Lazy Iteration is being actively researched recently. There are two main strategies
- Generators: C# with LINQ, Javascript Generators, Python List Comprehension
- Callback: NodeJS Events, Ruby Blocks
Generators are widely implemented and their use cases are quite well understood. Mainstream languages just recently implemented lambda functions (Lisp had them since 1958!) which are required for iteration with callback. This article introduces Streams, which is the most basic way to do iteration with callback.
Lazy Iteration
An iterator is used to modularize your code. You first put all your values in a container. Then you have an iterator that iterates through the container, and finally a code that processes the values.
The goal of the lazy iteration is to merge the generation and iteration steps. This has several benefits:
- No storage: values are generated, processed and then garbage collected.
- Arbitrary number of values: Can represent asynchronous I/O.
- Earlier Process: As soon as one value is generated, it can be processed. Better for user interactivity.
Generators
Lazy iteration is traditionally done with generators. A generator is a function that each time it is being called, returns the next value. Language makers allow to use yield
to return multiple values. When you call the function again, it jumps back where it stopped.
function generator() { for (var x = 0; x < 10; ++x) { for (var y = 0; y < 10; ++y) { yield [x, y]; } } } while (value in generator()) { // Do something with value } |
However, the implementation of generators is not trivial. There are several articles that explain how it works in C#: Behind the Scenes - Yield Keyword, What does Yield keyword generate.
To give you an idea of the complexity, this is a version of the same generator without yield.
var p = null; function generator() { // First time if (p == null) { p = [0, 0]; return p; } // Loop through the 2 dimensions (x, y) for (var dim = 2 - 1; dim >= 0; --dim) { p[dim] += 1; // i++ if (p[dim] == 10) { // i < 10 p[dim] = 0; // i = 0 continue; } return p; // return [x, y] } // If we are here, we got through all the values. } |
Streams
Generators are not the only way to deal with to do lazy iteration. We can use continuations (or callbacks). We will call "Stream" a function that generates values, and will pass them to the continuation.
function stream(continuation) { for (var x = 0; x < 10; ++x) { for (var y = 0; y < 10; ++y) { continuation([x, y]); // We call the function that will process the value } } } |
As you can see, the code used to generate the values remains unchanged. Now we are going to use the stream: the most basic way to do that is to print the values.
stream(function (value) { console.log(value); }); // [0, 0] // [0, 1] // [0, 2] // ... // [9, 9] |
A stream is a function and we call it with another function. This is probably the first thing that will intrigue you. We just landed into the functional world! A stream is a high-order function. If you are not familiar with this concept, you can think a stream as a foreach function.
function print(stream) { stream(function (v) { // For each value of the stream console.log(v); // Print it }); } |
Rebuilding core functions
Now that we defined what a stream is, we want to see if we can express the basic functional operations that are map, filter and reduce.
function map(stream, f) { return function (continuation) { // We return a stream that stream(function (value) { // For each value of the stream continuation(f(value)); // Continues with the application of f on the value }); }; } function filter(stream, f) { return function (continuation) { // We return a stream that stream(function (value) { // For each value of the stream if (f(value)) { // Tests if it matches the filter continuation(value); // And continues it } }); }; } function reduce(stream, f, initial) { var ret = initial; // Store the initial value stream(function (value) { // For each value of the stream ret = f(value, ret); // Reduce it with the current value }); return ret; // and return the computed value } |
Example: Range
This was probably not clear how to use the object we just built. Let us take a simple example, we are going to play with the simplest thing we can generate: numbers from 0 to 10 🙂
// The range function is a factory for a stream function range(min, max) { return function (continuation) { for (var i = min; i < max; ++i) { // Once we have generated a value, we pass it to the continuation continuation(i); } }; // The returned object is a function that takes a continuation // and calls it with the generated values, // therefore this is a stream } // We create a stream stream = range(0, 10); // A stream takes a continuation that will be executed on all the values it generates print(stream); // 0 1 2 3 4 5 6 7 8 9 // We can use the map, to change every value from v to 2 * v stream = map(stream, function (v) { return 2 * v; }); print(stream); // 0 2 4 6 8 10 12 14 16 18 // And filter only the multiples of 3 stream = filter(stream, function (v) { return v % 3 == 0; }); print(stream); // 0 6 12 18 // In our case the number of generated values is finite // We can therefore reduce them with the + operation reduce(stream, function (a, b) { return a + b; }, 0); // 36 (Note: This is a real value, not a stream) |
Example: Message
A stream is a function that will call the continuation for all the values it generates. Therefore, any API that generates values with a callback can be used as a stream. As an example, we will use the window.postMessage API.
// We create the stream events = function (continuation) { window.addEventListener("message", continuation); } // We only want events that are from a trusted origin trusted_events = filter(events, function (e) { return e.origin == 'http://vjeux.com'; }); // We don't care about the event object, we just want the data messages = map(trusted_events, function (e) { return e.data; }); // Now that we have the messages we wanted // in the form we wanted, we can process them. messages(function (m) { console.log('Message received!', m); }); |
enumerate
We can easily reproduce the enumerate function of Python.
function enumerate(stream) { var i = 0; return map(stream, function (v) { return [i++, v]; }); } print(enumerate(range(5, 10))); // [0, 5] // [1, 6] // [2, 7] // [3, 8] // [4, 9] |
Stream Comprehension
It is quite painful to work with streams for basic operations (filtering and mapping). Languages like Python introduced a shorthand called List Comprehension. It is possible to do the same with streams.
function comprehension(f_map, stream, f_filter) { if (filter) { stream = filter(stream, f_filter); } return map(stream, f_map); } print(comprehension(#(v) { 2 * v }, range(0, 10), #(v) { v % 3 == 0 })); // 0 6 12 18 // Python equivalent: // [2 * v for v in range(0, 10) if v % 3 == 0] |
I make use of the Harmony # function proposal. It is still not really user-friendly. A modification of the language is probably required to make it enjoyable.
Recursive Stream
It took some time for C# to have recursive yield. Our stream proposal on the other hand works directly in a recursive fashion.
function traverse(tree) { return function (continuation) { // We return a stream that continuation(tree.value); // Continues with the value for (var i = 0; i < tree.children.length; ++i) { // And traverse recursively on the children traverse(tree.children[i])(continuation); } }; } var tree = { value: '1', children: [ { value: '1.1', children: [ { value: '1.1.1', children: [] } ] }, { value: '1.2', children: [] } ] }; print(traverse(tree)); // 1 // 1.1 // 1.1.1 // 1.2 |
List
Since we are building a stream using a functional approach, we want to build functional lists. We need two things, an empty list and a way to construct a new list by adding an element to an existing list.
function empty() { // An empty list is a stream that does not call the continuation return function (continuation) { }; } function cons(head, tail) { return function (continuation) { // To construct a new list, we return a stream continuation(head); // That first continues with the head tail(continuation); // and continues the tail }; } print(cons(1, cons(2, cons(3, empty())))); // 1 2 3 |
I am not exactly sure how to write a head & tail function that returns two streams, one with the head and one with the tail.
zip
The zip function takes two streams and returns a single stream where each value is the combination of both streams.
function zip(stream_a, stream_b) { values_a = []; values_b = []; return function (continuation) { stream_a(function (v) { // For each value of stream_a values_a.push(v); // Store it if (values_b.length) { // If there is a value of stream_b awaiting // Continue with both values continuation([values_a.shift(), values_b.shift()]); } }); // Same for stream_b stream_b(function (v) { values_b.push(v); if (values_a.length) { continuation([values_a.shift(), values_b.shift()]); } }); } } print(zip(range(0, 10), range(5, 10))); // [0, 5] // [1, 6] // [2, 7] // [3, 8] // [4, 9] // Note: All the values of the first range after 4 are being ignored // because both streams do not have the same length. |
Infinite zip
Here, the stream is generated by a for
loop. Since the scheduler of Javascript is not pre-emptive, nothing can be executed while we are generating the values. Therefore we need to see all the values of one stream before we can return any value. It is problematic for infinite streams.
print(zip(range(1, Infinity), range(10, Infinity))); // Freeze! |
We are going to simulate a scheduler to solve this problem. We first need a generator. This is a function that will return the next value everytime it is being called. We can either use Firefox 2+ yield to build a generator or do it by hand.
We will construct a stream on top of this generator. The trick is to use window.setTimeout with a zero-delay between the generation of two values. Both streams will be able to generate values in parallel.
// With yield function xrange(min, max) { for (var i = min; i < max; ++i) { yield i; } } // Without yield function StopIteration() {} // We first define the exception function xrange(min, max) { var i = min; // We create the cursor return { // And return an object with a next: function () { // next() that will be called to get the next value if (i == max) { // If we reached the end, throw new StopIteration(); // We throw the StopIteration exception } // else return i++; // We return the value } }; } // Then we make a function that converts a generator to a stream function generator2stream(generator) { return function rec(continuation) { // We return a stream that try { continuation(generator.next()); // continues with the next value of the generator window.setTimeout(function () { // and gives the hand back. rec(continuation); // It will provide the next value }, 0); // as soon as the scheduler calls it again. } catch (e) { // When the generator has finished, it throws a StopIteration exception // We want to ignore it. if (!(e instanceof StopIteration)) { throw e; } } }; } a = generator2stream(xrange(1, Infinity)); b = generator2stream(xrange(10, Infinity)); print(zip(a, b)); // [1, 10] // [2, 11] // [3, 12] // ... |
But if the source of the stream releases the flow of execution between the generation of two values, this works well. This is the case of all async I/O. Let's take as example click and mousemove events. We want to synchronize the click and move events aka everytime both have happend, do something with them.
var click = function (continuation) { $('body').click(function (e) { continuation(e); }); }; var move = function (continuation) { $('body').click(function (e) { continuation(e); }); }; // The 2 lines before are best understood like this: // var click = $('body').click; // var move = $('body').mousemove; // However it is not working because of the dynamic scoping of `this` :( // Return only the type and enumerate for a better display click = enumerate(map(click, function (e) { return e.type; })); move = enumerate(map(move, function (e) { return e.type; })); print(zip(click, move)); // move 0 // move 1 // click 0 // -> [click 0, move 0] // click 1 // -> [click 1, move 1] // click 2 // click 3 // move 2 // -> [click 2, move 2] |
Conclusion
This article showed that Streams supports all the basic iteration techniques. Even better, the implementation of all them is straightforward. This looks all shiny, so why nobody uses it?
I think that it is because it relies heavily on functional programming. Lambda functions is a requirement and unfortunately they used to be only implemented on languages such as Lisp, Haskell, ML ... However, this trend is evolving. Languages with lambda such as Javascript, Python, Ruby are growing, and mainstream languages such as C# and C++ are getting lambdas. The next step is to educate people to functional programming.
If you want to know more about lazy iteration, here are some related links: