Home > Uncategorized > per: composable forward-passing processor functions

per: composable forward-passing processor functions

The other day I tried implementing SICP streams in JavaScript as part of a fun project I’m tinkering with. I noticed that (unsurprisingly, given how they work) they generate a lot of memory garbage during the main loop of whatever operation you’re doing, and the overhead of this can actually become significant.

So I blogged them, sighed to myself, and then ripped them out of my project. What should I use instead? I want a combination of high performance and convenience. Also I miss generators, which of course I can’t (yet) assume are available on my target platforms, as they include browsers dating as far back as the ancient IE9 (ask your grandmother) and Chrome 30 without the about:flags experimental JavaScript features enabled.

My building block will be a function of this shape, which I’ll refer to as a processor:

function timesTwo(emit, value) {
    emit(value * 2); // this part is optional
}

It takes two parameters, the first being a function to which it can pass values forward, and then second being a value for it to process. So you have to call it to give it a value, and it may or may not emit values onwards to wherever you tell it.

Of course it can emit nothing, or many values:

function triplicate(emit, value) {
    emit(value);
    emit(value);
    emit(value);
}

If you’re into Ruby this will be very familiar to you as a way of representing a sequence of values, and if you implement this kind of function it’s nice and easy, like using yield in a true generator.

The difference here from the Ruby pattern is that we’re discussing a pure function, rather than a method on an object. So we take a second argument as an input that can be used to determine what we emit. For example, we could assume our values will be arrays, and “flatten” them:

function flatten(emit, value) {
    value.forEach(emit); // JS array's forEach fits perfectly
}

If you called this three times passing an array each time, emit would get all the elements of the three arrays as a flat stream of elements (in separate calls), not knowing which arrays they were originally from.

Alternatively we can ignore the second parameter (not even bothering to declare it) and so define a pure source of data that emits its sequence of values when called:

function chatter(emit) {
    emit('hi');
    emit('whatever');
    emit('bye');
}

So far, so patterny. But the pain with these kinds of building blocks is that they don’t directly compose. An intermediate processor like flatten wants two arguments: where to emit its output and what value to process. But any preceding step just wants a function called emit that accepts one argument: a value.

We can take any two processors and turn them into a single processor that chains the two (put on your higher-order functional programming spectacles now):

function compose(first, second) {
    return function(emit, value) {
        return first(function(firstValue) {
            return second(emit, firstValue);
        }, value);
    };
}

(Note: I pass back the return value through the layers because it has a use that we’ll go into later.)

See how it works? compose creates and returns a new function. It weaves first and second together. The new wrapper receives the outer emit, and that is given to second, so that’s where the final results will be sent. The first function is passed another new function to emit to, which is where we call second.

That’s the beautiful version. But it means that we have allocate a function object every time a value is passed in. And let’s be crazy C++ programmers for a moment and assume that’s Just Not Good Enough. We could rewrite compose to be butt-ugly and yet still do what we want, without doing any dynamic allocation after initial setup:

function compose(first, second) {
    var secondEmit;
    function firstEmit(firstVal) {
        return second(secondEmit, firstVal);
    }
    return function(emit, value) {
        secondEmit = emit;
        return first(firstEmit, value);            
    };
}

Yes, EEWWW indeed. We use a local variable, secondEmit, in which we stash the outer emit as soon as we know it. And we create a firstEmit function once, so we can reuse it.

In simple scenarios this will behave the same as the beautiful version. But not always:

function troubleMaker(emit, value) {
    setInterval(function() { emit(value); }, 100);
};

var doubleTrouble = compose(troubleMaker, timesTwo);

doubleTrouble(function(value) { console.log(value); }, 3);
doubleTrouble(function(value) { document.write(value); }, 4);

Now we have the value 6 being printed to the console and the value 8 being appended to the document. Except… not if we use the second version of compose, because then the second call to foo would redirect both streams to the new destination (by updating that secondEmit local variable). What a mess.

Fortunately, we’re not C++ programmers! Phew! This doesn’t mean that we don’t care about performance. It just means that we do some measuring before going crazy about imaginary overhead. And on V8 I find that the “faster” version is approximately… 1% faster. Screw that. Let’s stick with the beautiful, easy-to-predict version.

The one other interesting feature of the pattern is what we can use the return value for: to signal that the receiver doesn’t want to receive any more data. To do this they return true. So our flatten example should actually look like this:

function flatten(emit, value) {
    return value.some(emit);
}

That way, we stop looping unnecessarily when there’s no need to keep going (because that’s what a JavaScript array’s some method does: quits and returns true when the function you gave it returns true).

So that’s the pattern. What more is there to say? Well, although writing a processor (or pure value generator) is very easy, because you just write imperative code and call emit whenever you like, it’s not so convenient to use them as components. Yes, we have combine to efficiently pipeline processors together, but there are a lot of “standard” things we tend to do on sequences where it would be nice not to have to write boilerplate code. Especially when writing tests (shudder).

To make this really easy and succinct, I’ve cooked up a little library called per:

npm install per

(Source code)

It’s reminiscent of jQuery, in that it puts a lightweight wrapper around something so that we can call useful operations on it. Most operations return another instance of the wrapper, so calls can be chained. The only entry point into the whole library is a single function called per, so in node (or browserify or webmake) you’d say:

var per = require('per');

In the browser you could just load per.js via the script tag and then you get a global per. Then you can wrap functions like this:

var numbers = per(function(emit) {
    for (var n = 0; n < 100; n++) {
        emit(n);
    }
});

The simplest methods on a per are ways to capture and return the emitted values:

var f = numbers.first(),    // f == 0
    l = numbers.last(),     // l == 99
    a = numbers.all()       // array [0... 99]

The function wrapped by a per is available in a property called forEach, so named because for a simple generator you can treat it a little like an array:

numbers.forEach(function(value) {
    console.log(value);
});

To compose functions, there is a per method which is exactly like the compose function we figured out above.

var odds = numbers.per(function(emit, value) {
                           if (value % 2) {
                               emit(value);
                           }
                       });

The above pattern is an example of a filter, which evaluates an expression to decide whether or not to forward a value. This is such an obvious pattern that we should have a built-in formalization of it:

var odds = numbers.filter(function(value) {
                              return value % 2
                          });

For extra brevity we support passing a string expression in terms of x:

var odds = numbers.filter('x%2');

Another common pattern is to transform a value and pass it on, which is captured by the map method:

var evens = numbers.filter('x%2').map('x-1');

How can we use the resulting combination? As always it has a forEach property, and because numbers doesn’t need a value parameter we only need to pass it a function to emit to (or we can use first, last or all).

There are operators skip and take that work like their corresponding Linq operators:

odds.skip(3).all()          // [7, 9, 11, 13...]
odds.skip(2).take(3).all()  // [5, 7, 9]

These are examples of stateful transformers, because they have an internal counter that controls their behaviour.

We can also construct an initial per by passing an array:

var a = per([1, 2, 3, 4, 5]);

console.log(a.first()); // 1
console.log(a.last()); // 5

You can even pass per a simple value and it will act as a function that merely emits that one value. For example that value might be something complex, such as a whole document that you’re going to pass through several parsing stages.

The above examples all use a fairly simple structure: the first call to per provides an initial value (or stream of values), then there is a chain of transformations, then the final step collects the results. This is fine if you want to process a stream of data in a single quick operation.

But what if you have a source of data that occasionally delivers you a value, and you want to send it through a pipeline of transformers? That is, rather than a single deluge you have more of a drip, drip, drip of information. You can prepare a chain of transformers:

var p = per(foo1).per(foo2)
                 .map(bar1)
                 .filter(bar2)
                 .listen(function(value) {
                     console.log(value);
                 });

The listen is used to inject a function in the pipeline that just receives values, but otherwise has no effect on the stream. So in fact it’s like per but you don’t have to write emit(value) – that just happens automatically.

So now you need a way to pass values into the front, for which you can use submit:

p.submit(5);
p.submit(41);
p.submit('hi');

Sometimes you need to direct the flow of information through multiple separate sections simultaneously. This is where multicast comes in handy:

var p = per(foo1).per(foo2).multicast(
    per(foo1).filter('!(x%2)').take(72).into(results1),
    per(foo2).filter('x%2').take(68).into(results2)
);

You can pass as many arguments as you like to multicast (they can be instances of per or plain transformer functions) and they will each receive every value. By the way, multicast is built on listen, so it doesn’t affect the stream, except that when all the recipients have returned true to indicate that they are full, then multicast itself will do the same.

But really the core idea is break up a complex multi-stage task into functions like this:

function transformer(emit, value) {
    // do something interesting and maybe call emit
    // a few times, passing forward values to the next stage
}

The per library merely provides some common ways to stitch such functions together.

Advertisements
Categories: Uncategorized Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: