functional | Smellegant Code

SICP-style Streams in JavaScript

October 29, 2013 earwicker 1 comment

In the not-famous-enough book Structure and Interpretation of Computer Programs (Abelson & Sussman, or “The Wizard book”) we learn about streams.

A stream is a tempting variation on the old school Lisp style of linked list. To get a plain old list, we can set up objects like this:

var a = {
    value: 'apple',
    next: null
};

var b = {
    value: 'banana',
    next: a
};

var c = {
    value: 'cantaloupe',
    next: b
};

So here our whole list is represented by c, and we can loop through it and print all the fruits:

for (var i = c; i != null; i = i.next) {
    console.log(i.value);
}

So far, so boring. The idea with a stream is very simple. Instead of storing the next object in the next property, we store a function that, if called, will return the next object. That is, we make it lazy. Note that our loop would still look much the same:

for (var i = c; i != null; i = i.next()) {
    console.log(i.value);
}

The only difference is we call next() instead of just reading it. And to set up the objects we’d have to say:

var a = {
    value: 'apple',
    next: function() { return null; }
};

var b = {
    value: 'banana',
    next: function() { return a; }
};

var c = {
    value: 'cantaloupe',
    next: function() { return b; }
};

So far, so pointless. But the value of this does not come from silly hand-built examples. In real software you would use this to generate streams from other data sources, or from other streams. It’s like Linq-to-objects in C#, but the foundations are actually more purely functional, because even the iteration process involves only immutable objects, and so everything is repeatable, nothing is destroyed merely by using it. Part-way through a stream you can stash the current node, and come back to it later. It will still represent “the rest of the stream”, even though you already used it once.

It is this extreme level of generality that persuaded me try using streams in a real JavaScript library. I want to write a rich text editor for HTML Canvas (more of that in a later post, hopefully). So I would have streams of characters, streams of words, streams of lines, etc. It seemed to fit, and also I have a week off work and it’s fun to re-invent the wheel.

I start with an object representing the empty stream. This is nicer than using null, because I want to provide member functions on streams. If you had to check whether a stream was null before calling methods on it, that would suck mightily.

var empty = {};

function getEmpty() {
    return empty;
}

Then we need a way to make a non-empty stream:

function create(value, next) {
    return Object.create(empty, {
        value: { value: value },
        next: { value: next || getEmpty }
    });
}

It uses the empty stream as its prototype, and adds immutable properties for value and the next function. If no next function is passed, we substitute getEmpty. So calling create('banana') would make a stream of just one item.

One very handy building block is range:

var range = function(start, limit) {
    return start >= limit ? empty : create(start, function() {
        return range(start + 1, limit);
    });
};

Note the pattern, as it is typical: the next works by calling the outer function with the arguments needed to make it do the next step. And you may be thinking – AHGGHGH! Stack overflow! But no, as long as we loop through the stream using our for-loop pattern, the stack will not get arbitrarily deep.

Here’s a favourite of mine, so often forgotten about:

var unfold = function(seed, increment, terminator) {
    return create(seed, function() {
        var next = increment(seed);
        return next === terminator ? empty :
            unfold(next, increment, terminator);
    });
};

You call it with a seed value, which becomes the first value of the stream, and also an increment function that knows how to get from one value to the next, and a terminator value that would be returned by the increment function when it has no more values. So in fact you could implement range in terms of unfold:

var range = function(start, limit) {
    return unfold(start, function(v) { return v + 1; }, limit);
};

It can also turn a traditional linked list into a stream:

var fromList = function(front) {
    return unfold(front, function(i) { return i.next; }, null);
};

Groovy! Now we have several ways to originate a stream, so lets add some methods. Recall that empty is the prototype for streams, so:

empty.forEach = function(each) {
    for (var s = this; s !== empty; s = s.next()) {
        each(s.value)
    }
};

Nothing to it! And we can use forEach to get a stream into an array:

empty.toArray = function() {
    var ar = [];
    this.forEach(function(i) { ar.push(i); });
    return ar;
};

Of course, how could we live without the awesome power of map?

empty.map = function(mapFunc) {
    var self = this;
    return self === empty ? empty : create(mapFunc(self.value), function() {
        return self.next().map(mapFunc);
    });
};

Again, that lazy-recursive pattern. And now we can very easily implement converting an array into a stream:

var fromArray = function(ar) {
    return range(0, ar.length).map(function(i) {
        return ar[i];
    });
}

How about concat? Well, this has a slight wrinkle in that if the argument is a function, I treat it as a lazy way to get the second sequence:

empty.concat = function(other) {
    function next(item) {
        return item === empty
            ? (typeof other === 'function' ? other() : other)
            : create(item.value, function() { return next(item.next()); });
    }
    return next(this);
};

And with concat we can easily implement the holy grail of methods, bind (known as SelectMany in Linq and flatMap in Scala):

empty.bind = function(bindFunc) {
    var self = this;
    return self === empty ? empty : bindFunc(self.value).concat(function() {
        return self.next().bind(bindFunc);
    });
};

Think that one through – it’s a mind-bender. The bindFunc returns a sub-stream for each item in the outer stream, and we join them all together. So:

assertEqual(

    // ordinary array of numbers
    [1, 2, 3, 4, 5, 6, 7, 8, 9],

    // making that same array in an interesting way
    Stream.fromArray(
        [[1, 2, 3], [4], [5, 6], [], [7], [], [], [8, 9]]
    ).bind(function(ar) {
        return Stream.fromArray(ar);
    }).toArray()

);

Anyway, I wrote my rich text layout engine using this stream foundation, and (as I like to do with these things) I set up an animation loop and watched it repeatedly carry out the entire word-break and line-wrap process from scratch in every frame, to see what frame rate I could get. Sadly, according to the browsers’ profilers, the runtime was spending a LOT of time creating and throwing away temporary objects, collecting garbage and all the other housekeeping tasks that I’d set for it just so I could use this cool stream concept. Interestingly, in terms of this raw crunching through objects, IE 10 was faster than Chrome 30. But I know that by using a simpler basic abstraction it would be much faster in both browsers.

How do I know? Well, because I found that I could speed up my program very easily by caching the stream of words in an ordinary array. And guess what… I could just use arrays in the first place. I am only scanning forward through streams and I definitely want to cache all my intermediate results. So I may as well just build arrays. (Even though I haven’t started the rewrite yet, I know it will be way faster because of what the profilers told me).

So, for now, we say: fair well streams, we hardly knew ye.

Categories: Uncategorized Tags: functional, javascript, sicp, streams

Asynchronous Memoization in JavaScript

January 16, 2012 earwicker Leave a comment

In pure functional programming there is a simple rule: if you evaluate (call) a function more than once with the exact same argument values, you’ll keep getting the same return value. It follows that there is no need to call it more than once, which is super-awesome!! because it means you can put a caching mechanism in front of the function that keeps a map (hash table, Dictionary, etc.) of all the return values produced so far, each one keyed by the bundle of arguments that produced that return value.

Of course it’s only worth doing this if the dictionary lookup is faster than simply re-executing the function itself, and if the same small set of arguments are highly likely to be passed repeatedly. Yawn!!

In a non-pure language like JavaScript many (most?) functions are not pure: they examine information from other sources besides their parameters. However, they often have contexts within which they are “pure enough”. e.g. the information the user can see on the screen is, naively speaking, a projection of the information stored in the database, but if that were really true then when the database changes, the screen would immediately change as well. But it doesn’t; instead it usually remains stale until the user presses Refresh. This corresponds to “emptying the cache”.

In a complex app, there may be several separate components that project the same information in different ways. If they all go back to the external source for that information, and it is changing in real time, you could end up with an inconsistent picture on the screen. This might even cause instability, if one component tries to talk to another and they assume they’ll both have identical snapshots of the external information.

So memoization actually has a purpose in JavaScript: it can simulate a “freeze frame” of your dependencies on external data. But we need to provide the ability to delete things from the cache at the time of our choosing, “unfreezing” the frame so we can take a new snapshot.

Another complicating factor in JavaScript is asynchrony. JavaScript programmers just have to get used to doing the following transformation into “continuation-passing style” by hand; starting with:

var add = function(a1, a2) {
  return a1 + a2;
};

We switch to:

var add = function(a1, a2, done) {
  done(a1 + a2);
};

So the caller of add can no longer say:

var sum = add(2, 2);
alert('The answer is ' + sum);

And must instead say:

add(2, 2, function(sum) {
  alert('The answer is ' + sum);
});

This allows the implementation of add to utilise other functions that need to be passed a continuation in the same manner. Yo!!

So, let’s memoize. To simplify matters we’ll start by assuming we’ll be dealing with functions that take no parameters (or always take exactly the same parameters, it’s the same thing). It means we can replace the map with a single variable. We’re displaying “prices” (whatever the hell they are) to the user, so if synchronous was a realistic option we’d start with:

var getPrices = function() { 
    /// Talk to server to get prices, p, somehow.
    return p; 
};

Sadly we have to be asynchronous, but it’s no biggie:

var getPrices = function(done) {
    /// Talk to server to get prices, p, somehow.
    done(p); 
};

Seems like memoizing something that will be easy!

var makeCache = function(getter) {

  var value = null, ready = false;

  return {

    reset: function() {
      value = null;
      ready = false;
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        getter(function(got) {
          value = got;
          ready = true;
          done(value);
        });
      }
    }
  };

};

You’d use it to “wrap” the getPrices function like this:

var pricesCache = makeCache(getPrices);

pricesCache.get(function(prices) {
  // we've got the prices!  
});

And when you want to reset the cache, just say:

pricesCache.reset();

But actually there’s a bug here: do you know what it is? Give yourself a playful slap on the thigh if you got it.

What if there’s more than one call to pricesCache.get before the first one comes back with the data? We only set the ready flag when we’ve got the answer, which might take a second. In the meantime, various parts of the UI might be grabbing the prices to make their own updates. Each such call will launch a separate (unnecessary) call to the backend. What’s worse is that the prices may actually change during this mess, and so the callers will end up with inconsistent price information, just like I was a-bellyachin’ about up yonder.

First reaction: oh, oh, I know, it’s a state machine! We thought there were two states, as indicated by the boolean ready flag. But actually there’s three:

No value.
Okay, I’m, getting the value, sheesh.
Got the value.

But hold on to your hose, little fireman. Think this one through for a second. It’s pretty clear that when the first caller tries to get, we need to transition to the middle state and make our call to the real getter function. And when the prices come back to us, we transition to the final state and call the callback function. But what about when a second caller tries to get and we’re already in the middle state? That’s the whole reason for doing this, to be able to handle that differently. Where do we put their callback function?

So, yes, it is a state machine, but not a three-state one. We need to keep a list of callback functions, so that when the prices come back, we can loop through those callback functions and give every single gosh darn one of them a calling they ain’t gonna forgit:

var makeCache = function(getter) {

  var value, ready, waiting = [];

  return {

    reset: function() {
      value = null;
      ready = false;
      waiting = [];
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        waiting.push(done);

        if (waiting.length === 1) {
          getter(function(got) {

            value = got;
            ready = true;

            waiting.forEach(function(w) { w(value); });
            waiting = null;
          });
        }
      }
    }
  };

};

Notice how I use waiting.forEach to loop through the callbacks. By definition here I’m calling some code that I don’t have control of. It might call back into pricesCache.get. That may seem intrinsically problematic, because it sounds like it could keep happening forever and cause a stack overflow. But it might be perfectly valid: there could be some separate code making the second call to get the prices, which supplies a different callback. Anyway, is it a problem for my cache implementation? No, because any calls to pricesCache.get during my callback loop will find that ready is already set, and so will not affect the waiting array. And even if pricesCache.reset were called, that would cause a fresh array to be created and stored in waiting.

And finally, nice piece of trivia for ya: even if there was some way for waiting to grow while we are still looping through it, according to the specification of Array.forEach the new item(s) won’t be included in the iteration.

Categories: Uncategorized Tags: asychrony, functional, javascript

Exceptions Part 2: Why do we need to catch them?

June 24, 2010 earwicker 6 comments

See the Contents Page for this series of articles

Last time we established that an exception is the way a function unambiguously produces no value – or no side-effect, which amounts to the same (no)thing. To take the best known example from mathematics, the function f(x) = 1/x has no value if x is zero. If you’re asking what anything divided by zero is, well… you shouldn’t be asking. Just don’t ask.

But by catching an exception, you are saying “I expect this”. So to be entirely rigorous about it, isn’t it wrong that we ever try to catch these things? Maybe, like in mathematics, we should never attempt to evaluate them in the first place.
Read more…

Categories: Uncategorized Tags: Exceptions, functional

More on delegates vs interfaces

October 29, 2009 earwicker 1 comment

A commenter on yesterday’s post asked about the lack of Dispose. Here’s a way of doing it:
Read more…

Categories: C#, delegates, functional Tags: C#, delegates, functional

Threadsafe Interfaces, Delegates vs Interfaces, Stuff Like That

October 26, 2009 earwicker 3 comments

The mighty Skeet posted the other day about the idea of using a single instance of IEnumerable as a shared input for multiple threads (not that I realised at first, but I got there eventually).

Clearly the interface is no good for that purpose, because two operations (“check for existence of next item” and “read next item”) are exposed as separate methods/properties, so it can’t be made atomic. Jared Parsons has blogged a lot of times very readably about this.

This got me thinking, because I’ve noticed that I can often shrink an interface declaration down so it only has one method. And then it doesn’t need to be an interface; it can just be a delegate. That way you can implement it on the fly with a lambda expression. If you express it without any out/ref parameters, you don’t even have to declare a new delegate type. And if you have a Tuple class (as in .NET 4.0), you don’t need to declare any new types – just higher order methods.
Read more…

Categories: C#, functional, threads Tags: C#, functional, threads

Smellegant Code

Archive

SICP-style Streams in JavaScript

Asynchronous Memoization in JavaScript

Exceptions Part 2: Why do we need to catch them?

More on delegates vs interfaces

Threadsafe Interfaces, Delegates vs Interfaces, Stuff Like That

Stack Overflow

Recent Posts

Archives