Archive

Posts Tagged ‘thunkify’

Implementing a minimal social network in node (with generators and thunks)

June 25, 2014 7 comments

Yes, it’s time for another survey of the excruciatingly slow progress of generators in JavaScript! But with the happy conclusion that they are practically usable right now (and add massive value) in at least one context. Also I made my own social network!

Executive summary:

It’s over a decade since this concept first went mainstream in Python 2.3 (the yield keyword), almost a decade since they appeared in C# 2.0 (yield return) and only a year later they were added to Mozilla’s JavaScript 1.7 (yield again). And next year will mark 40 years since CLU originated this idea (can you guess what keyword CLU used to yield values?)

But getting a useable form of generators in browsers is taking ridiculously too long; ridiculous because they are a perfectly acceptable solution to the central problem of the browser API, which is managing a chain of asynchronous operations via callbacks. All you have to do is represent your asynchronous operation as a value that you yield from the generator. This allows some simple management code to detect when the result eventually arrives and so reawaken your generator where it left off. So control flips back and forth between your generator and the management code. The pain of asynchronicity is hidden from you and you get to relax in your underpants and drink hot chocolate like everything’s fine.

Thunks

How do you represent an operation as a value? There are two popular options. When I first got excited about this in JavaScript back in 2010, I did this:

var ASYNC = {
    ajax: function(url) {
        return function(receiver) { 
            jQuery.ajax({ url: url, success: receiver });
        };
    },

The caller of that function is writing a generator, so they say:

var secrets = yield ASYNC.ajax('readSecretEmail');

Therefore the management code responsible for cycling through the generator must be expecting a function to be yielded.

This pattern – in which a function (such as the above ASYNC.ajax) returns a value by the convoluted route of (deep breath) returning a function to which another function (which I called receiver) is passed that accepts the return value as a parameter – has somewhere picked up the name thunk, a name previously applied to many things, but let’s run with it.

This seems familiar…

If you’ve read my demystification of call-cc, you may remember I had to take a deep breath before tersely summarising it thus: “a function A that accepts a function B that will immediately be called by the language runtime, passing it a function C that when eventually (optionally) called will cause the previous call to A to return the argument previously passed to C and so take us back to where we started.”

These deep breaths are not a coincidence. There’s a deep parallel here. Replace that “function A” with the yield keyword. We give it “function B”, so that’s our thunk (such as the function returned by ASYNC.ajax). The “language runtime” is the management code driving the generator. It calls our thunk passing it a “function C”, so that must be the receiver accepted by our thunk. And when ASYNC.ajax passes a value to the receiver, “the previous call to A” (that is, the yield) will return that value so our code can carry on where it left off.

So this whole pattern (the combination of yield-ing thunks and some management code to drive the generator) is exactly the same shape as call-cc. There truly is nothing new under the sun (call-cc may be 50 years old). The only difference is that in the kind of thunks popular in node development, the receiver takes two parameters: (error, value). This mirrors the way ordinary functions can either return a value or throw an exception (and therefore error should be something like the standard Error object).

What about promises?

Another parallel is with promises. I’ve seen thunks described as a competing alternative to promises, but for our purposes they are almost homeomorphic: you can stretch one to make it look like the other, and back again. Let’s see how (and why it’s not quite a perfect homeomorphism, but we don’t care).

A promise has a then function that takes two callbacks, each taking one value: the first takes the actual value and the second takes an error, so only one of those callbacks will actually be called.

But of course if an object only has one interesting function, it might as well be that function. No good reason to make people say p.then(v, e) when they could just say p(v, e). Sure enough, a thunk is that function. Only instead of taking two callbacks, it takes one that will be passed both the parameters (the value and the error). According to node convention, they are the other way round: (error, value). But only one will have a meaningful value: if the error is not null or void, the value can be ignored.

So if we have a thunk, how would we wrap it in an equivalent promise?

var promise = {
    then: function(valCb, errCb) {
        thunk(function(err, val) {
            if (err) {
                errCb(err);
            } else {
                valCb(val);
            }
        });
    }
};

Well, not quite. I’ve left out a big chunk of what normally makes promises so useful: then should return another promise, one that is set up to resolve (or reject) based on the return value of valCb and errCb. Best of all is what they do if those callbacks return promises – they automatically “chain” to the promise returned by then.

But as we’re going to be using generators, we don’t give a crap about chaining because we have a much nicer solution. So in brief, that’s what I mean by an imperfect homeomorphism: not giving a crap about chaining. Where we’re going (puts on sunglasses), we don’t care if our async APIs return promises or thunks.

Running the generator

Back in the day I wrote my own helper called ASYNC.run that was responsible for launching a generator and stepping through it, re-awakening it whenever an async result appeared. But since then the API of generator objects has changed as part of standardisation. Also since generators were added to V8 and became accessible in non-stable builds of node, people have been working on little libraries that implement this same pattern (with the essential extra of dealing with errors as well as success values). There is one called suspend and another called co, and as you’d expect there’s not much to choose between them.

Koa

In fact if you use the delightful koa as the basis for your node web app then you’ll forget that anything needs to drive the generator. You just write (almost all) your code in generator functions and It Just Works. You may not need to directly depend on something like co at all.

bitstupid.com

I’ve been threatening my co-workers with the idea that I would start my own social network, ever since Twitter caught on. It struck me that 140 characters is a lot of information, pretty daunting for the average person to deal with. So why not restrict each user to a single bit, the theoretical minimum quantity of information? You can toggle the value of your own bit, you can toggle other people’s bits, they can toggle yours. It can be a society-changing metaphor for whatever sordid, disgusting animalistic activity you wish. That’s none of my business.

So now I finally got around to it. The code is very short. This is largely down to the choice of redis as the database. My data layer is in the data.js module. All the exports are generators, e.g.:

exports.getSecret = function* (of) {

    var secret = yield redis.hget('user:' + of, 'secret');
    if (secret === null) {
        var secret = (yield crypto.randomBytes(128)).toString('base64');
        if ((yield redis.hsetnx('user:' + of, 'secret', secret)) === 0) {
            secret = yield redis.hget('user:' + of, 'secret');
        } else {
            yield redis.set('secret:' + secret, of);
        }
    }

    if (secret === null) {
        throw new Error('Unexpected: could not get secret of ' + of);
    }

    return secret;
};

In other words, as with C# async/await, it makes the code look no different from the synchronous variety except for the constant repetition of a certain keyword (either yield or await) before every API call. I maintain that good design is about choosing the right defaults, and here we have the incorrect choice. An asynchronous API call should automatically work as if you prefixed it with yield. Only if you specifically want to obtain the underlying value representation (the promise, the Task, the thunk…) should you need to prefix it with a keyword. But I already complained about that and few people seem to understand what I’m talking about. In any case I can’t see how this could be solved efficiently in a dynamic language like Javascript (a statically typed language is another matter: TypeScript maybe has the opportunity to get it right).

But despite all the yield keywords cluttering up the code, this is still obviously a great leap ahead of normal callback-based node code. Try rewriting the above getSecret and you’ll see what I mean. It only just occurred to me to bother for explanatory purposes, so I don’t know if this is correct, but it’s roughly the right shape:

exports.getSecret = function(of, callback) {

    redis.hget('user:' + of, 'secret', function(err, secret) {
        if (err) {
            callback(err);
        } else {
            if (secret !== null) {
                callback(null, secret);
            } else {
                crypto.randomBytes(128, function(err, secret) {
                    if (err) {
                        callback(err);
                    } else {
                        secret = secret.toString('base64');
                        redis.hsetnx('user:' + of, 'secret', secret, function(err, result) {
                            if (err) {
                                callback(err);
                            } else {
                                if (result === 0) {
                                    redis.hget('user:' + of, 'secret', function(err, secret) {
                                        if (secret === null) {
                                            callback(new Error('Unexpected: could not get secret of ' + of));
                                        } else {
                                            callback(null, secret);
                                        }
                                    });
                                } else {
                                    redis.set('secret:' + secret, function(err) {
                                        callback(null, secret);
                                    });
                                }
                            }
                        });
                    }
                });
            }
        }
    });
};

It’s worth noting that a lot of the mess is due to the misery of having to explicitly pass back errors. To be fair to promise chaining, that alone would clean up the error-handling mess (again, this is just an untested rough guess):

exports.getSecret = function(of) { 
    return redis.hget('user:' + of, 'secret').then(function(secret) {
        return secret || crypto.randomBytes(128).then(function(secret) {
            secret = secret.toString('base64');
            return redis.hsetnx('user:' + of, 'secret', secret).then(function(result) {
                if (result === 0) {
                    return redis.hget('user:' + of, 'secret').then(function(secret) {
                        if (secret === null) {
                            throw new Error('Unexpected: could not get secret of ' + of);
                        }
                        return secret;
                    });
                }
                return redis.set('secret:' + secret).then(function() {
                    return secret;
                });
            });
        });
    });
};

But it’s still very noisy. Lots of then(function(secret) { and hence nesting. Also this isn’t even a particularly complex case.

ay

Try this seemingly innocuous example from server.js:

app.get('/bits/:of', function* () {
    var bit = yield data.readBit(
        this.params.of.toLowerCase(), 
        this.query.skip, 
        this.query.take);

    yield ay(bit.changes).forEach(function* (change) {
        change.info = yield data.getInfo(change.by);
    });

    this.body = bit;
});

First I get the bit object that described the state of the requested user’s bit. It includes a changes that lists a series of occasions on which someone toggled the bits. These include a username, by. I want to augment this with an info property that gives all the known information about the user, but data.getInfo is another async API of course. That rules out using the usual Array#forEach, because you can’t give it a generator. Same goes for map, filter, reduce and so on.

So I wrote a little library called ay (“array yield”) to get around this. It lets you chain the usual operations on an array but pass in generator functions, in such a way that any yields can escape out to the surrounding context. Here’s a more complete example of usage:

var num = yield ay(nums)
    .map(function* (i) {
        yield sleep(10);
        return i * 2;
    })
    .filter(function* (i) {
        yield sleep(10);
        return i > 2;
    })
    .reduce(function* (a, b) {
        yield sleep(10);
        return a + b;
    });

funkify

Another library I cooked up for bitstupid.com is funkify. There’s a widely cited library thunkify that can turn a single traditional node callback API into a thunk-returner. But what if you want to do a whole module or other object? I found a couple of libraries for doing that, but they both modified the original object. Didn’t like that; what if it breaks it? Also there is a subtlety in writing something that wraps another object: what if it is a function, but also has several function properties attached to it? An example of this is the widely used request, which can be called directly as a function but also has functions like get and post hanging off of it. Hence funkify, which is wise to such things:

var request = funkify(require('request'));

// In a generator...
var info = yield request.get({ uri: 'blah' });

Or to adapt the superb redis client library:

var redis = funkify(require('redis').createClient());

It’s a pain-free way to instantly turn a library into a thunk-returner ready for use from koa.

Generators on stable node versions

Generators are not yet available in mainstream V8 and so neither are they in stable node releases. It’s pretty easy to fix this if you’re happy running unstable builds – just install nvm:

curl https://raw.githubusercontent.com/creationix/nvm/v0.8.0/install.sh | sh

and then say:

nvm install 0.11

But if you’d rather run a stable build (or your host only lets your run some ancient version), what can you do? It’s wonderfully easy thanks to gnode:

npm install -g gnode

Then just use the gnode command instead of node. On my server I use the delightful forever to get the app running:

forever start /usr/local/bin/gnode server.js

That way when I disconnect, it stays up (and it gets restarted automatically if it crashes).

Categories: Uncategorized Tags: , , ,