Home > Uncategorized > A pure library approach to async/await in standard JavaScript

A pure library approach to async/await in standard JavaScript

I’m very keen on JavaScript gaining actual language support for the one thing it is mostly used for: asynchronous programming, so I never shut up about it.

The C# approach was trailed very nicely here back in Oct 2010. Over the years I’ve occasionally searched for usable projects that might implement this kind of thing in JavaScript, but they all seem to be abandoned, so I stopped looking.

And then I had an idea for how to do it purely with an ordinary JS library – a crazy idea, to be sure, but an idea nonetheless. So I searched to see if anyone else had come up with it (let me rephrase that: like all simple ideas in software, it’s a certainty that other people came up with it decades ago, probably in LISP, but I searched anyway).

I haven’t found anything yet, but I did find Bruno Jouhier’s streamline.js, which looks like a very nice (and non-abandoned!) implementation of the precompiler approach.

So what was my crazy idea? Well, as Streamline’s creator said in a reply to a comment on his blog:

But, no matter how clever it is, a pure JS library will never be able to solve the “topological” issue that I tried to describe in my last post… You need extra power to solve this problem: either a fiber or coroutine library with a yield call, a CPS transform like streamline, or direct support from the language (a yield operator).

Well, that sounds like a challenge!

If we really really want to, we can in fact solve this problem with a pure library running in ordinary JavaScript, with no generators or fibers and no precompiler. Whether this approach is practical is another matter, partly because it chews the CPU like a hungry wolf, but also because the main selling point for this kind of thing is that it makes life easier for beginners, and unfortunately to use my approach you have to understand the concept of a pure function and pay close attention to when you’re stepping outside of purity.

To set the stage, Bruno Jouhier’s example is a node.js routine that recurses through file system directories. Here’s a simplified version using the sync APIs:

var fs = require('fs');
var path = require('path');

var recurseDir = function(dir) {
    fs.readdirSync(dir).forEach(function(child) {
        if (child[0] != '.') {
            var childPath = path.join(dir, child);
            if (fs.statSync(childPath).isDirectory()) {
                recurseDir(childPath);
            } else {
                console.log(childPath);
            }
        }
    });
};

recurseDir(process.argv[2]);

And – ta da! – here’s a version that uses the async APIs but appears not to:

var fs = require('fs');
var path = require('path');

var Q = require('q');
var interrupt = require('./interrupt.js');

var readdir = interrupt.bind(Q.nfbind(fs.readdir));
var stat = interrupt.bind(Q.nfbind(fs.stat));
var consoleLog = interrupt.bind(console.log);

interrupt.async(function() {

    var recurseDir = function(dir) {
        readdir(dir).forEach(function(child) {
            if (child[0] != '.') {
                var childPath = path.join(dir, child);
                if (stat(childPath).isDirectory()) {
                    recurseDir(childPath);
                } else {
                    consoleLog(childPath);
                }
            }
        });
    };

    recurseDir(process.argv[2]);
});

The core of the program, the recurseDir function, looks practically identical. The only difference is that it calls specially wrapped versions of readdir, stat and console.log, e.g.

var readdir = interrupt.bind(Q.nfbind(fs.readdir));

The inner wrapper Q.nfbind is from the lovely q module that provides us with promises with (almost) the same pattern as jQuery.Deferred. Q.nfbind wraps a node API so that instead of accepting a function(error, result) callback it returns a promise, which can reduce yuckiness by up to 68%.

But interrupt.bind is my own fiendish contribution:

exports.bind = function(impl) {
    return function() {
        var that = this;
        var args = arguments;
        return exports.await(function() {
            return impl.apply(that, args);
        });
    };
};

So it wraps a promise-returning function inside that interrupt.await thingy. To understand what that is for, we have to go back to the start of the example, where we say:

interrupt.async(function() {

The function we pass there (let’s call it our “async block”) will be executed multiple times – this is a basic fact about all interruptible coroutines. They can be paused and restarted. But standard JavaScript doesn’t provide a way to stop a function and then start up again from the same point. You can only start again from the beginning.

In order for that to work, when a function reruns all its activity a second, third, fourth… (and so on) time, repeating everything it has already done, then it has to behave exactly the same as it did the previous time(s). Which is where functional purity comes in. A pure function is one that returns the same value when provided with the same arguments. So Math.random is not pure. Nor is reading the file system (because it might change under your feet). But quite a lot of things are pure: anything that only depends on our parameters, or the local variables containing whatever we figured out so far from our parameters.

So, inside interrupt.async we can do anything pure without headaches. But whenever we want to know about the outside world, we have to be careful. The way we do that is with interrupt.await, e.g.

var stuff = interrupt.await(function() {
    return $.get('blah-de-blah');
});

The first time the async block runs, when it goes into interrupt.wait, it executes the function we pass to it (the “initializer”), which in this case starts a download and returns a promise that will be resolved when the download is ready. But then interrupt.wait throws an exception, which cancels execution of the async block. When the promise is resolved, the async block is executed again, and this time interrupt.wait totally ignores the function passed to it, but instead returns the result of the download from the promise created on the first run, which I call an externality (because it’s data that came from outside).

The internal representation is actually quite simple. Here’s interrupt.async:

function Interrupt() {}

var currentContext = null;

exports.async = function(impl) {
    log('Creating async context');
    var thisContext = {
        ready: [],
        waiting: null,
        slot: 0,
        result: defer(),
        attempt: function() {
            log('Swapping in context for execution attempt');
            var oldContext = currentContext;
            currentContext = thisContext;
            currentContext.slot = 0;
            try {
                thisContext.result.resolve(impl());
                log('Completed successfully');
            } catch (x) {
                if (x instanceof Interrupt) {
                    log('Execution was interrupted');
                    return;
                } else {
                    log('Exception occurred: ' + JSON.stringify(x));
                    throw x;
                }
            } finally {
                log('Restoring previous context');
                currentContext = oldContext;
            }
        }
    }
    log('Making first attempt at execution');
    thisContext.attempt();
    return getPromise(thisContext.result);
};

The important part is the context, which has an array, ready, of previously captured externalities, and an integer, slot, which is the index in the ready array where the next externality will be recorded.

The more fiddly work is done in interrupt.await:

exports.await = function(init) {
    if (!currentContext) {
        throw new Error('Used interrupt.await outside of interrupt.async');
    }
    var ctx = currentContext;
    if (ctx.ready.length > ctx.slot) {
        log('Already obtained value for slot ' + ctx.slot);
        var val = ctx.ready[ctx.slot];
        if (val && val.__exception) {
            log('Throwing exception for slot ' + ctx.slot);
            throw val.__exception;
        }
        log('Returning value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
        ctx.slot++;
        return val;
    }
    if (ctx.waiting) {
        log('Still waiting for value for ' + ctx.slot + ', will interrupt');
        throw new Interrupt();
    }
    log('Executing initializer for slot ' + ctx.slot);
    var promise = init();
    if (promise && promise.then) {
        log('Obtained a promise for slot ' + ctx.slot);
        var handler = function(val) {
            if ((ctx.slot != ctx.ready.length) ||
                (ctx.waiting != promise)) {
                throw new Error('Inconsistent state in interrupt context');
            }
            log('Obtained a value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
            ctx.ready.push(val);
            ctx.waiting = null;
            log('Requesting retry of execution');
            ctx.attempt();
        };
        promise.then(handler, function(reason) {
            log('Obtained an error ' + JSON.stringify(reason) + ' for slot ' + ctx.slot);
            handler({ __exception: reason });
        });
        ctx.waiting = promise;
        throw new Interrupt();
    }
    if (ctx.slot != ctx.ready.length) {
        throw new Error('Inconsistent state in interrupt context');
    }
    // 'promise' is not a promise!
    log('Obtained a plain value ' + JSON.stringify(promise) + ' for slot ' + ctx.slot);
    ctx.ready.push(promise);
    ctx.slot++;
    return promise;
};

It can deal with an initializer that returns a plain value, and avoids the overhead of interrupting and restarting in that case, but still enforces the same behaviour of capturing the externality so it can be returned on any subsequent repeat run instead of running the initializer again.

In fact we have an example of this in the original example: console.log is not a pure function. It has side-effects: conceptually, it returns a new state-of-the-universe every time we call it. So it has to be wrapped in interrupt.await, just like any other impure operation, and we faithfully record that it returned undefined so we can return that next time we execute the same step. In this case we’re not really recording a particular external value, but we are recording the fact that we’ve already caused a particular external side-effect, so we don’t cause it multiple times.

As long as the await-wrapping rule is followed, it all works perfectly. The problem, of course, is that if there are a lot of asynchronous promises and side-effecting calls involved, then it will start to slow down as it repeatedly stops and re-executes everything it has done so far. Although in fact, it doesn’t repeat everything. A lot of the hard work involves interacting with the OS, and that is (of necessity) wrapped in interrupt.await, and so only happens once. On subsequent executions the value cached in the ready array is reused, which is looked up by position, which is quite fast. So each re-execution only involves “going through the motions” to get back to where it left off.

Even so, this extra grinding of the CPU does start to slow it down very noticeably after a healthy number of interrupts (modern JavaScript is fast, but not a miracle worker). The recursion of the file system is a very good test, because it has to effectively recursively revisit all the places it already visited so far, and has to do this for every single file (due to the stat call) and twice for directories.

One way to “cheat” would be to replace the Array.prototype.forEach function with something that understood how to interact with the current async context, and could skip forward to the right place in the iteration… but I’ll save that for another rainy day.

Advertisements
Categories: Uncategorized Tags: , ,
  1. December 29, 2012 at 12:11 pm

    I followed the pingback and I’m flabbergasted! It can also be done with a pure library! Brilliant!!!

    As you mention, this approach puts more burden on the developer because it requires wrappers around all impure calls. It also requires tests to rethrow Interrupt exceptions in every catch and finally clause (no big deal). So, together with the perf issue, this is probably not such a great practical solution. But I had it wrong: there is a solution which does not require extra power. Bravo!

    Bruno

    • earwicker
      December 29, 2012 at 1:09 pm

      Thanks for the feedback, much appreciated! BTW since writing this I saw more of your blog and read about the disappointing feedback you’ve received after trying to promote this pattern in the Node community. I’ve seen that kind of reasoning before:

      • I obtain great value from X
      • But only if I perform menial task Y as “payment”, and so therefore,
      • Performing Y is what causes X to give me value, and,
      • Anyone who tells me to stop performing Y is endangering my access to the value of X!

      Where (of course) X is an event loop + async APIs, and Y is hand-written callback chains.

      Anyone who says “Look, we could just do Z, which is much nicer than Y, and that would unlock the value of X just fine!” gets shouted down. Hey, let them keep doing things the dumb way, if they enjoy it!

      Async programming is a must on the client as well (being the inspiration for the Node approach with an invisible event loop) so streamline.js deserves a wider audience than just Node. Microsoft is talking about adding async/await to their Typescript pre-compiler soon. It’s going to be an assumed, accepted part of the mainstream before long.

  2. December 30, 2012 at 12:29 am

    Yes, I share your XYZ analysis completely.

    The same reasoning applies to node’s stream API: X = event loop + async streams, Y = complex event driven API for streams, Z = simple, unix-like callback API for streams.

    Situation looks a bit better on the streams front though. A new API is coming with 1.0, some kind of (Y + Z) / 2.

    I’ve been trying (loudly?) to promote the Zs but t I’ve almost given up now. I just watch the discussions and I’m amazed by the energy which is being wasted (== doing things the dumb way).

    Regarding async/await, I’ve experimented an alternate async/await syntax on top of the streamline transform. It was not too difficult to get the basics going but I haven’t completed yet. This is available in the “await” branch.

    I really enjoyed your post. The idea of re-executing the function from start until all the async calls have completed is really clever.

  3. December 30, 2012 at 8:08 pm

    Node style callbacks are not dumb at all. However they make dumb programs to look really dumb. Let’s have a look at recurseDir() function from above. It is simple. Every programmer able to write sync version can make it async without any problem. Yes async version is a bit more verbose but from practical point they are equally readable and understandable. It’s really dumb to craft async util if everything you need to ship is just a recurse dir program. So let’s have a look at more complex case e.g build systems. We all know how rake, make, etc are work. They all use the same computational model which fits for building perfectly. It’s really dumb to write build scripts in imperative way and with asynchrony it becomes just unimaginable. On other hand use right design and asynchrony doesn’t add any complexity at all. I mentioned build systems not by accident. Their computational model is very useful for any application where you have to read/process complexly interrelated data from here and there. So design such applications right and you will never have troubles. The same is true for any kind of application.

    PS: You can have a look at https://github.com/eldargab/make-flow which is an implementation of build systems style control flow for every day use. I found that it solves the majority of callback troubles you typically have in node.

    • earwicker
      January 1, 2013 at 1:29 am

      I don’t disagree with that. Don’t know if you’re aware of the knockout library and the ko.computed function, but it’s effectively the same “make”-style dependency computation model but working on observable values (and it automatically figures out the dependencies), and it works great. So, yes, use the right tools for the job. But if you’re actually doing something imperative, then imperative code is precisely the right tool to be using, correct? If (for purely technical reasons) you have to call some async API at some stage in the imperative storyline, and that causes your programming language to fail to maintain the imperative style, then it is failing as a language in this area.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: