Archive

Posts Tagged ‘asychrony’

TypeScript 1.6 – Async functions

February 1, 2015 7 comments

Update: have retitled this post based on the roadmap, which excitingly now has generators and async/await slated for 1.6!

I realise that I’m in danger of writing the same blog post about once a year, and I am definitely going to start making notes on my experiences using TypeScript generally, now that I’m using it on an industrial scale (around 40,000 lines converted from JavaScript in the last month or so, and the latest features in 1.4 have taken it to a new level of brilliance).

But the fact is, we’re getting tantalisingly close to my holy grail of convenient async programming and static typing in one marvellous open source package, on JavaScript-enabled platforms. If you get TypeScript’s source:

git clone https://github.com/Microsoft/TypeScript.git
cd TypeScript

And then switch to the prototypeAsync branch:

git checkout prototypeAsync

And do the usual steps to build the compiler:

npm install -g jake
npm install
jake local 

You now have a TypeScript 1.5-ish compiler that you can run with:

node built/local/tsc.js -t ES5 my-code.ts

The -t ES5 flag is important because for the async code generation the compiler otherwise assumes that you’re targeting ES6, which (as of now, in browsers and mainstream node) you probably aren’t.

And then things are very straightforward (assuming you have a promisified API to call):

    async function startup() {

        if (!await fs.exists(metabasePath)) {
            await fs.mkdir(metabasePath);
        }
        if (!await fs.exists(coverartPath)) {
            fs.mkdir(coverartPath);
        }

        console.log("Loading metabase...");
        var metabaseJson: string;
        try {
            metabaseJson = await fs.readFile(metabaseFile, 'utf8');
        } catch (x) {
            console.log("No existing metabase found");
        }

        // and so on...

This corresponds very closely to previous uses of yield (such as this), but without the need to manually wrap the function in a helper that makes a promise out of a generator.

As explained in the ES7 proposal the feature can be described in exactly those terms, and sure enough the TypeScript compiler structures its output as a function that makes a generator, wrapped in a function that turns a generator into a promise.

This of course made me assume that ES6 generator syntax would also be implemented, but it’s not yet. But no matter! As I previously demonstrated with C#, if a generator has been wrapped in a promise, we can wrap it back in a generator.

To keep the example short and sweet, I’m going to skip three details:

  • exception handling (which is really no different to returning values)
  • passing values into the generator so they are “returned” from the next use of yield (similarly, passing in an Error so it will be thrown out of yield)
  • returning a value at the end of the generator.

The first two are just more of the same, but the last one turned out to be technically tricky and I suspect is impossible. It’s a quirky and non-essential feature of ES6 generators anyway.

To start with I need type declarations for Promise and also (for reasons that will become clear) Thenable, so I grabbed es6-promise.d.ts from DefinitelyTyped.

Then we write a function, generator that accepts a function and returns a generator object (albeit a simplified one that only has the next method):

    function generator<TYields>(
        impl: (yield: (val: TYields) => Thenable<void>
    ) => Promise<void>) {

        var started = false,
            yielded: TYields,
            continuation: () => void;

        function start() {
            impl(val => {
                yielded = val;
                return {
                    then(onFulfilled?: () => void) {
                        continuation = onFulfilled;
                        return this;
                    }
                };
            });
        }

        return {
            next(): { value?: TYields; done: boolean } {
                if (!started) {
                    started = true;
                    start();
                } else if (continuation) {
                    var c = continuation;
                    continuation = null;
                    c();
                }
                return !continuation ? { done: true } 
                    : { value: yielded, done: false };
            }
        };
    }

The impl function would be written using async/await, e.g.:

    var g = generator<string>(async (yield) => {

        console.log("Started");

        await yield("first");

        console.log("Continuing");

        for (var n = 0; n < 5; n++) {
            await yield("Number: " + n);
        }

        await yield("last");

        console.log("Done");
    });

Note how it accepts a parameter yield that is itself a function: this serves as the equivalent of the yield keyword, although we have to prefix it with await:

    await yield("first");

And then we can drive the progress of the generator g in the usual way, completely synchronously:

    for (var r; r = g.next(), !r.done;) {    
        console.log("-- " + r.value);
    }

Which prints:

Started
-- first
Continuing
-- Number: 0
-- Number: 1
-- Number: 2
-- Number: 3
-- Number: 4
-- last
Done

So how does this work? Well, firstly (and somewhat ironically) we have to avoid using promises as much as possible. The reason has to do with the terrifying Zalgo. As it says in the Promises/A+ spec, when you cause a promise to be resolved, this does not immediately (synchronously) trigger a call to any functions that have been registered via then. This is important because it ensures that such callbacks are always asynchronous.

But this has nothing to do with generators, which do not inherently have anything to do with asynchronicity. In the above example, we must be able to create the generator and iterate through it to exhaustion, all in a single event loop tick. So if we rely on promises to carry messages back and forth between the code inside and outside the generator, it just ain’t gonna work. Our “driving” loop on the outside is purely synchronous. It doesn’t yield for anyone or anything.

Hence, observe that when generator calls impl:

    impl(val => {
        yielded = val;
        return {
            then(onFulfilled?: () => void) {
                continuation = onFulfilled;
                return this;
            }
        };
    });

it completely ignored the returned promise, and in the implementation of yield (which is that lambda that accepts val) it cooks up a poor man’s pseudo-promise that clearly does not implement Promises/A+. Technically this is known as a mere Thenable. It doesn’t implement proper chaining behaviour (fortunately unnecessary in this context), instead returning itself. The onFulfilled function is just stashed in the continuation variable for later use in next:

    if (!started) {
        started = true;
        start();
    } else if (continuation) {
        var c = continuation;
        continuation = null;
        c();
    }
    return !continuation ? { done: true } 
                         : { value: yielded, done: false };

The first part is trivial: if we haven’t started, then start and remember that we’ve done so. Then we come to the meat of the logic: if this is the second time next has been called, then we’ve started. That means that impl has been called, and it ran until it hit the first occurrence of await yield, i.e.:

    await yield("first");

The TypeScript compiler’s generated code will have received our Thenable, and enlisted on it by calling then, which means we have stashed that callback in continuation. To be sure we only call it once, we “swap” it out of continuation into a temporary variable before we call it:

    var c = continuation;
    continuation = null;
    c();

That (synchronously) executes another chunk of impl until the next await yield, but note that we left continuation set to null. This is important because what if impl runs out of code to execute? We can detect this, because continuation will remain null. And so the last part looks like this:

    return !continuation ? { done: true } 
                         : { value: yielded, done: false };

Why do we have to use this stateful trickery? To reiterate (pun!) the promise returned by impl is meant to signal to us when impl has finished, but it’s just no good to us, because it’s a well-behaved promise, so it wouldn’t execute our callback until the next event loop tick, which is way too late in good old synchronous generators.

But this means we can’t get the final return value (if any) of impl, as the only way to see that from the outside is by enlisting on the returned promise. And that’s why I can’t make that one feature of generators work in this example.

Anyway, hopefully soon this will just be of nerdy historical interest, once generators make it into TypeScript. What might be the stumbling block? Well, TypeScript is all about static typing. In an ES6 generator in plain JavaScript, all names (that can be bound to values) have the same static type, known in TypeScript as any, or in the vernacular as whatever:

    function *g() {
        var x = yield "hello";
        var y = yield 52;
        yield [x, y];
    }

    var i = g();
    var a = i.next().value;
    var b = i.next(61).value;
    var c = i.next("humpty").value;

The runtime types are another matter: as the only kind of assignments here are initialisation, so each variable only ever contains one type of value, we can analyse it and associate a definite type with each:

x: number = 61
y: string = "humpty"
a: string = "hello"
b: number = 52;
c: [number, string] = [61, "humpty"]

But in TypeScript, we want the compiler to track this kind of stuff for us. Could it use type inference to do any good? The two questions to be answered are:

  • What is the type of the value accepted by next, which is also the type of the value “returned” by the yield operator inside the generator?
  • What is the type of the value returned in the value property of the object returned by next, which is also the type of value accepted by the yield operator?

The compiler could look at the types that the generator passes to yield. It could take the union of those types (string | number | [number, string]) and thus infer the type of the value property of the object returned by next. But the flow of information in the other direction isn’t so easy: the type of value “returned” from yield inside the generator depends on what the driving code passes to next. It’s not possible to tie down the type via inference alone.

There are therefore two possibilities:

  • Leave the type as any. This is not great, especially not if (like me) you’re a noImplicitAny adherent. Static typing is the whole point!
  • Allow the programmer to fully specify the type signature of yield.

The latter is obviously my preference. Imagine you could write the interface of a generator:

    interface MyGeneratorFunc {
        *(blah: string): Generator<number, string>;
    }

Note the * prefix, which would tell the compiler that we’re describing a generator function, by analogy with function *. And because it’s a generator, the compiler requires us to follow up with the return type Generator, which would be a built-in interface. The two type parameters describe:

  • the values passed to yield (and the final return value of the whole generator)
  • the values “returned” from yield

Note that the first type covers two kinds of outputs from the generator, but they have to be described by the same type because both are emitted in the value property of the object returned by the generator object’s next method:

function *g() {
    yield "eggs";
    yield "ham";
    return "lunch";
}

var i = g();
i.next() // {value: "eggs", done: false}
i.next() // {value: "ham", done: false}
i.next() // {value: "lunch", done: true} - Note: done === true 

Therefore, if we need them to be different types, we’ll have to use a union type to munge them together. In the most common simple use cases, the second type argument would be void.

This would probably be adequate, but in reality it’s trickier than this. Supposing in a parallel universe this extension was already implemented, but async/await was still the stuff of nightmares, how might we use it to describe the use of generators to achieve asynchrony? It’d be quite tricky. How about:

    interface AsyncFunc {
        *(): Generator<Promise<?>, ?>;
    }

See what I mean? What replaces those question marks? What we’d like to say is that wherever yield occurs inside the generator, it should accept a promise of a T and give back a plain T, where those Ts are the same type for a single use of yield, and yet it can be a different T for each use of yield in the same generator.

The hypothetical declaration above just can’t capture that relation. It’s all getting messy. No wonder they’re doing async/await first. On the other hand, we’re not in that universe, so maybe this stuff does’t matter.

These details aside, given how mature the prototype appears to be, I’m very much hoping that it will be released soon, with or without ordinary generators. It’s solid enough for me to use it for my fun home project, and it’s obviously so much better than any other way of describing complex asynchronous operations that I am even happy to give up IDE integration in order to use it (though I’d be very interested to hear if it’s possible to get it working in Visual Studio or Eclipse).

(But so far I’m sticking to a strict policy of waiting for official releases before unleashing them on my co-workers, so for now my day job remains on 1.4. And so my next post will be about some of the fun I’m having with that.)

Asynchronous Memoization in JavaScript

January 16, 2012 Leave a comment

In pure functional programming there is a simple rule: if you evaluate (call) a function more than once with the exact same argument values, you’ll keep getting the same return value. It follows that there is no need to call it more than once, which is super-awesome!! because it means you can put a caching mechanism in front of the function that keeps a map (hash table, Dictionary, etc.) of all the return values produced so far, each one keyed by the bundle of arguments that produced that return value.

Of course it’s only worth doing this if the dictionary lookup is faster than simply re-executing the function itself, and if the same small set of arguments are highly likely to be passed repeatedly. Yawn!!

In a non-pure language like JavaScript many (most?) functions are not pure: they examine information from other sources besides their parameters. However, they often have contexts within which they are “pure enough”. e.g. the information the user can see on the screen is, naively speaking, a projection of the information stored in the database, but if that were really true then when the database changes, the screen would immediately change as well. But it doesn’t; instead it usually remains stale until the user presses Refresh. This corresponds to “emptying the cache”.

In a complex app, there may be several separate components that project the same information in different ways. If they all go back to the external source for that information, and it is changing in real time, you could end up with an inconsistent picture on the screen. This might even cause instability, if one component tries to talk to another and they assume they’ll both have identical snapshots of the external information.

So memoization actually has a purpose in JavaScript: it can simulate a “freeze frame” of your dependencies on external data. But we need to provide the ability to delete things from the cache at the time of our choosing, “unfreezing” the frame so we can take a new snapshot.

Another complicating factor in JavaScript is asynchrony. JavaScript programmers just have to get used to doing the following transformation into “continuation-passing style” by hand; starting with:

var add = function(a1, a2) {
  return a1 + a2;
};

We switch to:

var add = function(a1, a2, done) {
  done(a1 + a2);
};

So the caller of add can no longer say:

var sum = add(2, 2);
alert('The answer is ' + sum);

And must instead say:

add(2, 2, function(sum) {
  alert('The answer is ' + sum);
});

This allows the implementation of add to utilise other functions that need to be passed a continuation in the same manner. Yo!!

So, let’s memoize. To simplify matters we’ll start by assuming we’ll be dealing with functions that take no parameters (or always take exactly the same parameters, it’s the same thing). It means we can replace the map with a single variable. We’re displaying “prices” (whatever the hell they are) to the user, so if synchronous was a realistic option we’d start with:

var getPrices = function() { 
    /// Talk to server to get prices, p, somehow.
    return p; 
};

Sadly we have to be asynchronous, but it’s no biggie:

var getPrices = function(done) {
    /// Talk to server to get prices, p, somehow.
    done(p); 
};

Seems like memoizing something that will be easy!

var makeCache = function(getter) {

  var value = null, ready = false;

  return {

    reset: function() {
      value = null;
      ready = false;
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        getter(function(got) {
          value = got;
          ready = true;
          done(value);
        });
      }
    }
  };

};

You’d use it to “wrap” the getPrices function like this:

var pricesCache = makeCache(getPrices);

pricesCache.get(function(prices) {
  // we've got the prices!  
});

And when you want to reset the cache, just say:

pricesCache.reset();

But actually there’s a bug here: do you know what it is? Give yourself a playful slap on the thigh if you got it.

What if there’s more than one call to pricesCache.get before the first one comes back with the data? We only set the ready flag when we’ve got the answer, which might take a second. In the meantime, various parts of the UI might be grabbing the prices to make their own updates. Each such call will launch a separate (unnecessary) call to the backend. What’s worse is that the prices may actually change during this mess, and so the callers will end up with inconsistent price information, just like I was a-bellyachin’ about up yonder.

First reaction: oh, oh, I know, it’s a state machine! We thought there were two states, as indicated by the boolean ready flag. But actually there’s three:

  1. No value.
  2. Okay, I’m, getting the value, sheesh.
  3. Got the value.

But hold on to your hose, little fireman. Think this one through for a second. It’s pretty clear that when the first caller tries to get, we need to transition to the middle state and make our call to the real getter function. And when the prices come back to us, we transition to the final state and call the callback function. But what about when a second caller tries to get and we’re already in the middle state? That’s the whole reason for doing this, to be able to handle that differently. Where do we put their callback function?

So, yes, it is a state machine, but not a three-state one. We need to keep a list of callback functions, so that when the prices come back, we can loop through those callback functions and give every single gosh darn one of them a calling they ain’t gonna forgit:

var makeCache = function(getter) {

  var value, ready, waiting = [];

  return {

    reset: function() {
      value = null;
      ready = false;
      waiting = [];
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        waiting.push(done);

        if (waiting.length === 1) {
          getter(function(got) {

            value = got;
            ready = true;

            waiting.forEach(function(w) { w(value); });
            waiting = null;
          });
        }
      }
    }
  };

};

Notice how I use waiting.forEach to loop through the callbacks. By definition here I’m calling some code that I don’t have control of. It might call back into pricesCache.get. That may seem intrinsically problematic, because it sounds like it could keep happening forever and cause a stack overflow. But it might be perfectly valid: there could be some separate code making the second call to get the prices, which supplies a different callback. Anyway, is it a problem for my cache implementation? No, because any calls to pricesCache.get during my callback loop will find that ready is already set, and so will not affect the waiting array. And even if pricesCache.reset were called, that would cause a fresh array to be created and stored in waiting.

And finally, nice piece of trivia for ya: even if there was some way for waiting to grow while we are still looping through it, according to the specification of Array.forEach the new item(s) won’t be included in the iteration.