Archive

Posts Tagged ‘async’

TypeScript 1.6 – Async functions

February 1, 2015 7 comments

Update: have retitled this post based on the roadmap, which excitingly now has generators and async/await slated for 1.6!

I realise that I’m in danger of writing the same blog post about once a year, and I am definitely going to start making notes on my experiences using TypeScript generally, now that I’m using it on an industrial scale (around 40,000 lines converted from JavaScript in the last month or so, and the latest features in 1.4 have taken it to a new level of brilliance).

But the fact is, we’re getting tantalisingly close to my holy grail of convenient async programming and static typing in one marvellous open source package, on JavaScript-enabled platforms. If you get TypeScript’s source:

git clone https://github.com/Microsoft/TypeScript.git
cd TypeScript

And then switch to the prototypeAsync branch:

git checkout prototypeAsync

And do the usual steps to build the compiler:

npm install -g jake
npm install
jake local 

You now have a TypeScript 1.5-ish compiler that you can run with:

node built/local/tsc.js -t ES5 my-code.ts

The -t ES5 flag is important because for the async code generation the compiler otherwise assumes that you’re targeting ES6, which (as of now, in browsers and mainstream node) you probably aren’t.

And then things are very straightforward (assuming you have a promisified API to call):

    async function startup() {

        if (!await fs.exists(metabasePath)) {
            await fs.mkdir(metabasePath);
        }
        if (!await fs.exists(coverartPath)) {
            fs.mkdir(coverartPath);
        }

        console.log("Loading metabase...");
        var metabaseJson: string;
        try {
            metabaseJson = await fs.readFile(metabaseFile, 'utf8');
        } catch (x) {
            console.log("No existing metabase found");
        }

        // and so on...

This corresponds very closely to previous uses of yield (such as this), but without the need to manually wrap the function in a helper that makes a promise out of a generator.

As explained in the ES7 proposal the feature can be described in exactly those terms, and sure enough the TypeScript compiler structures its output as a function that makes a generator, wrapped in a function that turns a generator into a promise.

This of course made me assume that ES6 generator syntax would also be implemented, but it’s not yet. But no matter! As I previously demonstrated with C#, if a generator has been wrapped in a promise, we can wrap it back in a generator.

To keep the example short and sweet, I’m going to skip three details:

  • exception handling (which is really no different to returning values)
  • passing values into the generator so they are “returned” from the next use of yield (similarly, passing in an Error so it will be thrown out of yield)
  • returning a value at the end of the generator.

The first two are just more of the same, but the last one turned out to be technically tricky and I suspect is impossible. It’s a quirky and non-essential feature of ES6 generators anyway.

To start with I need type declarations for Promise and also (for reasons that will become clear) Thenable, so I grabbed es6-promise.d.ts from DefinitelyTyped.

Then we write a function, generator that accepts a function and returns a generator object (albeit a simplified one that only has the next method):

    function generator<TYields>(
        impl: (yield: (val: TYields) => Thenable<void>
    ) => Promise<void>) {

        var started = false,
            yielded: TYields,
            continuation: () => void;

        function start() {
            impl(val => {
                yielded = val;
                return {
                    then(onFulfilled?: () => void) {
                        continuation = onFulfilled;
                        return this;
                    }
                };
            });
        }

        return {
            next(): { value?: TYields; done: boolean } {
                if (!started) {
                    started = true;
                    start();
                } else if (continuation) {
                    var c = continuation;
                    continuation = null;
                    c();
                }
                return !continuation ? { done: true } 
                    : { value: yielded, done: false };
            }
        };
    }

The impl function would be written using async/await, e.g.:

    var g = generator<string>(async (yield) => {

        console.log("Started");

        await yield("first");

        console.log("Continuing");

        for (var n = 0; n < 5; n++) {
            await yield("Number: " + n);
        }

        await yield("last");

        console.log("Done");
    });

Note how it accepts a parameter yield that is itself a function: this serves as the equivalent of the yield keyword, although we have to prefix it with await:

    await yield("first");

And then we can drive the progress of the generator g in the usual way, completely synchronously:

    for (var r; r = g.next(), !r.done;) {    
        console.log("-- " + r.value);
    }

Which prints:

Started
-- first
Continuing
-- Number: 0
-- Number: 1
-- Number: 2
-- Number: 3
-- Number: 4
-- last
Done

So how does this work? Well, firstly (and somewhat ironically) we have to avoid using promises as much as possible. The reason has to do with the terrifying Zalgo. As it says in the Promises/A+ spec, when you cause a promise to be resolved, this does not immediately (synchronously) trigger a call to any functions that have been registered via then. This is important because it ensures that such callbacks are always asynchronous.

But this has nothing to do with generators, which do not inherently have anything to do with asynchronicity. In the above example, we must be able to create the generator and iterate through it to exhaustion, all in a single event loop tick. So if we rely on promises to carry messages back and forth between the code inside and outside the generator, it just ain’t gonna work. Our “driving” loop on the outside is purely synchronous. It doesn’t yield for anyone or anything.

Hence, observe that when generator calls impl:

    impl(val => {
        yielded = val;
        return {
            then(onFulfilled?: () => void) {
                continuation = onFulfilled;
                return this;
            }
        };
    });

it completely ignored the returned promise, and in the implementation of yield (which is that lambda that accepts val) it cooks up a poor man’s pseudo-promise that clearly does not implement Promises/A+. Technically this is known as a mere Thenable. It doesn’t implement proper chaining behaviour (fortunately unnecessary in this context), instead returning itself. The onFulfilled function is just stashed in the continuation variable for later use in next:

    if (!started) {
        started = true;
        start();
    } else if (continuation) {
        var c = continuation;
        continuation = null;
        c();
    }
    return !continuation ? { done: true } 
                         : { value: yielded, done: false };

The first part is trivial: if we haven’t started, then start and remember that we’ve done so. Then we come to the meat of the logic: if this is the second time next has been called, then we’ve started. That means that impl has been called, and it ran until it hit the first occurrence of await yield, i.e.:

    await yield("first");

The TypeScript compiler’s generated code will have received our Thenable, and enlisted on it by calling then, which means we have stashed that callback in continuation. To be sure we only call it once, we “swap” it out of continuation into a temporary variable before we call it:

    var c = continuation;
    continuation = null;
    c();

That (synchronously) executes another chunk of impl until the next await yield, but note that we left continuation set to null. This is important because what if impl runs out of code to execute? We can detect this, because continuation will remain null. And so the last part looks like this:

    return !continuation ? { done: true } 
                         : { value: yielded, done: false };

Why do we have to use this stateful trickery? To reiterate (pun!) the promise returned by impl is meant to signal to us when impl has finished, but it’s just no good to us, because it’s a well-behaved promise, so it wouldn’t execute our callback until the next event loop tick, which is way too late in good old synchronous generators.

But this means we can’t get the final return value (if any) of impl, as the only way to see that from the outside is by enlisting on the returned promise. And that’s why I can’t make that one feature of generators work in this example.

Anyway, hopefully soon this will just be of nerdy historical interest, once generators make it into TypeScript. What might be the stumbling block? Well, TypeScript is all about static typing. In an ES6 generator in plain JavaScript, all names (that can be bound to values) have the same static type, known in TypeScript as any, or in the vernacular as whatever:

    function *g() {
        var x = yield "hello";
        var y = yield 52;
        yield [x, y];
    }

    var i = g();
    var a = i.next().value;
    var b = i.next(61).value;
    var c = i.next("humpty").value;

The runtime types are another matter: as the only kind of assignments here are initialisation, so each variable only ever contains one type of value, we can analyse it and associate a definite type with each:

x: number = 61
y: string = "humpty"
a: string = "hello"
b: number = 52;
c: [number, string] = [61, "humpty"]

But in TypeScript, we want the compiler to track this kind of stuff for us. Could it use type inference to do any good? The two questions to be answered are:

  • What is the type of the value accepted by next, which is also the type of the value “returned” by the yield operator inside the generator?
  • What is the type of the value returned in the value property of the object returned by next, which is also the type of value accepted by the yield operator?

The compiler could look at the types that the generator passes to yield. It could take the union of those types (string | number | [number, string]) and thus infer the type of the value property of the object returned by next. But the flow of information in the other direction isn’t so easy: the type of value “returned” from yield inside the generator depends on what the driving code passes to next. It’s not possible to tie down the type via inference alone.

There are therefore two possibilities:

  • Leave the type as any. This is not great, especially not if (like me) you’re a noImplicitAny adherent. Static typing is the whole point!
  • Allow the programmer to fully specify the type signature of yield.

The latter is obviously my preference. Imagine you could write the interface of a generator:

    interface MyGeneratorFunc {
        *(blah: string): Generator<number, string>;
    }

Note the * prefix, which would tell the compiler that we’re describing a generator function, by analogy with function *. And because it’s a generator, the compiler requires us to follow up with the return type Generator, which would be a built-in interface. The two type parameters describe:

  • the values passed to yield (and the final return value of the whole generator)
  • the values “returned” from yield

Note that the first type covers two kinds of outputs from the generator, but they have to be described by the same type because both are emitted in the value property of the object returned by the generator object’s next method:

function *g() {
    yield "eggs";
    yield "ham";
    return "lunch";
}

var i = g();
i.next() // {value: "eggs", done: false}
i.next() // {value: "ham", done: false}
i.next() // {value: "lunch", done: true} - Note: done === true 

Therefore, if we need them to be different types, we’ll have to use a union type to munge them together. In the most common simple use cases, the second type argument would be void.

This would probably be adequate, but in reality it’s trickier than this. Supposing in a parallel universe this extension was already implemented, but async/await was still the stuff of nightmares, how might we use it to describe the use of generators to achieve asynchrony? It’d be quite tricky. How about:

    interface AsyncFunc {
        *(): Generator<Promise<?>, ?>;
    }

See what I mean? What replaces those question marks? What we’d like to say is that wherever yield occurs inside the generator, it should accept a promise of a T and give back a plain T, where those Ts are the same type for a single use of yield, and yet it can be a different T for each use of yield in the same generator.

The hypothetical declaration above just can’t capture that relation. It’s all getting messy. No wonder they’re doing async/await first. On the other hand, we’re not in that universe, so maybe this stuff does’t matter.

These details aside, given how mature the prototype appears to be, I’m very much hoping that it will be released soon, with or without ordinary generators. It’s solid enough for me to use it for my fun home project, and it’s obviously so much better than any other way of describing complex asynchronous operations that I am even happy to give up IDE integration in order to use it (though I’d be very interested to hear if it’s possible to get it working in Visual Studio or Eclipse).

(But so far I’m sticking to a strict policy of waiting for official releases before unleashing them on my co-workers, so for now my day job remains on 1.4. And so my next post will be about some of the fun I’m having with that.)

Advertisements

A pure library approach to async/await in standard JavaScript

December 28, 2012 5 comments

I’m very keen on JavaScript gaining actual language support for the one thing it is mostly used for: asynchronous programming, so I never shut up about it.

The C# approach was trailed very nicely here back in Oct 2010. Over the years I’ve occasionally searched for usable projects that might implement this kind of thing in JavaScript, but they all seem to be abandoned, so I stopped looking.

And then I had an idea for how to do it purely with an ordinary JS library – a crazy idea, to be sure, but an idea nonetheless. So I searched to see if anyone else had come up with it (let me rephrase that: like all simple ideas in software, it’s a certainty that other people came up with it decades ago, probably in LISP, but I searched anyway).

I haven’t found anything yet, but I did find Bruno Jouhier’s streamline.js, which looks like a very nice (and non-abandoned!) implementation of the precompiler approach.

So what was my crazy idea? Well, as Streamline’s creator said in a reply to a comment on his blog:

But, no matter how clever it is, a pure JS library will never be able to solve the “topological” issue that I tried to describe in my last post… You need extra power to solve this problem: either a fiber or coroutine library with a yield call, a CPS transform like streamline, or direct support from the language (a yield operator).

Well, that sounds like a challenge!

If we really really want to, we can in fact solve this problem with a pure library running in ordinary JavaScript, with no generators or fibers and no precompiler. Whether this approach is practical is another matter, partly because it chews the CPU like a hungry wolf, but also because the main selling point for this kind of thing is that it makes life easier for beginners, and unfortunately to use my approach you have to understand the concept of a pure function and pay close attention to when you’re stepping outside of purity.

To set the stage, Bruno Jouhier’s example is a node.js routine that recurses through file system directories. Here’s a simplified version using the sync APIs:

var fs = require('fs');
var path = require('path');

var recurseDir = function(dir) {
    fs.readdirSync(dir).forEach(function(child) {
        if (child[0] != '.') {
            var childPath = path.join(dir, child);
            if (fs.statSync(childPath).isDirectory()) {
                recurseDir(childPath);
            } else {
                console.log(childPath);
            }
        }
    });
};

recurseDir(process.argv[2]);

And – ta da! – here’s a version that uses the async APIs but appears not to:

var fs = require('fs');
var path = require('path');

var Q = require('q');
var interrupt = require('./interrupt.js');

var readdir = interrupt.bind(Q.nfbind(fs.readdir));
var stat = interrupt.bind(Q.nfbind(fs.stat));
var consoleLog = interrupt.bind(console.log);

interrupt.async(function() {

    var recurseDir = function(dir) {
        readdir(dir).forEach(function(child) {
            if (child[0] != '.') {
                var childPath = path.join(dir, child);
                if (stat(childPath).isDirectory()) {
                    recurseDir(childPath);
                } else {
                    consoleLog(childPath);
                }
            }
        });
    };

    recurseDir(process.argv[2]);
});

The core of the program, the recurseDir function, looks practically identical. The only difference is that it calls specially wrapped versions of readdir, stat and console.log, e.g.

var readdir = interrupt.bind(Q.nfbind(fs.readdir));

The inner wrapper Q.nfbind is from the lovely q module that provides us with promises with (almost) the same pattern as jQuery.Deferred. Q.nfbind wraps a node API so that instead of accepting a function(error, result) callback it returns a promise, which can reduce yuckiness by up to 68%.

But interrupt.bind is my own fiendish contribution:

exports.bind = function(impl) {
    return function() {
        var that = this;
        var args = arguments;
        return exports.await(function() {
            return impl.apply(that, args);
        });
    };
};

So it wraps a promise-returning function inside that interrupt.await thingy. To understand what that is for, we have to go back to the start of the example, where we say:

interrupt.async(function() {

The function we pass there (let’s call it our “async block”) will be executed multiple times – this is a basic fact about all interruptible coroutines. They can be paused and restarted. But standard JavaScript doesn’t provide a way to stop a function and then start up again from the same point. You can only start again from the beginning.

In order for that to work, when a function reruns all its activity a second, third, fourth… (and so on) time, repeating everything it has already done, then it has to behave exactly the same as it did the previous time(s). Which is where functional purity comes in. A pure function is one that returns the same value when provided with the same arguments. So Math.random is not pure. Nor is reading the file system (because it might change under your feet). But quite a lot of things are pure: anything that only depends on our parameters, or the local variables containing whatever we figured out so far from our parameters.

So, inside interrupt.async we can do anything pure without headaches. But whenever we want to know about the outside world, we have to be careful. The way we do that is with interrupt.await, e.g.

var stuff = interrupt.await(function() {
    return $.get('blah-de-blah');
});

The first time the async block runs, when it goes into interrupt.wait, it executes the function we pass to it (the “initializer”), which in this case starts a download and returns a promise that will be resolved when the download is ready. But then interrupt.wait throws an exception, which cancels execution of the async block. When the promise is resolved, the async block is executed again, and this time interrupt.wait totally ignores the function passed to it, but instead returns the result of the download from the promise created on the first run, which I call an externality (because it’s data that came from outside).

The internal representation is actually quite simple. Here’s interrupt.async:

function Interrupt() {}

var currentContext = null;

exports.async = function(impl) {
    log('Creating async context');
    var thisContext = {
        ready: [],
        waiting: null,
        slot: 0,
        result: defer(),
        attempt: function() {
            log('Swapping in context for execution attempt');
            var oldContext = currentContext;
            currentContext = thisContext;
            currentContext.slot = 0;
            try {
                thisContext.result.resolve(impl());
                log('Completed successfully');
            } catch (x) {
                if (x instanceof Interrupt) {
                    log('Execution was interrupted');
                    return;
                } else {
                    log('Exception occurred: ' + JSON.stringify(x));
                    throw x;
                }
            } finally {
                log('Restoring previous context');
                currentContext = oldContext;
            }
        }
    }
    log('Making first attempt at execution');
    thisContext.attempt();
    return getPromise(thisContext.result);
};

The important part is the context, which has an array, ready, of previously captured externalities, and an integer, slot, which is the index in the ready array where the next externality will be recorded.

The more fiddly work is done in interrupt.await:

exports.await = function(init) {
    if (!currentContext) {
        throw new Error('Used interrupt.await outside of interrupt.async');
    }
    var ctx = currentContext;
    if (ctx.ready.length > ctx.slot) {
        log('Already obtained value for slot ' + ctx.slot);
        var val = ctx.ready[ctx.slot];
        if (val && val.__exception) {
            log('Throwing exception for slot ' + ctx.slot);
            throw val.__exception;
        }
        log('Returning value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
        ctx.slot++;
        return val;
    }
    if (ctx.waiting) {
        log('Still waiting for value for ' + ctx.slot + ', will interrupt');
        throw new Interrupt();
    }
    log('Executing initializer for slot ' + ctx.slot);
    var promise = init();
    if (promise && promise.then) {
        log('Obtained a promise for slot ' + ctx.slot);
        var handler = function(val) {
            if ((ctx.slot != ctx.ready.length) ||
                (ctx.waiting != promise)) {
                throw new Error('Inconsistent state in interrupt context');
            }
            log('Obtained a value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
            ctx.ready.push(val);
            ctx.waiting = null;
            log('Requesting retry of execution');
            ctx.attempt();
        };
        promise.then(handler, function(reason) {
            log('Obtained an error ' + JSON.stringify(reason) + ' for slot ' + ctx.slot);
            handler({ __exception: reason });
        });
        ctx.waiting = promise;
        throw new Interrupt();
    }
    if (ctx.slot != ctx.ready.length) {
        throw new Error('Inconsistent state in interrupt context');
    }
    // 'promise' is not a promise!
    log('Obtained a plain value ' + JSON.stringify(promise) + ' for slot ' + ctx.slot);
    ctx.ready.push(promise);
    ctx.slot++;
    return promise;
};

It can deal with an initializer that returns a plain value, and avoids the overhead of interrupting and restarting in that case, but still enforces the same behaviour of capturing the externality so it can be returned on any subsequent repeat run instead of running the initializer again.

In fact we have an example of this in the original example: console.log is not a pure function. It has side-effects: conceptually, it returns a new state-of-the-universe every time we call it. So it has to be wrapped in interrupt.await, just like any other impure operation, and we faithfully record that it returned undefined so we can return that next time we execute the same step. In this case we’re not really recording a particular external value, but we are recording the fact that we’ve already caused a particular external side-effect, so we don’t cause it multiple times.

As long as the await-wrapping rule is followed, it all works perfectly. The problem, of course, is that if there are a lot of asynchronous promises and side-effecting calls involved, then it will start to slow down as it repeatedly stops and re-executes everything it has done so far. Although in fact, it doesn’t repeat everything. A lot of the hard work involves interacting with the OS, and that is (of necessity) wrapped in interrupt.await, and so only happens once. On subsequent executions the value cached in the ready array is reused, which is looked up by position, which is quite fast. So each re-execution only involves “going through the motions” to get back to where it left off.

Even so, this extra grinding of the CPU does start to slow it down very noticeably after a healthy number of interrupts (modern JavaScript is fast, but not a miracle worker). The recursion of the file system is a very good test, because it has to effectively recursively revisit all the places it already visited so far, and has to do this for every single file (due to the stat call) and twice for directories.

One way to “cheat” would be to replace the Array.prototype.forEach function with something that understood how to interact with the current async context, and could skip forward to the right place in the iteration… but I’ll save that for another rainy day.

Categories: Uncategorized Tags: , ,

A way that async/await-style functionality might make it into browsers

October 9, 2012 1 comment

Many moons ago (actually it’s 35.7 moons ago) I wrote an excited post about JavaScript generators in Firefox. Sadly they are still only in Firefox, and there’s no sign of them appearing elsewhere.

But one way they could appear elsewhere is via compilers that generate plain JavaScript, and the latest player is TypeScript. Why is this a good fit? Because, with a very thin layer of helper code, generators replicate the functionality of C# 5’s async/await, and if that was a good idea for C# and Silverlight, it’s got to be a good idea for TypeScript. (The downside is that the auto-generated JavaScript would not be very readable, but I can still dream that this will happen…)

My old post is somewhat messy because I was unaware of jQuery.Deferred. If we bring that into the mix, to serve as the JS equivalent of Task, then things get really nice. To wit:

async(function() {
  
  // sleeping inside a loop
  for (var n = 0; n < 10; n++) {
    $('.counter').text('Counting: ' + n);
    yield sleep(500);
  }

  // asynchronous download
  $('.downloaded').text(yield $.get('test.txt'));

  // and some more looping, why not
  for (var n = 0; n < 10; n++) {
    $('.counter').text('Counting: ' + (10 - n));
    yield sleep(500);
  }
});

In other words, by passing a function to something called async, I can use the yield keyword in exactly the same way as C# 5’s await keyword.

The yielded values are just jquery.Deferred objects – or rather, they are objects that contain a function called done, to which a resulting value may be passed (at some later time). So the implementation of sleep is straightforward:

var sleep = function(ms) {
  var d = $.Deferred();
  setTimeout(function() { d.resolve(); }, ms);
  return d;
};

By calling resolve, we trigger any registered done functions. So who is registering? This is what async looks like:

var async = function(gen) {
  var result;
  var step = function() {
    var yielded;

    try {
      yielded = gen.send(result); // run to next yield
    } catch (x) {
      if (x instanceof StopIteration) {
        return;
      }
      throw x;
    }

    yielded.done(function(newResult) {
      result = newResult; // what to return from yield
      step();
    });
  };
  gen = gen(); // start the generator
  step();
};

So async calls the function passed to it to get a generator object. It can then repeatedly call send on that object to pass it values (which will be returned from yield inside the generator). It assumes that the objects that come back from send (which were passed to yield inside the generator) will have a done function, allowing us to register to be notified when an asynchronous operation completes.

Note that async could use some further complication, because it currently doesn’t deal with exceptions (note that the try/catch block above is merely to deal with the strange way that generators indicate when they’ve finished). But generators have full support for communicating exceptions back to the yielding code, so it should all be do-able.

And that’s all there is to it. You can see the example running here:

http://earwicker.com/yieldasync/

… but only in Firefox, of course.

Categories: Uncategorized Tags: , ,

A Boring Discovery!

August 28, 2012 3 comments

I was writing a simple example program to explain C#5’s async/await keywords. To keep it really simple and “old school”, I decided to make it a console app, and to read lines from the console until the user typed quit. First, the synchronous version:

public static List<string> Foo()
{
    var lines = new List<string>();

    string line;
    while ((line = Console.In.ReadLine()) != "quit")
        lines.Add(line);

    return lines;
}

And then the async version:

public static async Task<List<string>> Foo()
{
    var lines = new List<string>();

    string line;
    while ((line = await Console.In.ReadLineAsync()) != "quit")
        lines.Add(line);

    return lines;
}

Really straightforward, right? I just add the async keyword, wrap the return type inside Task<T> and then I can use await Console.In.ReadLineAsync() instead of Console.In.ReadLine().

So I tried this, and gosh-dang-diddly, it didn’t behave at all as expected. In fact, both versions behaved the same. Could it be something exciting to do with how the SynchronizationContext is set up in console apps? Sorry, no. Try to think of something much duller than that.

The answer is… wait for it… ReadLineAsync() isn’t asynchronous at all. It doesn’t return its Task<string> until the whole line has been read.

Why is that? TextReader.ReadLineAsync appears to be properly asynchronous, and Console.In returns a TextReader as you’d expect… but not quite. It first passes it to TextReader.Synchronized, and guess how the resulting wrapper class implements ReadLineAsync? Thanks to .NET Reflector, we don’t have to guess:

public override Task<string> ReadLineAsync()
{
    return Task.FromResult<string>(this.ReadLine());
}

Yep, it calls ordinary ReadLine, totally synchronously, and then makes an already-finished Task<string>. In real life this is probably harmless because it’s such an edge case, but you can imagine the tedious diversion it caused in the middle of the explanation I was giving.

To get around it, I used this alternative helper, to move the ReadLine to the thread pool:

public static Task<string> ReadConsoleAsync()
{
    return Task.Run(() => Console.ReadLine());
}

Moral: I have no idea. How about “Sometimes, things don’t exactly work.” Is that a moral?

Categories: Uncategorized Tags: , ,

Async/await iterator – updated for Visual Studio 11 Preview

January 29, 2012 2 comments

A long overdue install of the Visual Studio 11 Preview, and the changes to the asynchronous language features since 2010 (my, how time flies) are enough to break the code I blogged over a year ago.

The first problem is a few of the methods of an “awaiter” (what in C++ we’d call the awaiter concept) have been renamed, and there’s now a property called IsCompleted, and that’s fine and dandy.

But when I tried exercising the code I hit a more thorny problem, which is that my test program would terminate somewhat randomly when an exception was rethrown from a background thread. For a program that I thought was single threaded, that’s pretty bad!

I don’t have my install of the original CTP, so I’m not sure about this, but I think a fairly major change was made since then: there’s now a difference between an async method that returns void and an async method that returns Task (as opposed to Task<T>).

Contrary to what might be assumed, the relationship between Task and Task<T> is not the same as that between IEnumerable and IEnumerable<T>. That is, Task is not some old pre-generics version of the same idea. Instead, it was specially created to represent a task that doesn’t return any value at all; that is, something like void, but asynchronous.

I believe (though I’m not certain) that in the original CTP, a void async method would actually return a Task, so as to ensure that its lifetime could be managed externally even though it wouldn’t produce a value. But in the latest version that is not the case: the Task associated with an void async method is just not available, and the compiler generated version of the method really does return void. Which means in turn that you can’t use await on such methods.

You can still explicitly declare your async method to return Task, so nothing has been lost. And this certainly makes everything more clear and consistent to callers: methods really do return what they are declared to return, as usual. But it also changes the behaviour of exceptions.

In all case, if an exception tries to escape out of your async method, there is a catch-all handler in the compiler-generated state machine which will catch it, so it can be rethrown in an appropriate context. But the choice of context depends totally on whether the method returns void or Task. The policy is determined by AsyncVoidMethodBuilder or AsyncTaskMethodBuilder respectively. With the help of Resharper, we can see that the latter gives the caught exception to the Task, via task.TrySetException. So then the decision to rethrow (or not) is entirely up to whoever has a hold of the Task. They can check the Exception property whenever.

But in the void case, it’s totally different. The Task never gets passed the exception. What would be the point? We can’t get at the Task. The exception is unobservable; to avoid that loss of information, an arrangement is made to rethrow the exception at the next available opportunity, by creating a delegate that will rethrow it and then posting that delegate to the “context”.

The “context” is a somewhat vague concept; the architecture uses three different representations, depending on the scenario. But in the case of a simple console-based test program, the exception-rethrowing delegate is simply passed to the thread pool, and so it brings down the whole process at a random time (though reasonably soon). In a GUI program the exception would be thrown on the main GUI thread. You can supply your own context by setting a per-thread instance of SynchronizationContext, in which you can override the Post method. It doesn’t let you get at the exception, but it does give you a delegate that, if you executed it, would throw the exception, which you can then catch!

The upshot? An exception that leaves an async void is definitely a sign of a bug somewhere. Although of course this does not automatically mean you should add your own catch-all! Sometimes crashing the process is the least-worst option. There is no single correct way to deal with bugs – it’s a question of economics and so is not an exact science.

So in short, async void is a niche thing. In most situations you almost certainly want async Task with no type argument. And my example of implementing the equivalent of yield return definitely needs updating.

Firstly I stash the Task in a field. Second, after executing the continuation I check the Task.Exception property to see if anything bad happened that needs rethrowing:

if (_task.Exception != null)
{
    // Unpeel the AggregateException wrapping
    Exception inner = _task.Exception;
    while (inner is AggregateException)
        inner = inner.InnerException;

    throw inner;
}

Aside from that it works much the same way as before, though I’ve added a lot of comments and organised it a little differently to hopefully make the behaviour clearer. I’ve also had to add an implementation of the new awaiter property:

public bool IsCompleted
{
    get { return false; }
}

Well, that was easy. Returning true would be a very bad idea in this example, as we can discover with more Resharper digging. The compiler-generated state machine examines that property, and if it is true then it doesn’t bother to yield control back to the thread. So we don’t get the interleaved execution behaviour that we’re relying on.

Here’s the whole thing:

public delegate Task IteratorMethod(YieldEnumerator e);

public class YieldEnumerator : IEnumerator
{
    // Will be executed to get the next value
    private Action _continuation;

    // Will become the value of Current
    private TItem _nextValue;
    private bool _hasNextValue;

    // To be thrown inside the async method, as if by the await keyword
    private Exception _exception;

    // The task associated with our running async method
    private Task _task;

    public YieldEnumerator(IteratorMethod iteratorMethod)
    {
        _task = iteratorMethod(this);
    }

    private void Execute()
    {
        // If we already have a buffered value that hasn't been
        // retrieved, we shouldn't do anything yet. If we don't
        // and there's no continuation to run, we've finished.
        // And if _task is null, we've been disposed.
        if (_hasNextValue || _continuation == null || _task == null)
            return;

        // Be ultra-careful not to run same _continuation twice
        var t = _continuation;
        _continuation = null;
        t(); // may or may not have stored a new _continuation

        // And may also have hit a snag!
        if (_task.Exception != null)
        {
            // Unpeel the AggregateException wrapping
            Exception inner = _task.Exception;
            while (inner is AggregateException)
                inner = inner.InnerException;

            throw inner;
        }
    }

    public YieldEnumerator GetAwaiter()
    {
        return this;
    }

    // Performance optimisation added since original CTP. If we
    // returned true, the compiler-generated code would bypass the
    // OnCompleted/GetResult dance altogether, and the flow of the
    // async method would never be interrupted in the way that we
    // require.
    public bool IsCompleted
    {
        get { return false; }
    }

    // Was called BeginAwait in the original CTP
    public void OnCompleted(Action continuation)
    {
        Debug.Assert(_continuation == null);
        _continuation = continuation;
    }

    // Was called EndAwait
    public void GetResult()
    {
        // This is called by compiler-generated code caused by the
        // await keyword, so it's a chance to throw an exception to
        // be caught by the code in the async method
        if (_exception != null)
        {
            var t = _exception;
            _exception = null;
            throw t;
        }
    }

    // Our equivalent of yield return
    public YieldEnumerator YieldReturn(TItem value)
    {
        if (_hasNextValue)
        {
            // Shouldn't happen because MoveNext ought to have
            // been called and we should be inside the async
            // code at this point
            throw new InvalidOperationException();
        }

        _nextValue = value;
        _hasNextValue = true;
        return this;
    }

    public TItem Current { get; private set; }

    object System.Collections.IEnumerator.Current
    {
        get { return Current; }
    }

    public bool MoveNext()
    {
        Execute();

        if (_hasNextValue)
        {
            Current = _nextValue;
            _hasNextValue = false;
            return true;
        }

        return false;
    }

    private sealed class AbandonEnumeratorException : Exception {}

    public void Dispose()
    {
        // If async method is not yet complete, throw an exception
        // inside it to make it grind to a halt
        if (_continuation != null)
        {
            _exception = new AbandonEnumeratorException();
            try { Execute(); } catch (AbandonEnumeratorException) { }
        }

        _task.Dispose();
        _task = null;
    }

    public void Reset()
    {
        throw new NotImplementedException("Reset");
    }
}

// The usual obvious IEnumerable to go with our IEnumerator
public class YieldEnumerable : IEnumerable
{
    private readonly IteratorMethod _iteratorMethod;

    public YieldEnumerable(IteratorMethod iteratorMethod)
    {
        _iteratorMethod = iteratorMethod;
    }

    public IEnumerator GetEnumerator()
    {
        return new YieldEnumerator(_iteratorMethod);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

class Program
{
    public static async Task MyIteratorMethod1(YieldEnumerator e)
    {
        Console.WriteLine("A");
        await e.YieldReturn(1);
        Console.WriteLine("B");
        await e.YieldReturn(2);
        Console.WriteLine("C");
        await e.YieldReturn(3);
        Console.WriteLine("D");
    }

    public static async Task MyIteratorMethod2(YieldEnumerator e)
    {
        try
        {
            Console.WriteLine("A");
            await e.YieldReturn(1);
            Console.WriteLine("B");
            await e.YieldReturn(2);
            Console.WriteLine("C");
            await e.YieldReturn(3);
            Console.WriteLine("D");
        }
        finally
        {
            Console.WriteLine("Running finally");
        }
    }

    public static async Task MyIteratorMethodInfinite(YieldEnumerator e)
    {
        for (var n = 0; ; n++)
            await e.YieldReturn(n);
    }

    public static async Task MyIteratorBroken1(YieldEnumerator e)
    {
        // always happens, but compiler doesn't know that
        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(1);
    }

    public static async Task MyIteratorBroken2(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");
    }

    public static async Task MyIteratorBroken3(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(2);
    }

    static void Main(string[] args)
    {
        foreach (var i in new YieldEnumerable(MyIteratorMethod1))
            Console.WriteLine("Yielded: " + i);

        foreach (var i in new YieldEnumerable(MyIteratorMethod2))
        {
            Console.WriteLine("Yielded: " + i);
            break; // finally should still run
        }

        foreach (var i in new YieldEnumerable(MyIteratorMethodInfinite))
        {
            if (i % 1000000 == 0) // every million times...
                Console.WriteLine("Yielded: " + i);

            if (i > 10000000)
                break;
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken1))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken2))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken3))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }
    }
}
Categories: Uncategorized Tags: , ,

Unification of async, await and yield return

December 14, 2010 14 comments

The new await facility in the C# 5 CTP is mightly reminiscent of the iterator methods (yield return) added in C# 2. So much so that it is irresistable (to someone like me, anyway) to see if I can reinvent one using the other.

For many years now people have already been implementing something a lot like await using yield return, and I’ve blogged about that before. It’s quite possible to write a simple library such that yield return can play much the same role as await, except for some irritating limitations.

But can we go the other way, and write a library that provides the capability of yield return using only the new await feature?

Short answer: yes.

Long answer: (now read on…)
Read more…

C# 5.0 async/await and GUI events

October 30, 2010 7 comments

A silly example. I’m tickled by the idea of treating a button as a simple task:

async void RefreshButtonAsync()
{
    for (;;)
    {
        await refreshButton.ClickAsync();
            // returns a Task that finishes 
            // next time the button is clicked
        await RefreshAsync();
    }
}

Wait for the button to be clicked, wait for the refresh to finish, repeat forever… that’s your task, little fella!
Read more…

Categories: Uncategorized Tags: , ,