Archive

Archive for January, 2012

Async/await iterator – updated for Visual Studio 11 Preview

January 29, 2012 2 comments

A long overdue install of the Visual Studio 11 Preview, and the changes to the asynchronous language features since 2010 (my, how time flies) are enough to break the code I blogged over a year ago.

The first problem is a few of the methods of an “awaiter” (what in C++ we’d call the awaiter concept) have been renamed, and there’s now a property called IsCompleted, and that’s fine and dandy.

But when I tried exercising the code I hit a more thorny problem, which is that my test program would terminate somewhat randomly when an exception was rethrown from a background thread. For a program that I thought was single threaded, that’s pretty bad!

I don’t have my install of the original CTP, so I’m not sure about this, but I think a fairly major change was made since then: there’s now a difference between an async method that returns void and an async method that returns Task (as opposed to Task<T>).

Contrary to what might be assumed, the relationship between Task and Task<T> is not the same as that between IEnumerable and IEnumerable<T>. That is, Task is not some old pre-generics version of the same idea. Instead, it was specially created to represent a task that doesn’t return any value at all; that is, something like void, but asynchronous.

I believe (though I’m not certain) that in the original CTP, a void async method would actually return a Task, so as to ensure that its lifetime could be managed externally even though it wouldn’t produce a value. But in the latest version that is not the case: the Task associated with an void async method is just not available, and the compiler generated version of the method really does return void. Which means in turn that you can’t use await on such methods.

You can still explicitly declare your async method to return Task, so nothing has been lost. And this certainly makes everything more clear and consistent to callers: methods really do return what they are declared to return, as usual. But it also changes the behaviour of exceptions.

In all case, if an exception tries to escape out of your async method, there is a catch-all handler in the compiler-generated state machine which will catch it, so it can be rethrown in an appropriate context. But the choice of context depends totally on whether the method returns void or Task. The policy is determined by AsyncVoidMethodBuilder or AsyncTaskMethodBuilder respectively. With the help of Resharper, we can see that the latter gives the caught exception to the Task, via task.TrySetException. So then the decision to rethrow (or not) is entirely up to whoever has a hold of the Task. They can check the Exception property whenever.

But in the void case, it’s totally different. The Task never gets passed the exception. What would be the point? We can’t get at the Task. The exception is unobservable; to avoid that loss of information, an arrangement is made to rethrow the exception at the next available opportunity, by creating a delegate that will rethrow it and then posting that delegate to the “context”.

The “context” is a somewhat vague concept; the architecture uses three different representations, depending on the scenario. But in the case of a simple console-based test program, the exception-rethrowing delegate is simply passed to the thread pool, and so it brings down the whole process at a random time (though reasonably soon). In a GUI program the exception would be thrown on the main GUI thread. You can supply your own context by setting a per-thread instance of SynchronizationContext, in which you can override the Post method. It doesn’t let you get at the exception, but it does give you a delegate that, if you executed it, would throw the exception, which you can then catch!

The upshot? An exception that leaves an async void is definitely a sign of a bug somewhere. Although of course this does not automatically mean you should add your own catch-all! Sometimes crashing the process is the least-worst option. There is no single correct way to deal with bugs – it’s a question of economics and so is not an exact science.

So in short, async void is a niche thing. In most situations you almost certainly want async Task with no type argument. And my example of implementing the equivalent of yield return definitely needs updating.

Firstly I stash the Task in a field. Second, after executing the continuation I check the Task.Exception property to see if anything bad happened that needs rethrowing:

if (_task.Exception != null)
{
    // Unpeel the AggregateException wrapping
    Exception inner = _task.Exception;
    while (inner is AggregateException)
        inner = inner.InnerException;

    throw inner;
}

Aside from that it works much the same way as before, though I’ve added a lot of comments and organised it a little differently to hopefully make the behaviour clearer. I’ve also had to add an implementation of the new awaiter property:

public bool IsCompleted
{
    get { return false; }
}

Well, that was easy. Returning true would be a very bad idea in this example, as we can discover with more Resharper digging. The compiler-generated state machine examines that property, and if it is true then it doesn’t bother to yield control back to the thread. So we don’t get the interleaved execution behaviour that we’re relying on.

Here’s the whole thing:

public delegate Task IteratorMethod(YieldEnumerator e);

public class YieldEnumerator : IEnumerator
{
    // Will be executed to get the next value
    private Action _continuation;

    // Will become the value of Current
    private TItem _nextValue;
    private bool _hasNextValue;

    // To be thrown inside the async method, as if by the await keyword
    private Exception _exception;

    // The task associated with our running async method
    private Task _task;

    public YieldEnumerator(IteratorMethod iteratorMethod)
    {
        _task = iteratorMethod(this);
    }

    private void Execute()
    {
        // If we already have a buffered value that hasn't been
        // retrieved, we shouldn't do anything yet. If we don't
        // and there's no continuation to run, we've finished.
        // And if _task is null, we've been disposed.
        if (_hasNextValue || _continuation == null || _task == null)
            return;

        // Be ultra-careful not to run same _continuation twice
        var t = _continuation;
        _continuation = null;
        t(); // may or may not have stored a new _continuation

        // And may also have hit a snag!
        if (_task.Exception != null)
        {
            // Unpeel the AggregateException wrapping
            Exception inner = _task.Exception;
            while (inner is AggregateException)
                inner = inner.InnerException;

            throw inner;
        }
    }

    public YieldEnumerator GetAwaiter()
    {
        return this;
    }

    // Performance optimisation added since original CTP. If we
    // returned true, the compiler-generated code would bypass the
    // OnCompleted/GetResult dance altogether, and the flow of the
    // async method would never be interrupted in the way that we
    // require.
    public bool IsCompleted
    {
        get { return false; }
    }

    // Was called BeginAwait in the original CTP
    public void OnCompleted(Action continuation)
    {
        Debug.Assert(_continuation == null);
        _continuation = continuation;
    }

    // Was called EndAwait
    public void GetResult()
    {
        // This is called by compiler-generated code caused by the
        // await keyword, so it's a chance to throw an exception to
        // be caught by the code in the async method
        if (_exception != null)
        {
            var t = _exception;
            _exception = null;
            throw t;
        }
    }

    // Our equivalent of yield return
    public YieldEnumerator YieldReturn(TItem value)
    {
        if (_hasNextValue)
        {
            // Shouldn't happen because MoveNext ought to have
            // been called and we should be inside the async
            // code at this point
            throw new InvalidOperationException();
        }

        _nextValue = value;
        _hasNextValue = true;
        return this;
    }

    public TItem Current { get; private set; }

    object System.Collections.IEnumerator.Current
    {
        get { return Current; }
    }

    public bool MoveNext()
    {
        Execute();

        if (_hasNextValue)
        {
            Current = _nextValue;
            _hasNextValue = false;
            return true;
        }

        return false;
    }

    private sealed class AbandonEnumeratorException : Exception {}

    public void Dispose()
    {
        // If async method is not yet complete, throw an exception
        // inside it to make it grind to a halt
        if (_continuation != null)
        {
            _exception = new AbandonEnumeratorException();
            try { Execute(); } catch (AbandonEnumeratorException) { }
        }

        _task.Dispose();
        _task = null;
    }

    public void Reset()
    {
        throw new NotImplementedException("Reset");
    }
}

// The usual obvious IEnumerable to go with our IEnumerator
public class YieldEnumerable : IEnumerable
{
    private readonly IteratorMethod _iteratorMethod;

    public YieldEnumerable(IteratorMethod iteratorMethod)
    {
        _iteratorMethod = iteratorMethod;
    }

    public IEnumerator GetEnumerator()
    {
        return new YieldEnumerator(_iteratorMethod);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

class Program
{
    public static async Task MyIteratorMethod1(YieldEnumerator e)
    {
        Console.WriteLine("A");
        await e.YieldReturn(1);
        Console.WriteLine("B");
        await e.YieldReturn(2);
        Console.WriteLine("C");
        await e.YieldReturn(3);
        Console.WriteLine("D");
    }

    public static async Task MyIteratorMethod2(YieldEnumerator e)
    {
        try
        {
            Console.WriteLine("A");
            await e.YieldReturn(1);
            Console.WriteLine("B");
            await e.YieldReturn(2);
            Console.WriteLine("C");
            await e.YieldReturn(3);
            Console.WriteLine("D");
        }
        finally
        {
            Console.WriteLine("Running finally");
        }
    }

    public static async Task MyIteratorMethodInfinite(YieldEnumerator e)
    {
        for (var n = 0; ; n++)
            await e.YieldReturn(n);
    }

    public static async Task MyIteratorBroken1(YieldEnumerator e)
    {
        // always happens, but compiler doesn't know that
        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(1);
    }

    public static async Task MyIteratorBroken2(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");
    }

    public static async Task MyIteratorBroken3(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(2);
    }

    static void Main(string[] args)
    {
        foreach (var i in new YieldEnumerable(MyIteratorMethod1))
            Console.WriteLine("Yielded: " + i);

        foreach (var i in new YieldEnumerable(MyIteratorMethod2))
        {
            Console.WriteLine("Yielded: " + i);
            break; // finally should still run
        }

        foreach (var i in new YieldEnumerable(MyIteratorMethodInfinite))
        {
            if (i % 1000000 == 0) // every million times...
                Console.WriteLine("Yielded: " + i);

            if (i > 10000000)
                break;
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken1))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken2))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken3))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }
    }
}
Categories: Uncategorized Tags: , ,

Asynchronous Memoization in JavaScript

January 16, 2012 Leave a comment

In pure functional programming there is a simple rule: if you evaluate (call) a function more than once with the exact same argument values, you’ll keep getting the same return value. It follows that there is no need to call it more than once, which is super-awesome!! because it means you can put a caching mechanism in front of the function that keeps a map (hash table, Dictionary, etc.) of all the return values produced so far, each one keyed by the bundle of arguments that produced that return value.

Of course it’s only worth doing this if the dictionary lookup is faster than simply re-executing the function itself, and if the same small set of arguments are highly likely to be passed repeatedly. Yawn!!

In a non-pure language like JavaScript many (most?) functions are not pure: they examine information from other sources besides their parameters. However, they often have contexts within which they are “pure enough”. e.g. the information the user can see on the screen is, naively speaking, a projection of the information stored in the database, but if that were really true then when the database changes, the screen would immediately change as well. But it doesn’t; instead it usually remains stale until the user presses Refresh. This corresponds to “emptying the cache”.

In a complex app, there may be several separate components that project the same information in different ways. If they all go back to the external source for that information, and it is changing in real time, you could end up with an inconsistent picture on the screen. This might even cause instability, if one component tries to talk to another and they assume they’ll both have identical snapshots of the external information.

So memoization actually has a purpose in JavaScript: it can simulate a “freeze frame” of your dependencies on external data. But we need to provide the ability to delete things from the cache at the time of our choosing, “unfreezing” the frame so we can take a new snapshot.

Another complicating factor in JavaScript is asynchrony. JavaScript programmers just have to get used to doing the following transformation into “continuation-passing style” by hand; starting with:

var add = function(a1, a2) {
  return a1 + a2;
};

We switch to:

var add = function(a1, a2, done) {
  done(a1 + a2);
};

So the caller of add can no longer say:

var sum = add(2, 2);
alert('The answer is ' + sum);

And must instead say:

add(2, 2, function(sum) {
  alert('The answer is ' + sum);
});

This allows the implementation of add to utilise other functions that need to be passed a continuation in the same manner. Yo!!

So, let’s memoize. To simplify matters we’ll start by assuming we’ll be dealing with functions that take no parameters (or always take exactly the same parameters, it’s the same thing). It means we can replace the map with a single variable. We’re displaying “prices” (whatever the hell they are) to the user, so if synchronous was a realistic option we’d start with:

var getPrices = function() { 
    /// Talk to server to get prices, p, somehow.
    return p; 
};

Sadly we have to be asynchronous, but it’s no biggie:

var getPrices = function(done) {
    /// Talk to server to get prices, p, somehow.
    done(p); 
};

Seems like memoizing something that will be easy!

var makeCache = function(getter) {

  var value = null, ready = false;

  return {

    reset: function() {
      value = null;
      ready = false;
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        getter(function(got) {
          value = got;
          ready = true;
          done(value);
        });
      }
    }
  };

};

You’d use it to “wrap” the getPrices function like this:

var pricesCache = makeCache(getPrices);

pricesCache.get(function(prices) {
  // we've got the prices!  
});

And when you want to reset the cache, just say:

pricesCache.reset();

But actually there’s a bug here: do you know what it is? Give yourself a playful slap on the thigh if you got it.

What if there’s more than one call to pricesCache.get before the first one comes back with the data? We only set the ready flag when we’ve got the answer, which might take a second. In the meantime, various parts of the UI might be grabbing the prices to make their own updates. Each such call will launch a separate (unnecessary) call to the backend. What’s worse is that the prices may actually change during this mess, and so the callers will end up with inconsistent price information, just like I was a-bellyachin’ about up yonder.

First reaction: oh, oh, I know, it’s a state machine! We thought there were two states, as indicated by the boolean ready flag. But actually there’s three:

  1. No value.
  2. Okay, I’m, getting the value, sheesh.
  3. Got the value.

But hold on to your hose, little fireman. Think this one through for a second. It’s pretty clear that when the first caller tries to get, we need to transition to the middle state and make our call to the real getter function. And when the prices come back to us, we transition to the final state and call the callback function. But what about when a second caller tries to get and we’re already in the middle state? That’s the whole reason for doing this, to be able to handle that differently. Where do we put their callback function?

So, yes, it is a state machine, but not a three-state one. We need to keep a list of callback functions, so that when the prices come back, we can loop through those callback functions and give every single gosh darn one of them a calling they ain’t gonna forgit:

var makeCache = function(getter) {

  var value, ready, waiting = [];

  return {

    reset: function() {
      value = null;
      ready = false;
      waiting = [];
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        waiting.push(done);

        if (waiting.length === 1) {
          getter(function(got) {

            value = got;
            ready = true;

            waiting.forEach(function(w) { w(value); });
            waiting = null;
          });
        }
      }
    }
  };

};

Notice how I use waiting.forEach to loop through the callbacks. By definition here I’m calling some code that I don’t have control of. It might call back into pricesCache.get. That may seem intrinsically problematic, because it sounds like it could keep happening forever and cause a stack overflow. But it might be perfectly valid: there could be some separate code making the second call to get the prices, which supplies a different callback. Anyway, is it a problem for my cache implementation? No, because any calls to pricesCache.get during my callback loop will find that ready is already set, and so will not affect the waiting array. And even if pricesCache.reset were called, that would cause a fresh array to be created and stored in waiting.

And finally, nice piece of trivia for ya: even if there was some way for waiting to grow while we are still looping through it, according to the specification of Array.forEach the new item(s) won’t be included in the iteration.