Archive

Archive for the ‘Uncategorized’ Category

Virtualized scrolling in knockout.js

December 26, 2012 Leave a comment

If you have an observableArray of things and you want to display it, of course the first thing to go for is the foreach binding. But what if the array is sometimes pretty long? It can take a long time for knockout to build the elements for each item. Once it's in the thousands, it can freeze the UI for a while.

The classic-and-classy solution is to only ever create the handful of elements that are currently visible within the scrollable area. For example, here's a plain old foreach binding to an array called all:

<div class="dataDisplay" data-bind="foreach: all">
    <div class="dataField" data-bind="text: name, drag: pick"></div>
    <div class="dataValue"><pre data-bind="text: value"></pre></div>
</div>

The first change I had to make was to put the list into a nested div. Why? Because I want to keep restricting the size of the dataDisplay using absolute positioning, but I'm also going to need to programmatically set the height of the scrollable content (which, as you'd expect, is going to be much bigger than the fixed height of the outer div):

<div class="dataDisplay">
    <div class="dataDisplayContent" data-bind="virtualScroll: { rows: all, rowHeight: 26 }">
        <div class="dataField" data-bind="text: name, drag: pick"></div>
        <div class="dataValue"><pre data-bind="text: value"></pre></div>
    </div>
</div>

You may have noticed another change - the binding is now something called virtualScroll, and it takes an object with two properties: rows is just the same observableArray as before, and rowHeight is the pixel height of each row. This is the fiddly part of virtual scrolling: by far the easiest technique is to hold the height of the rows constant, as we'll see.

So, how does the virtualScroll binding work? Like many effective bindings, it's like a swan; on the surface the motion is graceful, but below the water things don't look so tidy. In fact it's a mess so I'll tackle it one piece at a time. It starts in the usual way:

ko.bindingHandlers.virtualScroll = {
    init: function(element, valueAccessor, allBindingsAccessor, 
                   viewModel, context) {

The very first thing we have to do steal the contents of the target element. This will serve as our template for each item in the list, so we clone our own copy so we can use it later:

        var clone = $(element).clone();
        $(element).empty();

Then we grab our settings. I name the whole object config, and I immediately unwrap the rowHeight so I can conveniently use it in several places:

        var config = ko.utils.unwrapObservable(valueAccessor());
        var rowHeight = ko.utils.unwrapObservable(config.rowHeight);

Now, we get the length of the rows array, multiply by the fixed rowHeight, and so set the height of our target element, which typically makes it bigger than the container div, which should then get scrollbars (because it should have overflow: auto set in CSS).

But thanks to the magic of ko.computed, knockout notices that we access the rows() observable and so arranges for the function to run again when the observable's value changes. And so when the length of the array changes, so does the height of our scrollable content. (This would be a little more effort if the rows could vary in height).

        ko.computed(function() {
            $(element).css({
                height: config.rows().length * rowHeight
            });
        });

Now, ko.computed is wonderful, but it can only work with existing observables. Sadly, the browser DOM is full of useful information that doesn't tell us when it changes. As a hack-around, I use a little helper function called simulatedObservable, which you can read about here. For now, just accept that offset and windowHeight are observables that track the y-offset of the target element, and the height of the whole window.

        var offset = simulatedObservable(element, function() {
            return $(element).offset().top;
        });

        var windowHeight = simulatedObservable(element, function() {
            return window.innerHeight;
        });

We have all the external info we need. Next we have to maintain our own state, which is a record of all the rows that we have "materialised" into real DOM elements. The others simply don't exist.

        var created = {};

And now we begin a rather large nested function called refresh, responsible for materialising any rows that are currently visible.

        var refresh = function() {
            var o = offset();
            var data = config.rows();
            var top = Math.max(0, Math.floor(-o / rowHeight) - 10);
            var bottom = Math.min(data.length, Math.ceil((-o + windowHeight()) / rowHeight));

The top and bottom variables are the indexes of the first and one-after-last rows that we consider visible (actually we're being quite generous because we use the height of the whole window as the bottom bound). This would be a lot more troublesome if the rows could vary in height.

So we can loop through just these rows, clone our "template" and ask knockout to bind to it, using the row's item from the array as the data model.

            for (var row = top; row < bottom; row++) {
                if (!created[row]) {
                    var rowDiv = $('<div></div>');
                    rowDiv.css({
                        position: 'absolute',
                        height: config.rowHeight,
                        left: 0,
                        right: 0,
                        top: row * config.rowHeight
                    });
                    rowDiv.append(clone.clone().children());
                    ko.applyBindingsToDescendants(
                        context.createChildContext(data[row]), rowDiv[0]);
                    created[row] = rowDiv;
                    $(element).append(rowDiv);
                }
            }

Finally, we can destroy any previously materialised row that is not in the currently visible range:

            Object.keys(created).forEach(function(rowNum) {
                if (rowNum < top || rowNum >= bottom) {
                    created[rowNum].remove();
                    delete created[rowNum];
                }
            });
        };

And that concludes the refresh function. There are two places we call it. First, when the rows() observable changes, we clear out all materialised rows, and then call refresh.

        config.rows.subscribe(function() {
            Object.keys(created).forEach(function(rowNum) {
                created[rowNum].remove();
                delete created[rowNum];
            });
            refresh();
        });

The second place is a little simpler: we just give it to ko.computed so it can do its usual magic and ensure that refresh runs whenever there is a change in any of the values it depends on.

        ko.computed(refresh);

Finally, we tell Knockout not to automatically perform binding on the child elements, as we take care of that ourselves:

        return { controlsDescendantBindings: true };
    }
};

You can see this in action in my Nimbah project (source code here), where I use it to display the input or output data for the selected node, allowing it to scale to thousands of rows without slowing down the UI.

Knockout observables where there was no observables before!

December 26, 2012 1 comment

A truly great example of a jQuery plugin is Ben Alman’s resize, which allows you to bind to a resize event on any element, not just the window. It does this by checking for changes to the size of the element in a timer loop - a hack, to be sure, but very nicely hidden behind the same programming interface used in the more normal cases.

Inspired by this, we can take the same approach for knockout, and turn any property of the DOM into an observable.

var simulatedObservable = (function() {

    var timer = null;
    var items = [];

    var check = function() {
        items = items.filter(function(item) {
            return !!item.elem.parents('html').length;
        });
        if (items.length === 0) {
            clearInterval(timer);
            timer = null;
            return;
        }
        items.forEach(function(item) {
            item.obs(item.getter());
        });
    };

    return function(elem, getter) {
        var obs = ko.observable(getter());
        items.push({ obs: obs, getter: getter, elem: $(elem) });
        if (timer === null) {
            timer = setInterval(check, 100);
        }
        return obs;
    };
})();

What's going on? The whole thing is in the module pattern, so the actual function we're defining is the one we return at the end. It accepts an element and another function that should get the current value of the property we want to observe. Why do we need the element? It's just so we can detect when we should stop observing the property.

As long as one of these observables is active, we keep running the check function at 100 ms intervals, which means we share one timer across all instances, automatically stopping it when there are no observables still active. And the check function is pretty self-explanatory. It doesn't even really need to check if the value has changed, because ko.observable does that for us when we give it a new value. The only bit of real work it has to do is check whether an element is still in the DOM (which is what that parents('html').length part is about).

So how can we use this? Let me count the ways... Suppose you want an observable that holds the vertical offset of a div?

var offset = simulatedObservable(element, function() {
    return $(element).offset().top;
});

And you're done!

Categories: Uncategorized Tags: ,

jQuery UI drag and drop bindings for knockout.js

December 26, 2012 Leave a comment

If there’s some functionality you need in your knockout app, just add custom bindings. Here’s one way to enable drag and drop by bringing in the jQuery UI library to do it all for us. First, the drag binding:

ko.bindingHandlers.drag = {
    init: function(element, valueAccessor, allBindingsAccessor, 
                   viewModel, context) {
        var value = valueAccessor();
        $(element).draggable({
            containment: 'window',
            helper: function(evt, ui) {
                var h = $(element).clone().css({
                    width: $(element).width(),
                    height: $(element).height()
                });
                h.data('ko.draggable.data', value(context, evt));
                return h;
            },
            appendTo: 'body'
        });
    }
};

The value for the binding is a function that should return some object or value that describes what is being dragged, e.g. here the view model of the draggable element has a save function that returns the persistent data of the object:

<div data-bind="drag: save">;

Then to allow it to be dropped somewhere, we need the drop binding:

ko.bindingHandlers.drop = {
    init: function(element, valueAccessor, allBindingsAccessor, 
                   viewModel, context) {
        var value = valueAccessor();
        $(element).droppable({
            tolerance: 'pointer',
            hoverClass: 'dragHover',
            activeClass: 'dragActive',
            drop: function(evt, ui) {
                value(ui.helper.data('ko.draggable.data'), context);
            }
        });
    }
};

Again, the value is a function, which this time receives the dragged data as its first argument, so it's nice and symmetrical:

<div data-bind="drop: dropped">;

If the drop regions are contained in scrollable areas, then decent autoscrolling is a must, and it can be implemented from the drag binding using a timer. There's a working implementation in my Nimbah project.

Asynchronous computed observables in Knockout.js

December 10, 2012 6 comments

An observable property is a single value that we can get or set, and we can also subscribe to be notified when its value changes. In JavaScript there’s a very nice implementation of an observable property in the lovely Knockout.js library:

var x = ko.observable('My initial value');

console.log(x()); // use x() to read the value

x.subscribe(function() {
    console.log('the value of x just changed to: ' + x());
});

x('A new value'); // use x(v) to assign a value

Atop this simple idea, lots of otherwise hairy interactions (and we all hate those) become surprisingly straightforward.

One tremendous addition is ko.computed. You specify a function that computes a value from other observable properties:

var y = ko.computed(function() {
    return 'How about this? ' + x();
});

Now y is - of course - also an observable, and its value automatically updates whenever anything it depends on (in this case, x) changes. Yes, just like a spreadsheet. While the computation function is running, Knockout is watching to see which other observables are accessed, and automatically subscribes to them on our behalf.

This has a nice implication - we hardly ever need to call subscribe ourselves. We can adjust the first example like this:

ko.computed(function() {
    console.log('the value of x just changed to: ' + x());
});

No need to manually figure out what observables our code depends on! Just wrap it in ko.computed, and it will automatically re-execute whenever there is a change in any of the observables it consumes. (Hint: our computation function returns nothing, which means really it returns undefined, and so the observable returned from ko.computed will always have that value, and so we don't need to store it in a variable - we just want the auto-subscribe behaviour).

Then there's another brilliant idea, described here - actually it's two or three brilliant ideas.

The first brilliant idea is that you can say .extend( ...blah...) after you define a ko.computed, to enhance how it works. And we can create our own extensions this way, by adding them to the ko.extenders object.

The second brilliant idea is that we can keep a bit of data in an observable, and then use a ko.computed to start an AJAX request to retrieve data from the server, based on what is in the observable. Every time the local data changes, another request happens. And of course the resulting information is stuffed into another observable, to which the UI is bound, so the screen updates. It's such a neat way to do it.

The third brilliant idea is to combine the above two ideas, and make an extender that automatically delays the execution of a computation function until a specific time interval (in milliseconds) after the required observables have stopped changing. This will stop too many AJAX requests being fired off while the user is still fiddling with values. When they leave the keyboard alone for half a second, then we actually act on the new information. Or as the example puts it:

this.throttledValue = ko.computed(this.instantaneousValue)
                        .extend({ throttle: 400 });

Now, there is actually a problem with the second idea, which is that it is possible that if you hit the backend with two subsequent AJAX requests, the first one might take longer to return than the second, and so arrive last, and so the screen will be left in an inconsistent state: old data from the server, despite a more recent local state.

This is addressed by an example on the Knockout wiki, which is well worth reading to the end (note that "dependent observable" is just the longer/older name for ko.computed).

On the downside, that part at the end hasn't yet made it into an extender. But we can fix that!

ko.extenders.async = function(computedDeferred, initialValue) {

    var plainObservable = ko.observable(initialValue), currentDeferred;
    plainObservable.inProgress = ko.observable(false);

    ko.computed(function() {
        if (currentDeferred) {
            currentDeferred.reject();
            currentDeferred = null;
        }

        var newDeferred = computedDeferred();
        if (newDeferred &&
            (typeof newDeferred.done == "function")) {

            // It's a deferred
            plainObservable.inProgress(true);

            // Create our own wrapper so we can reject
            currentDeferred = $.Deferred().done(function(data) {
                plainObservable.inProgress(false);
                plainObservable(data);
            });
            newDeferred.done(currentDeferred.resolve);
        } else {
            // A real value, so just publish it immediately
            plainObservable(newDeferred);
        }
    });

    return plainObservable;
};

How to use it? Here's a real example from my code:

var displayName = ko.computed(function() {

    return internalName() ?
        backend.getDisplayName(internalName()) :
        localStrings.none;

}).extend({ async: localStrings.none }),

Here, internalName is an ordinary observable containing something like a GUID. The backend.getDisplayName function asks the server for a human-readable name, so of course its asynchronous, and it returns a $.Deferred. But if we don't currently know an internalName, I just return localStrings.none, which is an ordinary string.

Then we add the .extend({ async: localStrings.none }) part, so what displayName actually contains is the plainObservable built by the extender. So now I can include displayName in my view model, and bind it to a <span> in my UI. When internalName changes, the UI updates soon after.

The value passed for the async property is the initial value that the observable will have when there is not yet a response from the server. You can make it anything that your view model will be able to cope with.

And of course, you can chain extenders, so you could put:

extend({ throttle: 300 }).extend({ async: localStrings.none })

As always, it's when the real life examples get more complex, and you have layers of things that depend on other things, that this becomes a really valuable way to cut through the complexity and get you back to a simple declaration of how all the data is related.

A way that async/await-style functionality might make it into browsers

October 9, 2012 1 comment

Many moons ago (actually it’s 35.7 moons ago) I wrote an excited post about JavaScript generators in Firefox. Sadly they are still only in Firefox, and there’s no sign of them appearing elsewhere.

But one way they could appear elsewhere is via compilers that generate plain JavaScript, and the latest player is TypeScript. Why is this a good fit? Because, with a very thin layer of helper code, generators replicate the functionality of C# 5′s async/await, and if that was a good idea for C# and Silverlight, it's got to be a good idea for TypeScript. (The downside is that the auto-generated JavaScript would not be very readable, but I can still dream that this will happen...)

My old post is somewhat messy because I was unaware of jQuery.Deferred. If we bring that into the mix, to serve as the JS equivalent of Task, then things get really nice. To wit:

async(function() {
  
  // sleeping inside a loop
  for (var n = 0; n < 10; n++) {
    $('.counter').text('Counting: ' + n);
    yield sleep(500);
  }

  // asynchronous download
  $('.downloaded').text(yield $.get('test.txt'));

  // and some more looping, why not
  for (var n = 0; n < 10; n++) {
    $('.counter').text('Counting: ' + (10 - n));
    yield sleep(500);
  }
});

In other words, by passing a function to something called async, I can use the yield keyword in exactly the same way as C# 5's await keyword.

The yielded values are just jquery.Deferred objects - or rather, they are objects that contain a function called done, to which a resulting value may be passed (at some later time). So the implementation of sleep is straightforward:

var sleep = function(ms) {
  var d = $.Deferred();
  setTimeout(function() { d.resolve(); }, ms);
  return d;
};

By calling resolve, we trigger any registered done functions. So who is registering? This is what async looks like:

var async = function(gen) {
  var result;
  var step = function() {
    var yielded;

    try {
      yielded = gen.send(result); // run to next yield
    } catch (x) {
      if (x instanceof StopIteration) {
        return;
      }
      throw x;
    }

    yielded.done(function(newResult) {
      result = newResult; // what to return from yield
      step();
    });
  };
  gen = gen(); // start the generator
  step();
};

So async calls the function passed to it to get a generator object. It can then repeatedly call send on that object to pass it values (which will be returned from yield inside the generator). It assumes that the objects that come back from send (which were passed to yield inside the generator) will have a done function, allowing us to register to be notified when an asynchronous operation completes.

Note that async could use some further complication, because it currently doesn't deal with exceptions (note that the try/catch block above is merely to deal with the strange way that generators indicate when they've finished). But generators have full support for communicating exceptions back to the yielding code, so it should all be do-able.

And that's all there is to it. You can see the example running here:

http://earwicker.com/yieldasync/

... but only in Firefox, of course.

Categories: Uncategorized Tags: , ,

TypeScript reactions

October 5, 2012 Leave a comment

Miscellaneous

Great implementation of modules. I use commonjs on the browser side (yes, the non-asynchronous module pattern), so the default suits me fine.

Love the => syntax of course, and the fact that it fixes the meaning of this within lambdas. No need to manually copy into a that variable.

Type system

... appears to be better than that of C#/CLR. Structural typing is clearly the right way to do it. It makes much more sense than nominal typing. The dumbest example of that in C# is the strange fact that two identically-signatured delegate types are not assignment compatible. Well, in TS, types are compared by their structures, not their names, so that problem goes away. And the same for classes and interfaces. If a class could implement an interface, then it already does. When they add generics, it's going to be excellent.

I wonder if they've given any thought to the problem of evolution of interfaces. Default implementations of methods (this could be implemented by having the compiler insert some checking/substitution code at the call site).

Async, await, async, await, async, await...

The really big missing ingredient is continuations. They've just added this to C#. Clearly it is just as important for browser-based apps, if not more important. And consider the fact that server-side JS at the moment is roughly synonymous with node, which is crying out for continuations in the language. From a Twitter conversation with Joe Palmer, it's clear that the TS team currently places a big emphasis on the clean appearance of the JS output. But I think they're going to have to twist the arms of the IE teams and get source map support working, and then they can butcher the JS as much as they like without hurting the debugging experience.

The Dreaded C++ Comparison

Theses days a C++ comparison from me is normally an insult (despite speaking ASFAC++B) but in this case, definitely not. C++ beat a lot of other languages simply by inheriting C. And JS is the C of the web, so TS has a good chance of being the C++ of the web, in the sense of becoming equally popular as its parent language, while probably not actually displacing it everywhere.

And at its best, C++ was a collection of extensions to C, a pretty ugly platform to start building on. To do that tastefully was a challenge, and (despite what people say in jest) C++ succeeded in lots of ways. Playing with TS, I see a similarly tastefully chosen set of extensions.

When you consider C++0x concepts, which were going to be (and may yet one day be) structural type definitions layered over the existing duck-typing of C++ templates, the comparison becomes even stronger. TS's type system has a lot in common with C++0x concepts.

The Competition

A common criticism so far seems to be "Why not use (my favourite language X) as a starting point?" The answer, surely, is that it may be your favourite language but it's nowhere compared to JS, which is now so widely deployed and used that it makes most other languages look like hobby projects! This criticism is correlated strongly with the idea that JS is a toy language, disgusting, revolting, etc., i.e. irrational emotional garbage. JS has a lot of stupid things in it, yes, we all know about {} + {} and [] + [] and {} + [] and so on, but I've now churned out a zillion lines of JS without ever hitting such issues. No language is perfect, but is JS useful? Yes.

In fact, the main competition faced by TS is therefore JS. This is why I think TS needs to do some heavy lifting (such as continuation support) besides a mere layer of sugar and type checking, in order to become a compelling advantage over JS itself.

And then there is CoffeeScript. Syntax just isn't that important for most people. Semantics are much more important. It's no good that your code looks really short and pretty if it takes you just as long to figure out what it needs to say. By the addition of static typing, TS holds out the hope of genuinely improving productivity. (With continuations it could asynchronous event-driven programming a lot easier too.)

Oh and there's Dart. I can't even begin to understand what Google is thinking. It's not compatible with, nor discernibly superior in any way, to anything that already exists. It's just their version of the same old stuff.

A Boring Discovery!

August 28, 2012 3 comments

I was writing a simple example program to explain C#5′s async/await keywords. To keep it really simple and “old school”, I decided to make it a console app, and to read lines from the console until the user typed quit. First, the synchronous version:

public static List<string> Foo()
{
    var lines = new List<string>();

    string line;
    while ((line = Console.In.ReadLine()) != "quit")
        lines.Add(line);

    return lines;
}

And then the async version:

public static async Task<List<string>> Foo()
{
    var lines = new List<string>();

    string line;
    while ((line = await Console.In.ReadLineAsync()) != "quit")
        lines.Add(line);

    return lines;
}

Really straightforward, right? I just add the async keyword, wrap the return type inside Task<T> and then I can use await Console.In.ReadLineAsync() instead of Console.In.ReadLine().

So I tried this, and gosh-dang-diddly, it didn't behave at all as expected. In fact, both versions behaved the same. Could it be something exciting to do with how the SynchronizationContext is set up in console apps? Sorry, no. Try to think of something much duller than that.

The answer is... wait for it... ReadLineAsync() isn't asynchronous at all. It doesn't return its Task<string> until the whole line has been read.

Why is that? TextReader.ReadLineAsync appears to be properly asynchronous, and Console.In returns a TextReader as you'd expect... but not quite. It first passes it to TextReader.Synchronized, and guess how the resulting wrapper class implements ReadLineAsync? Thanks to .NET Reflector, we don't have to guess:

public override Task<string> ReadLineAsync()
{
    return Task.FromResult<string>(this.ReadLine());
}

Yep, it calls ordinary ReadLine, totally synchronously, and then makes an already-finished Task<string>. In real life this is probably harmless because it's such an edge case, but you can imagine the tedious diversion it caused in the middle of the explanation I was giving.

To get around it, I used this alternative helper, to move the ReadLine to the thread pool:

public static Task<string> ReadConsoleAsync()
{
    return Task.Run(() => Console.ReadLine());
}

Moral: I have no idea. How about "Sometimes, things don't exactly work." Is that a moral?

Categories: Uncategorized Tags: , ,

SEISMIC – Really Simple Mercurial Sharing

March 17, 2012 Leave a comment

SEISMIC stands for:

S – Self
E – Explanatory
I – Installer
S – Sharing
M – Mercurial
I – Intercourse
C – Cupcakes

Okay, so the last two words don’t fit with the topic. I had to put them in to pad out the acronym. Here’s how you run it:

wget http://earwicker.com/seismic.sh
sudo bash seismic.sh

Ideally you'll be typing those two lines into a fresh install of Ubuntu 11.10. Even if your install isn't so fresh, the script tries to only set things up if they haven't already been set up.

After it does the first-time steps, it offers you some options:

[0] Quit
[1] Add user
[2] Add repository

Like I say, it's self-explanatory. Maybe there's a quicker way to get started and do the obvious maintenance tasks for sharing mercurial repositories, but I don't know about it yet.

Once you've run it, you can see your new shared repositories here:

http://your-vm-hostname/hg

You'll need to log in using one of the user accounts you've created. You can also clone a repository on a client machine:

hg clone http://your-vm-hostname/hg/your-repository

And you can commit and push changes back to it - again, hg push will require your Mercurial username/password.

Running Windows? Oh dear. Why not set up a VM? (Don't have any VM hosting software? VirtualBox is free).

The config created by this script is very simple (so it has a high probability of working). But on the downside it's not really secure, basic basic (plain text) authentication is used. But it gives you a working starting point to investigate further, e.g. as the song goes, "If you liked it then you should have put a certificate on it".

Tips for setting up a VM:

- Download the Ubuntu server 64-bit .iso

- Set up your VM so it has a terabyte virtual disk and grows on demand, and a bridged network connection instead of NAT.

- During install you should get to specify a suitable hostname. If not (or you change your mind about it), after your first login:

sudo nano /etc/hostname

Then reboot.

In case the script download site goes wrong, here's what it contains:


# Install apache and mercurial
apt-get install apache2 mercurial

# Create dir /var/hg/repos where all repositories will live
if [ -d /var/hg/repos ]
then
  echo "Already created /var/hg/repos"
else
  mkdir /var/hg
  mkdir /var/hg/repos
  chown -R www-data:www-data /var/hg/repos

  # Allow pushing without SSL
  echo "[web]" >> /etc/mercurial/hgrc
  echo "allow_push = *" >> /etc/mercurial/hgrc
  echo "push_ssl = false" >> /etc/mercurial/hgrc
fi

# Copy the hg .cgi script into place and make it runnable
if [ -a /var/hg/hgweb.cgi ]
then
  echo "Already created /var/hg/hgweb.cgi"
else
  cp /usr/share/doc/mercurial/examples/hgweb.cgi /var/hg/hgweb.cgi
  chmod a+x /var/hg/hgweb.cgi
  sed -i.bak "s|/path/to/repo/or/config|/var/hg/hgweb.config|" /var/hg/hgweb.cgi
fi

if [ -a /var/hg/hgweb.config ]
then
  echo "Already created /var/hg/hgweb.config"
else 
  echo "[paths]
/ = /var/hg/repos/*" > /var/hg/hgweb.config
fi

# Configure Apache
if grep /var/hg/hgweb.cgi /etc/apache2/sites-available/default
then
  echo "Already configured Apache"
else
  sed -i.bak 's|</VirtualHost>|ScriptAlias /hg \"/var/hg/hgweb.cgi\"\
  <Location /hg>\
  AuthType Basic\
  AuthName \"Mercurial repositories\"\
  AuthUserFile /var/hg/hgusers\
  Require valid-user\
  </Location>\
  </VirtualHost>|' /etc/apache2/sites-available/default
  apache2ctl restart
fi

shouldQuit=false

while [ $shouldQuit == false ]
do
  echo ""
  echo "[0] Quit"
  echo "[1] Add user"
  echo "[2] Add repository"

  read menuoption

  case $menuoption in
    0) shouldQuit=true;;

    1) echo -n "Creating new Mercurial user - give them a name:"
       read hgnewusername
       if [ -a /var/hg/hgusers ] 
       then
         htpasswd -m /var/hg/hgusers $hgnewusername
       else
         htpasswd -mc /var/hg/hgusers $hgnewusername
       fi
       ;;

    2) echo ""
       echo "Existing repositories:"
       ls /var/hg/repos
       echo ""
       echo -n "Enter name for new repository:"
       read hgrepname
       echo -n "Enter contact name:"
       read hgrepcont
       echo -n "Enter description:"
       read hgrepdesc
       cd /var/hg/repos
       mkdir $hgrepname
       cd $hgrepname
       hg init
       echo "[web]
contact = $hgrepcont
description = $hgrepdesc" > .hg/hgrc
       cd ..
       chown -R www-data:www-data .
       ;;

  esac
done
Categories: Uncategorized Tags: ,

Async/await iterator – updated for Visual Studio 11 Preview

January 29, 2012 2 comments

A long overdue install of the Visual Studio 11 Preview, and the changes to the asynchronous language features since 2010 (my, how time flies) are enough to break the code I blogged over a year ago.

The first problem is a few of the methods of an “awaiter” (what in C++ we’d call the awaiter concept) have been renamed, and there’s now a property called IsCompleted, and that's fine and dandy.

But when I tried exercising the code I hit a more thorny problem, which is that my test program would terminate somewhat randomly when an exception was rethrown from a background thread. For a program that I thought was single threaded, that's pretty bad!

I don't have my install of the original CTP, so I'm not sure about this, but I think a fairly major change was made since then: there's now a difference between an async method that returns void and an async method that returns Task (as opposed to Task<T>).

Contrary to what might be assumed, the relationship between Task and Task<T> is not the same as that between IEnumerable and IEnumerable<T>. That is, Task is not some old pre-generics version of the same idea. Instead, it was specially created to represent a task that doesn't return any value at all; that is, something like void, but asynchronous.

I believe (though I'm not certain) that in the original CTP, a void async method would actually return a Task, so as to ensure that its lifetime could be managed externally even though it wouldn't produce a value. But in the latest version that is not the case: the Task associated with an void async method is just not available, and the compiler generated version of the method really does return void. Which means in turn that you can't use await on such methods.

You can still explicitly declare your async method to return Task, so nothing has been lost. And this certainly makes everything more clear and consistent to callers: methods really do return what they are declared to return, as usual. But it also changes the behaviour of exceptions.

In all case, if an exception tries to escape out of your async method, there is a catch-all handler in the compiler-generated state machine which will catch it, so it can be rethrown in an appropriate context. But the choice of context depends totally on whether the method returns void or Task. The policy is determined by AsyncVoidMethodBuilder or AsyncTaskMethodBuilder respectively. With the help of Resharper, we can see that the latter gives the caught exception to the Task, via task.TrySetException. So then the decision to rethrow (or not) is entirely up to whoever has a hold of the Task. They can check the Exception property whenever.

But in the void case, it's totally different. The Task never gets passed the exception. What would be the point? We can't get at the Task. The exception is unobservable; to avoid that loss of information, an arrangement is made to rethrow the exception at the next available opportunity, by creating a delegate that will rethrow it and then posting that delegate to the "context".

The "context" is a somewhat vague concept; the architecture uses three different representations, depending on the scenario. But in the case of a simple console-based test program, the exception-rethrowing delegate is simply passed to the thread pool, and so it brings down the whole process at a random time (though reasonably soon). In a GUI program the exception would be thrown on the main GUI thread. You can supply your own context by setting a per-thread instance of SynchronizationContext, in which you can override the Post method. It doesn't let you get at the exception, but it does give you a delegate that, if you executed it, would throw the exception, which you can then catch!

The upshot? An exception that leaves an async void is definitely a sign of a bug somewhere. Although of course this does not automatically mean you should add your own catch-all! Sometimes crashing the process is the least-worst option. There is no single correct way to deal with bugs - it's a question of economics and so is not an exact science.

So in short, async void is a niche thing. In most situations you almost certainly want async Task with no type argument. And my example of implementing the equivalent of yield return definitely needs updating.

Firstly I stash the Task in a field. Second, after executing the continuation I check the Task.Exception property to see if anything bad happened that needs rethrowing:

if (_task.Exception != null)
{
    // Unpeel the AggregateException wrapping
    Exception inner = _task.Exception;
    while (inner is AggregateException)
        inner = inner.InnerException;

    throw inner;
}

Aside from that it works much the same way as before, though I've added a lot of comments and organised it a little differently to hopefully make the behaviour clearer. I've also had to add an implementation of the new awaiter property:

public bool IsCompleted
{
    get { return false; }
}

Well, that was easy. Returning true would be a very bad idea in this example, as we can discover with more Resharper digging. The compiler-generated state machine examines that property, and if it is true then it doesn't bother to yield control back to the thread. So we don't get the interleaved execution behaviour that we're relying on.

Here's the whole thing:

public delegate Task IteratorMethod(YieldEnumerator e);

public class YieldEnumerator : IEnumerator
{
    // Will be executed to get the next value
    private Action _continuation;

    // Will become the value of Current
    private TItem _nextValue;
    private bool _hasNextValue;

    // To be thrown inside the async method, as if by the await keyword
    private Exception _exception;

    // The task associated with our running async method
    private Task _task;

    public YieldEnumerator(IteratorMethod iteratorMethod)
    {
        _task = iteratorMethod(this);
    }

    private void Execute()
    {
        // If we already have a buffered value that hasn't been
        // retrieved, we shouldn't do anything yet. If we don't
        // and there's no continuation to run, we've finished.
        // And if _task is null, we've been disposed.
        if (_hasNextValue || _continuation == null || _task == null)
            return;

        // Be ultra-careful not to run same _continuation twice
        var t = _continuation;
        _continuation = null;
        t(); // may or may not have stored a new _continuation

        // And may also have hit a snag!
        if (_task.Exception != null)
        {
            // Unpeel the AggregateException wrapping
            Exception inner = _task.Exception;
            while (inner is AggregateException)
                inner = inner.InnerException;

            throw inner;
        }
    }

    public YieldEnumerator GetAwaiter()
    {
        return this;
    }

    // Performance optimisation added since original CTP. If we
    // returned true, the compiler-generated code would bypass the
    // OnCompleted/GetResult dance altogether, and the flow of the
    // async method would never be interrupted in the way that we
    // require.
    public bool IsCompleted
    {
        get { return false; }
    }

    // Was called BeginAwait in the original CTP
    public void OnCompleted(Action continuation)
    {
        Debug.Assert(_continuation == null);
        _continuation = continuation;
    }

    // Was called EndAwait
    public void GetResult()
    {
        // This is called by compiler-generated code caused by the
        // await keyword, so it's a chance to throw an exception to
        // be caught by the code in the async method
        if (_exception != null)
        {
            var t = _exception;
            _exception = null;
            throw t;
        }
    }

    // Our equivalent of yield return
    public YieldEnumerator YieldReturn(TItem value)
    {
        if (_hasNextValue)
        {
            // Shouldn't happen because MoveNext ought to have
            // been called and we should be inside the async
            // code at this point
            throw new InvalidOperationException();
        }

        _nextValue = value;
        _hasNextValue = true;
        return this;
    }

    public TItem Current { get; private set; }

    object System.Collections.IEnumerator.Current
    {
        get { return Current; }
    }

    public bool MoveNext()
    {
        Execute();

        if (_hasNextValue)
        {
            Current = _nextValue;
            _hasNextValue = false;
            return true;
        }

        return false;
    }

    private sealed class AbandonEnumeratorException : Exception {}

    public void Dispose()
    {
        // If async method is not yet complete, throw an exception
        // inside it to make it grind to a halt
        if (_continuation != null)
        {
            _exception = new AbandonEnumeratorException();
            try { Execute(); } catch (AbandonEnumeratorException) { }
        }

        _task.Dispose();
        _task = null;
    }

    public void Reset()
    {
        throw new NotImplementedException("Reset");
    }
}

// The usual obvious IEnumerable to go with our IEnumerator
public class YieldEnumerable : IEnumerable
{
    private readonly IteratorMethod _iteratorMethod;

    public YieldEnumerable(IteratorMethod iteratorMethod)
    {
        _iteratorMethod = iteratorMethod;
    }

    public IEnumerator GetEnumerator()
    {
        return new YieldEnumerator(_iteratorMethod);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

class Program
{
    public static async Task MyIteratorMethod1(YieldEnumerator e)
    {
        Console.WriteLine("A");
        await e.YieldReturn(1);
        Console.WriteLine("B");
        await e.YieldReturn(2);
        Console.WriteLine("C");
        await e.YieldReturn(3);
        Console.WriteLine("D");
    }

    public static async Task MyIteratorMethod2(YieldEnumerator e)
    {
        try
        {
            Console.WriteLine("A");
            await e.YieldReturn(1);
            Console.WriteLine("B");
            await e.YieldReturn(2);
            Console.WriteLine("C");
            await e.YieldReturn(3);
            Console.WriteLine("D");
        }
        finally
        {
            Console.WriteLine("Running finally");
        }
    }

    public static async Task MyIteratorMethodInfinite(YieldEnumerator e)
    {
        for (var n = 0; ; n++)
            await e.YieldReturn(n);
    }

    public static async Task MyIteratorBroken1(YieldEnumerator e)
    {
        // always happens, but compiler doesn't know that
        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(1);
    }

    public static async Task MyIteratorBroken2(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");
    }

    public static async Task MyIteratorBroken3(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(2);
    }

    static void Main(string[] args)
    {
        foreach (var i in new YieldEnumerable(MyIteratorMethod1))
            Console.WriteLine("Yielded: " + i);

        foreach (var i in new YieldEnumerable(MyIteratorMethod2))
        {
            Console.WriteLine("Yielded: " + i);
            break; // finally should still run
        }

        foreach (var i in new YieldEnumerable(MyIteratorMethodInfinite))
        {
            if (i % 1000000 == 0) // every million times...
                Console.WriteLine("Yielded: " + i);

            if (i > 10000000)
                break;
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken1))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken2))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken3))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }
    }
}
Categories: Uncategorized Tags: , ,

Asynchronous Memoization in JavaScript

January 16, 2012 Leave a comment

In pure functional programming there is a simple rule: if you evaluate (call) a function more than once with the exact same argument values, you’ll keep getting the same return value. It follows that there is no need to call it more than once, which is super-awesome!! because it means you can put a caching mechanism in front of the function that keeps a map (hash table, Dictionary, etc.) of all the return values produced so far, each one keyed by the bundle of arguments that produced that return value.

Of course it’s only worth doing this if the dictionary lookup is faster than simply re-executing the function itself, and if the same small set of arguments are highly likely to be passed repeatedly. Yawn!!

In a non-pure language like JavaScript many (most?) functions are not pure: they examine information from other sources besides their parameters. However, they often have contexts within which they are “pure enough”. e.g. the information the user can see on the screen is, naively speaking, a projection of the information stored in the database, but if that were really true then when the database changes, the screen would immediately change as well. But it doesn’t; instead it usually remains stale until the user presses Refresh. This corresponds to “emptying the cache”.

In a complex app, there may be several separate components that project the same information in different ways. If they all go back to the external source for that information, and it is changing in real time, you could end up with an inconsistent picture on the screen. This might even cause instability, if one component tries to talk to another and they assume they’ll both have identical snapshots of the external information.

So memoization actually has a purpose in JavaScript: it can simulate a “freeze frame” of your dependencies on external data. But we need to provide the ability to delete things from the cache at the time of our choosing, “unfreezing” the frame so we can take a new snapshot.

Another complicating factor in JavaScript is asynchrony. JavaScript programmers just have to get used to doing the following transformation into “continuation-passing style” by hand; starting with:

var add = function(a1, a2) {
  return a1 + a2;
};

We switch to:

var add = function(a1, a2, done) {
  done(a1 + a2);
};

So the caller of add can no longer say:

var sum = add(2, 2);
alert('The answer is ' + sum);

And must instead say:

add(2, 2, function(sum) {
  alert('The answer is ' + sum);
});

This allows the implementation of add to utilise other functions that need to be passed a continuation in the same manner. Yo!!

So, let's memoize. To simplify matters we'll start by assuming we'll be dealing with functions that take no parameters (or always take exactly the same parameters, it's the same thing). It means we can replace the map with a single variable. We're displaying "prices" (whatever the hell they are) to the user, so if synchronous was a realistic option we'd start with:

var getPrices = function() { 
    /// Talk to server to get prices, p, somehow.
    return p; 
};

Sadly we have to be asynchronous, but it's no biggie:

var getPrices = function(done) {
    /// Talk to server to get prices, p, somehow.
    done(p); 
};

Seems like memoizing something that will be easy!

var makeCache = function(getter) {

  var value = null, ready = false;

  return {

    reset: function() {
      value = null;
      ready = false;
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        getter(function(got) {
          value = got;
          ready = true;
          done(value);
        });
      }
    }
  };

};

You'd use it to "wrap" the getPrices function like this:

var pricesCache = makeCache(getPrices);

pricesCache.get(function(prices) {
  // we've got the prices!  
});

And when you want to reset the cache, just say:

pricesCache.reset();

But actually there's a bug here: do you know what it is? Give yourself a playful slap on the thigh if you got it.

What if there's more than one call to pricesCache.get before the first one comes back with the data? We only set the ready flag when we've got the answer, which might take a second. In the meantime, various parts of the UI might be grabbing the prices to make their own updates. Each such call will launch a separate (unnecessary) call to the backend. What's worse is that the prices may actually change during this mess, and so the callers will end up with inconsistent price information, just like I was a-bellyachin' about up yonder.

First reaction: oh, oh, I know, it's a state machine! We thought there were two states, as indicated by the boolean ready flag. But actually there's three:

  1. No value.
  2. Okay, I'm, getting the value, sheesh.
  3. Got the value.

But hold on to your hose, little fireman. Think this one through for a second. It's pretty clear that when the first caller tries to get, we need to transition to the middle state and make our call to the real getter function. And when the prices come back to us, we transition to the final state and call the callback function. But what about when a second caller tries to get and we're already in the middle state? That's the whole reason for doing this, to be able to handle that differently. Where do we put their callback function?

So, yes, it is a state machine, but not a three-state one. We need to keep a list of callback functions, so that when the prices come back, we can loop through those callback functions and give every single gosh darn one of them a calling they ain't gonna forgit:

var makeCache = function(getter) {

  var value, ready, waiting = [];

  return {

    reset: function() {
      value = null;
      ready = false;
      waiting = [];
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        waiting.push(done);

        if (waiting.length === 1) {
          getter(function(got) {

            value = got;
            ready = true;

            waiting.forEach(function(w) { w(value); });
            waiting = null;
          });
        }
      }
    }
  };

};

Notice how I use waiting.forEach to loop through the callbacks. By definition here I'm calling some code that I don't have control of. It might call back into pricesCache.get. That may seem intrinsically problematic, because it sounds like it could keep happening forever and cause a stack overflow. But it might be perfectly valid: there could be some separate code making the second call to get the prices, which supplies a different callback. Anyway, is it a problem for my cache implementation? No, because any calls to pricesCache.get during my callback loop will find that ready is already set, and so will not affect the waiting array. And even if pricesCache.reset were called, that would cause a fresh array to be created and stored in waiting.

And finally, nice piece of trivia for ya: even if there was some way for waiting to grow while we are still looping through it, according to the specification of Array.forEach the new item(s) won't be included in the iteration.

Follow

Get every new post delivered to your Inbox.