Archive

Archive for December, 2012

A pure library approach to async/await in standard JavaScript

December 28, 2012 5 comments

I’m very keen on JavaScript gaining actual language support for the one thing it is mostly used for: asynchronous programming, so I never shut up about it.

The C# approach was trailed very nicely here back in Oct 2010. Over the years I’ve occasionally searched for usable projects that might implement this kind of thing in JavaScript, but they all seem to be abandoned, so I stopped looking.

And then I had an idea for how to do it purely with an ordinary JS library – a crazy idea, to be sure, but an idea nonetheless. So I searched to see if anyone else had come up with it (let me rephrase that: like all simple ideas in software, it’s a certainty that other people came up with it decades ago, probably in LISP, but I searched anyway).

I haven’t found anything yet, but I did find Bruno Jouhier’s streamline.js, which looks like a very nice (and non-abandoned!) implementation of the precompiler approach.

So what was my crazy idea? Well, as Streamline’s creator said in a reply to a comment on his blog:

But, no matter how clever it is, a pure JS library will never be able to solve the “topological” issue that I tried to describe in my last post… You need extra power to solve this problem: either a fiber or coroutine library with a yield call, a CPS transform like streamline, or direct support from the language (a yield operator).

Well, that sounds like a challenge!

If we really really want to, we can in fact solve this problem with a pure library running in ordinary JavaScript, with no generators or fibers and no precompiler. Whether this approach is practical is another matter, partly because it chews the CPU like a hungry wolf, but also because the main selling point for this kind of thing is that it makes life easier for beginners, and unfortunately to use my approach you have to understand the concept of a pure function and pay close attention to when you’re stepping outside of purity.

To set the stage, Bruno Jouhier’s example is a node.js routine that recurses through file system directories. Here’s a simplified version using the sync APIs:

var fs = require('fs');
var path = require('path');

var recurseDir = function(dir) {
    fs.readdirSync(dir).forEach(function(child) {
        if (child[0] != '.') {
            var childPath = path.join(dir, child);
            if (fs.statSync(childPath).isDirectory()) {
                recurseDir(childPath);
            } else {
                console.log(childPath);
            }
        }
    });
};

recurseDir(process.argv[2]);

And – ta da! – here’s a version that uses the async APIs but appears not to:

var fs = require('fs');
var path = require('path');

var Q = require('q');
var interrupt = require('./interrupt.js');

var readdir = interrupt.bind(Q.nfbind(fs.readdir));
var stat = interrupt.bind(Q.nfbind(fs.stat));
var consoleLog = interrupt.bind(console.log);

interrupt.async(function() {

    var recurseDir = function(dir) {
        readdir(dir).forEach(function(child) {
            if (child[0] != '.') {
                var childPath = path.join(dir, child);
                if (stat(childPath).isDirectory()) {
                    recurseDir(childPath);
                } else {
                    consoleLog(childPath);
                }
            }
        });
    };

    recurseDir(process.argv[2]);
});

The core of the program, the recurseDir function, looks practically identical. The only difference is that it calls specially wrapped versions of readdir, stat and console.log, e.g.

var readdir = interrupt.bind(Q.nfbind(fs.readdir));

The inner wrapper Q.nfbind is from the lovely q module that provides us with promises with (almost) the same pattern as jQuery.Deferred. Q.nfbind wraps a node API so that instead of accepting a function(error, result) callback it returns a promise, which can reduce yuckiness by up to 68%.

But interrupt.bind is my own fiendish contribution:

exports.bind = function(impl) {
    return function() {
        var that = this;
        var args = arguments;
        return exports.await(function() {
            return impl.apply(that, args);
        });
    };
};

So it wraps a promise-returning function inside that interrupt.await thingy. To understand what that is for, we have to go back to the start of the example, where we say:

interrupt.async(function() {

The function we pass there (let’s call it our “async block”) will be executed multiple times – this is a basic fact about all interruptible coroutines. They can be paused and restarted. But standard JavaScript doesn’t provide a way to stop a function and then start up again from the same point. You can only start again from the beginning.

In order for that to work, when a function reruns all its activity a second, third, fourth… (and so on) time, repeating everything it has already done, then it has to behave exactly the same as it did the previous time(s). Which is where functional purity comes in. A pure function is one that returns the same value when provided with the same arguments. So Math.random is not pure. Nor is reading the file system (because it might change under your feet). But quite a lot of things are pure: anything that only depends on our parameters, or the local variables containing whatever we figured out so far from our parameters.

So, inside interrupt.async we can do anything pure without headaches. But whenever we want to know about the outside world, we have to be careful. The way we do that is with interrupt.await, e.g.

var stuff = interrupt.await(function() {
    return $.get('blah-de-blah');
});

The first time the async block runs, when it goes into interrupt.wait, it executes the function we pass to it (the “initializer”), which in this case starts a download and returns a promise that will be resolved when the download is ready. But then interrupt.wait throws an exception, which cancels execution of the async block. When the promise is resolved, the async block is executed again, and this time interrupt.wait totally ignores the function passed to it, but instead returns the result of the download from the promise created on the first run, which I call an externality (because it’s data that came from outside).

The internal representation is actually quite simple. Here’s interrupt.async:

function Interrupt() {}

var currentContext = null;

exports.async = function(impl) {
    log('Creating async context');
    var thisContext = {
        ready: [],
        waiting: null,
        slot: 0,
        result: defer(),
        attempt: function() {
            log('Swapping in context for execution attempt');
            var oldContext = currentContext;
            currentContext = thisContext;
            currentContext.slot = 0;
            try {
                thisContext.result.resolve(impl());
                log('Completed successfully');
            } catch (x) {
                if (x instanceof Interrupt) {
                    log('Execution was interrupted');
                    return;
                } else {
                    log('Exception occurred: ' + JSON.stringify(x));
                    throw x;
                }
            } finally {
                log('Restoring previous context');
                currentContext = oldContext;
            }
        }
    }
    log('Making first attempt at execution');
    thisContext.attempt();
    return getPromise(thisContext.result);
};

The important part is the context, which has an array, ready, of previously captured externalities, and an integer, slot, which is the index in the ready array where the next externality will be recorded.

The more fiddly work is done in interrupt.await:

exports.await = function(init) {
    if (!currentContext) {
        throw new Error('Used interrupt.await outside of interrupt.async');
    }
    var ctx = currentContext;
    if (ctx.ready.length > ctx.slot) {
        log('Already obtained value for slot ' + ctx.slot);
        var val = ctx.ready[ctx.slot];
        if (val && val.__exception) {
            log('Throwing exception for slot ' + ctx.slot);
            throw val.__exception;
        }
        log('Returning value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
        ctx.slot++;
        return val;
    }
    if (ctx.waiting) {
        log('Still waiting for value for ' + ctx.slot + ', will interrupt');
        throw new Interrupt();
    }
    log('Executing initializer for slot ' + ctx.slot);
    var promise = init();
    if (promise && promise.then) {
        log('Obtained a promise for slot ' + ctx.slot);
        var handler = function(val) {
            if ((ctx.slot != ctx.ready.length) ||
                (ctx.waiting != promise)) {
                throw new Error('Inconsistent state in interrupt context');
            }
            log('Obtained a value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
            ctx.ready.push(val);
            ctx.waiting = null;
            log('Requesting retry of execution');
            ctx.attempt();
        };
        promise.then(handler, function(reason) {
            log('Obtained an error ' + JSON.stringify(reason) + ' for slot ' + ctx.slot);
            handler({ __exception: reason });
        });
        ctx.waiting = promise;
        throw new Interrupt();
    }
    if (ctx.slot != ctx.ready.length) {
        throw new Error('Inconsistent state in interrupt context');
    }
    // 'promise' is not a promise!
    log('Obtained a plain value ' + JSON.stringify(promise) + ' for slot ' + ctx.slot);
    ctx.ready.push(promise);
    ctx.slot++;
    return promise;
};

It can deal with an initializer that returns a plain value, and avoids the overhead of interrupting and restarting in that case, but still enforces the same behaviour of capturing the externality so it can be returned on any subsequent repeat run instead of running the initializer again.

In fact we have an example of this in the original example: console.log is not a pure function. It has side-effects: conceptually, it returns a new state-of-the-universe every time we call it. So it has to be wrapped in interrupt.await, just like any other impure operation, and we faithfully record that it returned undefined so we can return that next time we execute the same step. In this case we’re not really recording a particular external value, but we are recording the fact that we’ve already caused a particular external side-effect, so we don’t cause it multiple times.

As long as the await-wrapping rule is followed, it all works perfectly. The problem, of course, is that if there are a lot of asynchronous promises and side-effecting calls involved, then it will start to slow down as it repeatedly stops and re-executes everything it has done so far. Although in fact, it doesn’t repeat everything. A lot of the hard work involves interacting with the OS, and that is (of necessity) wrapped in interrupt.await, and so only happens once. On subsequent executions the value cached in the ready array is reused, which is looked up by position, which is quite fast. So each re-execution only involves “going through the motions” to get back to where it left off.

Even so, this extra grinding of the CPU does start to slow it down very noticeably after a healthy number of interrupts (modern JavaScript is fast, but not a miracle worker). The recursion of the file system is a very good test, because it has to effectively recursively revisit all the places it already visited so far, and has to do this for every single file (due to the stat call) and twice for directories.

One way to “cheat” would be to replace the Array.prototype.forEach function with something that understood how to interact with the current async context, and could skip forward to the right place in the iteration… but I’ll save that for another rainy day.

Advertisements
Categories: Uncategorized Tags: , ,

Use of Knockout in Nimbah

December 27, 2012 Leave a comment

As well as Nimbah being a tool that does something I occasionally need, it’s also a chance to experiment. (Also by writing this up, I’m probably going to spot some things I can simplify.)

While I’ve been using Knockout regularly in my work for a few months, it’s been as a tool that I apply here and there where appropriate. But in Nimbah I decided to go for broke and base the entire thing on Knockout. So in essence its a single view model, the whole document is bound once to the the view model, and when you edit the pipeline you are just editing the view model.

A simplified schematic version of the view model is:

var viewModel = {
    inputText: ko.observable(localStorage.getItem('savedInputText') || ''),
    selected: ko.observable(),
    pipeline: ko.observable(null),
    layout: ko.observable('vertical')
};

ko.computed(function() {
    var root = viewModel.pipeline();
    if (root) {
        root.inputValue(viewModel.inputText());
    }
});

viewModel.outputText = ko.computed(function() {
    var root = viewModel.pipeline();
    return root ? viewModel.stringify(root.outputValue()) : '';
});

In the UI the inputText and outputText properties are each bound to a textarea (the output one being readonly).

But internally, thanks to the ko.computed parts, they are really little more than aliases for two properties on the root pipeline: inputValue and outputValue. The underlying model of Nimbah is based on nodes that all follow this same pattern. A pipeline is a node and operators are nodes, and some operators contain their own pipelines as child nodes, and so on.

The building block is the node function, which builds a raw node – again in simplified schematic form:

var node = function() {
    var model = {};
    model.readOnlyChildren = ko.observableArray();

    model.insertAfter = function(newChild, existingChild) ...
    model.insertBefore = function(newChild, existingChild) ...
    model.remove = function() ...
    model.parent = ko.computed({
        read: function() ...
        write: function(val) ...
    });
    model.firstChild = ...
    model.lastChild = ...
    model.nextSibling = ...
    model.previousSibling = ...

    return model;
};

Each node has (possibly null) references to its parent, firstChild, lastChild, nextSibling and previousSibling, all of which are shielded by ko.computed with read/write operations so that they remain consistent. For example, if you assign node A to be the nextSibling of node B, that’s equivalent to B.parent().insertAfter(B, A), which ensures that B also becomes the previousSibling of A and that B’s parent is now also A’s parent.

There’s an observable array of the node’s children, called readOnlyChildren to try to emphasise that it isn’t meant to be directly modified. I have to expose it because it’s what the UI binds to, but its contents are automatically maintained (via individual insertions and deletions) by the above “proper” node properties.

Why do it this way? Because of the way nodes obtain their input values. If an operator has a previousSibling, it uses that sibling’s outputValue as its own inputValue. If it has a null previousSibling, it’s the first operator in its pipeline, so it should be using the pipeline’s inputValue. And guess what? The pipeline is its parent, so it has no problem getting the inputValue from it. Hence for any operator node, inputValue looks like this:

model.inputValue = ko.computed(function() {
    return model.previousSibling() ? model.previousSibling().outputValue() :
           model.parent() ? model.parent().inputValue() : null;
});

In a pipeline it’s a lot simpler:

model.inputValue = ko.observable(null);

This is because pipelines can act as the root of everything, or be used in whatever way an operator wants to use them. So it’s up to the owner of the pipeline to feed it with the right inputValue.

Of course, the outputValue of a node has to be entirely its own responsibility – the whole point of a node is how it turns a given input into the right output. For a pipeline, it’s just the last child node’s outputValue (or for an empty pipeline it’s just the inputValue):

model.outputValue = ko.computed(function() {
    return model.lastChild() ? model.lastChild().outputValue() : 
           model.inputValue();
});

Each kind of operator has to implement outputValue differently. Here’s split:

model.outputValue = ko.computed(function() {
    var input = model.inputValue();
    if (input && typeof input.split == 'function') {
        return input.split(model.separator());
    }
    return input;
});

So the upshot of all this is that whenever you edit the text in the input textarea, it ripples through all the operators in the pipeline completely automatically, through the miracle of ko.computed. If a node is unplugged from some location and then plugged in somewhere else, everything updates accordingly. It’s just beautiful!

And that’s before we even get on to the joy of how undo/redo just bolts onto the side without any real effort…

Categories: Uncategorized Tags: ,

A few extra operators in Nimbah

December 27, 2012 Leave a comment

I added skip and take operators that work exactly like their namesakes in Linq. Also there’s a new substring operator that has a nice instant feedback UI:

substring

Categories: Uncategorized Tags:

Nimbah in one minute

December 27, 2012 1 comment

Over the “holidays” I’ve been tinkering with a simple online tool that gives me a fast and easy way to reformat text (prompted by the fact that the macro recorder has been removed from Visual Studio 2012).

Here’s a very quick intro to it (will be clearer if you open it in YouTube fullscreen at 1080p).

You can play with it yourself here: http://www.earwicker.com/nimbah

Categories: Uncategorized Tags:

Virtualized scrolling in knockout.js

December 26, 2012 Leave a comment

If you have an observableArray of things and you want to display it, of course the first thing to go for is the foreach binding. But what if the array is sometimes pretty long? It can take a long time for knockout to build the elements for each item. Once it’s in the thousands, it can freeze the UI for a while.

The classic-and-classy solution is to only ever create the handful of elements that are currently visible within the scrollable area. For example, here’s a plain old foreach binding to an array called all:

<div class="dataDisplay" data-bind="foreach: all">
    <div class="dataField" data-bind="text: name, drag: pick"></div>
    <div class="dataValue"><pre data-bind="text: value"></pre></div>
</div>

The first change I had to make was to put the list into a nested div. Why? Because I want to keep restricting the size of the dataDisplay using absolute positioning, but I’m also going to need to programmatically set the height of the scrollable content (which, as you’d expect, is going to be much bigger than the fixed height of the outer div):

<div class="dataDisplay">
    <div class="dataDisplayContent" data-bind="virtualScroll: { rows: all, rowHeight: 26 }">
        <div class="dataField" data-bind="text: name, drag: pick"></div>
        <div class="dataValue"><pre data-bind="text: value"></pre></div>
    </div>
</div>

You may have noticed another change – the binding is now something called virtualScroll, and it takes an object with two properties: rows is just the same observableArray as before, and rowHeight is the pixel height of each row. This is the fiddly part of virtual scrolling: by far the easiest technique is to hold the height of the rows constant, as we’ll see.

So, how does the virtualScroll binding work? Like many effective bindings, it’s like a swan; on the surface the motion is graceful, but below the water things don’t look so tidy. In fact it’s a mess so I’ll tackle it one piece at a time. It starts in the usual way:

ko.bindingHandlers.virtualScroll = {
    init: function(element, valueAccessor, allBindingsAccessor, 
                   viewModel, context) {

The very first thing we have to do steal the contents of the target element. This will serve as our template for each item in the list, so we clone our own copy so we can use it later:

        var clone = $(element).clone();
        $(element).empty();

Then we grab our settings. I name the whole object config, and I immediately unwrap the rowHeight so I can conveniently use it in several places:

        var config = ko.utils.unwrapObservable(valueAccessor());
        var rowHeight = ko.utils.unwrapObservable(config.rowHeight);

Now, we get the length of the rows array, multiply by the fixed rowHeight, and so set the height of our target element, which typically makes it bigger than the container div, which should then get scrollbars (because it should have overflow: auto set in CSS).

But thanks to the magic of ko.computed, knockout notices that we access the rows() observable and so arranges for the function to run again when the observable’s value changes. And so when the length of the array changes, so does the height of our scrollable content. (This would be a little more effort if the rows could vary in height).

        ko.computed(function() {
            $(element).css({
                height: config.rows().length * rowHeight
            });
        });

Now, ko.computed is wonderful, but it can only work with existing observables. Sadly, the browser DOM is full of useful information that doesn’t tell us when it changes. As a hack-around, I use a little helper function called simulatedObservable, which you can read about here. For now, just accept that offset and windowHeight are observables that track the y-offset of the target element, and the height of the whole window.

        var offset = simulatedObservable(element, function() {
            return $(element).offset().top;
        });

        var windowHeight = simulatedObservable(element, function() {
            return window.innerHeight;
        });

We have all the external info we need. Next we have to maintain our own state, which is a record of all the rows that we have “materialised” into real DOM elements. The others simply don’t exist.

        var created = {};

And now we begin a rather large nested function called refresh, responsible for materialising any rows that are currently visible.

        var refresh = function() {
            var o = offset();
            var data = config.rows();
            var top = Math.max(0, Math.floor(-o / rowHeight) - 10);
            var bottom = Math.min(data.length, Math.ceil((-o + windowHeight()) / rowHeight));

The top and bottom variables are the indexes of the first and one-after-last rows that we consider visible (actually we’re being quite generous because we use the height of the whole window as the bottom bound). This would be a lot more troublesome if the rows could vary in height.

So we can loop through just these rows, clone our “template” and ask knockout to bind to it, using the row’s item from the array as the data model.

            for (var row = top; row < bottom; row++) {
                if (!created[row]) {
                    var rowDiv = $('<div></div>');
                    rowDiv.css({
                        position: 'absolute',
                        height: config.rowHeight,
                        left: 0,
                        right: 0,
                        top: row * config.rowHeight
                    });
                    rowDiv.append(clone.clone().children());
                    ko.applyBindingsToDescendants(
                        context.createChildContext(data[row]), rowDiv[0]);
                    created[row] = rowDiv;
                    $(element).append(rowDiv);
                }
            }

Finally, we can destroy any previously materialised row that is not in the currently visible range:

            Object.keys(created).forEach(function(rowNum) {
                if (rowNum < top || rowNum >= bottom) {
                    created[rowNum].remove();
                    delete created[rowNum];
                }
            });
        };

And that concludes the refresh function. There are two places we call it. First, when the rows() observable changes, we clear out all materialised rows, and then call refresh.

        config.rows.subscribe(function() {
            Object.keys(created).forEach(function(rowNum) {
                created[rowNum].remove();
                delete created[rowNum];
            });
            refresh();
        });

The second place is a little simpler: we just give it to ko.computed so it can do its usual magic and ensure that refresh runs whenever there is a change in any of the values it depends on.

        ko.computed(refresh);

Finally, we tell Knockout not to automatically perform binding on the child elements, as we take care of that ourselves:

        return { controlsDescendantBindings: true };
    }
};

You can see this in action in my Nimbah project (source code here), where I use it to display the input or output data for the selected node, allowing it to scale to thousands of rows without slowing down the UI.

Knockout observables where there was no observables before!

December 26, 2012 1 comment

A truly great example of a jQuery plugin is Ben Alman’s resize, which allows you to bind to a resize event on any element, not just the window. It does this by checking for changes to the size of the element in a timer loop – a hack, to be sure, but very nicely hidden behind the same programming interface used in the more normal cases.

Inspired by this, we can take the same approach for knockout, and turn any property of the DOM into an observable.

var simulatedObservable = (function() {

    var timer = null;
    var items = [];

    var check = function() {
        items = items.filter(function(item) {
            return !!item.elem.parents('html').length;
        });
        if (items.length === 0) {
            clearInterval(timer);
            timer = null;
            return;
        }
        items.forEach(function(item) {
            item.obs(item.getter());
        });
    };

    return function(elem, getter) {
        var obs = ko.observable(getter());
        items.push({ obs: obs, getter: getter, elem: $(elem) });
        if (timer === null) {
            timer = setInterval(check, 100);
        }
        return obs;
    };
})();

What’s going on? The whole thing is in the module pattern, so the actual function we’re defining is the one we return at the end. It accepts an element and another function that should get the current value of the property we want to observe. Why do we need the element? It’s just so we can detect when we should stop observing the property.

As long as one of these observables is active, we keep running the check function at 100 ms intervals, which means we share one timer across all instances, automatically stopping it when there are no observables still active. And the check function is pretty self-explanatory. It doesn’t even really need to check if the value has changed, because ko.observable does that for us when we give it a new value. The only bit of real work it has to do is check whether an element is still in the DOM (which is what that parents('html').length part is about).

So how can we use this? Let me count the ways… Suppose you want an observable that holds the vertical offset of a div?

var offset = simulatedObservable(element, function() {
    return $(element).offset().top;
});

And you’re done!

Categories: Uncategorized Tags: ,

jQuery UI drag and drop bindings for knockout.js

December 26, 2012 Leave a comment

If there’s some functionality you need in your knockout app, just add custom bindings. Here’s one way to enable drag and drop by bringing in the jQuery UI library to do it all for us. First, the drag binding:

ko.bindingHandlers.drag = {
    init: function(element, valueAccessor, allBindingsAccessor, 
                   viewModel, context) {
        var value = valueAccessor();
        $(element).draggable({
            containment: 'window',
            helper: function(evt, ui) {
                var h = $(element).clone().css({
                    width: $(element).width(),
                    height: $(element).height()
                });
                h.data('ko.draggable.data', value(context, evt));
                return h;
            },
            appendTo: 'body'
        });
    }
};

The value for the binding is a function that should return some object or value that describes what is being dragged, e.g. here the view model of the draggable element has a save function that returns the persistent data of the object:

<div data-bind="drag: save">;

Then to allow it to be dropped somewhere, we need the drop binding:

ko.bindingHandlers.drop = {
    init: function(element, valueAccessor, allBindingsAccessor, 
                   viewModel, context) {
        var value = valueAccessor();
        $(element).droppable({
            tolerance: 'pointer',
            hoverClass: 'dragHover',
            activeClass: 'dragActive',
            drop: function(evt, ui) {
                value(ui.helper.data('ko.draggable.data'), context);
            }
        });
    }
};

Again, the value is a function, which this time receives the dragged data as its first argument, so it’s nice and symmetrical:

<div data-bind="drop: dropped">;

If the drop regions are contained in scrollable areas, then decent autoscrolling is a must, and it can be implemented from the drag binding using a timer. There’s a working implementation in my Nimbah project.