Archive

Archive for the ‘Uncategorized’ Category

How to write a “class” in JavaScript

February 16, 2013 2 comments

Of all the problems with JavaScript, by far the worst is the way people try to [ab]use it based on their training in Java and Object Orientation. They want to write a class, and then they want to write another class that inherits from the first class, because That How You Write A Software’sTM.

And this is made worse by a few features in the language that encourage people to make these mistakes: new, prototype, this.

The first hurdle to get over is inheritance. Consult any current text on OO and it will tell you that interface inheritance is fine, but implementation inheritance is a heap of trouble, so try to avoid it. In JavaScript, interface inheritance is completely unnecessary because there is no static typing at all. So that leaves implementation inheritance, which is what people usually mean by “inheritance” when they’ve been to Hollywood Upstairs Java College, and this is the part that even Java experts try to warn you away from.

So to get this absolutely clear: when you try to do inheritance in JavaScript, you’re trying to replicate something from Java that experts tell you not to do in Java. If you already know that, and you’re still insistent on trying, I can’t help you. You’re obviously insane.

For everyone else, this is the simplest way to write something that serves the approximate purpose of a “class” in JavaScript. It’s a factory function: when you call it, it manufactures an object:

var person = function(firstName, lastName) {
    return {
        getFullName: function() {
            return firstName + ' ' + lastName;
        },
        setFirstName: function(newName) {
            firstName = newName;
        },
        setLastName: function(newName) {
            lastName = newName;
        }
    };
};

// Usage:
var henryJones = person('Henry', 'Jones');

console.log(henryJones.getFullName());

henryJones.setFirstName('Indiana');

Note how there was no need to copy the constructor parameters into some separate fields. They already behave like private fields (actually more private than in the JVM or CLR, because in those runtimes you can use reflection to get to private fields).

There is a difference between the above pattern versus using prototype on a constructor function that has to be called with new – I mean aside from the convenience of not have to use new and this everywhere). The difference is that for each instance of a person, we create four objects: the one we return and the three functions stored in it.

If instead we did it the messy way:

function Person(firstName, lastName) {
    this.firstName = firstName;
    this.lastName = lastName;
};

Person.prototype.getFullName = function() {
    return this.firstName + ' ' + this.lastName;
};

Person.prototype.setFirstName = function(newName) {
    return this.firstName = newName;
}

Person.prototype.setLastName = function(newName) {
    return this.lastName = newName;
};

// Usage:
var henryJones = new Person('Henry', 'Jones');

Now the three function objects are shared between all instances, so only one object gets created per person instance. And to the typical, irrational programmer who places premature optimisation above all other concerns, this is vastly preferable. Never mind that the fields firstName and lastName are now public! That is a minor concern compared to the terrible thought of creating three extra function objects per instance.

But hang on one second: have you considered the following factors?

  • How many person instances are going to exist at any time while your code is running? If it’s small, then the cost per instance is irrelevant.
  • What other things are you going to do, per instance? If you’re going to build a load of DOM nodes with animations on them and fire off AJAX calls every second (come on, admit, that’s exactly what you’re going to do), then the cost of a few function objects per instance is lost in the noise.
  • Have you realised just how amazingly good modern JS runtimes are? They love allocating and throwing away lots of little objects.

If you do ever run into a situation where the difference is significant, you can apply the necessary optimisation in one place: inside your person factory function, without changing any other code:

var person = (function() {
    var vtable = {
        getFullName: function() {
            return this._firstName + ' ' + this._lastName;
        },
        setFirstName: function(newName) {    
            this._firstName = newName;
        },
        setLastName: function(newName) {
            this._lastName = newName;
        }
    };
    return function(firstName, lastName) {
        var p = Object.create(vtable);
        p._firstName = firstName;
        p._lastName = lastName;
        return p;
    };
})();

But in the vast majority of situations, there will never be any need to do that (i.e. when you do it, the difference will be practically unmeasurable), so why bother until you have a realistic model of your application with which to carry out your performance testing? (And you’ve also made sure that you have nothing better to do with your time…)

To find out precisely how much difference this makes, and thus when it might be worth worrying about it, all we have to do is try allocating some objects in the Chrome console. Let’s pick a ridiculously big number: a million.

var start = new Date().getTime();
var p = [];
for (var n = 0; n < 1000000; n++) {
    p.push(person('Henry', 'Jones' + n));
}
console.log(new Date().getTime() - start);

We can then run this test three times each with the plain and optimised implementations and find out what the overhead actually amounts to.

The first point of interest is that the allocation of a million objects takes about two and half seconds in a freshly-opened Chrome instance on my run-of-the-mill Dell notebook, with either implementation. Taking the average of three runs with each, the “fast” version took 2390 ms and the “slow” took 2570 ms. That’s 180 ms difference over a million allocations! We’re talking about mere nanoseconds per extra function object being allocated. Just forget about speed being an issue. It’s not.

What about memory? The relevant Chrome process showed an increase of 81 MB for the “fast” version, versus 204 MB for the “slow”, so a total overhead of 123 MB caused by those extra function objects. Again, that’s shared across a million objects, so the overhead per object is just 130 bytes or so. Is that a big deal?

If you’re writing a game and trying to model specks of dust floating around the screen, it might be realistic to allocate a million of something. But in real business apps, we don’t create a million person objects all at once. We create objects based on downloaded data that we retrieve in small pages from the server (otherwise if a thousand users try to open the app at the same time, the server will have transmit gigabytes of stuff that none of those users have enough lifetime left to ever read).

So a hundred objects is more realistic. Then the total overhead in this example would be 13 KB – the space consumed by a small image, the kind used in most real web apps without a moment’s thought.

Another comparison is to look at the overhead of DOM nodes. The objects we create in JavaScript are typically there to create and manage our user interface elements in the browser DOM. How much does a single empty DIV cost?

var p = []; 
for (var n = 0; n < 1000000; n++) {
    p.push(document.createElement('div')); 
}

That added 128 MB to a fresh Chrome process. What a coincidence! 134 bytes per empty DIV, a fraction more overhead than our three function objects.

So the conclusion must be: relax, stop worrying about illusory “performance” concerns, and focus instead on writing code that is readable, maintainable, hard to get wrong, hard to break accidentally when modifying it, and so on.

Categories: Uncategorized Tags: , ,

Arguing with a Decade-Old Eric Lippert Post

January 7, 2013 2 comments

But it’s on the first page of Google results for c++ smart pointer addref release so I have to do this, dammit!

The post in question is: Smart Pointers are Too Smart.

Eric asked:

Does this code look correct?

map[srpFoo.Disown()] = srpBar.Disown();

It sure looks correct, doesn’t it?

Nope, it looks like a disaster waiting to happen. The map ought to have keys and values that are smart pointers, so the correct code would be:

map[srpFoo] = srpBar;

If the map holds raw pointers, that in itself is the error, because it will necessitate some complicated dance to transfer ownership in and out, which means bandying around raw pointers casually, and in C++ that leads to disaster, because it can only be done safely in code paths that can be proven to not throw exceptions.

In fact its even more general and simple than that. If you write some code in your C++ program that initiates any state at all, and the previous state will not be restored automatically during stack unwinding, and you haven’t taken the time to prove that no exceptions will be thrown during this state, you’re doing it wrong.

Far from showing what goes wrong when you use smart pointers, Eric showed what goes wrong when you stop using them, even for a nanosecond. If you never use them, the wrongitude will be uninterrupted.

It’s quite possible that forcing yourself to put all the AddRef and Release calls in the right places by hand will have a sort of positive side-effect: you will be so distracted by this effort that you will not have sufficient time or patience to be productive, and so this will force you to write only very simple programs, which is definitely a good thing, but perhaps arrived at for the wrong reasons.

And conversely, when a problem becomes rarer, it seems more egregious. “I remember the good old days, in the Blitz, when we were being bombed every day, but at least we were used to it!”

The real problem underlying all of COM is probably that IUnknown doesn’t provide a way to attach a listener for when the object is destroyed. If it had this, we could create a safe “weak” pointer, one that doesn’t keep the object in existence but is automatically set to nullptr when the object expires. This is addressed in the more recent WinRT variety of COM, where all sensible objects should implement IWeakReferenceSource, and in fact classes created with C++/CX do this automatically.

And ultimately this is perhaps the simplest response to Eric’s blog post: with C++/CX, smart pointers are baked into the language. Reference count cycles will still occur, no question, which is why the whole exercise is a silly step backwards, but only relative to something much better: pick just about any managed GC-enabled runtime.

Categories: Uncategorized Tags: ,

A pure library approach to async/await in standard JavaScript

December 28, 2012 5 comments

I’m very keen on JavaScript gaining actual language support for the one thing it is mostly used for: asynchronous programming, so I never shut up about it.

The C# approach was trailed very nicely here back in Oct 2010. Over the years I’ve occasionally searched for usable projects that might implement this kind of thing in JavaScript, but they all seem to be abandoned, so I stopped looking.

And then I had an idea for how to do it purely with an ordinary JS library – a crazy idea, to be sure, but an idea nonetheless. So I searched to see if anyone else had come up with it (let me rephrase that: like all simple ideas in software, it’s a certainty that other people came up with it decades ago, probably in LISP, but I searched anyway).

I haven’t found anything yet, but I did find Bruno Jouhier’s streamline.js, which looks like a very nice (and non-abandoned!) implementation of the precompiler approach.

So what was my crazy idea? Well, as Streamline’s creator said in a reply to a comment on his blog:

But, no matter how clever it is, a pure JS library will never be able to solve the “topological” issue that I tried to describe in my last post… You need extra power to solve this problem: either a fiber or coroutine library with a yield call, a CPS transform like streamline, or direct support from the language (a yield operator).

Well, that sounds like a challenge!

If we really really want to, we can in fact solve this problem with a pure library running in ordinary JavaScript, with no generators or fibers and no precompiler. Whether this approach is practical is another matter, partly because it chews the CPU like a hungry wolf, but also because the main selling point for this kind of thing is that it makes life easier for beginners, and unfortunately to use my approach you have to understand the concept of a pure function and pay close attention to when you’re stepping outside of purity.

To set the stage, Bruno Jouhier’s example is a node.js routine that recurses through file system directories. Here’s a simplified version using the sync APIs:

var fs = require('fs');
var path = require('path');

var recurseDir = function(dir) {
    fs.readdirSync(dir).forEach(function(child) {
        if (child[0] != '.') {
            var childPath = path.join(dir, child);
            if (fs.statSync(childPath).isDirectory()) {
                recurseDir(childPath);
            } else {
                console.log(childPath);
            }
        }
    });
};

recurseDir(process.argv[2]);

And – ta da! – here’s a version that uses the async APIs but appears not to:

var fs = require('fs');
var path = require('path');

var Q = require('q');
var interrupt = require('./interrupt.js');

var readdir = interrupt.bind(Q.nfbind(fs.readdir));
var stat = interrupt.bind(Q.nfbind(fs.stat));
var consoleLog = interrupt.bind(console.log);

interrupt.async(function() {

    var recurseDir = function(dir) {
        readdir(dir).forEach(function(child) {
            if (child[0] != '.') {
                var childPath = path.join(dir, child);
                if (stat(childPath).isDirectory()) {
                    recurseDir(childPath);
                } else {
                    consoleLog(childPath);
                }
            }
        });
    };

    recurseDir(process.argv[2]);
});

The core of the program, the recurseDir function, looks practically identical. The only difference is that it calls specially wrapped versions of readdir, stat and console.log, e.g.

var readdir = interrupt.bind(Q.nfbind(fs.readdir));

The inner wrapper Q.nfbind is from the lovely q module that provides us with promises with (almost) the same pattern as jQuery.Deferred. Q.nfbind wraps a node API so that instead of accepting a function(error, result) callback it returns a promise, which can reduce yuckiness by up to 68%.

But interrupt.bind is my own fiendish contribution:

exports.bind = function(impl) {
    return function() {
        var that = this;
        var args = arguments;
        return exports.await(function() {
            return impl.apply(that, args);
        });
    };
};

So it wraps a promise-returning function inside that interrupt.await thingy. To understand what that is for, we have to go back to the start of the example, where we say:

interrupt.async(function() {

The function we pass there (let’s call it our “async block”) will be executed multiple times – this is a basic fact about all interruptible coroutines. They can be paused and restarted. But standard JavaScript doesn’t provide a way to stop a function and then start up again from the same point. You can only start again from the beginning.

In order for that to work, when a function reruns all its activity a second, third, fourth… (and so on) time, repeating everything it has already done, then it has to behave exactly the same as it did the previous time(s). Which is where functional purity comes in. A pure function is one that returns the same value when provided with the same arguments. So Math.random is not pure. Nor is reading the file system (because it might change under your feet). But quite a lot of things are pure: anything that only depends on our parameters, or the local variables containing whatever we figured out so far from our parameters.

So, inside interrupt.async we can do anything pure without headaches. But whenever we want to know about the outside world, we have to be careful. The way we do that is with interrupt.await, e.g.

var stuff = interrupt.await(function() {
    return $.get('blah-de-blah');
});

The first time the async block runs, when it goes into interrupt.wait, it executes the function we pass to it (the “initializer”), which in this case starts a download and returns a promise that will be resolved when the download is ready. But then interrupt.wait throws an exception, which cancels execution of the async block. When the promise is resolved, the async block is executed again, and this time interrupt.wait totally ignores the function passed to it, but instead returns the result of the download from the promise created on the first run, which I call an externality (because it’s data that came from outside).

The internal representation is actually quite simple. Here’s interrupt.async:

function Interrupt() {}

var currentContext = null;

exports.async = function(impl) {
    log('Creating async context');
    var thisContext = {
        ready: [],
        waiting: null,
        slot: 0,
        result: defer(),
        attempt: function() {
            log('Swapping in context for execution attempt');
            var oldContext = currentContext;
            currentContext = thisContext;
            currentContext.slot = 0;
            try {
                thisContext.result.resolve(impl());
                log('Completed successfully');
            } catch (x) {
                if (x instanceof Interrupt) {
                    log('Execution was interrupted');
                    return;
                } else {
                    log('Exception occurred: ' + JSON.stringify(x));
                    throw x;
                }
            } finally {
                log('Restoring previous context');
                currentContext = oldContext;
            }
        }
    }
    log('Making first attempt at execution');
    thisContext.attempt();
    return getPromise(thisContext.result);
};

The important part is the context, which has an array, ready, of previously captured externalities, and an integer, slot, which is the index in the ready array where the next externality will be recorded.

The more fiddly work is done in interrupt.await:

exports.await = function(init) {
    if (!currentContext) {
        throw new Error('Used interrupt.await outside of interrupt.async');
    }
    var ctx = currentContext;
    if (ctx.ready.length > ctx.slot) {
        log('Already obtained value for slot ' + ctx.slot);
        var val = ctx.ready[ctx.slot];
        if (val && val.__exception) {
            log('Throwing exception for slot ' + ctx.slot);
            throw val.__exception;
        }
        log('Returning value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
        ctx.slot++;
        return val;
    }
    if (ctx.waiting) {
        log('Still waiting for value for ' + ctx.slot + ', will interrupt');
        throw new Interrupt();
    }
    log('Executing initializer for slot ' + ctx.slot);
    var promise = init();
    if (promise && promise.then) {
        log('Obtained a promise for slot ' + ctx.slot);
        var handler = function(val) {
            if ((ctx.slot != ctx.ready.length) ||
                (ctx.waiting != promise)) {
                throw new Error('Inconsistent state in interrupt context');
            }
            log('Obtained a value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
            ctx.ready.push(val);
            ctx.waiting = null;
            log('Requesting retry of execution');
            ctx.attempt();
        };
        promise.then(handler, function(reason) {
            log('Obtained an error ' + JSON.stringify(reason) + ' for slot ' + ctx.slot);
            handler({ __exception: reason });
        });
        ctx.waiting = promise;
        throw new Interrupt();
    }
    if (ctx.slot != ctx.ready.length) {
        throw new Error('Inconsistent state in interrupt context');
    }
    // 'promise' is not a promise!
    log('Obtained a plain value ' + JSON.stringify(promise) + ' for slot ' + ctx.slot);
    ctx.ready.push(promise);
    ctx.slot++;
    return promise;
};

It can deal with an initializer that returns a plain value, and avoids the overhead of interrupting and restarting in that case, but still enforces the same behaviour of capturing the externality so it can be returned on any subsequent repeat run instead of running the initializer again.

In fact we have an example of this in the original example: console.log is not a pure function. It has side-effects: conceptually, it returns a new state-of-the-universe every time we call it. So it has to be wrapped in interrupt.await, just like any other impure operation, and we faithfully record that it returned undefined so we can return that next time we execute the same step. In this case we’re not really recording a particular external value, but we are recording the fact that we’ve already caused a particular external side-effect, so we don’t cause it multiple times.

As long as the await-wrapping rule is followed, it all works perfectly. The problem, of course, is that if there are a lot of asynchronous promises and side-effecting calls involved, then it will start to slow down as it repeatedly stops and re-executes everything it has done so far. Although in fact, it doesn’t repeat everything. A lot of the hard work involves interacting with the OS, and that is (of necessity) wrapped in interrupt.await, and so only happens once. On subsequent executions the value cached in the ready array is reused, which is looked up by position, which is quite fast. So each re-execution only involves “going through the motions” to get back to where it left off.

Even so, this extra grinding of the CPU does start to slow it down very noticeably after a healthy number of interrupts (modern JavaScript is fast, but not a miracle worker). The recursion of the file system is a very good test, because it has to effectively recursively revisit all the places it already visited so far, and has to do this for every single file (due to the stat call) and twice for directories.

One way to “cheat” would be to replace the Array.prototype.forEach function with something that understood how to interact with the current async context, and could skip forward to the right place in the iteration… but I’ll save that for another rainy day.

Categories: Uncategorized Tags: , ,

Use of Knockout in Nimbah

December 27, 2012 Leave a comment

As well as Nimbah being a tool that does something I occasionally need, it’s also a chance to experiment. (Also by writing this up, I’m probably going to spot some things I can simplify.)

While I’ve been using Knockout regularly in my work for a few months, it’s been as a tool that I apply here and there where appropriate. But in Nimbah I decided to go for broke and base the entire thing on Knockout. So in essence its a single view model, the whole document is bound once to the the view model, and when you edit the pipeline you are just editing the view model.

A simplified schematic version of the view model is:

var viewModel = {
    inputText: ko.observable(localStorage.getItem('savedInputText') || ''),
    selected: ko.observable(),
    pipeline: ko.observable(null),
    layout: ko.observable('vertical')
};

ko.computed(function() {
    var root = viewModel.pipeline();
    if (root) {
        root.inputValue(viewModel.inputText());
    }
});

viewModel.outputText = ko.computed(function() {
    var root = viewModel.pipeline();
    return root ? viewModel.stringify(root.outputValue()) : '';
});

In the UI the inputText and outputText properties are each bound to a textarea (the output one being readonly).

But internally, thanks to the ko.computed parts, they are really little more than aliases for two properties on the root pipeline: inputValue and outputValue. The underlying model of Nimbah is based on nodes that all follow this same pattern. A pipeline is a node and operators are nodes, and some operators contain their own pipelines as child nodes, and so on.

The building block is the node function, which builds a raw node – again in simplified schematic form:

var node = function() {
    var model = {};
    model.readOnlyChildren = ko.observableArray();

    model.insertAfter = function(newChild, existingChild) ...
    model.insertBefore = function(newChild, existingChild) ...
    model.remove = function() ...
    model.parent = ko.computed({
        read: function() ...
        write: function(val) ...
    });
    model.firstChild = ...
    model.lastChild = ...
    model.nextSibling = ...
    model.previousSibling = ...

    return model;
};

Each node has (possibly null) references to its parent, firstChild, lastChild, nextSibling and previousSibling, all of which are shielded by ko.computed with read/write operations so that they remain consistent. For example, if you assign node A to be the nextSibling of node B, that’s equivalent to B.parent().insertAfter(B, A), which ensures that B also becomes the previousSibling of A and that B’s parent is now also A’s parent.

There’s an observable array of the node’s children, called readOnlyChildren to try to emphasise that it isn’t meant to be directly modified. I have to expose it because it’s what the UI binds to, but its contents are automatically maintained (via individual insertions and deletions) by the above “proper” node properties.

Why do it this way? Because of the way nodes obtain their input values. If an operator has a previousSibling, it uses that sibling’s outputValue as its own inputValue. If it has a null previousSibling, it’s the first operator in its pipeline, so it should be using the pipeline’s inputValue. And guess what? The pipeline is its parent, so it has no problem getting the inputValue from it. Hence for any operator node, inputValue looks like this:

model.inputValue = ko.computed(function() {
    return model.previousSibling() ? model.previousSibling().outputValue() :
           model.parent() ? model.parent().inputValue() : null;
});

In a pipeline it’s a lot simpler:

model.inputValue = ko.observable(null);

This is because pipelines can act as the root of everything, or be used in whatever way an operator wants to use them. So it’s up to the owner of the pipeline to feed it with the right inputValue.

Of course, the outputValue of a node has to be entirely its own responsibility – the whole point of a node is how it turns a given input into the right output. For a pipeline, it’s just the last child node’s outputValue (or for an empty pipeline it’s just the inputValue):

model.outputValue = ko.computed(function() {
    return model.lastChild() ? model.lastChild().outputValue() : 
           model.inputValue();
});

Each kind of operator has to implement outputValue differently. Here’s split:

model.outputValue = ko.computed(function() {
    var input = model.inputValue();
    if (input && typeof input.split == 'function') {
        return input.split(model.separator());
    }
    return input;
});

So the upshot of all this is that whenever you edit the text in the input textarea, it ripples through all the operators in the pipeline completely automatically, through the miracle of ko.computed. If a node is unplugged from some location and then plugged in somewhere else, everything updates accordingly. It’s just beautiful!

And that’s before we even get on to the joy of how undo/redo just bolts onto the side without any real effort…

Categories: Uncategorized Tags: ,

A few extra operators in Nimbah

December 27, 2012 Leave a comment

I added skip and take operators that work exactly like their namesakes in Linq. Also there’s a new substring operator that has a nice instant feedback UI:

substring

Categories: Uncategorized Tags:

Nimbah in one minute

December 27, 2012 1 comment

Over the “holidays” I’ve been tinkering with a simple online tool that gives me a fast and easy way to reformat text (prompted by the fact that the macro recorder has been removed from Visual Studio 2012).

Here’s a very quick intro to it (will be clearer if you open it in YouTube fullscreen at 1080p).

You can play with it yourself here: http://www.earwicker.com/nimbah

Categories: Uncategorized Tags:

Virtualized scrolling in knockout.js

December 26, 2012 Leave a comment

If you have an observableArray of things and you want to display it, of course the first thing to go for is the foreach binding. But what if the array is sometimes pretty long? It can take a long time for knockout to build the elements for each item. Once it’s in the thousands, it can freeze the UI for a while.

The classic-and-classy solution is to only ever create the handful of elements that are currently visible within the scrollable area. For example, here’s a plain old foreach binding to an array called all:

<div class="dataDisplay" data-bind="foreach: all">
    <div class="dataField" data-bind="text: name, drag: pick"></div>
    <div class="dataValue"><pre data-bind="text: value"></pre></div>
</div>

The first change I had to make was to put the list into a nested div. Why? Because I want to keep restricting the size of the dataDisplay using absolute positioning, but I’m also going to need to programmatically set the height of the scrollable content (which, as you’d expect, is going to be much bigger than the fixed height of the outer div):

<div class="dataDisplay">
    <div class="dataDisplayContent" data-bind="virtualScroll: { rows: all, rowHeight: 26 }">
        <div class="dataField" data-bind="text: name, drag: pick"></div>
        <div class="dataValue"><pre data-bind="text: value"></pre></div>
    </div>
</div>

You may have noticed another change – the binding is now something called virtualScroll, and it takes an object with two properties: rows is just the same observableArray as before, and rowHeight is the pixel height of each row. This is the fiddly part of virtual scrolling: by far the easiest technique is to hold the height of the rows constant, as we’ll see.

So, how does the virtualScroll binding work? Like many effective bindings, it’s like a swan; on the surface the motion is graceful, but below the water things don’t look so tidy. In fact it’s a mess so I’ll tackle it one piece at a time. It starts in the usual way:

ko.bindingHandlers.virtualScroll = {
    init: function(element, valueAccessor, allBindingsAccessor, 
                   viewModel, context) {

The very first thing we have to do steal the contents of the target element. This will serve as our template for each item in the list, so we clone our own copy so we can use it later:

        var clone = $(element).clone();
        $(element).empty();

Then we grab our settings. I name the whole object config, and I immediately unwrap the rowHeight so I can conveniently use it in several places:

        var config = ko.utils.unwrapObservable(valueAccessor());
        var rowHeight = ko.utils.unwrapObservable(config.rowHeight);

Now, we get the length of the rows array, multiply by the fixed rowHeight, and so set the height of our target element, which typically makes it bigger than the container div, which should then get scrollbars (because it should have overflow: auto set in CSS).

But thanks to the magic of ko.computed, knockout notices that we access the rows() observable and so arranges for the function to run again when the observable’s value changes. And so when the length of the array changes, so does the height of our scrollable content. (This would be a little more effort if the rows could vary in height).

        ko.computed(function() {
            $(element).css({
                height: config.rows().length * rowHeight
            });
        });

Now, ko.computed is wonderful, but it can only work with existing observables. Sadly, the browser DOM is full of useful information that doesn’t tell us when it changes. As a hack-around, I use a little helper function called simulatedObservable, which you can read about here. For now, just accept that offset and windowHeight are observables that track the y-offset of the target element, and the height of the whole window.

        var offset = simulatedObservable(element, function() {
            return $(element).offset().top;
        });

        var windowHeight = simulatedObservable(element, function() {
            return window.innerHeight;
        });

We have all the external info we need. Next we have to maintain our own state, which is a record of all the rows that we have “materialised” into real DOM elements. The others simply don’t exist.

        var created = {};

And now we begin a rather large nested function called refresh, responsible for materialising any rows that are currently visible.

        var refresh = function() {
            var o = offset();
            var data = config.rows();
            var top = Math.max(0, Math.floor(-o / rowHeight) - 10);
            var bottom = Math.min(data.length, Math.ceil((-o + windowHeight()) / rowHeight));

The top and bottom variables are the indexes of the first and one-after-last rows that we consider visible (actually we’re being quite generous because we use the height of the whole window as the bottom bound). This would be a lot more troublesome if the rows could vary in height.

So we can loop through just these rows, clone our “template” and ask knockout to bind to it, using the row’s item from the array as the data model.

            for (var row = top; row < bottom; row++) {
                if (!created[row]) {
                    var rowDiv = $('<div></div>');
                    rowDiv.css({
                        position: 'absolute',
                        height: config.rowHeight,
                        left: 0,
                        right: 0,
                        top: row * config.rowHeight
                    });
                    rowDiv.append(clone.clone().children());
                    ko.applyBindingsToDescendants(
                        context.createChildContext(data[row]), rowDiv[0]);
                    created[row] = rowDiv;
                    $(element).append(rowDiv);
                }
            }

Finally, we can destroy any previously materialised row that is not in the currently visible range:

            Object.keys(created).forEach(function(rowNum) {
                if (rowNum < top || rowNum >= bottom) {
                    created[rowNum].remove();
                    delete created[rowNum];
                }
            });
        };

And that concludes the refresh function. There are two places we call it. First, when the rows() observable changes, we clear out all materialised rows, and then call refresh.

        config.rows.subscribe(function() {
            Object.keys(created).forEach(function(rowNum) {
                created[rowNum].remove();
                delete created[rowNum];
            });
            refresh();
        });

The second place is a little simpler: we just give it to ko.computed so it can do its usual magic and ensure that refresh runs whenever there is a change in any of the values it depends on.

        ko.computed(refresh);

Finally, we tell Knockout not to automatically perform binding on the child elements, as we take care of that ourselves:

        return { controlsDescendantBindings: true };
    }
};

You can see this in action in my Nimbah project (source code here), where I use it to display the input or output data for the selected node, allowing it to scale to thousands of rows without slowing down the UI.

Follow

Get every new post delivered to your Inbox.