Archive

Posts Tagged ‘javascript’

doop: Immutable classes for TypeScript

March 7, 2016 1 comment

Update 2016-03-07 In version 0.9.2 I’ve updated the name of the decorator function to begin with lowercase, to fit with the convention (of course you can rename it to whatever you prefer when you import).

As great as Immutable.js is, especially with a TypeScript declaration included in the package, the Record class leaves me a little disappointed.

In an ordinary class with public properties we’re used to being able to say:

const a = new Animal();
a.hasTail = true;
a.legs = 2;

const tailPrefix = a.hasTail ? "a" : "no";
const desciption = `Has ${a.legs} legs and ${tail} tail.`

That is, each property is a single named feature that can be used to set and get its value. But immutability makes things a little more complicated, because rather than changing a property of an object, we instead create a whole new object that has the same values on all its properties except for the one we want to change. It’s just a convenient version of “clone and update”. This is how it has to be with immutable data. You can’t change an object, but you can easily make a new object that is modified to your requirements.

Why is this hard to achieve in a statically typed way? This thread gives a nice quick background. In a nutshell, you use TypeScript because you want to statically declare the structure of your data. Immutable.js provides a class called Record that lets you define class-like data types, but at runtime rather than compile time. You can overlay TypeScript interface declarations onto them at compile time, but it’s a bit messy. Inheritance is troublesome, and there’s a stubborn set method that takes the property name as a string, so there’s nothing stopping you at compile-time from specifying the wrong property name or the wrong type of value.

The most complex suggestion in that thread is to use code generation to automatically generate a complete statically typed immutable class, from a simpler declaration in a TS-like syntax. This is certainly an option, but seems like a defeat for something so fundamental to programming as declaring the data structures we’re going to use in memory.

Really this kind of class declaration should be second nature. If we’re going to adopt immutable data as an approach, we’re going to be flinging these things around like there’s no tomorrow.

So I wanted to see if something simpler could be done using the built-in metaprogramming capabilities in TypeScript, namely decorators. And it can! And it’s not as ugly as it might be! And there’s a nice hack hiding under it!

How it looks

This is how to declare an immutable class with some properties and one method that reads the properties.

import { doop } from "../doop";

@doop
class Animal {

    @doop
    get hasTail() { return doop<boolean, this>() }

    @doop
    get legs() { return doop<number, this>(); }

    @doop
    get food() { return doop<string, this>(); }

    constructor() {
        this.hasTail(true).legs(2);
    }

    describe() {
        const tail = this.hasTail() ? "a" : "no";
        return `Has ${this.legs()} legs, ${tail} tail and likes to eat ${this.food()}.`;
    }
}

The library doop exposes a single named feature, doop, and you can see it being used in three ways in the above snippet:

  • As a class decorator, right above the Animal class: this allows it to “finish off” the class definition when the code is loaded into the JS engine.
  • As a property decorator, above each property: this inserts a function that implements both get and set functionality.
  • As a helper function, called inside each property getter

Although not visible in that snippet, there’s also a generic interface, Doop, returned by the helper function, and hence supported by each property:

interface Doop<V, O> {
    (): V;
    (newValue: V): O;
}

That’s a function object with two overloads. So to get the value of a property (as you can see happening in the describe method) you call it like a function with no arguments:

if (a.hasTail()) { ...

It’s a little annoying that you can’t just say:

if (a.hasTail) { ...

But that would rule out being able to “set” (make a modified clone) through the same named feature on the object. If the type of hasTail were boolean, we’d be stuck.

There’s a particular pattern you follow to create a property in an immutable class. You have to define it as a getter function (using the get prefix), and return the result of calling doop as a helper function, which is where you get to specify the type of the property. Note: you only need to define a getter; doop provides getting and pseudo-setting (i.e. cloning) via the same property, with static type checking.

See how the constructor is able to call its properties to supply them with initial values. This looks a lot like mutation, doesn’t it? Well, it is. But it’s okay because we’re in the constructor. doop won’t let this happen on properties of a class that has finished being constructed and therefore is in danger of being seen to change (NB. you can leak a reference to your unfinished object out of your constructor by passing this as an argument to some outside function… so don’t do that).

And in the describe method (which is just here as an example, not part of any mandatory pattern) you can see how we retrieve the values by calling properties as if they were methods, this time passing no parameters.

But what’s not demonstrated in this example is “setting” a value in an already-constructed object. It looks like this:

const a = new Animal();
expect(a.legs()).toEqual(2); // jasmine spec-style assertion

// create a modified clone
const b = a.legs(4);
expect(b.legs()).toEqual(4);

// original object is unaffected
expect(a.legs()).toEqual(2);

Inheritance is supported; a derived class can add more properties, and in its constructor (after calling super() it can mutate the base class’s properties. The runtime performance of a derived class should be identical to that of an equivalent class that declares all the properties itself.

One thing to be wary of is adding ordinary instance properties to a doop class. It would be difficult to effectively block this happening, and in any case there may occasionally be a good reason to do it, as long as you understand one basic limitation of it: ordinary instance properties belong to an instance. When you call a property to set its value, you are returned a new instance, and there is no magic that automatically copies or initialises any instance fields. Only the other doop properties will have the same values as in the original instance. Any plain instance fields in the new instance will have the value undefined.

For simplicity’s sake, just make sure in a doop class that all data is stored in doop properties.

Implementation

The implementation of the cloning is basically the one described here so it’s super-fast.

I mentioned there’s a hack involved, and it’s this: I needed a way to generate, from a single declaration in the library user’s code, something that can perform two completely different operations: a simple get and a pseudo-set that returns a new instance. That means I need each property to be an object with two functions. But if I do that literally, then a get would look like this:

// A bit ugly
const v = a.legs.get();
const a2 = a.legs.set(4);

I don’t like the verbosity, for starters. But there’s a worse problem caused by legs being an extra object in the middle. Think about how this works in JS. Inside the get function this would point to legs, which is just some helper object stored in a property defined on the prototype used by all instances of the Animal class. It’s not associated with an instance. It doesn’t know what instance we’re trying to get a value from. I could fix this by creating a duplicate legs object as an instance property on every Animal instance, and then giving it a back-reference to the owning Animal, but that would entirely defeat the whole point of the really fast implementation, which uses a secret array so it can be rapidly cloned, whereas copying object properties is very much slower.

Or I could make legs, as a property getter, allocate a new middle object on the fly and pass through the this reference. So every time you so much as looked at a property, you’d be allocating an object that needs to be garbage collected. Modern GCs are amazing, but still, let’s not invent work for them.

So what if instead of properties, I made the user declare a function with two overloads for getting and setting? That solves the this problem, but greatly increases the boilerplate code overhead. The user would actually have to write two declarations for the overloads (stating the “property” type twice) and a third for the implementation:

// Ugh
@doop
legs(): number;
legs(v: number): this;
legs(v?: number): any { }

The function body would be empty because the doop decorator replaces it with a working version. But it’s just a big splurge of noise so it’s not good enough. And yet it’s the best usage syntax available. Ho hum.

Lateral thinking to the rescue: in TypeScript we can declare the interface to a function with two overloads. Here it is again:

export interface Doop<V, O> {
    (): V;
    (newValue: V): O;
}

Note that O is the type of the object that owns the property, as that’s what the “setter” overload has to return a new instance of.

Using a getter in the actual doop library looks like this:

const l: number = a.legs();

There are at least two possible interpretations of a.legs():

  • legs is function that returns the number we want.
  • legs is a property backed by a getter function, that returns a function with at least one overload (): number, which when called returns the number we want.

To explain the second one more carefully: the part that says a.legs will actually call the getter function, which returns a second function, so a.legs() would actually make two calls. The returned function would need to be created on-the-fly so it has access to the relevant this, so this is very much like the GC-heavy option I described earlier.

But it’s not possible to tell which it is from the syntax. And that’s quite good. Because if we tell the TypeScript compiler that we’re declaring a getter function that returns a function, it will generate JavaScript such as a.legs(). But at runtime, we can use the simple implementation where legs is just a function. The doop decorator can make that switcheroo, and we get the best of both worlds: a one-liner declaration of a property getter, and a minimal overhead implementation.

Well, it seemed nifty to me when I realised it would work!

So this is what the doop property decorator does: the user has declared a property, and all we care about is its name. All properties are the same at runtime: just a function that can be called to either get or clone-mutate.

doop on GitHub

JavaScript for Java programmers

December 7, 2013 2 comments

I just found on my hard drive a talk I gave over two years ago. If you’re a reasonably experienced Java programmer looking for a way to really understand how JavaScript works (especially functions as object, closures, etc.) it may be of help to you:

http://www.youtube.com/watch?v=FGNKoHv7xPY

Categories: Uncategorized Tags: , ,

Rich Text Editor in the HTML Canvas – Part 1: Introducing CAROTA

November 4, 2013 9 comments

I’m developing a rich text editor from scratch in JavaScript, atop the HTML5 canvas. It’s called Carota (Latin for carrot, which sounds like “caret”, and I like carrots).

Here is the demo page, which is very self-explanatory, in that it presents a bunch of information about the editor, inside the editor itself, so you can fiddle with it and instantly see how it persists the text in JSON. As you can see, it’s quite far along. In fact I suspect it is already good enough for every way I currently make use of rich text in browser applications. If your browser is old, it will not work. (Hint: IE8 is way old.)

So… Why? What a crazy waste of time when browsers already have the marvellous contentEditable feature, right?

A quick survey of the state-of-the-art suggests otherwise. Google Docs uses its own text layout and rendering system, only using the DOM as low-level display mechanism (the details on that link are very relevant and interesting). Go to Apple’s iCloud which now has a beta of their Pages word processor, and use your browser to look at how they do it: the text is rendered using absolute, meticulously positioned SVG elements, so they too perform their own layout.

And having tried for the last year to get contentEditable to serve my purposes, in the same way on all browsers (actually, even one browser would be something), I can understand why the Twin Behemoths of the Cloud have taken control of their own text layout. So I’m going to do the same thing, but with Canvas. (My previous plan was to do a plugin for Windows so I’d be able to use the Win32 Rich Edit control, but that kind of plugin is about to die out.)

Before I got as far as drawing any text on screen, I had to be very careful to build up a fundamental model of how flowing text actually works. I wanted to end up with beautiful components that each do something extremely simple, and plug together into a working editor. That way it’s easier to change stuff to meet future needs. I’ve designed it from the ground up to be hacked by other people to do whatever they want.

So, hopefully I’ll be back soon to start describing how it works. In the meantime, fork me on github and you can also get the development set-up via the usual:

npm install carota

For a really quick minimal demo, try this jsfiddle, which just creates an editor in an empty DIV and then uses load and save for persistence.

per: composable forward-passing processor functions

October 31, 2013 Leave a comment

The other day I tried implementing SICP streams in JavaScript as part of a fun project I’m tinkering with. I noticed that (unsurprisingly, given how they work) they generate a lot of memory garbage during the main loop of whatever operation you’re doing, and the overhead of this can actually become significant.

So I blogged them, sighed to myself, and then ripped them out of my project. What should I use instead? I want a combination of high performance and convenience. Also I miss generators, which of course I can’t (yet) assume are available on my target platforms, as they include browsers dating as far back as the ancient IE9 (ask your grandmother) and Chrome 30 without the about:flags experimental JavaScript features enabled.

My building block will be a function of this shape, which I’ll refer to as a processor:

function timesTwo(emit, value) {
    emit(value * 2); // this part is optional
}

It takes two parameters, the first being a function to which it can pass values forward, and then second being a value for it to process. So you have to call it to give it a value, and it may or may not emit values onwards to wherever you tell it.

Of course it can emit nothing, or many values:

function triplicate(emit, value) {
    emit(value);
    emit(value);
    emit(value);
}

If you’re into Ruby this will be very familiar to you as a way of representing a sequence of values, and if you implement this kind of function it’s nice and easy, like using yield in a true generator.

The difference here from the Ruby pattern is that we’re discussing a pure function, rather than a method on an object. So we take a second argument as an input that can be used to determine what we emit. For example, we could assume our values will be arrays, and “flatten” them:

function flatten(emit, value) {
    value.forEach(emit); // JS array's forEach fits perfectly
}

If you called this three times passing an array each time, emit would get all the elements of the three arrays as a flat stream of elements (in separate calls), not knowing which arrays they were originally from.

Alternatively we can ignore the second parameter (not even bothering to declare it) and so define a pure source of data that emits its sequence of values when called:

function chatter(emit) {
    emit('hi');
    emit('whatever');
    emit('bye');
}

So far, so patterny. But the pain with these kinds of building blocks is that they don’t directly compose. An intermediate processor like flatten wants two arguments: where to emit its output and what value to process. But any preceding step just wants a function called emit that accepts one argument: a value.

We can take any two processors and turn them into a single processor that chains the two (put on your higher-order functional programming spectacles now):

function compose(first, second) {
    return function(emit, value) {
        return first(function(firstValue) {
            return second(emit, firstValue);
        }, value);
    };
}

(Note: I pass back the return value through the layers because it has a use that we’ll go into later.)

See how it works? compose creates and returns a new function. It weaves first and second together. The new wrapper receives the outer emit, and that is given to second, so that’s where the final results will be sent. The first function is passed another new function to emit to, which is where we call second.

That’s the beautiful version. But it means that we have allocate a function object every time a value is passed in. And let’s be crazy C++ programmers for a moment and assume that’s Just Not Good Enough. We could rewrite compose to be butt-ugly and yet still do what we want, without doing any dynamic allocation after initial setup:

function compose(first, second) {
    var secondEmit;
    function firstEmit(firstVal) {
        return second(secondEmit, firstVal);
    }
    return function(emit, value) {
        secondEmit = emit;
        return first(firstEmit, value);            
    };
}

Yes, EEWWW indeed. We use a local variable, secondEmit, in which we stash the outer emit as soon as we know it. And we create a firstEmit function once, so we can reuse it.

In simple scenarios this will behave the same as the beautiful version. But not always:

function troubleMaker(emit, value) {
    setInterval(function() { emit(value); }, 100);
};

var doubleTrouble = compose(troubleMaker, timesTwo);

doubleTrouble(function(value) { console.log(value); }, 3);
doubleTrouble(function(value) { document.write(value); }, 4);

Now we have the value 6 being printed to the console and the value 8 being appended to the document. Except… not if we use the second version of compose, because then the second call to foo would redirect both streams to the new destination (by updating that secondEmit local variable). What a mess.

Fortunately, we’re not C++ programmers! Phew! This doesn’t mean that we don’t care about performance. It just means that we do some measuring before going crazy about imaginary overhead. And on V8 I find that the “faster” version is approximately… 1% faster. Screw that. Let’s stick with the beautiful, easy-to-predict version.

The one other interesting feature of the pattern is what we can use the return value for: to signal that the receiver doesn’t want to receive any more data. To do this they return true. So our flatten example should actually look like this:

function flatten(emit, value) {
    return value.some(emit);
}

That way, we stop looping unnecessarily when there’s no need to keep going (because that’s what a JavaScript array’s some method does: quits and returns true when the function you gave it returns true).

So that’s the pattern. What more is there to say? Well, although writing a processor (or pure value generator) is very easy, because you just write imperative code and call emit whenever you like, it’s not so convenient to use them as components. Yes, we have combine to efficiently pipeline processors together, but there are a lot of “standard” things we tend to do on sequences where it would be nice not to have to write boilerplate code. Especially when writing tests (shudder).

To make this really easy and succinct, I’ve cooked up a little library called per:

npm install per

(Source code)

It’s reminiscent of jQuery, in that it puts a lightweight wrapper around something so that we can call useful operations on it. Most operations return another instance of the wrapper, so calls can be chained. The only entry point into the whole library is a single function called per, so in node (or browserify or webmake) you’d say:

var per = require('per');

In the browser you could just load per.js via the script tag and then you get a global per. Then you can wrap functions like this:

var numbers = per(function(emit) {
    for (var n = 0; n < 100; n++) {
        emit(n);
    }
});

The simplest methods on a per are ways to capture and return the emitted values:

var f = numbers.first(),    // f == 0
    l = numbers.last(),     // l == 99
    a = numbers.all()       // array [0... 99]

The function wrapped by a per is available in a property called forEach, so named because for a simple generator you can treat it a little like an array:

numbers.forEach(function(value) {
    console.log(value);
});

To compose functions, there is a per method which is exactly like the compose function we figured out above.

var odds = numbers.per(function(emit, value) {
                           if (value % 2) {
                               emit(value);
                           }
                       });

The above pattern is an example of a filter, which evaluates an expression to decide whether or not to forward a value. This is such an obvious pattern that we should have a built-in formalization of it:

var odds = numbers.filter(function(value) {
                              return value % 2
                          });

For extra brevity we support passing a string expression in terms of x:

var odds = numbers.filter('x%2');

Another common pattern is to transform a value and pass it on, which is captured by the map method:

var evens = numbers.filter('x%2').map('x-1');

How can we use the resulting combination? As always it has a forEach property, and because numbers doesn’t need a value parameter we only need to pass it a function to emit to (or we can use first, last or all).

There are operators skip and take that work like their corresponding Linq operators:

odds.skip(3).all()          // [7, 9, 11, 13...]
odds.skip(2).take(3).all()  // [5, 7, 9]

These are examples of stateful transformers, because they have an internal counter that controls their behaviour.

We can also construct an initial per by passing an array:

var a = per([1, 2, 3, 4, 5]);

console.log(a.first()); // 1
console.log(a.last()); // 5

You can even pass per a simple value and it will act as a function that merely emits that one value. For example that value might be something complex, such as a whole document that you’re going to pass through several parsing stages.

The above examples all use a fairly simple structure: the first call to per provides an initial value (or stream of values), then there is a chain of transformations, then the final step collects the results. This is fine if you want to process a stream of data in a single quick operation.

But what if you have a source of data that occasionally delivers you a value, and you want to send it through a pipeline of transformers? That is, rather than a single deluge you have more of a drip, drip, drip of information. You can prepare a chain of transformers:

var p = per(foo1).per(foo2)
                 .map(bar1)
                 .filter(bar2)
                 .listen(function(value) {
                     console.log(value);
                 });

The listen is used to inject a function in the pipeline that just receives values, but otherwise has no effect on the stream. So in fact it’s like per but you don’t have to write emit(value) – that just happens automatically.

So now you need a way to pass values into the front, for which you can use submit:

p.submit(5);
p.submit(41);
p.submit('hi');

Sometimes you need to direct the flow of information through multiple separate sections simultaneously. This is where multicast comes in handy:

var p = per(foo1).per(foo2).multicast(
    per(foo1).filter('!(x%2)').take(72).into(results1),
    per(foo2).filter('x%2').take(68).into(results2)
);

You can pass as many arguments as you like to multicast (they can be instances of per or plain transformer functions) and they will each receive every value. By the way, multicast is built on listen, so it doesn’t affect the stream, except that when all the recipients have returned true to indicate that they are full, then multicast itself will do the same.

But really the core idea is break up a complex multi-stage task into functions like this:

function transformer(emit, value) {
    // do something interesting and maybe call emit
    // a few times, passing forward values to the next stage
}

The per library merely provides some common ways to stitch such functions together.

Categories: Uncategorized Tags: ,

SICP-style Streams in JavaScript

October 29, 2013 1 comment

In the not-famous-enough book Structure and Interpretation of Computer Programs (Abelson & Sussman, or “The Wizard book”) we learn about streams.

A stream is a tempting variation on the old school Lisp style of linked list. To get a plain old list, we can set up objects like this:

var a = {
    value: 'apple',
    next: null
};

var b = {
    value: 'banana',
    next: a
};

var c = {
    value: 'cantaloupe',
    next: b
};

So here our whole list is represented by c, and we can loop through it and print all the fruits:

for (var i = c; i != null; i = i.next) {
    console.log(i.value);
}

So far, so boring. The idea with a stream is very simple. Instead of storing the next object in the next property, we store a function that, if called, will return the next object. That is, we make it lazy. Note that our loop would still look much the same:

for (var i = c; i != null; i = i.next()) {
    console.log(i.value);
}

The only difference is we call next() instead of just reading it. And to set up the objects we’d have to say:

var a = {
    value: 'apple',
    next: function() { return null; }
};

var b = {
    value: 'banana',
    next: function() { return a; }
};

var c = {
    value: 'cantaloupe',
    next: function() { return b; }
};

So far, so pointless. But the value of this does not come from silly hand-built examples. In real software you would use this to generate streams from other data sources, or from other streams. It’s like Linq-to-objects in C#, but the foundations are actually more purely functional, because even the iteration process involves only immutable objects, and so everything is repeatable, nothing is destroyed merely by using it. Part-way through a stream you can stash the current node, and come back to it later. It will still represent “the rest of the stream”, even though you already used it once.

It is this extreme level of generality that persuaded me try using streams in a real JavaScript library. I want to write a rich text editor for HTML Canvas (more of that in a later post, hopefully). So I would have streams of characters, streams of words, streams of lines, etc. It seemed to fit, and also I have a week off work and it’s fun to re-invent the wheel.

I start with an object representing the empty stream. This is nicer than using null, because I want to provide member functions on streams. If you had to check whether a stream was null before calling methods on it, that would suck mightily.

var empty = {};

function getEmpty() {
    return empty;
}

Then we need a way to make a non-empty stream:

function create(value, next) {
    return Object.create(empty, {
        value: { value: value },
        next: { value: next || getEmpty }
    });
}

It uses the empty stream as its prototype, and adds immutable properties for value and the next function. If no next function is passed, we substitute getEmpty. So calling create('banana') would make a stream of just one item.

One very handy building block is range:

var range = function(start, limit) {
    return start >= limit ? empty : create(start, function() {
        return range(start + 1, limit);
    });
};

Note the pattern, as it is typical: the next works by calling the outer function with the arguments needed to make it do the next step. And you may be thinking – AHGGHGH! Stack overflow! But no, as long as we loop through the stream using our for-loop pattern, the stack will not get arbitrarily deep.

Here’s a favourite of mine, so often forgotten about:

var unfold = function(seed, increment, terminator) {
    return create(seed, function() {
        var next = increment(seed);
        return next === terminator ? empty :
            unfold(next, increment, terminator);
    });
};

You call it with a seed value, which becomes the first value of the stream, and also an increment function that knows how to get from one value to the next, and a terminator value that would be returned by the increment function when it has no more values. So in fact you could implement range in terms of unfold:

var range = function(start, limit) {
    return unfold(start, function(v) { return v + 1; }, limit);
};

It can also turn a traditional linked list into a stream:

var fromList = function(front) {
    return unfold(front, function(i) { return i.next; }, null);
};

Groovy! Now we have several ways to originate a stream, so lets add some methods. Recall that empty is the prototype for streams, so:

empty.forEach = function(each) {
    for (var s = this; s !== empty; s = s.next()) {
        each(s.value)
    }
};

Nothing to it! And we can use forEach to get a stream into an array:

empty.toArray = function() {
    var ar = [];
    this.forEach(function(i) { ar.push(i); });
    return ar;
};

Of course, how could we live without the awesome power of map?

empty.map = function(mapFunc) {
    var self = this;
    return self === empty ? empty : create(mapFunc(self.value), function() {
        return self.next().map(mapFunc);
    });
};

Again, that lazy-recursive pattern. And now we can very easily implement converting an array into a stream:

var fromArray = function(ar) {
    return range(0, ar.length).map(function(i) {
        return ar[i];
    });
}

How about concat? Well, this has a slight wrinkle in that if the argument is a function, I treat it as a lazy way to get the second sequence:

empty.concat = function(other) {
    function next(item) {
        return item === empty
            ? (typeof other === 'function' ? other() : other)
            : create(item.value, function() { return next(item.next()); });
    }
    return next(this);
};

And with concat we can easily implement the holy grail of methods, bind (known as SelectMany in Linq and flatMap in Scala):

empty.bind = function(bindFunc) {
    var self = this;
    return self === empty ? empty : bindFunc(self.value).concat(function() {
        return self.next().bind(bindFunc);
    });
};

Think that one through – it’s a mind-bender. The bindFunc returns a sub-stream for each item in the outer stream, and we join them all together. So:

assertEqual(

    // ordinary array of numbers
    [1, 2, 3, 4, 5, 6, 7, 8, 9],

    // making that same array in an interesting way
    Stream.fromArray(
        [[1, 2, 3], [4], [5, 6], [], [7], [], [], [8, 9]]
    ).bind(function(ar) {
        return Stream.fromArray(ar);
    }).toArray()

);

Anyway, I wrote my rich text layout engine using this stream foundation, and (as I like to do with these things) I set up an animation loop and watched it repeatedly carry out the entire word-break and line-wrap process from scratch in every frame, to see what frame rate I could get. Sadly, according to the browsers’ profilers, the runtime was spending a LOT of time creating and throwing away temporary objects, collecting garbage and all the other housekeeping tasks that I’d set for it just so I could use this cool stream concept. Interestingly, in terms of this raw crunching through objects, IE 10 was faster than Chrome 30. But I know that by using a simpler basic abstraction it would be much faster in both browsers.

How do I know? Well, because I found that I could speed up my program very easily by caching the stream of words in an ordinary array. And guess what… I could just use arrays in the first place. I am only scanning forward through streams and I definitely want to cache all my intermediate results. So I may as well just build arrays. (Even though I haven’t started the rewrite yet, I know it will be way faster because of what the profilers told me).

So, for now, we say: fair well streams, we hardly knew ye.

How to write a “class” in JavaScript

February 16, 2013 2 comments

Of all the problems with JavaScript, by far the worst is the way people try to [ab]use it based on their training in Java and Object Orientation. They want to write a class, and then they want to write another class that inherits from the first class, because That How You Write A Software’sTM.

And this is made worse by a few features in the language that encourage people to make these mistakes: new, prototype, this.

The first hurdle to get over is inheritance. Consult any current text on OO and it will tell you that interface inheritance is fine, but implementation inheritance is a heap of trouble, so try to avoid it. In JavaScript, interface inheritance is completely unnecessary because there is no static typing at all. So that leaves implementation inheritance, which is what people usually mean by “inheritance” when they’ve been to Hollywood Upstairs Java College, and this is the part that even Java experts try to warn you away from.

So to get this absolutely clear: when you try to do inheritance in JavaScript, you’re trying to replicate something from Java that experts tell you not to do in Java. If you already know that, and you’re still insistent on trying, I can’t help you. You’re obviously insane.

For everyone else, this is the simplest way to write something that serves the approximate purpose of a “class” in JavaScript. It’s a factory function: when you call it, it manufactures an object:

var person = function(firstName, lastName) {
    return {
        getFullName: function() {
            return firstName + ' ' + lastName;
        },
        setFirstName: function(newName) {
            firstName = newName;
        },
        setLastName: function(newName) {
            lastName = newName;
        }
    };
};

// Usage:
var henryJones = person('Henry', 'Jones');

console.log(henryJones.getFullName());

henryJones.setFirstName('Indiana');

Note how there was no need to copy the constructor parameters into some separate fields. They already behave like private fields (actually more private than in the JVM or CLR, because in those runtimes you can use reflection to get to private fields).

There is a difference between the above pattern versus using prototype on a constructor function that has to be called with new – I mean aside from the convenience of not have to use new and this everywhere). The difference is that for each instance of a person, we create four objects: the one we return and the three functions stored in it.

If instead we did it the messy way:

function Person(firstName, lastName) {
    this.firstName = firstName;
    this.lastName = lastName;
};

Person.prototype.getFullName = function() {
    return this.firstName + ' ' + this.lastName;
};

Person.prototype.setFirstName = function(newName) {
    return this.firstName = newName;
}

Person.prototype.setLastName = function(newName) {
    return this.lastName = newName;
};

// Usage:
var henryJones = new Person('Henry', 'Jones');

Now the three function objects are shared between all instances, so only one object gets created per person instance. And to the typical, irrational programmer who places premature optimisation above all other concerns, this is vastly preferable. Never mind that the fields firstName and lastName are now public! That is a minor concern compared to the terrible thought of creating three extra function objects per instance.

But hang on one second: have you considered the following factors?

  • How many person instances are going to exist at any time while your code is running? If it’s small, then the cost per instance is irrelevant.
  • What other things are you going to do, per instance? If you’re going to build a load of DOM nodes with animations on them and fire off AJAX calls every second (come on, admit, that’s exactly what you’re going to do), then the cost of a few function objects per instance is lost in the noise.
  • Have you realised just how amazingly good modern JS runtimes are? They love allocating and throwing away lots of little objects.

If you do ever run into a situation where the difference is significant, you can apply the necessary optimisation in one place: inside your person factory function, without changing any other code:

var person = (function() {
    var vtable = {
        getFullName: function() {
            return this._firstName + ' ' + this._lastName;
        },
        setFirstName: function(newName) {    
            this._firstName = newName;
        },
        setLastName: function(newName) {
            this._lastName = newName;
        }
    };
    return function(firstName, lastName) {
        var p = Object.create(vtable);
        p._firstName = firstName;
        p._lastName = lastName;
        return p;
    };
})();

But in the vast majority of situations, there will never be any need to do that (i.e. when you do it, the difference will be practically unmeasurable), so why bother until you have a realistic model of your application with which to carry out your performance testing? (And you’ve also made sure that you have nothing better to do with your time…)

To find out precisely how much difference this makes, and thus when it might be worth worrying about it, all we have to do is try allocating some objects in the Chrome console. Let’s pick a ridiculously big number: a million.

var start = new Date().getTime();
var p = [];
for (var n = 0; n < 1000000; n++) {
    p.push(person('Henry', 'Jones' + n));
}
console.log(new Date().getTime() - start);

We can then run this test three times each with the plain and optimised implementations and find out what the overhead actually amounts to.

The first point of interest is that the allocation of a million objects takes about two and half seconds in a freshly-opened Chrome instance on my run-of-the-mill Dell notebook, with either implementation. Taking the average of three runs with each, the “fast” version took 2390 ms and the “slow” took 2570 ms. That’s 180 ms difference over a million allocations! We’re talking about mere nanoseconds per extra function object being allocated. Just forget about speed being an issue. It’s not.

What about memory? The relevant Chrome process showed an increase of 81 MB for the “fast” version, versus 204 MB for the “slow”, so a total overhead of 123 MB caused by those extra function objects. Again, that’s shared across a million objects, so the overhead per object is just 130 bytes or so. Is that a big deal?

If you’re writing a game and trying to model specks of dust floating around the screen, it might be realistic to allocate a million of something. But in real business apps, we don’t create a million person objects all at once. We create objects based on downloaded data that we retrieve in small pages from the server (otherwise if a thousand users try to open the app at the same time, the server will have transmit gigabytes of stuff that none of those users have enough lifetime left to ever read).

So a hundred objects is more realistic. Then the total overhead in this example would be 13 KB – the space consumed by a small image, the kind used in most real web apps without a moment’s thought.

Another comparison is to look at the overhead of DOM nodes. The objects we create in JavaScript are typically there to create and manage our user interface elements in the browser DOM. How much does a single empty DIV cost?

var p = []; 
for (var n = 0; n < 1000000; n++) {
    p.push(document.createElement('div')); 
}

That added 128 MB to a fresh Chrome process. What a coincidence! 134 bytes per empty DIV, a fraction more overhead than our three function objects.

So the conclusion must be: relax, stop worrying about illusory “performance” concerns, and focus instead on writing code that is readable, maintainable, hard to get wrong, hard to break accidentally when modifying it, and so on.

Categories: Uncategorized Tags: , ,

A pure library approach to async/await in standard JavaScript

December 28, 2012 5 comments

I’m very keen on JavaScript gaining actual language support for the one thing it is mostly used for: asynchronous programming, so I never shut up about it.

The C# approach was trailed very nicely here back in Oct 2010. Over the years I’ve occasionally searched for usable projects that might implement this kind of thing in JavaScript, but they all seem to be abandoned, so I stopped looking.

And then I had an idea for how to do it purely with an ordinary JS library – a crazy idea, to be sure, but an idea nonetheless. So I searched to see if anyone else had come up with it (let me rephrase that: like all simple ideas in software, it’s a certainty that other people came up with it decades ago, probably in LISP, but I searched anyway).

I haven’t found anything yet, but I did find Bruno Jouhier’s streamline.js, which looks like a very nice (and non-abandoned!) implementation of the precompiler approach.

So what was my crazy idea? Well, as Streamline’s creator said in a reply to a comment on his blog:

But, no matter how clever it is, a pure JS library will never be able to solve the “topological” issue that I tried to describe in my last post… You need extra power to solve this problem: either a fiber or coroutine library with a yield call, a CPS transform like streamline, or direct support from the language (a yield operator).

Well, that sounds like a challenge!

If we really really want to, we can in fact solve this problem with a pure library running in ordinary JavaScript, with no generators or fibers and no precompiler. Whether this approach is practical is another matter, partly because it chews the CPU like a hungry wolf, but also because the main selling point for this kind of thing is that it makes life easier for beginners, and unfortunately to use my approach you have to understand the concept of a pure function and pay close attention to when you’re stepping outside of purity.

To set the stage, Bruno Jouhier’s example is a node.js routine that recurses through file system directories. Here’s a simplified version using the sync APIs:

var fs = require('fs');
var path = require('path');

var recurseDir = function(dir) {
    fs.readdirSync(dir).forEach(function(child) {
        if (child[0] != '.') {
            var childPath = path.join(dir, child);
            if (fs.statSync(childPath).isDirectory()) {
                recurseDir(childPath);
            } else {
                console.log(childPath);
            }
        }
    });
};

recurseDir(process.argv[2]);

And – ta da! – here’s a version that uses the async APIs but appears not to:

var fs = require('fs');
var path = require('path');

var Q = require('q');
var interrupt = require('./interrupt.js');

var readdir = interrupt.bind(Q.nfbind(fs.readdir));
var stat = interrupt.bind(Q.nfbind(fs.stat));
var consoleLog = interrupt.bind(console.log);

interrupt.async(function() {

    var recurseDir = function(dir) {
        readdir(dir).forEach(function(child) {
            if (child[0] != '.') {
                var childPath = path.join(dir, child);
                if (stat(childPath).isDirectory()) {
                    recurseDir(childPath);
                } else {
                    consoleLog(childPath);
                }
            }
        });
    };

    recurseDir(process.argv[2]);
});

The core of the program, the recurseDir function, looks practically identical. The only difference is that it calls specially wrapped versions of readdir, stat and console.log, e.g.

var readdir = interrupt.bind(Q.nfbind(fs.readdir));

The inner wrapper Q.nfbind is from the lovely q module that provides us with promises with (almost) the same pattern as jQuery.Deferred. Q.nfbind wraps a node API so that instead of accepting a function(error, result) callback it returns a promise, which can reduce yuckiness by up to 68%.

But interrupt.bind is my own fiendish contribution:

exports.bind = function(impl) {
    return function() {
        var that = this;
        var args = arguments;
        return exports.await(function() {
            return impl.apply(that, args);
        });
    };
};

So it wraps a promise-returning function inside that interrupt.await thingy. To understand what that is for, we have to go back to the start of the example, where we say:

interrupt.async(function() {

The function we pass there (let’s call it our “async block”) will be executed multiple times – this is a basic fact about all interruptible coroutines. They can be paused and restarted. But standard JavaScript doesn’t provide a way to stop a function and then start up again from the same point. You can only start again from the beginning.

In order for that to work, when a function reruns all its activity a second, third, fourth… (and so on) time, repeating everything it has already done, then it has to behave exactly the same as it did the previous time(s). Which is where functional purity comes in. A pure function is one that returns the same value when provided with the same arguments. So Math.random is not pure. Nor is reading the file system (because it might change under your feet). But quite a lot of things are pure: anything that only depends on our parameters, or the local variables containing whatever we figured out so far from our parameters.

So, inside interrupt.async we can do anything pure without headaches. But whenever we want to know about the outside world, we have to be careful. The way we do that is with interrupt.await, e.g.

var stuff = interrupt.await(function() {
    return $.get('blah-de-blah');
});

The first time the async block runs, when it goes into interrupt.wait, it executes the function we pass to it (the “initializer”), which in this case starts a download and returns a promise that will be resolved when the download is ready. But then interrupt.wait throws an exception, which cancels execution of the async block. When the promise is resolved, the async block is executed again, and this time interrupt.wait totally ignores the function passed to it, but instead returns the result of the download from the promise created on the first run, which I call an externality (because it’s data that came from outside).

The internal representation is actually quite simple. Here’s interrupt.async:

function Interrupt() {}

var currentContext = null;

exports.async = function(impl) {
    log('Creating async context');
    var thisContext = {
        ready: [],
        waiting: null,
        slot: 0,
        result: defer(),
        attempt: function() {
            log('Swapping in context for execution attempt');
            var oldContext = currentContext;
            currentContext = thisContext;
            currentContext.slot = 0;
            try {
                thisContext.result.resolve(impl());
                log('Completed successfully');
            } catch (x) {
                if (x instanceof Interrupt) {
                    log('Execution was interrupted');
                    return;
                } else {
                    log('Exception occurred: ' + JSON.stringify(x));
                    throw x;
                }
            } finally {
                log('Restoring previous context');
                currentContext = oldContext;
            }
        }
    }
    log('Making first attempt at execution');
    thisContext.attempt();
    return getPromise(thisContext.result);
};

The important part is the context, which has an array, ready, of previously captured externalities, and an integer, slot, which is the index in the ready array where the next externality will be recorded.

The more fiddly work is done in interrupt.await:

exports.await = function(init) {
    if (!currentContext) {
        throw new Error('Used interrupt.await outside of interrupt.async');
    }
    var ctx = currentContext;
    if (ctx.ready.length > ctx.slot) {
        log('Already obtained value for slot ' + ctx.slot);
        var val = ctx.ready[ctx.slot];
        if (val && val.__exception) {
            log('Throwing exception for slot ' + ctx.slot);
            throw val.__exception;
        }
        log('Returning value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
        ctx.slot++;
        return val;
    }
    if (ctx.waiting) {
        log('Still waiting for value for ' + ctx.slot + ', will interrupt');
        throw new Interrupt();
    }
    log('Executing initializer for slot ' + ctx.slot);
    var promise = init();
    if (promise && promise.then) {
        log('Obtained a promise for slot ' + ctx.slot);
        var handler = function(val) {
            if ((ctx.slot != ctx.ready.length) ||
                (ctx.waiting != promise)) {
                throw new Error('Inconsistent state in interrupt context');
            }
            log('Obtained a value ' + JSON.stringify(val) + ' for slot ' + ctx.slot);
            ctx.ready.push(val);
            ctx.waiting = null;
            log('Requesting retry of execution');
            ctx.attempt();
        };
        promise.then(handler, function(reason) {
            log('Obtained an error ' + JSON.stringify(reason) + ' for slot ' + ctx.slot);
            handler({ __exception: reason });
        });
        ctx.waiting = promise;
        throw new Interrupt();
    }
    if (ctx.slot != ctx.ready.length) {
        throw new Error('Inconsistent state in interrupt context');
    }
    // 'promise' is not a promise!
    log('Obtained a plain value ' + JSON.stringify(promise) + ' for slot ' + ctx.slot);
    ctx.ready.push(promise);
    ctx.slot++;
    return promise;
};

It can deal with an initializer that returns a plain value, and avoids the overhead of interrupting and restarting in that case, but still enforces the same behaviour of capturing the externality so it can be returned on any subsequent repeat run instead of running the initializer again.

In fact we have an example of this in the original example: console.log is not a pure function. It has side-effects: conceptually, it returns a new state-of-the-universe every time we call it. So it has to be wrapped in interrupt.await, just like any other impure operation, and we faithfully record that it returned undefined so we can return that next time we execute the same step. In this case we’re not really recording a particular external value, but we are recording the fact that we’ve already caused a particular external side-effect, so we don’t cause it multiple times.

As long as the await-wrapping rule is followed, it all works perfectly. The problem, of course, is that if there are a lot of asynchronous promises and side-effecting calls involved, then it will start to slow down as it repeatedly stops and re-executes everything it has done so far. Although in fact, it doesn’t repeat everything. A lot of the hard work involves interacting with the OS, and that is (of necessity) wrapped in interrupt.await, and so only happens once. On subsequent executions the value cached in the ready array is reused, which is looked up by position, which is quite fast. So each re-execution only involves “going through the motions” to get back to where it left off.

Even so, this extra grinding of the CPU does start to slow it down very noticeably after a healthy number of interrupts (modern JavaScript is fast, but not a miracle worker). The recursion of the file system is a very good test, because it has to effectively recursively revisit all the places it already visited so far, and has to do this for every single file (due to the stat call) and twice for directories.

One way to “cheat” would be to replace the Array.prototype.forEach function with something that understood how to interact with the current async context, and could skip forward to the right place in the iteration… but I’ll save that for another rainy day.

Categories: Uncategorized Tags: , ,