Archive

Archive for the ‘Uncategorized’ Category

A way that async/await-style functionality might make it into browsers

October 9, 2012 1 comment

Many moons ago (actually it’s 35.7 moons ago) I wrote an excited post about JavaScript generators in Firefox. Sadly they are still only in Firefox, and there’s no sign of them appearing elsewhere.

But one way they could appear elsewhere is via compilers that generate plain JavaScript, and the latest player is TypeScript. Why is this a good fit? Because, with a very thin layer of helper code, generators replicate the functionality of C# 5′s async/await, and if that was a good idea for C# and Silverlight, it’s got to be a good idea for TypeScript. (The downside is that the auto-generated JavaScript would not be very readable, but I can still dream that this will happen…)

My old post is somewhat messy because I was unaware of jQuery.Deferred. If we bring that into the mix, to serve as the JS equivalent of Task, then things get really nice. To wit:

async(function() {
  
  // sleeping inside a loop
  for (var n = 0; n < 10; n++) {
    $('.counter').text('Counting: ' + n);
    yield sleep(500);
  }

  // asynchronous download
  $('.downloaded').text(yield $.get('test.txt'));

  // and some more looping, why not
  for (var n = 0; n < 10; n++) {
    $('.counter').text('Counting: ' + (10 - n));
    yield sleep(500);
  }
});

In other words, by passing a function to something called async, I can use the yield keyword in exactly the same way as C# 5′s await keyword.

The yielded values are just jquery.Deferred objects – or rather, they are objects that contain a function called done, to which a resulting value may be passed (at some later time). So the implementation of sleep is straightforward:

var sleep = function(ms) {
  var d = $.Deferred();
  setTimeout(function() { d.resolve(); }, ms);
  return d;
};

By calling resolve, we trigger any registered done functions. So who is registering? This is what async looks like:

var async = function(gen) {
  var result;
  var step = function() {
    var yielded;

    try {
      yielded = gen.send(result); // run to next yield
    } catch (x) {
      if (x instanceof StopIteration) {
        return;
      }
      throw x;
    }

    yielded.done(function(newResult) {
      result = newResult; // what to return from yield
      step();
    });
  };
  gen = gen(); // start the generator
  step();
};

So async calls the function passed to it to get a generator object. It can then repeatedly call send on that object to pass it values (which will be returned from yield inside the generator). It assumes that the objects that come back from send (which were passed to yield inside the generator) will have a done function, allowing us to register to be notified when an asynchronous operation completes.

Note that async could use some further complication, because it currently doesn’t deal with exceptions (note that the try/catch block above is merely to deal with the strange way that generators indicate when they’ve finished). But generators have full support for communicating exceptions back to the yielding code, so it should all be do-able.

And that’s all there is to it. You can see the example running here:

http://earwicker.com/yieldasync/

… but only in Firefox, of course.

Categories: Uncategorized Tags: , ,

TypeScript reactions

October 5, 2012 Leave a comment

Miscellaneous

Great implementation of modules. I use commonjs on the browser side (yes, the non-asynchronous module pattern), so the default suits me fine.

Love the => syntax of course, and the fact that it fixes the meaning of this within lambdas. No need to manually copy into a that variable.

Type system

… appears to be better than that of C#/CLR. Structural typing is clearly the right way to do it. It makes much more sense than nominal typing. The dumbest example of that in C# is the strange fact that two identically-signatured delegate types are not assignment compatible. Well, in TS, types are compared by their structures, not their names, so that problem goes away. And the same for classes and interfaces. If a class could implement an interface, then it already does. When they add generics, it’s going to be excellent.

I wonder if they’ve given any thought to the problem of evolution of interfaces. Default implementations of methods (this could be implemented by having the compiler insert some checking/substitution code at the call site).

Async, await, async, await, async, await…

The really big missing ingredient is continuations. They’ve just added this to C#. Clearly it is just as important for browser-based apps, if not more important. And consider the fact that server-side JS at the moment is roughly synonymous with node, which is crying out for continuations in the language. From a Twitter conversation with Joe Palmer, it’s clear that the TS team currently places a big emphasis on the clean appearance of the JS output. But I think they’re going to have to twist the arms of the IE teams and get source map support working, and then they can butcher the JS as much as they like without hurting the debugging experience.

The Dreaded C++ Comparison

Theses days a C++ comparison from me is normally an insult (despite speaking ASFAC++B) but in this case, definitely not. C++ beat a lot of other languages simply by inheriting C. And JS is the C of the web, so TS has a good chance of being the C++ of the web, in the sense of becoming equally popular as its parent language, while probably not actually displacing it everywhere.

And at its best, C++ was a collection of extensions to C, a pretty ugly platform to start building on. To do that tastefully was a challenge, and (despite what people say in jest) C++ succeeded in lots of ways. Playing with TS, I see a similarly tastefully chosen set of extensions.

When you consider C++0x concepts, which were going to be (and may yet one day be) structural type definitions layered over the existing duck-typing of C++ templates, the comparison becomes even stronger. TS’s type system has a lot in common with C++0x concepts.

The Competition

A common criticism so far seems to be “Why not use (my favourite language X) as a starting point?” The answer, surely, is that it may be your favourite language but it’s nowhere compared to JS, which is now so widely deployed and used that it makes most other languages look like hobby projects! This criticism is correlated strongly with the idea that JS is a toy language, disgusting, revolting, etc., i.e. irrational emotional garbage. JS has a lot of stupid things in it, yes, we all know about {} + {} and [] + [] and {} + [] and so on, but I’ve now churned out a zillion lines of JS without ever hitting such issues. No language is perfect, but is JS useful? Yes.

In fact, the main competition faced by TS is therefore JS. This is why I think TS needs to do some heavy lifting (such as continuation support) besides a mere layer of sugar and type checking, in order to become a compelling advantage over JS itself.

And then there is CoffeeScript. Syntax just isn’t that important for most people. Semantics are much more important. It’s no good that your code looks really short and pretty if it takes you just as long to figure out what it needs to say. By the addition of static typing, TS holds out the hope of genuinely improving productivity. (With continuations it could asynchronous event-driven programming a lot easier too.)

Oh and there’s Dart. I can’t even begin to understand what Google is thinking. It’s not compatible with, nor discernibly superior in any way, to anything that already exists. It’s just their version of the same old stuff.

A Boring Discovery!

August 28, 2012 3 comments

I was writing a simple example program to explain C#5′s async/await keywords. To keep it really simple and “old school”, I decided to make it a console app, and to read lines from the console until the user typed quit. First, the synchronous version:

public static List<string> Foo()
{
    var lines = new List<string>();

    string line;
    while ((line = Console.In.ReadLine()) != "quit")
        lines.Add(line);

    return lines;
}

And then the async version:

public static async Task<List<string>> Foo()
{
    var lines = new List<string>();

    string line;
    while ((line = await Console.In.ReadLineAsync()) != "quit")
        lines.Add(line);

    return lines;
}

Really straightforward, right? I just add the async keyword, wrap the return type inside Task<T> and then I can use await Console.In.ReadLineAsync() instead of Console.In.ReadLine().

So I tried this, and gosh-dang-diddly, it didn’t behave at all as expected. In fact, both versions behaved the same. Could it be something exciting to do with how the SynchronizationContext is set up in console apps? Sorry, no. Try to think of something much duller than that.

The answer is… wait for it… ReadLineAsync() isn’t asynchronous at all. It doesn’t return its Task<string> until the whole line has been read.

Why is that? TextReader.ReadLineAsync appears to be properly asynchronous, and Console.In returns a TextReader as you’d expect… but not quite. It first passes it to TextReader.Synchronized, and guess how the resulting wrapper class implements ReadLineAsync? Thanks to .NET Reflector, we don’t have to guess:

public override Task<string> ReadLineAsync()
{
    return Task.FromResult<string>(this.ReadLine());
}

Yep, it calls ordinary ReadLine, totally synchronously, and then makes an already-finished Task<string>. In real life this is probably harmless because it’s such an edge case, but you can imagine the tedious diversion it caused in the middle of the explanation I was giving.

To get around it, I used this alternative helper, to move the ReadLine to the thread pool:

public static Task<string> ReadConsoleAsync()
{
    return Task.Run(() => Console.ReadLine());
}

Moral: I have no idea. How about “Sometimes, things don’t exactly work.” Is that a moral?

Categories: Uncategorized Tags: , ,

SEISMIC – Really Simple Mercurial Sharing

March 17, 2012 Leave a comment

SEISMIC stands for:

S – Self
E – Explanatory
I – Installer
S – Sharing
M – Mercurial
I – Intercourse
C – Cupcakes

Okay, so the last two words don’t fit with the topic. I had to put them in to pad out the acronym. Here’s how you run it:

wget http://earwicker.com/seismic.sh
sudo bash seismic.sh

Ideally you’ll be typing those two lines into a fresh install of Ubuntu 11.10. Even if your install isn’t so fresh, the script tries to only set things up if they haven’t already been set up.

After it does the first-time steps, it offers you some options:

[0] Quit
[1] Add user
[2] Add repository

Like I say, it’s self-explanatory. Maybe there’s a quicker way to get started and do the obvious maintenance tasks for sharing mercurial repositories, but I don’t know about it yet.

Once you’ve run it, you can see your new shared repositories here:

http://your-vm-hostname/hg

You’ll need to log in using one of the user accounts you’ve created. You can also clone a repository on a client machine:

hg clone http://your-vm-hostname/hg/your-repository

And you can commit and push changes back to it – again, hg push will require your Mercurial username/password.

Running Windows? Oh dear. Why not set up a VM? (Don’t have any VM hosting software? VirtualBox is free).

The config created by this script is very simple (so it has a high probability of working). But on the downside it’s not really secure, basic basic (plain text) authentication is used. But it gives you a working starting point to investigate further, e.g. as the song goes, “If you liked it then you should have put a certificate on it”.

Tips for setting up a VM:

- Download the Ubuntu server 64-bit .iso

- Set up your VM so it has a terabyte virtual disk and grows on demand, and a bridged network connection instead of NAT.

- During install you should get to specify a suitable hostname. If not (or you change your mind about it), after your first login:

sudo nano /etc/hostname

Then reboot.

In case the script download site goes wrong, here’s what it contains:


# Install apache and mercurial
apt-get install apache2 mercurial

# Create dir /var/hg/repos where all repositories will live
if [ -d /var/hg/repos ]
then
  echo "Already created /var/hg/repos"
else
  mkdir /var/hg
  mkdir /var/hg/repos
  chown -R www-data:www-data /var/hg/repos

  # Allow pushing without SSL
  echo "[web]" >> /etc/mercurial/hgrc
  echo "allow_push = *" >> /etc/mercurial/hgrc
  echo "push_ssl = false" >> /etc/mercurial/hgrc
fi

# Copy the hg .cgi script into place and make it runnable
if [ -a /var/hg/hgweb.cgi ]
then
  echo "Already created /var/hg/hgweb.cgi"
else
  cp /usr/share/doc/mercurial/examples/hgweb.cgi /var/hg/hgweb.cgi
  chmod a+x /var/hg/hgweb.cgi
  sed -i.bak "s|/path/to/repo/or/config|/var/hg/hgweb.config|" /var/hg/hgweb.cgi
fi

if [ -a /var/hg/hgweb.config ]
then
  echo "Already created /var/hg/hgweb.config"
else 
  echo "[paths]
/ = /var/hg/repos/*" > /var/hg/hgweb.config
fi

# Configure Apache
if grep /var/hg/hgweb.cgi /etc/apache2/sites-available/default
then
  echo "Already configured Apache"
else
  sed -i.bak 's|</VirtualHost>|ScriptAlias /hg \"/var/hg/hgweb.cgi\"\
  <Location /hg>\
  AuthType Basic\
  AuthName \"Mercurial repositories\"\
  AuthUserFile /var/hg/hgusers\
  Require valid-user\
  </Location>\
  </VirtualHost>|' /etc/apache2/sites-available/default
  apache2ctl restart
fi

shouldQuit=false

while [ $shouldQuit == false ]
do
  echo ""
  echo "[0] Quit"
  echo "[1] Add user"
  echo "[2] Add repository"

  read menuoption

  case $menuoption in
    0) shouldQuit=true;;

    1) echo -n "Creating new Mercurial user - give them a name:"
       read hgnewusername
       if [ -a /var/hg/hgusers ] 
       then
         htpasswd -m /var/hg/hgusers $hgnewusername
       else
         htpasswd -mc /var/hg/hgusers $hgnewusername
       fi
       ;;

    2) echo ""
       echo "Existing repositories:"
       ls /var/hg/repos
       echo ""
       echo -n "Enter name for new repository:"
       read hgrepname
       echo -n "Enter contact name:"
       read hgrepcont
       echo -n "Enter description:"
       read hgrepdesc
       cd /var/hg/repos
       mkdir $hgrepname
       cd $hgrepname
       hg init
       echo "[web]
contact = $hgrepcont
description = $hgrepdesc" > .hg/hgrc
       cd ..
       chown -R www-data:www-data .
       ;;

  esac
done
Categories: Uncategorized Tags: ,

Async/await iterator – updated for Visual Studio 11 Preview

January 29, 2012 2 comments

A long overdue install of the Visual Studio 11 Preview, and the changes to the asynchronous language features since 2010 (my, how time flies) are enough to break the code I blogged over a year ago.

The first problem is a few of the methods of an “awaiter” (what in C++ we’d call the awaiter concept) have been renamed, and there’s now a property called IsCompleted, and that’s fine and dandy.

But when I tried exercising the code I hit a more thorny problem, which is that my test program would terminate somewhat randomly when an exception was rethrown from a background thread. For a program that I thought was single threaded, that’s pretty bad!

I don’t have my install of the original CTP, so I’m not sure about this, but I think a fairly major change was made since then: there’s now a difference between an async method that returns void and an async method that returns Task (as opposed to Task<T>).

Contrary to what might be assumed, the relationship between Task and Task<T> is not the same as that between IEnumerable and IEnumerable<T>. That is, Task is not some old pre-generics version of the same idea. Instead, it was specially created to represent a task that doesn’t return any value at all; that is, something like void, but asynchronous.

I believe (though I’m not certain) that in the original CTP, a void async method would actually return a Task, so as to ensure that its lifetime could be managed externally even though it wouldn’t produce a value. But in the latest version that is not the case: the Task associated with an void async method is just not available, and the compiler generated version of the method really does return void. Which means in turn that you can’t use await on such methods.

You can still explicitly declare your async method to return Task, so nothing has been lost. And this certainly makes everything more clear and consistent to callers: methods really do return what they are declared to return, as usual. But it also changes the behaviour of exceptions.

In all case, if an exception tries to escape out of your async method, there is a catch-all handler in the compiler-generated state machine which will catch it, so it can be rethrown in an appropriate context. But the choice of context depends totally on whether the method returns void or Task. The policy is determined by AsyncVoidMethodBuilder or AsyncTaskMethodBuilder respectively. With the help of Resharper, we can see that the latter gives the caught exception to the Task, via task.TrySetException. So then the decision to rethrow (or not) is entirely up to whoever has a hold of the Task. They can check the Exception property whenever.

But in the void case, it’s totally different. The Task never gets passed the exception. What would be the point? We can’t get at the Task. The exception is unobservable; to avoid that loss of information, an arrangement is made to rethrow the exception at the next available opportunity, by creating a delegate that will rethrow it and then posting that delegate to the “context”.

The “context” is a somewhat vague concept; the architecture uses three different representations, depending on the scenario. But in the case of a simple console-based test program, the exception-rethrowing delegate is simply passed to the thread pool, and so it brings down the whole process at a random time (though reasonably soon). In a GUI program the exception would be thrown on the main GUI thread. You can supply your own context by setting a per-thread instance of SynchronizationContext, in which you can override the Post method. It doesn’t let you get at the exception, but it does give you a delegate that, if you executed it, would throw the exception, which you can then catch!

The upshot? An exception that leaves an async void is definitely a sign of a bug somewhere. Although of course this does not automatically mean you should add your own catch-all! Sometimes crashing the process is the least-worst option. There is no single correct way to deal with bugs – it’s a question of economics and so is not an exact science.

So in short, async void is a niche thing. In most situations you almost certainly want async Task with no type argument. And my example of implementing the equivalent of yield return definitely needs updating.

Firstly I stash the Task in a field. Second, after executing the continuation I check the Task.Exception property to see if anything bad happened that needs rethrowing:

if (_task.Exception != null)
{
    // Unpeel the AggregateException wrapping
    Exception inner = _task.Exception;
    while (inner is AggregateException)
        inner = inner.InnerException;

    throw inner;
}

Aside from that it works much the same way as before, though I’ve added a lot of comments and organised it a little differently to hopefully make the behaviour clearer. I’ve also had to add an implementation of the new awaiter property:

public bool IsCompleted
{
    get { return false; }
}

Well, that was easy. Returning true would be a very bad idea in this example, as we can discover with more Resharper digging. The compiler-generated state machine examines that property, and if it is true then it doesn’t bother to yield control back to the thread. So we don’t get the interleaved execution behaviour that we’re relying on.

Here’s the whole thing:

public delegate Task IteratorMethod(YieldEnumerator e);

public class YieldEnumerator : IEnumerator
{
    // Will be executed to get the next value
    private Action _continuation;

    // Will become the value of Current
    private TItem _nextValue;
    private bool _hasNextValue;

    // To be thrown inside the async method, as if by the await keyword
    private Exception _exception;

    // The task associated with our running async method
    private Task _task;

    public YieldEnumerator(IteratorMethod iteratorMethod)
    {
        _task = iteratorMethod(this);
    }

    private void Execute()
    {
        // If we already have a buffered value that hasn't been
        // retrieved, we shouldn't do anything yet. If we don't
        // and there's no continuation to run, we've finished.
        // And if _task is null, we've been disposed.
        if (_hasNextValue || _continuation == null || _task == null)
            return;

        // Be ultra-careful not to run same _continuation twice
        var t = _continuation;
        _continuation = null;
        t(); // may or may not have stored a new _continuation

        // And may also have hit a snag!
        if (_task.Exception != null)
        {
            // Unpeel the AggregateException wrapping
            Exception inner = _task.Exception;
            while (inner is AggregateException)
                inner = inner.InnerException;

            throw inner;
        }
    }

    public YieldEnumerator GetAwaiter()
    {
        return this;
    }

    // Performance optimisation added since original CTP. If we
    // returned true, the compiler-generated code would bypass the
    // OnCompleted/GetResult dance altogether, and the flow of the
    // async method would never be interrupted in the way that we
    // require.
    public bool IsCompleted
    {
        get { return false; }
    }

    // Was called BeginAwait in the original CTP
    public void OnCompleted(Action continuation)
    {
        Debug.Assert(_continuation == null);
        _continuation = continuation;
    }

    // Was called EndAwait
    public void GetResult()
    {
        // This is called by compiler-generated code caused by the
        // await keyword, so it's a chance to throw an exception to
        // be caught by the code in the async method
        if (_exception != null)
        {
            var t = _exception;
            _exception = null;
            throw t;
        }
    }

    // Our equivalent of yield return
    public YieldEnumerator YieldReturn(TItem value)
    {
        if (_hasNextValue)
        {
            // Shouldn't happen because MoveNext ought to have
            // been called and we should be inside the async
            // code at this point
            throw new InvalidOperationException();
        }

        _nextValue = value;
        _hasNextValue = true;
        return this;
    }

    public TItem Current { get; private set; }

    object System.Collections.IEnumerator.Current
    {
        get { return Current; }
    }

    public bool MoveNext()
    {
        Execute();

        if (_hasNextValue)
        {
            Current = _nextValue;
            _hasNextValue = false;
            return true;
        }

        return false;
    }

    private sealed class AbandonEnumeratorException : Exception {}

    public void Dispose()
    {
        // If async method is not yet complete, throw an exception
        // inside it to make it grind to a halt
        if (_continuation != null)
        {
            _exception = new AbandonEnumeratorException();
            try { Execute(); } catch (AbandonEnumeratorException) { }
        }

        _task.Dispose();
        _task = null;
    }

    public void Reset()
    {
        throw new NotImplementedException("Reset");
    }
}

// The usual obvious IEnumerable to go with our IEnumerator
public class YieldEnumerable : IEnumerable
{
    private readonly IteratorMethod _iteratorMethod;

    public YieldEnumerable(IteratorMethod iteratorMethod)
    {
        _iteratorMethod = iteratorMethod;
    }

    public IEnumerator GetEnumerator()
    {
        return new YieldEnumerator(_iteratorMethod);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

class Program
{
    public static async Task MyIteratorMethod1(YieldEnumerator e)
    {
        Console.WriteLine("A");
        await e.YieldReturn(1);
        Console.WriteLine("B");
        await e.YieldReturn(2);
        Console.WriteLine("C");
        await e.YieldReturn(3);
        Console.WriteLine("D");
    }

    public static async Task MyIteratorMethod2(YieldEnumerator e)
    {
        try
        {
            Console.WriteLine("A");
            await e.YieldReturn(1);
            Console.WriteLine("B");
            await e.YieldReturn(2);
            Console.WriteLine("C");
            await e.YieldReturn(3);
            Console.WriteLine("D");
        }
        finally
        {
            Console.WriteLine("Running finally");
        }
    }

    public static async Task MyIteratorMethodInfinite(YieldEnumerator e)
    {
        for (var n = 0; ; n++)
            await e.YieldReturn(n);
    }

    public static async Task MyIteratorBroken1(YieldEnumerator e)
    {
        // always happens, but compiler doesn't know that
        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(1);
    }

    public static async Task MyIteratorBroken2(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");
    }

    public static async Task MyIteratorBroken3(YieldEnumerator e)
    {
        await e.YieldReturn(1);

        if (DateTime.Now.Year < 10000)
            throw new IOException("Bad");

        await e.YieldReturn(2);
    }

    static void Main(string[] args)
    {
        foreach (var i in new YieldEnumerable(MyIteratorMethod1))
            Console.WriteLine("Yielded: " + i);

        foreach (var i in new YieldEnumerable(MyIteratorMethod2))
        {
            Console.WriteLine("Yielded: " + i);
            break; // finally should still run
        }

        foreach (var i in new YieldEnumerable(MyIteratorMethodInfinite))
        {
            if (i % 1000000 == 0) // every million times...
                Console.WriteLine("Yielded: " + i);

            if (i > 10000000)
                break;
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken1))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken2))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }

        try
        {
            foreach (var i in new YieldEnumerable(MyIteratorBroken3))
                Console.WriteLine("Yielded: " + i);
        }
        catch (IOException)
        {
            Console.WriteLine("Caught expected exception");
        }
    }
}
Categories: Uncategorized Tags: , ,

Asynchronous Memoization in JavaScript

January 16, 2012 Leave a comment

In pure functional programming there is a simple rule: if you evaluate (call) a function more than once with the exact same argument values, you’ll keep getting the same return value. It follows that there is no need to call it more than once, which is super-awesome!! because it means you can put a caching mechanism in front of the function that keeps a map (hash table, Dictionary, etc.) of all the return values produced so far, each one keyed by the bundle of arguments that produced that return value.

Of course it’s only worth doing this if the dictionary lookup is faster than simply re-executing the function itself, and if the same small set of arguments are highly likely to be passed repeatedly. Yawn!!

In a non-pure language like JavaScript many (most?) functions are not pure: they examine information from other sources besides their parameters. However, they often have contexts within which they are “pure enough”. e.g. the information the user can see on the screen is, naively speaking, a projection of the information stored in the database, but if that were really true then when the database changes, the screen would immediately change as well. But it doesn’t; instead it usually remains stale until the user presses Refresh. This corresponds to “emptying the cache”.

In a complex app, there may be several separate components that project the same information in different ways. If they all go back to the external source for that information, and it is changing in real time, you could end up with an inconsistent picture on the screen. This might even cause instability, if one component tries to talk to another and they assume they’ll both have identical snapshots of the external information.

So memoization actually has a purpose in JavaScript: it can simulate a “freeze frame” of your dependencies on external data. But we need to provide the ability to delete things from the cache at the time of our choosing, “unfreezing” the frame so we can take a new snapshot.

Another complicating factor in JavaScript is asynchrony. JavaScript programmers just have to get used to doing the following transformation into “continuation-passing style” by hand; starting with:

var add = function(a1, a2) {
  return a1 + a2;
};

We switch to:

var add = function(a1, a2, done) {
  done(a1 + a2);
};

So the caller of add can no longer say:

var sum = add(2, 2);
alert('The answer is ' + sum);

And must instead say:

add(2, 2, function(sum) {
  alert('The answer is ' + sum);
});

This allows the implementation of add to utilise other functions that need to be passed a continuation in the same manner. Yo!!

So, let’s memoize. To simplify matters we’ll start by assuming we’ll be dealing with functions that take no parameters (or always take exactly the same parameters, it’s the same thing). It means we can replace the map with a single variable. We’re displaying “prices” (whatever the hell they are) to the user, so if synchronous was a realistic option we’d start with:

var getPrices = function() { 
    /// Talk to server to get prices, p, somehow.
    return p; 
};

Sadly we have to be asynchronous, but it’s no biggie:

var getPrices = function(done) {
    /// Talk to server to get prices, p, somehow.
    done(p); 
};

Seems like memoizing something that will be easy!

var makeCache = function(getter) {

  var value = null, ready = false;

  return {

    reset: function() {
      value = null;
      ready = false;
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        getter(function(got) {
          value = got;
          ready = true;
          done(value);
        });
      }
    }
  };

};

You’d use it to “wrap” the getPrices function like this:

var pricesCache = makeCache(getPrices);

pricesCache.get(function(prices) {
  // we've got the prices!  
});

And when you want to reset the cache, just say:

pricesCache.reset();

But actually there’s a bug here: do you know what it is? Give yourself a playful slap on the thigh if you got it.

What if there’s more than one call to pricesCache.get before the first one comes back with the data? We only set the ready flag when we’ve got the answer, which might take a second. In the meantime, various parts of the UI might be grabbing the prices to make their own updates. Each such call will launch a separate (unnecessary) call to the backend. What’s worse is that the prices may actually change during this mess, and so the callers will end up with inconsistent price information, just like I was a-bellyachin’ about up yonder.

First reaction: oh, oh, I know, it’s a state machine! We thought there were two states, as indicated by the boolean ready flag. But actually there’s three:

  1. No value.
  2. Okay, I’m, getting the value, sheesh.
  3. Got the value.

But hold on to your hose, little fireman. Think this one through for a second. It’s pretty clear that when the first caller tries to get, we need to transition to the middle state and make our call to the real getter function. And when the prices come back to us, we transition to the final state and call the callback function. But what about when a second caller tries to get and we’re already in the middle state? That’s the whole reason for doing this, to be able to handle that differently. Where do we put their callback function?

So, yes, it is a state machine, but not a three-state one. We need to keep a list of callback functions, so that when the prices come back, we can loop through those callback functions and give every single gosh darn one of them a calling they ain’t gonna forgit:

var makeCache = function(getter) {

  var value, ready, waiting = [];

  return {

    reset: function() {
      value = null;
      ready = false;
      waiting = [];
    },

    get: function(done) {
      if (ready) {
        done(value);
      } else {
        waiting.push(done);

        if (waiting.length === 1) {
          getter(function(got) {

            value = got;
            ready = true;

            waiting.forEach(function(w) { w(value); });
            waiting = null;
          });
        }
      }
    }
  };

};

Notice how I use waiting.forEach to loop through the callbacks. By definition here I’m calling some code that I don’t have control of. It might call back into pricesCache.get. That may seem intrinsically problematic, because it sounds like it could keep happening forever and cause a stack overflow. But it might be perfectly valid: there could be some separate code making the second call to get the prices, which supplies a different callback. Anyway, is it a problem for my cache implementation? No, because any calls to pricesCache.get during my callback loop will find that ready is already set, and so will not affect the waiting array. And even if pricesCache.reset were called, that would cause a fresh array to be created and stored in waiting.

And finally, nice piece of trivia for ya: even if there was some way for waiting to grow while we are still looping through it, according to the specification of Array.forEach the new item(s) won’t be included in the iteration.

Linq to C++ (or something much better)

October 31, 2011 8 comments

A really long time ago (26th Jan 2009 to be exact), I got excited about lambdas in C++0x, and the idea of bringing a Linq-like lazily-evaluated sequence filtering system to C++. I wrote a blog post about this idea partly as a pedagogical exercise in explaining the parallels/divergences in how an iteratable sequence is represented in C++ and C#.

Whenever I’m asked if this would make sense as a real library, I’ve replied that in the real world it would make much more sense to build on the stuff in Boost, especially the range library.

And guess what? At that very moment someone was already working on it (only with a far more elegant, extensible interface), and the following month a set of extensions to range were put up for review, and were accepted… then a long time passed (they’re very careful) until summer of 2010, when finally the new features were released in Boost 1.43.

I only discovered this just now. People still seem to be finding that old blog post, so it only seems proper to get up-to-date. My first stab looked like this:

#include <boost/range/algorithm/for_each.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include <boost/range/adaptor/transformed.hpp>

#include <boost/lexical_cast.hpp>

#include <vector>
#include <iostream>
#include <string>

int main()
{
	using namespace boost::adaptors;

	std::vector<int> numbers;
	numbers.push_back(0);
	numbers.push_back(-1);
	numbers.push_back(4);
	numbers.push_back(-3);
	numbers.push_back(5);
	numbers.push_back(8);
	numbers.push_back(-2);
	
	auto quoted_positives = numbers 
		| filtered([] (int n) { return n >= 0; })
		| transformed([] (int x) { 
                    return "\"" + boost::lexical_cast<std::string>(x) + "\""; 
                  });

	boost::for_each(quoted_positives, [] (std::string n) { 
            std::cout << n << std::endl; 
        });
	return 0;
}

As you can see, the key is the use of operator| to make a piping syntax, so that we can naturally read from left to right (you may have seen the same approach used in F#). So I start with a vector of plain integers called numbers, I then pipe that through the filtered adaptor so as to get just the positives (corresponding to Where in Linq), and finally through the transformed adaptor to turn each positive into a quoted string (corresponding to map in most functional frameworks and to Select in Linq).

Except… it doesn’t work.

The problem is that boost is not yet entirely C++11 lambda-friendly. It’s unfortunate that the boost documentation (and other online information) is peppered with the word lambda signifying something else, so I may just be missing something. But the gist of the problem appears to be that transformed wants a functor with a member type called result_type. So you are supposed to write something like this:

struct quote
{
    typedef std::string result_type;

    std::string operator()(int x) const 
    {
        return "\"" + boost::lexical_cast<std::string>(x) + "\""; 
    }
};

C++11 lambdas don’t have that result_type inside them, so they don’t directly fit. It goes without saying that we don’t like this, because the whole point of lambdas is to avoid writing a separate class somewhere else.

It is surely possible to extend boost’s function_traits so that they can reflect on lambdas. In the past they have not been written to work on general functors because a functor can have multiple overloads of operator(), so you are required to manually pick one of those members before you ask questions. But this position is no longer sensible (if indeed it ever was). The majority of functors will now be lambdas, with a single overload of operator(), and it is (as we can see from this example) perfectly reasonable to want to know the return type of that single overload. A lambda is an alternative to a plain function, and whatever is useful for a function will be useful for a lambda.

But in the meantime, it is interesting to see how many hoops we have to jump through to make the above sample code work as expected.

If you read my last update on this sordid subject you’ll know that I don’t want to have to state the type of something that can be inferred, so I’m not going to be satisfied by a wrapper template that just adds a manually specified result_type member to my lambda. Ugh! It must be inferred automatically.

Fortunately this is just a matter of using a few simple steps in metaprogramming. Firstly, a template designed to match a member function:

template <class T>
struct mem_fun_traits { };

Of course in this general form it will match anything. But we’ll only bother to specialise it for one case: a pointer to a const member function that accepts one argument:

template <class R, class C, class A>
struct mem_fun_traits<R (C::*)(A) const> 
{
    typedef R result_type;
    typedef A argument_type;
};

So if you pass mem_fun_traits the type of a pointer to the operator() within a lambda, you know what it accepts and returns. But how do you get that type argument? With decltype that’s easy-peasy:

template<typename T>
struct functor_member {
    typedef decltype(&T::operator()) type;
};

Armed with these tools, we can write a traits class for our simple one-argument, single-overload functors (and hence, by an extraordinary coincidence, also for C++11 lambdas):

template <class F>
struct functor_traits
{
    typedef typename functor_member<F>::type member_t;
    typedef mem_fun_traits<member_t> traits;

    typedef typename traits::result_type result_type;
    typedef typename traits::argument_type argument_type;
};

This is the thing that should be integrated into boost’s function_traits, to make lambdas work as true replacements for functions. I’m sure someone’s working on it.

In the event that this doesn’t happen, we can shoehorn it together anyway by making a function template that adds type metadata to any single-argument functor:

template <class F>
class functor_with_traits_t
{
    F f;

public:
    functor_with_traits_t(F f_) : f(f_) {}

    typedef typename functor_traits<F>::result_type result_type;
    typedef typename functor_traits<F>::argument_type argument_type;

    result_type operator()(argument_type a) const { return f(a); }
};

template <class F>
inline functor_with_traits_t<F> functor_with_traits(F f) { return functor_with_traits_t<F>(f); }

So if you have a plain old lambda like this:

auto l = [] (int x) { 
    return "\"" + boost::lexical_cast<std::string>(x) + "\""; 
};

You can “fix” it to have type metadata like so:

auto l2 = functor_with_traits(l);

This makes it compatible with boost’s transformed! Although manually putting in that wrapper each time is pretty tedious. We can work around that by defining our own adapter (let’s call it selected in slavish imitation of Linq) that just plugs together functor_with_traits and transformed. Sounds simple, though as always the difficulty is in figuring out the types:

template <class F>
inline um... selected(F f)
{
    return boost::adaptors::transformed(functor_with_traits(f)); 
}

The um... is “whatever type we’re returning”. Wouldn’t it be nice if we could put auto there? Well, we almost can: decltype to the rescue as usual:

decltype(
    boost::adaptors::transformed(
        functor_with_traits(make_ref<F>())
    )
)

That is: give me the type that would be returned if I called transformed, passing it the result of a call to functor_with_traits, passing it an F. Of course we can’t assume that F can be default constructed, hence the use of that funny make_ref function that always comes in handy with decltype.

So our completed adaptor is:

template <class F>
decltype(
    boost::adaptors::transformed(
        functor_with_traits(make_ref<F>())
    )
)
selected(F f)
{
    return boost::adaptors::transformed(functor_with_traits(f)); 
}

We can use selected in place of transformed and everything is hunky dory. Full example:

#include <boost/range/algorithm/for_each.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include <boost/range/adaptor/transformed.hpp>

#include <boost/lexical_cast.hpp>

#include <vector>
#include <iostream>
#include <string>

template <class T>
T *make_ptr() { return (T *)0; }

template <class T>
const T &make_ref() { return *(make_ptr<T>()); }

template <class T>
struct mem_fun_traits { };

template <class R, class C, class A>
struct mem_fun_traits<R (C::*)(A) const> 
{
  typedef R result_type;
  typedef A argument_type;
};

template<typename T>
struct functor_member {
    typedef decltype(&T::operator()) type;
};

template <class F>
struct functor_traits
{
	typedef typename functor_member<F>::type member_t;
	typedef mem_fun_traits<member_t> traits;

	typedef typename traits::result_type result_type;
	typedef typename traits::argument_type argument_type;
};

template <class F>
class functor_with_traits_t
{
	F f;

public:
	functor_with_traits_t(F f_) : f(f_) {}

	typedef typename functor_traits<F>::result_type result_type;
	typedef typename functor_traits<F>::argument_type argument_type;

	result_type operator()(argument_type a) const { return f(a); }
};

template <class F>
functor_with_traits_t<F> functor_with_traits(F f) { return functor_with_traits_t<F>(f); }

template <class F>
decltype(boost::adaptors::transformed(functor_with_traits(make_ref<F>()))) selected(F f)
{
	return boost::adaptors::transformed(functor_with_traits(f)); 
}

int main()
{
	using namespace boost::adaptors;

	std::vector<int> numbers;
	numbers.push_back(0);
	numbers.push_back(-1);
	numbers.push_back(4);
	numbers.push_back(-3);
	numbers.push_back(5);
	numbers.push_back(8);
	numbers.push_back(-2);
	
	auto quoted_positives = numbers 
		| filtered([] (int n) { return n >= 0; })
		| selected([] (int x) { 
                    return "\"" + 
                      boost::lexical_cast<std::string>(x) +
                      "\"";
                  });

	boost::for_each(quoted_positives, [] (std::string n) {
            std::cout << n << std::endl; 
        });
	return 0;
}

(Tested on MSVC++ 16.00.40219.01 and GNU G++ 4.5.3.)

Categories: Uncategorized Tags: , , , ,

Secrets of the GNU Make Ninjas: Part 2 – Introducing MORK

October 9, 2011 1 comment

Suppose you had written some make scripts that took advantage of all the fun metaprogramming possibilities to make life as simple as possible for you. Imagine what that might look like.

There would be a directory representing your whole project. (Everything else we discuss will be under that directory, so I’ll use full relative paths from now on.)

It would have a subdirectory containing some general-purpose makefiles that take care of all the messy work, which you usually never need to look at. Let’s call that subdirectory morkfiles, because mork sounds like make, only it’s funnier, and so is a good name for our fantasy makefile framework.

And there would be one actual makefile under the top level directory, acting as the entry point. It would contain only this:

include morkfiles/main.makefile

It’s just there to allow you to conveniently type make at the command line and kick off a build of all targets, or say make clean to wipe everything previously built.

Executables

So, how would you go about writing, say, an executable?

You’d start with a subdirectory called, say, myexe, where you’ll build your first executable (hence the boring example name).

You’ll need a main source file, myexe/myexe.cpp:

#include <iostream>

int main()
{
	std::cout << "Hello from myexe" << std::endl;
}

Not exactly world-changing, but good enough for now. If you typed make at this point, it would be weird and wrong if it went ahead and assumed anything. No, first we have to undertake the arduous task of writing a morkfile, which would be saved as myexe/morkfile and would contain this:

$(this)_type = exe

Now, there’s a lot of stuff to explain there, so I’ll go slowly… wait a second, no there isn’t. It’s literally one line long. It means, unsurprisingly, “This thing is going to be an executable”.

If you run make now, a mini-miracle occurs. Somehow mork figures out what to do. And you can run your new program:

myexe/bin/myexe

or on Windows:

myexe\bin\myexe.exe

Things to note:

  • We haven’t told mork where to find the myexe directory. It automatically finds all files called morkfile. You can reorganise your source tree willy-nilly, add or remove modules, and never have to update any central configuration file.
  • Neither did we have to add myexe/myexe.cpp to a list. An executable module is simply assumed to have source files directly inside it. All files with extension .cpp are automatically compiled and linked. We could add more source files under myexe if we wanted to, re-run make and they would be added to our executable.
  • And of course, we only recompile what we need to. The second time you run make after changing nothing, it immediately reports it has nothing to do (even if you have dozens of modules, it says this very quickly, because only one instance of make is executing).

If you’ve used make before you may be wondering about the business of using the preprocessor to generate include file dependencies. Don’t worry about it; that stuff is taken care of automatically. No need to say make depends or anything like that. No stupid error messages when you delete a header. It just works.

Libraries

But real projects involve multiple targets – some are executables, some are static libraries, some are shared libraries (DLLs on Windows). These all have to depend on one another, which typically means that they have to be built in the right order, and include each other’s header files.

So let’s make a static library. Create another directory called myutil – you can put it absolutely anywhere beneath the top-level project directory (even inside myexe though that would be unnecessarily confusing, so don’t do that).

This time the source file, myutil/myutil.cpp, has no main function. Instead it exposes a function intended to be called from another module:

#include <myutil.hpp>

std::string myutil_says()
{
	return "world.";
}

We also need to write that header file. It goes in a subdirectory of the module, as myutil/include/myutil.hpp:

#ifndef MYUTIL_INCLUDED
#define MYUTIL_INCLUDED

#include <string>

std::string myutil_says();

#endif//MYUTIL_INCLUDED

That include subdirectory is where you put all headers that may need to be included by other modules – it’s the “public interface” of your module. Note that you don’t have to use the extension .hpp; the only significant thing here is the exact name and location of the include subdirectory.

Finally we have to confront that terrifying task of writing myutil/morkfile:

$(this)_type = static

Wait a second, that looks vaguely familiar! The only difference from last time is that we’re giving this module the type static instead of exe.

At this point we can run make and, no surprises, a library is built. It’s created under myutil/lib, but as we’ll see, you don’t even need to know that.

Now let’s make myexe use myutil. Back in myexe/myexe.cpp we adjust it to:

#include <iostream>

#include <myutil.hpp>

int main()
{
	std::cout << "Hello, " + myutil_says() << std::endl;
}

So myexe requires myutil. Now, mork is helpful, but not telepathic (or smart/stupid enough to try to scan your source files looking for implied dependencies between modules). You have to tell it that myexe requires myutil, by amending myexe/morkfile like so:

$(this)_type = exe
$(this)_requires = myutil

Couldn’t be simpler, could it? If your module requires another other modules, you list them in the $(this)_requires variable, separated by spaces, and mork takes care of adding myutil/include to the compiler options for myexe. It also (of course) deals with the linker options.

But it goes further – requirements are automatically transitive. That is, if module A requires module B (including some of its headers), and module B requires module C (so some of B’s headers include some of C’s headers), then it follows that A will not be able to compile successfully unless it can include headers from both B and C. But this is handled automatically: you only have to configure A to require B, and A will implicitly require C also, and so “it just works”.

Note that you only give the name, not the full path, to the required modules. Hence modules need to have unique names within the entire source tree, not just unique paths. This is a good idea anyway because when you distribute your software it’s usually convenient to put all the binaries (executables and shared libraries) in one directory, so they’ll need to have unique names.

But it has an additional nice advantage here, which is that even if you have dozens of modules that require one another in complex ways, you can still rearrange the modules under neat organisational subdirectories. It makes no difference to mork.

Dynamic Libraries

What about dynamic libraries (shared objects on Linux or DLLs on Windows)? It’s as easy as amending myutil/morkfile to:

$(this)_type = shared

That’s it. Run make again and everything is taken care of – mork even copies the built DLLs or shared objects to the bin directory of any executable that requires them, so it can be instantly executed.

Java

Oh yes, for good measure mork can build Java .jar files as well. You just have the morkfile look like this:

$(this)_type = jar

You put your Java source files in a subdirectory called src. The requires works exactly the same – it controls build order, as with native code modules, but also figures out the classpath settings.

This can be useful if you’re writing a mixed native/Java project with the emphasis on the native code. (If it was mostly Java, you’d presumably use Java build tools… by the way, I have something a little bit like mork but layered on ant, and integrating very nicely with Eclipse, so I’ll write that up sometime soon.).

Other stuff

There are more topics to consider, my favourite being code generation, which is a massively important technique, sadly underused in many projects, and many build tools don’t seem designed to support it, but mork is ready. And how do you integrate unit testing? Also very easy. But this post is long enough already.

ME WANT MORK!

I almost forgot… here’s where mork lives:

http://earwicker.com/mork

It has some examples with it that are very similar to the ones suggested here, but have more complex interdependencies. If you want to use it on Linux, it should just work, assuming you have the usual Gnu development tools.

Windows Quirks

On Windows, you need to install Cygwin with the development tools, and Visual Studio (for maximum interoperability with standard Windows libraries, mork uses Visual C++ as the compiler but relies on Cygwin for make, the bash shell and related commands – it even uses gcc, purely to get make-compatible dependencies).

So you need to have the Visual C++ tools and Cygwin accessible in your environment. The ordering of the PATH environment variable is important. Visual C++ paths need to be at the front of the PATH, so the correct (Microsoft) link.exe is used, then cygwin\bin so that the correct (Cygwin) find.exe is used, and finally the rest of your standard Windows PATH.

Next time: Part 3, in which we delve into how mork works, how to extend it, why the morkfile settings begin with $(this)_ and so on.

Categories: Uncategorized Tags: , , , ,

Secrets of the GNU Make Ninjas: Part 1

October 1, 2011 2 comments

The title of this post is a lie, of course. I am not a Ninja in GNU Make. I started learning it last week.

However, in that time I’ve cooked up a tiny framework of makefiles that I’m excited about. To put it another way, I’ve been driven partially insane by the experience – how else could someone get excited about something as old-fashioned as makefiles?

But if you’ve seen typical makefiles before, and then you see my beautiful new* alternative way of using them, I guarantee** you’ll get excited too. You’ll throw away*** your Visual Studio, Eclipse, NetBeans, Ant, Nant, Maven, and so on. They’ll seem so clunky in comparison to the elegance of my makefile framework. Oh yes.

(* Not really new)
(** Not an actual guarantee)
(*** Except for editing code and debugging and all that other stuff)

Note that throughout I’m talking about GNU make, not any of the many other incompatible makes.

The main cause of the excitement is my surprise when I actually starting R-ingTFM. Turns out that make is no less than a functional metaprogramming environment.

Like C++ templates, it uses recursively expanded definitions to generate a program. That generation process is “phase one”. Then the generated program actually runs, and has imperative/procedural elements to it: that’s “phase two”. Broadly speaking.

This is extremely powerful. Most of the things that people complain about being impossible in make are in fact perfectly possible, even easy, as long as you treat it as what it is: a functional metaprogramming environment.

Also, by the way, because it’s declarative and functional and so on, it’s amenable to automatic concurrency: make looks at what you’ve told it, and figures out which parts can run in parallel, and takes care of it (on Linux, anyway.)

It’s example time. If you want to play along, start a text editor, save the file as makefile. On Windows you’ll need to get Cygwin or MinGW, on Mac OS X you’ll need to install Xcode. Then you can type snippets in and save them, and from the same directory at the command line, run make. (Note that some of these early examples won’t do anything as we’re just defining variables.)

somevar = a b c d

I’ve created a variable called samovar, and stored a string containing four letters separated by spaces. Space separated lists are special – as well as being used wholesale as a simple string, there are handy built-in functions that treat them as lists.

This leads to Important Conclusion 1: spaces in filenames are bad. Just don’t put spaces in any of your directory or source files that you want make to deal with, and you’ll be fine. Probably seems a bit backward to you at first, but it turns out to be a fine trade-off.

To get the value of a variable:

othervar = before$(somevar)after

Now othervar contains beforea b c dafter because I referred to samovar in parentheses with a dollar sign in front, which tells make to substitute the value of the named variable.

Do those two definitions have to be in a particular order? What if I gave make the othervar definition at the top of the file and then the samevar definition after that? Can you guess?

The answer is due to the way make expands variables. My definition of othervar is stored exactly as I wrote it: the dollar sign, the parentheses, the somevar, are not expanded. Only when I actually refer to othervar in some context will make actually carry out the expansion, and by that stage it will know all the variable definitions.

Hence this is a problem:

a = $(b)
b = $(a)

Torturing make like that will just get you error messages.

You can in fact cause the expansion to happen at the point where you define the variable, by using := instead of plain =. For example, if you wanted to find all the png files in the images directory, you could say (using the built-in wildcard function):

all_pngs := $(wildcard images/*.png)

(Important Conclusion (IC)2: use forward slashes for file paths. On Windows, Cygwin will deal with it.)

The advantage of using := is that the directory gets scanned once and then all_pngs will contain a space-separated list of filenames. If you used plain = then the directory would get scanned every time you referred (directly or indirectly) to all_pngs.

And hence, IC3: variables defined with plain = are in fact functions. When you refer to them, you’re really calling them. They’re functions that don’t take any arguments, but they are still functions. They are in a simple sense closures: they close over the whole single variable namespace.

IC4: choose variable names carefully. One big namespace => danger of accidental clashes.

Actually you can pass arguments to variables when you refer to them. There’s a built-in function called call:

square_brackets = [ $(1) ]

example := hello $(call square_brackets,world)

Now example contains hello [ world ] – note how you can get the argument’s value in the definition by referring to its position (one-based).

Back to our list of pngs. I want to get a list of jpgs that would have the same base names as those pngs:

image_basenames = $(basename $(all_pngs))

Note how I don’t have to say “for each thing on the list, do something to it”. The built-in basename is list-aware. That is, it breaks up the input according to spaces, and then operates on each piece, and finally joins them back into a single string with space separators. To finish the example:

jpg_names = $(addsuffix .jpg,$(image_basenames))

Or to do both steps in one line:

jpg_names = $(addsuffix .jpg,$(basename $(all_pngs)))

There’s also a built-in foreach, which is like a general list operator, analogous to Select in Linq, or map in most other functional libraries. If you’re up on your monads, you’ll already have realised something semi-cool.

The core of a monad is the bind operator. For list monads this goes by various names depending on the language: in Linq it’s SelectMany, in Scala it’s flatMap. But it always does the same thing: for each item of the list, create a new list (somehow), and then join all the little lists together into one big list. The input is a list, the output is a list: that makes it composable.

In make the built-in list operators are all automatically monadically composable in that way, because a list is a string and a string is a list. (If the string has no spaces, it’s a list of one item, right?) So if you have a list-of-lists-of strings, then you already have a list-of strings. They’re indistinguishable. I said semi-cool because what if you don’t want the list to be flattened? But mostly you do.

IC5: This make stuff has all kinds of deep analogical relationships to interesting things!

Everything we’ve done so far is actually useless because it just defines string (or lists, same thing). With some shell incantations we can run make and print out the final value of a variable:

make -p | grep 'my_variable_name = '

But to actually cause make to do something we have to define a rule. The simplest kind of rule is:

files-to-build : required-files
	shell command
	shell command
	and so on


The shell commands are executed as normal operating system shell programs. The required-files and files-to-build are of course space separated. You can refer to variables anywhere. Note that each shell command has a tab indentation, and it must be a proper tab character, not spaces.

IC6: Make depends on a shell, so for portable makefiles, use UNIX-style shells on all systems (it’s the default on Windows with Cygwin anyway).

IC7: Use an editor that can make tab characters visible in some way.

The commands under a rule will only be executed if any of files-to-build has a timestamp that is earlier than any of required-files (or doesn’t exist at all). If any of required-files doesn’t exist, then make will complain, unless there is another rule in which that missing file appears in the files-to-build list, in which case make will run that other rule first, in the hope that it will cause the missing file to miraculously appear.

The upshot is that if you put a non-existent filename on the left, and nothing on the right, and then you make sure your shell commands don’t create the non-existent file, then you have something that will run every time you execute make:

run_every_time:
	@echo Hello, world!

When you run make, it will notice that run_every_time doesn’t exist, and go ahead with the commands in an attempt to make it. Note that it is not a terrible crime to fail to generate the target file of a rule, but most rules should be designed to do that. Most problems people get into with make involve trying to use phony rules to do imperative programming.

IC8: This is a tool for generating output files from input files, doing the minimum amount of work each time it is run. Each rule should take in some input files and write out some output files. Avoid writing rules that try to cheat the system; otherwise you’ll eventually get into a mess.

Now, what we’ve seen so far comprises the standard feature set of make as used in most makefiles. But I mentioned metaprogramming before; that’s not possible with these features along. Consider:

a = b = won't work
$(a)

That is, in variable a I store an expression that would (if it appeared on its own) assign the value won't work to b. And then on the next line I refer to that variable, hoping that this will cause b to be defined. But it’s a syntax error.

However, with another built-in, eval, we can do this:

$(eval $(a))

And that works! The variable b gets defined. So we can generate statements and then get them evaluated. But some things in make are inherently multi-line: rules especially. Can we define multi-line variables and so automate the declaring of rules? You’re darn tootin’ we can:

define my_recipe
cake: flour eggs sugar butter spices
	preheat oven
	mix sugar and butter
	stir in eggs
	stir in flour
	bake
	oh, forgot to add spices
endef

Note that my_recipe is just a variable. And by the way, did I mention that we can pass parameters to eval? I didn’t? Well, consider it duly mentioned.

This is where we can experience the true joy of metaprogramming, because we’re using the $(syntax) of variable references to generate code that must then go on to declare variable references that will only make sense later on. Some variables should be expanded before eval interprets its input. Others should go through eval unmolested so that they can be expanded later. To arrange this, we have to tell make to ignore some of our variable references by writing two dollars together: $$ is the escape sequence for a dollar. So eval ignores those variable references and “outputs” one dollar as a result, which can then be seen by make when it actually tries to evaluate things later on.

So, IC9: anything you’d declare in a makefile can be auto-generated within make from a multi-line template stored in a variable. There is never any need to repeat yourself. Say it once, say it loud: I’m metaprogramming and I’m proud.

It can get quite hair-raising when you’re trying understand snippets of metaprogramming, because two levels are intermingled. It is code talking about code. If you like that kind of thing, you’re a proper nerd like me. Hi!

Anyway, with these tools of functional metaprogramming in our back pocket, we can easily get make to solve a huge range of problems. Yes, easily. And we can make it so it runs fast. Yes, fast!

And yet so many blogs are full of people complaining about how make wouldn’t let them do this or that, or it took ten minutes to compile each file, so they came up with something else entirely. It’s a tragedy. Like the LISP programmers would say, every non-trivial C program contains an ad hoc implementation of LISP. Well, every new build-directing tool is basically the same as make, except that the author didn’t bother to RTF-make-M, so they didn’t realise they didn’t need to write their own version.

The number one cause of irritations with make is a technique called recursive make. When you have to build five static libraries, three dynamic libraries and four executables, plus some code generation, its often suggested that you should have a main makefile that visits each module’s directory and there runs another instance of make. But this turns out to be a very bad idea, as noted in this famous paper. I tried recursive make last week, and I’m not going to use it ever again. It runs counter to your nerd instincts to reject anything recursive, of course, because recursion is such an important, primal nerd totem. But this isn’t the same as abandoning recursion altogether! All the techniques described above are powerfully recursive – much more so than the specific technique known as “recursive make”, where you actually launch a separate instance. That technique sucks.

In Part 2: a quick tutorial of my fantasy make framework, where everything is easy and fast. And it’s not just a fantasy!

Categories: Uncategorized Tags:

Specifying how to specify a BNF-like grammar in a BNF-like grammar

July 25, 2011 1 comment

I love this bit of the XML specification, the part where in definitions 47-50, it states how to use * to mean “the preceding can appear zero or more times” and + to mean “the preceding can appear one or more times” and ? to mean “the preceding is optional” and | to mean “either of these may appear”, but it does this using those very same symbols, with the same meanings.

Categories: Uncategorized
Follow

Get every new post delivered to your Inbox.