Secrets of the GNU Make Ninjas: Part 1
The title of this post is a lie, of course. I am not a Ninja in GNU Make. I started learning it last week.
However, in that time I’ve cooked up a tiny framework of makefiles that I’m excited about. To put it another way, I’ve been driven partially insane by the experience – how else could someone get excited about something as old-fashioned as makefiles?
But if you’ve seen typical makefiles before, and then you see my beautiful new* alternative way of using them, I guarantee** you’ll get excited too. You’ll throw away*** your Visual Studio, Eclipse, NetBeans, Ant, Nant, Maven, and so on. They’ll seem so clunky in comparison to the elegance of my makefile framework. Oh yes.
(* Not really new)
(** Not an actual guarantee)
(*** Except for editing code and debugging and all that other stuff)
Note that throughout I’m talking about GNU make, not any of the many other incompatible makes.
The main cause of the excitement is my surprise when I actually starting R-ingTFM. Turns out that make is no less than a functional metaprogramming environment.
Like C++ templates, it uses recursively expanded definitions to generate a program. That generation process is “phase one”. Then the generated program actually runs, and has imperative/procedural elements to it: that’s “phase two”. Broadly speaking.
This is extremely powerful. Most of the things that people complain about being impossible in make are in fact perfectly possible, even easy, as long as you treat it as what it is: a functional metaprogramming environment.
Also, by the way, because it’s declarative and functional and so on, it’s amenable to automatic concurrency: make looks at what you’ve told it, and figures out which parts can run in parallel, and takes care of it (on Linux, anyway.)
It’s example time. If you want to play along, start a text editor, save the file as makefile. On Windows you’ll need to get Cygwin or MinGW, on Mac OS X you’ll need to install Xcode. Then you can type snippets in and save them, and from the same directory at the command line, run make. (Note that some of these early examples won’t do anything as we’re just defining variables.)
somevar = a b c d
I’ve created a variable called samovar, and stored a string containing four letters separated by spaces. Space separated lists are special – as well as being used wholesale as a simple string, there are handy built-in functions that treat them as lists.
This leads to Important Conclusion 1: spaces in filenames are bad. Just don’t put spaces in any of your directory or source files that you want make to deal with, and you’ll be fine. Probably seems a bit backward to you at first, but it turns out to be a fine trade-off.
To get the value of a variable:
othervar = before$(somevar)after
Now othervar contains
beforea b c dafter because I referred to samovar in parentheses with a dollar sign in front, which tells make to substitute the value of the named variable.
Do those two definitions have to be in a particular order? What if I gave make the othervar definition at the top of the file and then the samevar definition after that? Can you guess?
The answer is due to the way make expands variables. My definition of othervar is stored exactly as I wrote it: the dollar sign, the parentheses, the somevar, are not expanded. Only when I actually refer to othervar in some context will make actually carry out the expansion, and by that stage it will know all the variable definitions.
Hence this is a problem:
a = $(b)
b = $(a)
Torturing make like that will just get you error messages.
You can in fact cause the expansion to happen at the point where you define the variable, by using
:= instead of plain
=. For example, if you wanted to find all the png files in the images directory, you could say (using the built-in wildcard function):
all_pngs := $(wildcard images/*.png)
(Important Conclusion (IC)2: use forward slashes for file paths. On Windows, Cygwin will deal with it.)
The advantage of using := is that the directory gets scanned once and then all_pngs will contain a space-separated list of filenames. If you used plain = then the directory would get scanned every time you referred (directly or indirectly) to all_pngs.
And hence, IC3: variables defined with plain = are in fact functions. When you refer to them, you’re really calling them. They’re functions that don’t take any arguments, but they are still functions. They are in a simple sense closures: they close over the whole single variable namespace.
IC4: choose variable names carefully. One big namespace => danger of accidental clashes.
Actually you can pass arguments to variables when you refer to them. There’s a built-in function called call:
square_brackets = [ $(1) ]
example := hello $(call square_brackets,world)
Now example contains
hello [ world ] – note how you can get the argument’s value in the definition by referring to its position (one-based).
Back to our list of pngs. I want to get a list of jpgs that would have the same base names as those pngs:
image_basenames = $(basename $(all_pngs))
Note how I don’t have to say “for each thing on the list, do something to it”. The built-in
basename is list-aware. That is, it breaks up the input according to spaces, and then operates on each piece, and finally joins them back into a single string with space separators. To finish the example:
jpg_names = $(addsuffix .jpg,$(image_basenames))
Or to do both steps in one line:
jpg_names = $(addsuffix .jpg,$(basename $(all_pngs)))
There’s also a built-in
foreach, which is like a general list operator, analogous to
Select in Linq, or
map in most other functional libraries. If you’re up on your monads, you’ll already have realised something semi-cool.
The core of a monad is the
bind operator. For list monads this goes by various names depending on the language: in Linq it’s
SelectMany, in Scala it’s
flatMap. But it always does the same thing: for each item of the list, create a new list (somehow), and then join all the little lists together into one big list. The input is a list, the output is a list: that makes it composable.
In make the built-in list operators are all automatically monadically composable in that way, because a list is a string and a string is a list. (If the string has no spaces, it’s a list of one item, right?) So if you have a list-of-lists-of strings, then you already have a list-of strings. They’re indistinguishable. I said semi-cool because what if you don’t want the list to be flattened? But mostly you do.
IC5: This make stuff has all kinds of deep analogical relationships to interesting things!
Everything we’ve done so far is actually useless because it just defines string (or lists, same thing). With some shell incantations we can run make and print out the final value of a variable:
make -p | grep 'my_variable_name = '
But to actually cause make to do something we have to define a rule. The simplest kind of rule is:
files-to-build : required-files shell command shell command and so on
The shell commands are executed as normal operating system shell programs. The required-files and files-to-build are of course space separated. You can refer to variables anywhere. Note that each shell command has a tab indentation, and it must be a proper tab character, not spaces.
IC6: Make depends on a shell, so for portable makefiles, use UNIX-style shells on all systems (it’s the default on Windows with Cygwin anyway).
IC7: Use an editor that can make tab characters visible in some way.
The commands under a rule will only be executed if any of files-to-build has a timestamp that is earlier than any of required-files (or doesn’t exist at all). If any of required-files doesn’t exist, then make will complain, unless there is another rule in which that missing file appears in the files-to-build list, in which case make will run that other rule first, in the hope that it will cause the missing file to miraculously appear.
The upshot is that if you put a non-existent filename on the left, and nothing on the right, and then you make sure your shell commands don’t create the non-existent file, then you have something that will run every time you execute make:
run_every_time: @echo Hello, world!
When you run make, it will notice that run_every_time doesn’t exist, and go ahead with the commands in an attempt to make it. Note that it is not a terrible crime to fail to generate the target file of a rule, but most rules should be designed to do that. Most problems people get into with make involve trying to use phony rules to do imperative programming.
IC8: This is a tool for generating output files from input files, doing the minimum amount of work each time it is run. Each rule should take in some input files and write out some output files. Avoid writing rules that try to cheat the system; otherwise you’ll eventually get into a mess.
Now, what we’ve seen so far comprises the standard feature set of make as used in most makefiles. But I mentioned metaprogramming before; that’s not possible with these features along. Consider:
a = b = won't work
That is, in variable a I store an expression that would (if it appeared on its own) assign the value
won't work to b. And then on the next line I refer to that variable, hoping that this will cause b to be defined. But it’s a syntax error.
However, with another built-in,
eval, we can do this:
And that works! The variable b gets defined. So we can generate statements and then get them evaluated. But some things in make are inherently multi-line: rules especially. Can we define multi-line variables and so automate the declaring of rules? You’re darn tootin’ we can:
define my_recipe cake: flour eggs sugar butter spices preheat oven mix sugar and butter stir in eggs stir in flour bake oh, forgot to add spices endef
my_recipe is just a variable. And by the way, did I mention that we can pass parameters to eval? I didn’t? Well, consider it duly mentioned.
This is where we can experience the true joy of metaprogramming, because we’re using the
$(syntax) of variable references to generate code that must then go on to declare variable references that will only make sense later on. Some variables should be expanded before eval interprets its input. Others should go through eval unmolested so that they can be expanded later. To arrange this, we have to tell make to ignore some of our variable references by writing two dollars together: $$ is the escape sequence for a dollar. So eval ignores those variable references and “outputs” one dollar as a result, which can then be seen by make when it actually tries to evaluate things later on.
So, IC9: anything you’d declare in a makefile can be auto-generated within make from a multi-line template stored in a variable. There is never any need to repeat yourself. Say it once, say it loud: I’m metaprogramming and I’m proud.
It can get quite hair-raising when you’re trying understand snippets of metaprogramming, because two levels are intermingled. It is code talking about code. If you like that kind of thing, you’re a proper nerd like me. Hi!
Anyway, with these tools of functional metaprogramming in our back pocket, we can easily get make to solve a huge range of problems. Yes, easily. And we can make it so it runs fast. Yes, fast!
And yet so many blogs are full of people complaining about how make wouldn’t let them do this or that, or it took ten minutes to compile each file, so they came up with something else entirely. It’s a tragedy. Like the LISP programmers would say, every non-trivial C program contains an ad hoc implementation of LISP. Well, every new build-directing tool is basically the same as make, except that the author didn’t bother to RTF-make-M, so they didn’t realise they didn’t need to write their own version.
The number one cause of irritations with make is a technique called recursive make. When you have to build five static libraries, three dynamic libraries and four executables, plus some code generation, its often suggested that you should have a main makefile that visits each module’s directory and there runs another instance of make. But this turns out to be a very bad idea, as noted in this famous paper. I tried recursive make last week, and I’m not going to use it ever again. It runs counter to your nerd instincts to reject anything recursive, of course, because recursion is such an important, primal nerd totem. But this isn’t the same as abandoning recursion altogether! All the techniques described above are powerfully recursive – much more so than the specific technique known as “recursive make”, where you actually launch a separate instance. That technique sucks.
In Part 2: a quick tutorial of my fantasy make framework, where everything is easy and fast. And it’s not just a fantasy!