How to Build Racket on Windows 7

Here are the steps I followed to build Racket from a git checkout on a fresh Windows 7 installation.

  • Installed Visual Studio Express 2008 (not 2010 or 2012). This is hard to find on Microsoft's website; this stackoverflow question linked to the Visual Studio Express 2008 ISO directly.

  • Used Virtual Clone Drive from SlySoft to mount the ISO in order to install Visual Studio, since Microsoft thoughtfully omitted ISO-mounting functionality from the core operating system.

  • Installed MASM32 to get a working assembler, since Microsoft thoughtfully omitted an assembler from their core compiler suite. (I am informed that later and/or non-Express editions of Visual Studio do actually include an assembler.)

  • Added the following directories to the system path:

    • C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin, for cl.exe etc.
    • C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE, for VCExpress.exe
    • C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\Tools, for vsvars32.bat
    • C:\masm32\bin, for ml.exe

    They do have to appear in that order, in particular with the MASM directory last, since it includes a link.exe that will otherwise conflict with the Visual Studio linker.

  • Installed Github for Windows and checked out Racket.

  • Opened a cmd.exe shell. NOTE not a PowerShell instance. Somehow environment variables are propagated differently in PowerShell from in cmd.exe! You want to only try to use cmd.exe to build Racket with.

  • Ran vsvars32.bat in that shell. This step is important, as otherwise the start of the xform stage in the build will fail to find stdio.h.

  • Navigated to my Racket checkout within that shell, and from there to src/worksp. Ran build.bat.

Following these steps will in principle lead to a fresh Racket.exe living in the root directory of your Racket checkout. There are many, many things that could go wrong however. Good luck!

Crude benchmarks of NaCl and scrypt in the browser

As I just wrote, I've ported libraries for cryptography (js-nacl) and password-based key derivation (js-scrypt) to Javascript.

Some browsers are faster at running these cryptographic routines than others. The results below are from casual (nay, unscientific!) speed measurements in the browsers I had handy on my machine.

The setup:

  • Chrome 26.0.1410.43
  • Safari 5.1.8 (6534.58.2)
  • Aurora 21.0a2 (2013-03-30)
  • Firefox 19.0.2
  • Macbook Air late 2010 (3,1), 1.6 GHz Core 2 Duo, 4 GB RAM, OS X 10.6.8

I had to exclude Firefox from the nacl tests, since it lacks window.crypto.getRandomValues.

Hashing strings/bytes with SHA-512

Here we see Safari has the edge. Aurora is oddly slow.

Hash operations (per sec)

Computing random nonces

This is a thin wrapper over window.crypto.getRandomValues. Safari wins hands-down here. I wonder how good the generated randomness is?

Random nonce generation (per sec)

Authenticated encryption using a shared key

These are Salsa20/Poly1305 authenticated encryptions using a precomputed shared key. Broadly speaking, boxing was quicker than unboxing. The browsers perform roughly equally here.

Secret-key operations (per sec)

Computing a shared key from public/private keys

These are operations whose runtime is dominated by the computation of a Curve25519 operation. In three of the four cases, the operation is used to compute a Diffie-Hellman shared key from a public key and a secret key; in the remaining case (crypto_box_keypair_from_seed) it is used to compute a public key from a secret key. Chrome is significantly faster than the other browsers here.

Shared-key computations (per sec)

scrypt Key Derivation Function

Here, Safari is the only browser that underperforms significantly. The other three all compute an scrypt-derived key in 2–4 seconds, using defaults suggested by the scrypt paper as being suitable for interactive login.

scrypt() calls per second

Conclusions

scrypt is slow. Precompute Diffie-Hellman shared keys where you can.

NaCl and scrypt in the Browser (and node.js)

I've produced Emscripten-compiled variants of both NaCl, a cryptographic library, and scrypt, a password-based key derivation function.

  • js-nacl (documentation) includes support both for the browser and for node.js.

  • js-scrypt (documentation) supports just the browser, since there are plenty of existing, faster alternatives for scrypt for node.js.

I'm looking forward to exploring some of the possible applications of combining the two libraries!

One important missing piece is certificates; for this, dusting off SPKI might prove interesting.

A calling convention for ARM that supports proper tail-calls efficiently

Because proper tail calls are necessary for object-oriented languages, we can't quite use the standard calling conventions unmodified when compiling OO languages efficiently to ARM architectures.

Here's one approach to a non-standard, efficient, tail-call-supporting calling convention that I've been exploring recently.

The big change from the standard is that we do not move the stack pointer down over outbound arguments when we make a call.

Instead, the callee moves the stack pointer as they see fit. The reason for this is so that the callee can tail-call someone else without having to do any hairy adjusting of the frame, and so that the original caller doesn't have to know anything about what's left to clean up when they receive control: all the clean-up has already been completed.

This bears stating again: just after return from a subroutine, all clean-up has already been completed.

In the official standard, the stack space used to communicate arguments to a callee is owned by the caller. In this modified convention, that space is owned by the callee as soon as control is transferred.

Other aspects of the convention are similar to the AAPCS standard:

  • keep the stack Full Descending, just like the standard.
  • ensure it is 8-byte aligned at all times, just like (a slight restriction of) the standard.
  • make outbound arguments leftmost-low in memory, that is, "pushed from right to left". This makes the convention compatible with naive C struct overlaying of memory.
  • furthermore, ensure argument 0 in memory is also 8-byte aligned.

Details of the stack layout

Consider compiling a single subroutine, either a leaf or a non-leaf routine. We need to allocate stack space to incoming arguments, to saved temporaries, to outbound arguments, and to padding so we maintain the correct stack alignment. Let

  • Ni = inward-arg-count, the number of arguments the routine expects
  • No = most-tail-args, the largest number of outbound tail-call arguments the routine produces
  • Nt = inward-temp-count, the number of temps the routine requires
  • Na = outward-arg-count, the number of arguments supplied in a particular call the routine makes to some other routine

Upon entry to the routine, where Ni=5, No=7, Nt=3, Na=3, we have the following stack layout. Recall that stacks are full-descending.

(low)                                                               (high)
    | outbound  |   |   temps   |   |shuffle|      inbound      |
    | 0 | 1 | 2 |---| 0 | 1 | 2 |---| - | - | 0 | 1 | 2 | 3 | 4 |---|
                    ^                                               ^
                  sp for non-leaf                                sp for leaf

I've marked two interesting locations in the stack: the position of the stack pointer for leaf routines, and the position of the stack pointer for non-leaf routines, which need some space of their own to store their internal state at times when they delegate to another routine. Leaf routines simply leave the stack pointer in place as they start execution; non-leaf routines adjust the stack pointer themselves as control arrives from their caller.

Note that the first four arguments are transferred in registers, but that stack slots still need to be reserved for them. Note also the padding after the outbound arguments, the temps, and the inbound/shuffle-space.

The shuffle-space is used to move values around during preparation for a tail call whenever the routine needs to supply more arguments to the tail-called routine than it received in turn from its caller.

The extra shuffle slots are only required if there's no room in the inbound slots plus padding. For example, if Ni=5 and No=6, then since we expect the inbound arguments to have one slot of padding, that slot can be used as shuffle space.

Addressing calculations

Leaf procedures do not move the stack pointer on entry. Nonleaf procedures do move the stack pointer on entry. This means we have different addressing calculations depending on whether we're a leaf or nonleaf procedure.

  • Pad8(x) = x rounded up to the nearest multiple of 8.
  • sp_delta = Pad8(No * 4) + Pad8(Nt * 4), the distance SP might move on entry and exit.

Leaf procedures, where the stack pointer does not move on entry to the routine:

inward(n) = rn, if n < 4
          | sp - Pad8(Ni * 4) + (n * 4)
temp(n) = sp - sp_delta + (n * 4)
outward(n) (tail calls only) = rn, if n < 4
                             | sp - Pad8(Na * 4) + (n * 4)

Nonleaf procedures, where the stack pointer moves down by sp_delta bytes on entry to the routine:

inward(n) = rn, if n < 4
          | sp + sp_delta - Pad8(Ni * 4) + (n * 4)
temp(n) = sp + (n * 4)
outward(n) (non-tail calls) = rn, if n < 4
                            | sp - Pad8(Na * 4) + (n * 4)
outward(n) (tail calls) = rn, if n < 4
                        | sp + sp_delta - Pad8(Na * 4) + (n * 4)

Variations

This convention doesn't easily support varargs. One option would be to sacrifice simple C struct overlaying of the inbound argument stack area, flipping arguments so they are pushed from left to right instead of from right to left. That way, the first argument is always at a known location.

Another option would be to use an argument count at the end of the argument list in the varargs case. This requires both the caller and callee to be aware that a varargs-specific convention is being used.

Of course, varargs may not even be required: instead, a vector could be passed in as a normal argument. Whether this makes sense or not depends on the language being compiled.

Successors to "Enterprise Integration Patterns"?

"Enterprise Integration Patterns", by Gregor Hohpe, has been a classic go-to volume for a lot of people working with distributed systems over the years.

It was published back in 2002, though, before things like AMQP, ZeroMQ, Websockets and Twitter.

Is there anything that could be considered an update on the book? Something that covers modern integration scenarios. Perhaps something that touches not only on the newer messaging technologies but also on NoSQL, improvements to the browser environment, and so on.

What are people reading to get a common vocabulary for all this stuff and to get their heads around how the pieces fit together?

Mac OS X gripes

I've been using a Mac as my personal computer since late 2003. It's been fine for all of that time, but recent releases of the software are starting to make me want to go back to Debian or Ubuntu, warts and all.

  • Every time I restart my machine, it forgets my trackpad settings.

  • About one time in ten I wake my machine from sleep, it shows the beachball forever and never comes back.

Combine that with the lack of a compiler shipped with the machine by default, and Debian is starting to look downright attractive again.

'bad_vertex' errors while developing and testing RabbitMQ plugins

Today I have been doing maintenance work on an old RabbitMQ plugin of mine. Part of this work was updating its Makefiles to work with the latest RabbitMQ build system.

The problem and symptom

After getting it to compile, and trying to run it, I started seeing errors like this:

Error: {'EXIT',
       {{badmatch,
        {error,
        {edge,{bad_vertex,mochiweb},rabbitmq_mochiweb,mochiweb}}},
    [{rabbit_plugins,dependencies,3,
         [{file,"src/rabbit_plugins.erl"},{line,100}]},
     {rabbit_plugins_main,format_plugins,4,
         [{file,"src/rabbit_plugins_main.erl"},{line,184}]},
     {rabbit_plugins_main,start,0,
         [{file,"src/rabbit_plugins_main.erl"},{line,70}]},
     {init,start_it,1,[]},
     {init,start_em,1,[]}]}}

Not just from rabbitmq-plugins, but also a similar error when starting the RabbitMQ server itself.

The reason turned out to be simple: I had symbolically linked the rabbitmq-mochiweb and mochiweb-wrapper directories into my plugins directory, as per the manual, but what the manual didn't say was that this works for all plugins except the -wrapper plugins (and the Erlang "client" plugin, rabbitmq-erlang-client a.k.a. amqp_client.ez).

The solution

Symlink all the plugins except the wrapper plugins and amqp_client.ez.

The wrapper plugin *.ez files and amqp_client.ez need to be present in the plugins directory itself. So instead of the instructions given, try the following steps:

$ mkdir -p rabbitmq-server/plugins
$ cd rabbitmq-server/plugins
$ ln -s ../../rabbitmq-mochiweb
$ cp rabbitmq-mochiweb/dist/mochiweb*ez .
$ cp rabbitmq-mochiweb/dist/webmachine*ez .
$ ../scripts/rabbitmq-plugins enable rabbitmq_mochiweb

A working configuration for me has the following contents of the plugins directory:

total 816
-rw-r--r--   1 tonyg  staff  260123 Sep 17 18:36 mochiweb-2.3.1-rmq0.0.0-gitd541e9a.ez
lrwxr-xr-x   1 tonyg  staff      23 Sep 17 17:59 rabbitmq-mochiweb -> ../../rabbitmq-mochiweb
-rw-r--r--   1 tonyg  staff  149142 Sep 17 18:37 webmachine-1.9.1-rmq0.0.0-git52e62bc.ez

Co-optional arguments?

Racket and many other languages have the concept of optional arguments at the receiver side:

(define (a-procedure mandatory-arg #:foo [foo 123])
  ;; Use mandatory-arg and foo as normal
  ...)

(a-procedure mandatory-value #:foo 234) ;; override foo
(a-procedure mandatory-value) ;; leave foo alone

Here, the procedure is offering its caller the option of supplying a value for foo, and if the caller does not do so, uses 123 as the value for foo. The caller is permitted not to care (or even to know!) about the #:foo option.

Almost no language that I know of has the symmetric case: an optional argument at the caller side (and because the feature doesn't exist, I'm having to invent syntax to show what I mean):

(define (run-callback callback)
  (callback mandatory-value [#:foo 234]))

(run-callback (lambda (mandatory-arg) ...)) ;; ignores the extra value
(run-callback (lambda (mandatory-arg #:foo [foo 123]) ...)) ;; uses it

The intent here is that the procedure is permitted not to care (or even to know) about the additional value its caller supplies.

This would be useful for callbacks and other situations where the code invoking a procedure has potentially-useful information that the specific (and statically-unknown) procedure being called may or may not care about.

I said above that almost no languages have this feature: it turns out that Squeak and Pharo have something similar in their BlockClosure>>#cull: methods. The following code works when given a closure accepting zero, one or two arguments:

someBlock cull: 1 cull: 2 cull: 3

(Update: @msimoni points out that Common Lisp's allow-other-keys is similar, too. It still requires the callee to declare something extra, though.)

The Software Crisis, 2012 edition

This morning's breakfast experiment involves rough estimation of the changes in cost and capacity of computing power over the last few years, and imagining where that will leave us in ten years.

In summary: hardware is already superpowered. The gap between hardware and software is already huge. It's only going to get worse. What is to be done about the parlous state of software?

SD Cards

Rough googling suggests a 512MB card cost about $60 in 2005. That's around $68 in 2012 dollars. A commodity 32GB card costs about $25 today. Assuming exponential decline in price, or equivalently exponential increase in capacity for constant cost, we see that we can expect capacity per dollar to roughly double year on year.

CPU

More rough googling suggests CPU capacity (measured in GFLOPS) is increasing at roughly a factor of 1.6 per year. GPUs are improving more quickly, approximately doubling in speed each year.

DRAM

Wikipedia informs me that in 1971, DRAM cost 5c per bit, and in 1999 it cost 20µc per bit. Again assuming exponential scaling, that gives approximately a 1.56 increase in capacity year on year.

Total

Component201220172022
Storage1~32~1024
Compute1~10.5~110
RAM1~9.2~85

In five years time, expect to be working with machines that are ten times as fast, that have ten times as much RAM, and that have thirty-two times as much secondary storage as today's machines.

In ten years time, expect machines one hundred times as fast, with one hundred times the amount of RAM, and one thousand times the amount of storage.

I'm not even sure how to measure progress in software. But my impression is that it isn't keeping up its end of the bargain. Perhaps we're seeing linear improvement, at best.

I think a big part of the problem is that our ambition hasn't increased to match our capacity. We haven't kept our expectations from software in line with the ability of our new hardware.

RabbitHub Status, May 2012

RabbitHub, an implementation of PubSubHubBub I built as part of my work on RabbitMQ, hasn't been maintained for a while now. It lags the current state-of-the-art in two respects:

  1. it will need straightforward updates to work with the current versions of RabbitMQ Server, and
  2. it implements PSHB version 0.1 rather than the current 0.3.

Fixing the former would take about two days of expert work, and I don't know how much work fixing the latter would be. Both perfectly reasonable to consider, though.

RabbitHub is unique, as far as I know, in one respect: it uses PSHB as a generic webhook-based messaging protocol, rather than the more narrow Atom distribution protocol that the designers have in mind. There are other PSHB implementations, notably the reference implementation and Superfeedr, but they all as far as I know are specialized to Atom transport. A list of hub implementations can be found at http://code.google.com/p/pubsubhubbub/wiki/Hubs.