Erlang on Xen - Out of memory: the ultimate exception

Out of memory: the ultimate exception
————————————-

Among unpleasant situations your application will encounter, the out-of-memory exception is probably the worst. It tends to happen deep down the enthrails of the system and is notoriously difficult to recover from. The maximum nicety we may expect from the application trapped in an out-of-memory situation is to log the error and die. But error reporting itself may be very well impaired by inability to grab more memory. A lame solution is to have a chunk of memory preallocated specifically for the preterminal error reporting.

Every time I start a new application with manual memory management I promise myself that I meticulously watch for out-of-memory situations and react accordingly. It soon becomes obvious that the result is not worth the effort. There are simply to many places where the application may run out of memory.

It is possible to recover gracefully from a out-of-memory exception? The practial approach would be to limit the extent of the damage done by the error to a smallest context surrounding the failing code. In case of Linux, such context is an OS process, yet Linux does not seem to be inclined to crash processes when they runs out of memory. Linux rather resorts to swapping – extending computation to a horribly slow disk storage.

Erlang VM offers a workable solution to the out-of-memory exceptions. Erlang processes do not share memory with each other and thus represent a neat context to capture an out-of-memory exception. The Erlang process that failed to allocate memory goes away, but the whole system (including the virtual machine) continues unscathed. Unfortunately, the BEAM VM does not seem to take full advantage of this. The BEAM emulator is prone to crash when the memory is tight.

One of the design goals of the LING VM was the predictable recovery from out-of-memory situations. To achieve this we keep the C code footprint to the absolute minimum. We always prefer to use Erlang when adding new functions despite potential performance impact. We are reluctant to use libraries that have their own memory management. Also, out-of-memory exceptions, unlike other errors, are handled using setjmp/longjmp. The outcome is that a LING instance with as little as 32M of memory can run all test suites we use. Many tests will fail due to low memory but the VM will hold.

Now every time I hear people want to use a low level interface, such as NIF, to reimplement a high-level function in C for better performance, I ask myself: what about out-of-memory exceptions? The native implementation of the distribution layer in BEAM is an example. People tend to easily flip robustness for a marginal performance gain.

Out of memory: the ultimate exception ————————————-

Out of memory: the ultimate exception
————————————-