ppx_core: context-free rewriters for better semantics and faster compilation

At Jane Street, we have always been heavy users of pre-processors, first with camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, but it save us a lot of time by taking care of a lot of tedious boilerplate code and in some case makes the code a bit prettier.

All in all, our standard set has 19 rewriters:

  • ppx_assert
  • ppx_bench
  • ppx_bin_prot
  • ppx_compare
  • ppx_custom_printf
  • ppx_enumerate
  • ppx_expect
  • ppx_fail
  • ppx_fields_conv
  • ppx_here
  • ppx_inline_test
  • ppx_js_style*
  • ppx_let
  • ppx_pipebang
  • ppx_sexp_conv
  • ppx_sexp_message
  • ppx_sexp_value
  • ppx_typerep_conv
  • ppx_variants_conv

These rewriters fall into 3 big categories:

  1. type driven code generators: ppx_sexp_conv, ppx_bin_prot, ...
  2. inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench
  3. convenience: ppx_sexp_value, ppx_custom_printf, ...

The first category is the one that definitely justify the use of pre-processors, until we get something better in the language itself.

With such a high number of code transformations, there is an important question of how they compose with each other. For instance what happens if the output of a ppx generates some code that is rewritten by another ppx?

Since the switch from camlp4 to ppx a year ago really, category 1 transformers were handled all at once as a whole-AST mapping pass by ppx_type_conv while all the other one were implemented as separate passes. With the previous list that means 13 passes, given that 7 of them are ppx_type_conv plugins. This means that the output depended on the order in which the various passes were applied.

Intuitively, one would think it's not a big deal, given that it is quite rare for a ppx to produce code that would be rewritten by another ppx. Still we ran into several issues over time:

  • Some ppx rewriters - such as ppx_inline_test that rewrites [%%test ...] extensions - captures a pretty-print of their payload, for debugging purposes. Depending on when ppx_inline_test is applied, the payload won't be the same, as it might have been expanded by other ppx rewriters, which is confusing for users.
  • A few ppx rewriters interpret the payload of a specific extension point as a DSL to be interpreted. This is the case of ppx_sexp_value and ppx_sexp_message. If another ppx messed with the payload before them, the result will be unspecified. We had such an issue with ppx_here: inside [%sexp ...], [%here] is interpreted by ppx_sexp_value and ppx_sexp_message and produces "<filename>:<line>:<column>", while outside it is interpreted by ppx_here and produces a record of type Lexing.position

Initially we dealt with these issues by using a specific order in the default set of rewriters, but that's more of a dirty hack than a real solution. Often developers are not aware of this and might end up using a wrong order when using a custom set of rewriters. Moreover this worked because we have control over the order with Jenga, but in opensource packages using oasis, ocamlbuild and ocamlfind we have no control over the final ordering.

But apart from the semantic problems, there is an obvious performance problem: all the transformations are local, but still we are doing 12 passes over the entire AST. What a waste of CPU time!

The different ways of composing ppx rewriters

Before jumping into the subject of this post, we recall a few of the various methods one can use to compose ppx rewriters.

Via separate process

The default method, that was adopted early by the community is to define each transformation as a separate executable. To compose them, one just has to call all the executables one by one. The main advantage of this approach is that each transformation is a black box and can do whatever dirty hacks it wants.

This is what you get when you are using a ppx by just putting the package name in your build system without doing anything special.

Via a driver

Another approach, that we developed at Jane Street is to link all the transformations into a single executable. For this to work properly all transformations must use the same framework. Technically they all register themselves with ppx_driver via a call to Ppx_driver.register_transformation. Ppx_driver is then responsible for composing them.

There are several advantages of the second approach: since ppx_driver has knowledge of all transformations, it can do extended checks such as making sure that all attributes have been interpreted. This helps detect typos, which in practice saves a lot of debugging time. But what really interest us in this post is that it can use more clever composition methods.

Code transformations using ppx_driver can still export a single executable compatible with the first method, that's why all Jane Street ppx rewriters can be used with both methods.

ppx_driver has an ocamlbuild plugin to simplify building custom drivers.

Context free transformations

Given that all transformations are local, it was clear that they should be defined as such; i.e. if all you want to do is turn [%sexp "blah"] into Sexp.Atom "blah", you don't need to visit the whole AST yourself. You just need to instruct whatever framework you are using that you want to rewrite [%sexp ...] extension points.

Context-free extension expander

We started with this idea a few month ago by adding an API in ppx_core to declare context-free extension expanders. For instance, this shows how you would declare a ppx that interpret an extension [%foo ...] inside expressions:

  1. open Ppx_core.Std
  3. let ext =
  4. Extension.declare "foo" Expression
  5. Ast_pattern.(...)
  6. (fun ~path ~loc <parsed-payload...> -> <expansion>)
  8. let () = Ppx_driver.register "foo" ~extensions:[ext]

The Ast_pattern.(...) bit describes what the extension expects as its payload.

Since ppx_driver knows about all the local extension expanders, it can expand them all in one pass over the AST. Moreover it can detect ambiguities and error out in such cases.

There was a choice to make as to whether rewrite the AST in a bottom-up or top-down manner. We choose top-down, to allow extension expanders to interpret their payload before anyone else, and so they can correctly implement a DSL.

This solved most of the initial issues and reduced the number of passes to 7:

  • all extension expanders
  • ppx_type_conv
  • ppx_custom_printf
  • ppx_expect
  • ppx_fail
  • ppx_pipebang
  • ppx_js_style

ppx_expect wasn't initially defined as a context-free extension expander for technical reasons.

Making everything context-free

Recently we went even further and added a Context_free module to Ppx_core to cover all of our transformations. It doesn't support all possible rewriting but support enough to implement a lot of common ones:

  • context-free extension expanders
  • some specific support to implement type-driven code generators
  • support for ppx rewriters interpreting a function application at pre-processing time, such as ppx_custom_printf that interprets !"<format>"

With this we reduced the number of passes to only 2:

  • context free transformations
  • ppx_js_style

ppx_js_style is still done in a separate pass for simplicity. It is run last to ensure we don't generate code that doesn't match our coding rules.

Now, whatever order developers specify their ppx in their build system, they will get the exact same output.

Seeing the exact passes

Ppx_driver got a new option to print what passes it will execute, for instance with ppx-jane which a standard driver containing all of the Jane Street ppx rewriters linked in (available in the ppx_jane package in opam):

$ ppx-jane -print-passes

$ ppx-jane -print-passes -no-check

Safety checks are implemented as additional passes, that's why we see more than one passes by default.


No performance comparison was done when introducing context free extension expanders, but we did some for the second stage, when we changed all ppx rewriters to use Context_free; processing a file with the resulting driver was twice as fast (check passes included).

But how does this compare to the more traditional method of running each rewriter in a separate process? To find out we did some benchmark by taking one of the biggest ml file in core_kernel (src/command.ml) and comparing the two methods. We put a type error on the first line to be sure we stop just after pre-processing.

For reference, following are the numbers for calling ocamlfind ocamlc on the file with no pre-processing:

$ time ocamlfind ocamlc -c command.ml
File "command.ml", line 1, characters 12-15:
Error: This expression has type char but an expression was expected of type

real 0m0.022s
user 0m0.016s
sys  0m0.006s

To preprocess the file with ppx_jane as a single driver executable, one just has to pass one -ppx option, or a -pp option given that ppx_driver can be used either as a -ppx either as a -pp:

# via -ppx
$ time ocamlfind ocamlc \
  -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.095s
user 0m0.074s
sys  0m0.020s

# via -pp
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.091s
user 0m0.066s
sys  0m0.024s

# via -pp, with checks disabled
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.070s
user 0m0.051s
sys  0m0.018s

# via -pp, without merging passes
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.229s
user 0m0.206s
sys  0m0.022s

Using the other method turned out to be quite painful, given that the various ppx cannot share command line arguments, they had to be specified more than once:

$ time ocamlfind ocamlc -package ppx_jane \
  -ppxopt "ppx_inline_test,-inline-test-lib blah -inline-test-drop" \
  -ppxopt "ppx_bench,-inline-test-lib blah -bench-drop" \
  -ppxopt "ppx_expect,-inline-test-lib blah" \
  -c command.ml 2> /dev/null

real 0m0.339s
user 0m0.233s
sys  0m0.098s

So without surprise the single pass in a single executable method is really a lot faster.


This code is available on github. The context-free extension point API is already available in opam. The newer one is only in the git repository for ppx_core and ppx_driver. You can try them out by using our development opam repository. You should have a look at this if you care about how your rewriters are composed and/or if you care about compilation speed.

  • ppx_js_style is not currently released; it is an internal ppx that we use to enforce our coding standards.

Unraveling of the tech hiring market

Recruiting talented people has always been challenging.

In some years that meant competing with a hot new company that aggressively courted every fresh graduate with promises of stock options and IPO glory.  In other years there wasn’t a specific company so much as an entire rising industry looking for people (I’m looking at you cloud services, driverless cars, and peer-to-peer sharing).  Either way, we understood the yearly back and forth.  Our job was to explain to candidates how we stacked up, and more importantly, why a career at Jane Street might be the right choice for many of them.

But this year I got to learn a new name for a new challenge.  “Unraveling”.

I first encountered it in a book I was reading for fun: "Who Gets What, and Why", by the Nobel Prize-winning economist Alvin Roth.  He does a lovely job explaining the idea of a matching market.  In a matching market each person wants only one of each item, each item is unique, and each item can be given to at most one person at a time.  Jobs are a classic matching market, and just like any market, matching markets can work well, or poorly.

Unraveling is one of the primary things that makes a matching market fail.  When a market unravels  matches start to happen earlier and earlier, to the point where people no longer get a complete view of their options.  In the book Roth relates the story of a person who stepped off of a plane to find three voicemails on his phone.  The first offered him a job, the second urged him to respond soon, and the last rescinded the offer because he hadn't responded quickly enough.

We call them exploding offers, and this year they have gotten completely out of hand as companies race to the bottom in their efforts to recruit the next wave of interns and fresh graduates.

Colleges try to impose deadline limits explicitly to stop unraveling, and in the past these have largely been honored.  The cheating and fudging, such as it was, was kept to the fringes.  But this year it seems like the seal is broken, and we've seen major companies delivering internship and full-time offers with 2 week (and less) hard deadlines.  Other companies now routinely deliver expiring bonus offers for signing early.  Many of these offers circumvent or outright break the guidelines set down by schools, and if past matching markets are a model for this one, next year will come with even earlier offers and worse conditions.

This unraveling has been the subject of a lot of discussion, both internally at Jane Street and with the various schools we recruit at, who see it - rightly - as bad for their students.  How can someone make a thoughtful decision about where they want to build a career without the time to interview at more than one or two places?  Unfortunately, most of this discussion is out of the public light, and so the unraveling continues.

We can't control the actions of others, but we also don’t have to follow the herd, so we'd like to be clear:

Jane Street is committed to making sure that you have the time and information you need to decide on an offer from us.  Our offer letters do have good-until dates as a matter of professional practice, but we try to work with every candidate to choose a date that works for them.  We are also happy to extend the date if something unexpected comes up, or, frankly, if someone just needs more time.

Choosing where to start your career is a big decision and we hope you have the time to make a good one.

Seven Implementations of Incremental

We finally got a decent recording of one of my favorite talks. This one is about our Incremental library (which I wrote about here), and in particular about the story of how we got to the present, quite performant, implementation.

It's not clear from the talk, but the work on this library wasn't done by me: The initial version was implemented by Stephen Weeks and Milan Stanojevic, and most of the intermediate versions were implemented by Nick Chapman and Valentin Gatien-Baron.

The high quality org-mode slides, though, are all me.

OCaml 4.03: Everything else

In my previous post I wrote about Flambda, which is the single biggest feature coming to OCaml in this release. In this post, I'll review the other features of 4.03 that caught my eye.

Inline records

Variants are my favorite thing about OCaml, and in this release, they're getting better. You've always been able to define variants with multiple arguments, e.g.:

  1. type shape =
  2. | Circle of float * float * float
  3. | Rect of float * float * float * float

But, as with this example, it can sometimes be a little hard to figure out what the meaning of the individual fields are, since they don't have labels. We can make this better by replacing our multi-argument variants with single argument variants containing approriately named records, as follows.

  1. type circle = { center_x: float;
  2. center_y: float;
  3. radius: float;
  4. }
  6. type rect = { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }
  12. type shape =
  13. | Circle of circle
  14. | Rect of rect

This works, but the separation of the record type from the variant definition is a little awkward. Beyond that, this approach imposes extra runtime costs. A multi-argument variant takes up just a single heap-allocated block, while a single argument variant containing a record takes two blocks.

With 4.03, we can have the best of both worlds by definining variants containing inline records. Here's what they look like.

  1. type shape =
  2. | Circle of { center_x: float;
  3. center_y: float;
  4. radius: float;
  5. }
  6. | Rect of { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }

And we can write code that uses these types as follows:

  1. let area = function
  2. | Circle c -> 3.14159 *. c.radius *. c.radius
  3. | Rect r -> (r.x_hi -. r.x_lo) *. (r.y_hi -. r.y_lo)

Note, however, that the values r and c aren't quite first-class. For example, we can't use them away from the context of the match statement in which they're found. So this function:

  1. let maybe_rect = function Circle _ -> None | Rect r -> Some r

fails with the error This form is not allowed as the type of the inlined record could escape.

Even with this complexity, inlined records are really convenient.

Another advantage of inline records is that they allow us to express variants with mutable fields. This is useful in a variety of circumstances. In Core, we fake this using Obj.magic for our mutable AVL tree implementation, and new we can remove these hacks. Similar uses of Obj.magic were removed from OCaml's own imperative queue module in this release as well.

Uchar and result

A couple of useful types have been added to the standard library: Uchar.t and result. Uchar.t represents a unicode character, and is effectively just an int under the covers.

The result type is a type used for signaling success or failure, and has the following type.

  1. type ('a,'b) result = Ok of 'a | Error of 'b

Both of these are in some sense trivial, but they're there as a coordination point between different libraries. Lots of OCaml libraries have some analogue of result, and each OCaml Unicode library has its own character type. By including these in the standard library, it provides an easy point of agreement between different external libraries, while adding almost no complexity to the core distribution itself.

Better unboxing for externals

A small but valuable change: it's now possible to write C bindings through which one can pass unboxed versions of a number of different types, including floats, Int64's, Int32's and Nativeint's. This was previously possible only for calls that took only floats and returned a float. This makes it easier to write efficient, zero-allocation C bindings in a wider variety of cases.

GC Latency improvements

I talked about most of this here, so I won't go into great detail now. But the news is that the changes that Damien Doligez did during his stay at Jane Street have finally made it upstream.


To a first approximation, you can think of a GC as a machine that determines what memory can be reclaimed by figuring out what data is reachable, and then reclaiming everything else. The simplest notion of reachability counts everything reachable by following pointers, starting at the GC roots, which are mostly just the values on the call stack.

This notion of reachability is a bit too restrictive, however. In particular, in some circumstances you want to keep pointers to objects without preventing those objects from being reclaimed. This is useful for some kinds of caching, where you want to cache previously computed values for as long as they're referenced, but no longer.

OCaml has long had an answer to this problem, which is the notion of a weak pointer. Weak pointers are references that aren't counted when computing reachability. When an object that is pointed to by a weak reference is collected, the weak reference itself is nulled out.

Weak references are good enough for some purposes (e.g. hash-consing), but they can be awkward for many use cases. One basic use-case for which weak references are an awkward fit is memoizing a function, where one wants to keep entries in the table as long as the input to the function in question is still reachable.

You could imagine just keeping the key of the hash-table in a weak pointer, and then using a finalizer to remove the entry in the table once the key gets collected. But this fails if there is a reference from the output of the function back to the key, which is common enough.

Ephemerons were proposed back in 97 by Barry Hayes to solve just this problem. The idea is pretty simple: an ephemeron has multiple keys and a single data entry. When determining whether the keys are alive, one doesn't count references from values that are reachable only via the ephemeron, so the references from the data back to the key don't keep the key alive. Also, once any of the ephemerons keys are collected, the key and the data element are removed immediately. Weak pointers are now just a special case of ephemerons --- a weak pointer is effectively just an epehmeron with no data.

Ephemerons don't come up that much, but if you need to build a memoization table, ephemerons make the task simpler and the result less leak-prone.

Licenses, Github and OCamlbuild

A few organizational changes are landing in 4.03 that I think are worth noting. First, OCaml development is officially moving from Subversion to Git, with Github as the primary coordination point for OCaml development.

Second, OCaml's license is changing from the somewhat awkward and rarely used QPL, to LGPLv2 with a linking exception. The latter had already been used for various libraries distributed with the compiler. But, acknowledging the fact that more and more of the guts of the compiler are being used as libraries for a variety of reasons, it was recently decided to move everything to LGPLv2.

And finally, ocamlbuild is being moved out of the core OCaml distribution. This is one in a series of decisions to break out software that was previously bundled together with OCaml. This allows the core team to focus more on the core compiler itself, and makes the exported projects freer to make changes on whatever schedule they see fit.

I think keeping the OCaml distribution lean is an excellent strategic decision, allowing the core team to focus on the compiler itself, and allowing tooling and libraries to be worked on independently.

Together, these are all small changes. But they're part of a trend towards making OCaml development simpler and more agile.

What's missing

There are a number of much anticipated features that haven't made it into this release. In particular, the multicore GC, which at one point had been expected to land in 4.03, has been pushed back, likely to 4.04. Algebraic effects, which are in part motivated by the multicore GC, also didn't make it.

And finally, modular implicits, which are intended to provide typeclass-like functionality for OCaml, are not in this release either. That said, they're being actively worked on, and I'm expecting there will be more news about those in the next six months.

All in, it's been a pleasure watching this release take shape. And just as important as what got in is what didn't. Despite a greatly increased rate of change, the process is quite conservative --- nonsense features just don't make it in. My gratitude goes out to the core team for their safekeeping of the language.

A better inliner for OCaml, and why it matters

OCaml 4.03 is branched and a first release candidate is imminent, so it seems like a good time to take stock of what's coming.

This post will focus on just one of those features: Flambda, a new IR (intermediate representation) in the depths of the compiler designed to allow for better inlining, paired with a set of optimizations leveraging that IR.

Why inlining matters

If your expectations about inlining come from a language like C, you might not be all that excited by Flambda. In C, after all, the benefits of inlining are relatively small, mostly allowing one to avoid function call overhead. That's useful, but has limited impact.

In a language like OCaml, inlining is about more than just function call overhead. That's because inlining grows the amount of code that the optimizer can look at a given point, and that makes other optimizations more effective. And there are simply more optimizations available in a language like OCaml than in one like C. In particular, many allocations can be optimized away, which can have a rather big effect on the performance of a program.

To get a sense of how we can eliminate allocations, consider this simple example.

  1. let rec fold l ~init ~f =
  2. match l with
  3. | [] -> init
  4. | hd :: tl -> fold tl ~init:(f init hd ) ~f
  6. let pow_sum l n =
  7. fold l ~init:0 ~f:(fun x y -> pow x n + pow y n)

In the current compiler, every invocation of pow_sum will allocate a closure for the function f. With Flambda, this code should be allocation free, since the fold can be inlined into pow_sum, f can be inlined into the body of the now inlined fold, and the fold can be specialized by adding an extra argument, thus making the closure allocation unnecessary.

Here's another example. Imagine you wanted to sum the first and last elements of a list. You could do this using functions in the list module and a simple pattern match, as follows.

  1. let first_plus_last =
  2. match List.hd l, List.last l with
  3. | None, _ | _, None -> None
  4. | Some x, Some y -> Some (x + y)

Without any inlining, this function creates three heap-allocated values, which are the various options being returned.

Now, let's consider what happens if we rewrite this code using the option monad.

  1. let (>>=) o f =
  2. match o with
  3. | None -> None
  4. | Some x -> f x
  6. let first_plus_last l =
  7. List.hd l >>= fun x ->
  8. List.last l >>= fun y ->
  9. Some (x + y)

Or, equivalently, using our ppx_let syntax:

  1. let first_plus_last l =
  2. let%bind x = List.hd l in
  3. let%bind y = List.last l in
  4. Some (x + y)

Without any inlining, this implementation will create five heap allocated values: one for each of the closures on the right hand side of the bind, one for the options returned from List.hd and List.tl respectively, and once for the final Some.

It's of course possible to write a version that only allocates one object, for the return value.

  1. let first_plus_last l =
  2. match l with
  3. | [] -> None
  4. | x::_ ->
  5. let rec last_exn = function
  6. | [] -> assert false
  7. | [y] -> y
  8. | _ :: tl -> last_exn tl
  9. in
  10. Some (x + last_exn l)

But this is obviously uglier and harder to read than either of the earlier versions.

With Flambda (and the appropriate flags, i.e., (-O3 -unbox-closures), these examples all allocate the same amount. That's because, once the inlining is done and all the code is in one place, the compiler can do things like observe that the option returned from List.hd is immediately deconstructed and is never used for anything else. As a result, the allocation of the options can be removed entirely.

This example highlights why these kinds of compiler optimizations are valuable. It's not that they're strictly necessary for good performance --- usually, pretty good performance can be achieved with sufficiently ugly code. But what good optimizations can do for us is let us write simpler, easier to understand code, without sacrificing performance.

A basis for future optimizations

In addition to the optimizations that will be there when 4.03 lands, Flambda will also make it easier to build new optimizations. One example comes from a project that was done last summer by Will Crichton, one of our interns. The aim of the project was to make the kind of allocation removal I described above more predictable.

From how I described it above, it might seem like Flambda's ability to remove allocations relies on the ability to inline the allocating functions in their entirety. That would be problematic, because what is and isn't inlined can be a bit unpredictable. That's because there have to be some heuristic thresholds, since inlining everything can lead to excessively large executables.

As such, depending on inlining makes it harder to predict the allocation behavior of your code, since it now hinges on the heuristics for determining when a function should be inlined. The situation isn't quite so grim --- some of the optimizations that come along with Flambda (like -unbox-closures) don't depend so critically on inlining. But still, many of Flambda's optimizations do depend on inlining, and as such whether they hit becomes harder to predict.

But we can make this better. The specific project the Will worked on had to do with allocations that come from returning small objects that are immediately deconstructed. With Flambda as it is today, such allocations are only removed if the both the construction and deconstruction of the value are together, which often requires inlining. Will's project made this more robust by changing the calling conventions in Flambda, specifically by adding a first class multi-argument return to the Flambda IR.

With a multi-argument return in place, a function that returns a small tuple, say, could instead be compiled to return the components of the tuple as individual values to be passed back on the stack. Then, this function can be wrapped with a small function that picks those values off the stack and allocates the necessary object.

This small wrapper function is by design small enough to be inlined reliably. Once that inlining is done, both the construction and deconstruction of the value will be visible to the optimizer, and they can be collapsed. This should work whether or not the main function is inlined.

This is really just one example. Our hope and expectation is that Flambda will be the basis of many improvements in the compilation of OCaml over the next few years.


The performance impact we've seen from our experiments with Flambda seem pretty promising so far. On real world applications we've tested, it's pretty normal to see allocations reduced by 20-30%. We've found similarly sized improvements in application latency as well. And we think that these numbers will improve yet more as more optimizations are added on top of Flambda.

Beyond improving existing programs, we already are seeing how Flambda allows us to write prettier and cleaner code without compromising on performance. For example, Flambda allows us to make freer use of OCaml's functors, which are a great abstraction tool, but one that imposed significant costs pre-Flambda. With Flambda, functors can simply be inlined away.

Flambda makes things easier in other ways too. For example, we're in the processing of developing a new internal protocol for zero-allocation messaging, which requires a lot of code generation, via PPX. Being able to rely on Flambda greatly simplifies that code generation, since it lets us write a fairly naive code generator, which generates efficient code because of Flambda. We can write code that computes offsets into the message naively, and Flambda, via a combination of inlining and constant folding, moves the computation of these offsets to compile time, so the runtime field access becomes a simple lookup.

Even if you're perfectly happy with OCaml's performance as is, Flambda is still an exciting change. That's because various upcoming language improvements like modular implicits (a feature that brings some of the same benefits as Haskell's typeclasses) will only really perform acceptably well with a good inliner in place. So in addition to making your current programs run faster, Flambda also enables new abstractions that can make your future programs easier to read and write.

One downside of all this is that it reduces the predictability of OCaml's performance, which is an important property of the language. Part of how we hope to address this is by improving Flambda's heuristics to be more predictable, but that's unlikely to be enough on its own. That's why OCaml 4.03 comes with new annotations that let the programmer require or prohibit inlining for a particular function.

Over time, we hope to find a good balance, where the heuristics do a pretty good and pretty predictable job by default, but where we give programmers hooks to control things more precisely where it matters.


Getting Flambda this far has been a big project, and a lot of people were involved. OCamlPro did a lot of the heavy lifting, with Pierre Chambart writing most of the compiler code, and Louis Gesbert and Michel Laporte working on benchmarking. Jane Street jumped in too, with Mark Shinwell and Leo White contributing mightily to the code and design as part of the review and stabilization process, and Jeremie Dimino helping with testing. And a number of other people on the core team (notably Alain Frisch, Gabriel Scherer, Runhang Li, Damien Doligez) contributed in a variety of ways to the final product. Thanks to all involved!

Self Adjusting DOM and Diffable Data

In my last post, I gave some simple examples showing how you could use self adjusting computations, or SAC, as embodied by our Incremental library, to incrementalize the computation of virtual dom nodes. In this post, I'd like to discuss how we can extend this approach to more realistic scales, and some of the extensions to Incremental itself that are required to get there.

Along the way, we'll discuss the way that diffable data structures relate to self adjusting computations, and how we can use ordinary pure data types and functional updates to those data types as the basis for building efficient incremental computations.

First, let's bring back the code of the example from that post, which is a simple component that displayes the state of a machine in a datacenter.

  1. module Machine = struct
  2. module Model = struct
  3. type display_mode = Temperature | Uptime
  4. type t = { temperature: float;
  5. uptime: Time.Span.t;
  6. selected: bool;
  7. mode: display_mode;
  8. }
  9. [@@deriving fields]
  10. end
  12. let view model =
  13. let mode = Incr.map model ~f:Model.mode in
  14. let temperature = Incr.map model ~f:Model.temperature in
  15. let selected = Incr.map model ~f:Model.selected in
  16. let uptime = Incr.map model ~f:Model.uptime in
  17. let text =
  18. match%bind mode with
  19. | Temperature ->
  20. let%map x = temperature in sprintf "%F degrees" x
  21. | Uptime ->
  22. let%map x = uptime in Time.Span.to_string x
  23. in
  24. let%map text = text and selected = selected in
  25. Vdom.text
  26. (if selected then [Vdom.class_ "selected"] else [])
  27. text
  28. end

It's worth mentioning that in the service of exercising the machinery, I've over-incrementalized this example. It would very likely be cheaper to just generate the necessary vdom in its entirety every time the model changes. That said, the above is still a good example of how to use Incremental, and it's worth understanding how it works.

The first thing that is done by view is to project out different parts of the model into separate incrementals. The key observation is that, even though model changes, if Model.mode model doesn't change, then the corresponding incremental will cut off computation there, preventing dependent computations from refiring. So, whenever the model changes we'll check each field in the model and whether it changed before launching dependent computations.

By default, this cutoff happens based on physical equality; but one can also specify more expansive semantic notions of equality when that's important.

Mapping over maps, incrementally

The projection approach works well for simple fixed-size records, but it doesn't work when it comes to more complex models with variable sized components. To see why, let's consider what happens if we add one more layer, an outer model consisting of a collection of named machines. We do this using OCaml's Map data structure, which is essentially a functional dictionary.

  1. module Model = struct
  2. type t = Machine.Model.t String.Map.t
  3. end

How do we construct an incremental view for this? We can no longer simply project out all of the individual machine models into a fixed set of incrementals, since the number of machines in the map is not known in advance.

What we need here is a way of efficiently mapping over an incremental map. (The naming collision between the Incr.map function and the Map.t data structure is an unforutnate source of confusion here.) Here's a signature that captures what we need:

  1. val Incr.Map.map
  2. : ('k,'v,'cmp) Map.t Incr.t
  3. -> f:('v Incr.t -> 'w Incr.t)
  4. -> ('k,'w,'cmp) Map.t Incr.t

If you're not used to reading OCaml signatures, this can take a moment to digest. First, note that ('k,'v,'cmp) Map.t denotes a map whose key is of type 'k, whose data is of type 'v. (You can ignore the 'cmp parameter.)

So, what does this function do? A good rule of thumb for incremental is that if you want to understand the meaning of an incremental function (ignoring performance), you can just drop all the incremental bits. If we do that for the above type signature, we get this.

  1. val Map.map
  2. : ('k,'v,'cmp) Map.t
  3. -> f:('v -> 'w)
  4. -> ('k,'w,'cmp) Map.t

This type correctly suggests that the function is, as the name suggests, a map, which is to say, a function that transforms the elements of a container, thereby producing a new instance of that container.

That tells us what Incr.Map.map is supposed to compute, but what about the performance of that computation? In particular, what is the structure of the incremental graph this generates?

The following picture illustrates the desired incremental graph. The idea is that there is a path in the graph for every key/data pair in the map. When a given key is updated, an update should be passed down the corresponding incremental path. When a key is added or removed, then paths should be added or removed accordingly. And, rather than recomputing map' from scratch on every update, we want to do the corresponding update, insert or delete into map'.

The incremental graph generated by Incr.Map.map

This function is enough to build an efficient version of our view. In particular, we can write our view function as follows:

  1. let view (model : Model.t Incr.t) : Vdom.t Incr.t =
  2. let%map machine_vdoms = Incr.Map.map model ~f:Machine.view in
  3. Vdom.node "div" [] (Map.data machine_vdoms)

Note that at the end, we do need to construct a complete list of all the displayed machines, and that does have to be reconstructed in its entirety on every change. That's because the virtual dom API uses lists, which can't be updated incrementally in an efficient way. If the virtual dom were modified to take a tree-like structure for nodes, this could be handled efficiently.

That said, I don't think this is a serious practical problem, since in most UIs, the cost of allocating such nodes is a small part of the overall cost of reconstructing the virtual dom.

Diffable data and incrementality

In order to implement Incr.Map.map efficiently, we need some special support both from our Map datatype and from incremental itself. From Map, we need a way to efficiently figure out a set of updates, inserts and deletes that will take us from one map to another. Happily, the Map module in Core_kernel contains a function that can help.

  1. val Map.symmetric_diff
  2. : ('k, 'v, 'cmp) Map.t
  3. -> ('k, 'v, 'cmp) Map.t
  4. -> data_equal:('v -> 'v -> bool)
  5. -> ('k * [ `Left of 'v | `Right of 'v | `Unequal of 'v * 'v ]) Sequence.t

Here, the Left case corresponds to data that's only in the first map, but not in the second, and so is a deletion. Similarly, the Right case corresponds to an addition, and the Unequal case corresponds to an update to the value.

A key property of Map.symmetric_diff is that it takes advantage of physical equality of nodes by short-circuiting. This means that if you take a map, make some modifications to it, and then compute the symmetric diff between the original map and the new one you just created, the computational effort is proportional to the work that was required to generate the second map in the first place. In other words, you can efficiently read back the set of changes applied to a map.

With Map.symmetric_diff, we have a way to efficiently determine what has changed in the map, and thereby which incrementals need to be updated, added or removed. This diffing is reminiscent of both the diffing inherent in the virtual dom approach, and the enter/update/exit pattern that characterizes the D3 visualization library.

Implementing Incr.Map.map

Map.symmetric_diff isn't quite enough to build Incr.Map.map. We also need some extra support from Incremental itself for controlling the pattern of updates so that even though the node containing the input map has many dependent nodes, we only fire the nodes that are impliciated by the symmetric diff. Indeed, we're still working on providing first class support for this kind of primitive within Incremental.

Even without those hooks, we can still take some steps in the right direction, the first of which is to use mutable state to keep the old map to diff against. Essentially, we want a function with this signature:

  1. val Incr.diff_map :
  2. 'a Incr.t -> f:(old:('a * 'b) option -> 'a -> 'b) -> 'b Incr.t

Here, the function f gets access to the old input and old output of this same function when computing the new one. We can implement diff_map straightforwardly enough:

  1. let diff_map i ~f =
  2. let old = ref None in
  3. let%map a = i in
  4. let b = f ~old:!old a in
  5. old := Some (a,b);
  6. b

Using diff_map, we can implement a simpler form of Incr.Map.map which takes a function that transforms the elements of the map directly, rather than incrementally. It would have this signature:

  1. val Incr.Map.simple_map
  2. : ('a, 'b, 'c) Map.t Incr.t
  3. -> f:('b -> 'd)
  4. -> ('a, 'd, 'c) Map.t Incr.t

And here's the implementation.

  1. let simple_map m ~f =
  2. diff_map m ~f:(fun ~old m ->
  3. match old with
  4. | None -> Map.map m ~f
  5. | Some (old_in,old_out) ->
  6. let diff = Map.symmetric_diff ~data_equal:phys_equal old_in m in
  7. Sequence.fold diff ~init:old_out ~f:(fun acc (key,change) ->
  8. match change with
  9. | `Left _ -> Map.remove acc key
  10. | `Right d | `Unequal (_,d) -> Map.add acc ~key ~data:(f d)
  11. )
  12. )

The implementation is simple enough: when the map changes, we diff the old input and the new input. By folding over that diff, we can efficiently update the old output to produce the new output.

Implementing the full Incr.Map.map is more complicated, because the fact that f is a function that consumes and produces incrementals means that you need to construct a more complicated incremental graph --- effectively, you need to split, transform and merge the results.

We can bridge the difference between Incr.Map.simple_map and Incr.Map.map with the following two function.

  1. val Incr.Map.split :
  2. ('a,'b,'cmp) Map.t Incr.t -> ('a,'b Incr.t,'cmp) Map.t Incr.t
  3. val Incr.Map.join :
  4. ('a,'b Incr.t,'cmp) Map.t Incr.t -> ('a,'b,'cmp) Map.t Incr.t

Incr.Map.split takes an incremental map and produces an incremental map whose data is itself incremental. The idea is that the outer incremental updates only when a key is added or removed from the input map. Changes to the input map that change data of an existing key instead lead to an update to the corresponding inner incremental. Incr.Map.join is simply the inverse operation, taking an incremental map full of incrementals, and producing a simple incremental map.

Using these two functions together, we can convert our simple_map into a full implementation of Incr.Map.map, as follows.

  1. let map m ~f =
  2. Incr.Map.join (simple_map ~f (Incr.Map.split m))

While there are hacks to approximate them using Incr.map and Incr.bind, the ordinary incremental interface isn't enough to build a version of Incr.Map.join and Incr.Map.split with the right performance characteristics. For that reason, we've started work on an Expert interface within Incremental that lets you create incremental nodes with precise control over their dependencies and when they refire.

Why virtual-dom?

Almost everyone that I've discussed these ideas with, both inside of Jane Street and beyond, have asked the question: given that you have Incremental, why bother with using a virtual dom at all? After all, Incremental provides a mechanism for tracking which parts of the computation have changed. WHy should one have two layers that are doing something so similar?

We've made things even more duplicative by diffing not just the virtual dom but also our own internal data structures as a way of creating more efficient incremental computations on top of ordinary functional data types.

But despite the seeming redundancy of this approach, I think it has real value. That's because, although the optimizations provided by Incremental are valuable, programming with Incremental is a worse experience than doing ordinary functional programming with immutable values. The big win of virtual dom is that it lets you do most of your programming in a typical functional style with reasonable performance. You only need to resort to using Incremental when the performance provided by this simple approach isn't sufficient.

But diffability isn't just useful on the output of your incremental computation. It's also helpful on the inputs, and that's where the diffability of Core_kernel's maps comes in handy. That way, we can write our code for updating our model in an ordinary functional style, but still use incremental to efficiently scaffold our view functions on top.

Summing up

We have some beginning versions of all of this infrastructure internally at this point. In particular, we've built fast versions of Incr.Map.map and friends, and used this for creating some simple dynamic web applications. The results so far are pretty promising. By paying some careful attention to how the input data changes and incrementalizing accordingly, we can build applications that display thousands of rows of data of which hundreds of cells are changing every second with relatively low latency, able to respond to every update in just a handful of milliseconds.

I doubt our performance is as good as more widely used and likely better optimized frameworks like React and Mercury. But the thing that has struck me about all of this work is how easy and natural it has been. With only a small amount of effort, we've been able to leverage existing tools, from Incremental to Async to virtual-dom, to write highly responsive applications in a simple and natural style, and one that fits naturally within our existing OCaml libraries and tools.

Another interesting question is how this all relates to the use of FRP for UIs. I've written a bit about the difference between FRP and SAC, but the short version is that FRP is more about modeling temporal processes, and SAC is purely about optimization.

For my perspective, SAC fits more cleanly into the goal of simply rendering virtual-dom, and I can't see as much value in having time-oriented operations as a first-class construct. And the fact that SAC is focused purely on optimization allows it to do a better job of that. In particular, the dynamism of operations like bind makes it possible to precisely control the dependency structure of the system, and do things like quiesce parts of the UI that are not in view. This is harder to do in an FRP system without introducing memory leaks.

All of which makes me think that self adjusting computation is a useful idiom for UI programming, and perhaps one that should be adopted more widely.

Self Adjusting DOM

I've been thinking recently about how to structure dynamic web applications, and in particular about the role that incremental computation should play.

In this post, I'd like to outline an approach we've been experimenting with internally which uses Incremental, a general purpose library for building so-called self adjusting computations. Self adjusting computations are essentially graph-structured computations that can be updated efficiently when their inputs change.

I'll describe this all in terms of OCaml, which is the language we're doing these experiments in (courtesy of the excellent js_of_ocaml), but the ideas should be applicable to other languages.

Elm in OCaml

Let's start by outlining a simple, non-incremental approach, inspired by the Elm Architecture. In that architecture, there are three primary elements from which an application is composed: the model, the view and a set of actions.

The model represents the logical state of the application. This includes things like the data being displayed on the page, the current page being displayed, and the part of the page that's in focus. But it omits most presentation details, and doesn't specify the concrete choice of dom elements used.

Actions are the things that can happen to the model. The arrival of new data to be displayed, or the click of a mouse, are translated into actions which are in turn interpreted to update the state. These are important, but don't affect the incrementality story much, so we won't discuss them in detail.

The view is a function that takes the current model and produces a vdom (virtual dom) tree, an immutable datastructure that represents the intended state of the dom. It is the view function that makes the concrete presentation choices that are left unsaid by the model. The view determines some of the dynamic behavior of the application as well, since within the vdom you can register callbacks on keypresses or mouse clicks that enqueue actions to later be applied to the state.

We can wrap this up into a module signature, as follows.

  1. module type App_intf = sig
  2. module Model : sig
  3. type t
  4. end
  6. module Action : sig
  7. type t
  8. val apply : t -> Model.t -> Model.t
  9. end
  11. val view : Model.t -> (Action.t -> unit) -> Vdom.t
  12. end

Note that the second argument to view is a function that can be used for scheduling actions to occur. This function is used to build callbacks which are attached to the vdom.

We can combine this with a simple interface for starting up an application:

  1. val start_app
  2. : (module App_intf with type Model.t = 'model)
  3. -> init:'model
  4. -> unit

which is responsible for running the display loop.

This isn't quite enough; in particular, I've omitted the necessary hooks for kicking off asynchronous processes for doing things like grabbing data from a server. But the model above is a pretty good start, and gives you a sense of the structure of an Elm-style application.

This approach isn't too bad from a performance perspective. In particular, the on_startup function minimizes the amount of churn to the dom by, on every action, diffing the newly generated vdom against the previous instantiation, to produce a minimal patch to apply to the dom proper. And modifications to the dom itself are quite expensive.

But this approach doesn't scale to big, complex UIs. That's because, even though most actions change the model in only small ways, the full vdom tree has to be created every time. If the vdom is large and complicated, this is a serious problem.

Incremental Basics

Let's see how we can use Incremental to optimize generation of the vdom.

Imagine we want to display information about the state of a set of machines in our datacenter. For each machine, assume we have two pieces of data: the temperature, and whether that particular server is currently selected. Now, given incremental values representing the data for just one machine, how can we use Incremental to produce the vdom?

Incremental provides operators for building computations on top of incremental values. For example, the function Incr.map2 has this signature.

  1. val Incr.map2 : 'a Incr.t -> 'b Incr.t -> f:('a -> 'b -> 'c) -> 'c Incr.t

This lets us build a computation that takes two incremental inputs, and combines them to make a single new incremental. We can use this for generating an incremental vdom to display our machine.

  1. let view temperature selected =
  2. Incr.map2 temperature selected
  3. ~f:(fun temperature selected ->
  4. Vdom.text
  5. (if selected then [Vdom.class_ "selected"] else [])
  6. (sprintf "%F degrees" temperature))

We can write this a bit more naturally using the ppx_let syntax extension. Essentially, ppx_let allows us to encode maps using ordinary let binding syntax.

  1. let view temperature selected =
  2. let%map temperature = temperature and selected = selected in
  3. Vdom.text
  4. (if selected then [Vdom.class_ "selected"] else [])
  5. (sprintf "%F degrees" temperature)

The key issue here is that the code regenerating the text node will only be rerun when necessary, i.e., when the value of either selected or temperature have changes. In a complex view with lots of incremental inputs and lots of vdom nodes built on top of them, judicious use of map allow you to recompute only the vdom that are in need of an update.

Using bind

It turns out that map isn't enough. One limitation of map is that the dependencies introduced by map are static. e.g. Incr.map2 a b ~f will produce a node that reruns f every time a or b change.

But sometimes, you want dynamic rather than static dependencies. For a trivial example, imagine that our machine view has different inputs that might be used in different situations, say, there's a mode that determines whether uptime or temperature are displayed. We could implement such a view on top of map as follows.

  1. let view temperature uptime selected mode =
  2. let%map temperature = temperature
  3. and uptime = uptime
  4. and selected = selected
  5. and mode = mode
  6. in
  7. Vdom.text
  8. (if selected then [Vdom.class_ "selected"] else [])
  9. (match mode with
  10. | Temperature -> sprintf "%F degrees" temperature
  11. | Uptime -> Time.Span.to_string uptime)

Here, the appearance of the dom node is dynamic, but the dependencies are not. Even if you're in Temperature node, you'll still recompute the Vdom when the uptime changes.

On this small scale, the extra cost is trivial. But as you consider larger more complicated views, the ability to control dependencies precisely can have real value.

In this case, we can control the dependencies using bind. Here's the signature of bind:

  1. val bind : 'a Incr.t -> ('a -> 'b Incr.t) -> 'b Incr.t

The signature is deceptively simple, but it lets you do something powerful. In particular, Incr.bind i f returns an incremental whose dependencies, and even whose interior nodes, are chosen dynamically by f.

Here's a simple rewrite of the code above that takes advantage of bind.

  1. let view temperature uptime selected mode =
  2. let text =
  3. Incr.bind mode (function
  4. | Temperature ->
  5. let%map x = temperature in sprintf "%F degrees" x
  6. | Uptime ->
  7. let%map x = uptime in Time.Span.to_string x
  8. )
  9. in
  10. let%map text = text and selected = selected in
  11. Vdom.text
  12. (if selected then [Vdom.class_ "selected"] else [])
  13. text

Here, bind lets us create a text incremental that depends either on the temperature or on the humidity, depending on the mode. We can write this with our syntax extension, using its specialized match syntax.

  1. let view temperature uptime selected mode =
  2. let text =
  3. match%bind mode with
  4. | Temperature ->
  5. let%map x = temperature in sprintf "%F degrees" x
  6. | Uptime ->
  7. let%map x = uptime in Time.Span.to_string x
  8. in
  9. let%map text = text and selected = selected in
  10. Vdom.text
  11. (if selected then [Vdom.class_ "selected"] else [])
  12. text

One thing that's nice about ppx_let is that it makes it easier to separate thinking about what your code does from how it's incrementalized. If you ignore the %map and %bind annotations, what you're left with is enough to understand the meaning of the computation that's being incrementalized. The annotations are only important for understanding the performance characteristics of the incremental recomputation.

Decomposing incrementals

Here's an updated version of our App_intf which includes Incremental.

  1. module type App_intf = sig
  2. module Model : sig
  3. type t
  4. end
  6. module Action : sig
  7. type t
  8. val apply : t -> Model.t -> Model.t
  9. end
  11. val view : Model.t Incr.t -> (Action.t -> unit) -> Vdom.t Incr.t
  12. end

The only change is the view function, which instead of taking a Model.t and returning Vdom.t now takes a Model.t Incr.t and returns a Vdom.t Incr.t. And instead of calling this function on every update, we simply call it once at the beginning. The start_app function would be responsible for repeatedly updating the Model.t Incr.t as the model changes, and can then read off the new vdom node from the Vdom.t Incr.t.

This all sounds good on the surface, but there's a fly in this ointment, which is that my earlier examples were built on the assumption that the different inputs to the computation were already broken down into separate incremental values. But here, we have one big incremental value containing the entire model. How do we apply Incremental effectively in this case?

Let's revist our server-display example from before. Now, instead of assuming we have a different incremental for each property of a server, imagine we have one incremental representing the full state of one server. We can use the following as our model type:

  1. module Model = struct
  2. type display_mode = Temperature | Uptime
  3. type t = { temperature: float;
  4. uptime: Time.Span.t;
  5. selected: bool;
  6. mode: display_mode;
  7. }
  8. [@@deriving fields]
  9. end

The deriving annotation above provides us with accessor functions for each field, which will be useful shortly.

It's easy enough to write a view function that returns an incremental that recomputes in its entirety every time the model changes, as shown below.

  1. let view model =
  2. let%map model = model in
  3. let text =
  4. match model.mode with
  5. | Temperature ->
  6. sprintf "%F degrees" model.temperature
  7. | Uptime ->
  8. Time.Span.to_string model.uptime
  9. in
  10. Vdom.text
  11. (if model.selected then [Vdom.class_ "selected"] else [])
  12. text

And for this tiny example, that's likely the correct thing to do. But we'll want to incrementalize more precisely for real world examples, so let's see how we could do that in this case.

What we effectively need to do to incrementalize this is to convert our one big incremental model into a number of smaller incrementals. We can do that by projecting out individual components of the model using map.

For example, if we write:

  1. Incr.map model ~f:(fun m -> m.mode)

or, equivalently

  1. Incr.map model ~f:Model.mode

We'll get an incremental that contains only the mode. Critically, incrementals that are depend on this one will only update when the mode actually changes, not, say, the temperature is updated. That's because Incremental cuts off computations when the new output is physically equal to the old one, so that even if the model changes, each projected incremental will only propagate the computation if its data has changed.

We can use this approach to first project out the fields of the model record into different incrementals, and then build out computation on top of that. That looks like this:

  1. let view model =
  2. let mode = Incr.map model ~f:Model.mode in
  3. let temperature = Incr.map model ~f:Model.temperature in
  4. let selected = Incr.map model ~f:Model.selected in
  5. let uptime = Incr.map model ~f:Model.uptime in
  6. let text =
  7. match%bind mode with
  8. | Temperature ->
  9. let%map x = temperature in sprintf "%F degrees" x
  10. | Uptime ->
  11. let%map x = uptime in Time.Span.to_string x
  12. in
  13. let%map text = text and selected = selected in
  14. Vdom.text
  15. (if selected then [Vdom.class_ "selected"] else [])
  16. text

And this has basically the right incremental structure.

This is a good start, but we still don't really have the full story. In particular, the above approach to incrementalization makes sense when our overall data is organized as simple static structures like records. But it's not clear what to do when we have more complex data structures. For example, what if we had a collection of machines stored in a map or a set? We don't yet have a way of efficiently handling this kind of complex abstract data type using Incremental.

More on that in my next post.

Incremental computation and the web

I've recently been thinking about the world of JavaScript and web applications. That's odd for me, since I know almost nothing about the web. Indeed, Jane Street's use of web technologies is quite minimal -- nearly all of our user interfaces are text based, and all told we've been pretty happy with that.

But there are real limitations to console apps, and if you need something richer that's cross-platform, the web is pretty appealing. For us it's made yet more appealing by the fact that OCaml, our language of choice, compiles into JavaScript via js_of_ocaml.

So recently, when a few people internally got interested in trying out JavaScript-based UIs, I dug in a little to try to understand the landscape, and help us figure out what approach to take.

Virtual DOM, all at once

I started by trying to understand more about the approaches taken in the wider world for building JavaScript apps. One idea the struck me as particularly interesting was virtual DOM. Virtual DOM showed up first in Facebook's React, but has since inspired other implementations, like Matt Esch's virtual-dom library, which in turn is the basis of the Mercury web framework and Elm's Html library. Other libraries have wrapped and extended React, like ClojureScript's Om framework.

To understand the appeal of virtual DOM, you first need to understand what the world looks like without it. JavaScript applications in the browser are fundamentally tied to the DOM, which is the tree of objects that reflects the structure of the HTML that the page is built from. The DOM is wired into the browser, and mutating the DOM is how you change what's shown on the screen, a necessity for a dynamic web app.

But working directly with DOM can be awkward. For one thing, it encourages you to write your display logic twice: once to create the initial state of your page, and then again for the code that updates the DOM in response to external events.

But it's worse than just having to write your logic twice: the second time is also trickier. That's because, for performance reasons, you need to minimize changes to the DOM, since those changes can cause expensive reflows and the like. Doing this kind of change minimization by hand can be pretty painful.

The goal of virtual DOM is to let you write your display logic just once, and to do so in a convenient, straight-ahead style. The idea is simple: instead of modifying the DOM directly, you create immutable trees that represent the DOM you'd like to have, a virtual DOM. Every time your state changes, you recompute the virtual DOM, and then diff the new virtual DOM against the old. The resulting patch can then be applied to the actual DOM, which minimizes changes to the DOM proper.

This makes it easier to express your display logic cleanly. Elm uses virtual DOM as part of the Elm architecture. In that approach, the state of the application is kept in a model value, which abstractly represents the state of the application, omitting presentation details. A separate view function is used to convert the model into a virtual DOM tree that describes the page that should be shown to the user.

The application is made dynamic by adding an action type which summarizes the kinds of changes that can be made to the model. Actions are enqueued either from callbacks embedded in the virtual DOM, or by other jobs that communicate with the outside world via web requests and the like. When the action is applied to the model, a DOM update is done by recomputing the virtual DOM, and then diffing and patching the real DOM accordingly.

Virtual DOM, incrementally

As described, the above approach involves computing the virtual DOM from scratch after every action. This is done even if the change to the DOM implied by the action is small, which is the common case. Essentially, every key press and mouse click causes the entire virtual DOM to be recomputed.

In a world where DOM updates are the only expense that matters, this isn't so bad. And for sufficiently small web applications, that's almost right. But once you're creating large, dynamic UIs, this simple story falls apart, and the cost of recreating the virtual DOM every time matters.

That's why in all of these virtual DOM APIs and frameworks, there's some form of incrementalization built in, a way to avoid paying the full cost of rebuilding the virtual DOM when the logical changes are small.

In React, for example, the state is organized into a set of hierarchical components, each with its own render function. These components are structured to match the structure of the HTML that they generate, with the idea that you'll only have to re-render the few components whose input data has actually changed. React effectively memoizes the render computation at the component level.

Elm, rather than tying the incrementalization directly to a framework-level notion of component, lets you introduce memoization in the construction of individual virtual DOM nodes. To do this, Elm's Html module exposes a set of "lazy" functions with roughly these signatures (shown with OCaml syntax).

  1. val lazy1 : ('a -> Html.t) -> 'a -> Html.t
  2. val lazy2 : ('a -> 'b -> Html.t) -> 'a -> 'b -> Html.t
  3. val lazy3 : ('a -> 'b -> 'c -> Html.t) -> 'a -> 'b -> 'c -> Html.t

Here, the first argument is the render function, and the remaining arguments are the values to be passed to the render function.

The idea is that a call to one of these lazy functions won't call the render function immediately. Instead, it creates a special node that stores the render function and its arguments for later. The render function is only called as part of the process of diffing two virtual DOM trees. When the diff gets to the point of comparing two such nodes, it first compares the things that the node was built from, i.e., the render function and its arguments. If they're the same, then the diff is empty. If they differ, then the render function is run to compute more of the tree, and the diffing process continues from there.

It's worth noting that forcing the render function for a given node to run will create more of the virtual DOM tree, but it won't necessarily build everything below that node. In particular, the tree that's created may contain yet more lazy nodes, which won't be forced until the diffing process gets to them.

By making enough nodes lazy in this way, you can incrementalize the computation of the entire virtual dom tree, only forcing the recomputation of parts of the virtual dom that could have changed given the changes in the underlying data model.

Elm's approach has some limitations. While it doesn't limit memoization to a particular notion of a component, it does tie it to nodes in the DOM tree. This can be limiting, since it prevents you from sharing other parts of the computation that don't result concretely in DOM nodes.

It's also a little anti-modular, in that you basically need to call your lazy function on simple values and top-level functions, so ordinary functional programming modularization techniques, which often rely on passing around closures, don't work as well as you'd hope.

Beyond virtual DOM

Virtual DOM isn't the only approach to simplifying the process of programming the DOM. Another example I ran across is Mike Bostock's amazing D3 library. D3 has some of the same goals as virtual DOM, in that it aims to provide a nice way to construct complex web pages based on some more abstract data model. Like virtual DOM, D3's approach lets you specify the view logic once, while producing a view that responds efficiently to changing data. D3 is doing this in the service of data visualization, but the approach it takes is not limited to that domain.

Where virtual DOM encourages you to think of your view calculation as an all at once affair, D3 makes you think about incrementalization explicitly where it matters. In particular, when you specify how the view changes in response to data, you do so by explicitly specifying what happens in three cases: enter, update, and exit. The enter case corresponds to new data points arriving, update corresponds to data points that are changing, and exit corresponds to data being removed.

These transformations are specified using a spiffed up version of the DOM selectors API, which lets you can select a collection of nodes by stating conditions that those nodes satisfy. You can then specify ways of transforming those nodes, and, somewhat surprisingly, specify the creation of nodes that don't exist yet. This is done using the append operation, and is all part of what's is called data binding in the D3 world.

If this sounds confusing, well, I found it confusing too. But the D3 approach has some good things going for it. For one thing, it gives you a natural way of thinking about animations, since you can specify simple animations to run on the enter/exit/update actions, which is more awkward in virtual DOM based approaches.

To borrow an analogy from circuits, virtual DOM is level-triggered, meaning the view depends only on the current value of the state; but D3 is edge-triggered, meaning that the display logic can depend on the precise transition that's occurring. This is a real difference in the models, but I'm not sure how important it is in practice.

To some degree, you can get around this issue on the virtual DOM side by expressing more time-dependent information in the model. Also, you can add edge-triggered events on top of your virtual DOM, which React does. That said, it's not as front-and-center in the Virtual DOM API as it is with D3, where edge-triggered animations are an easy and natural part of the design.

Incrementality everywhere

Given that incrementality seems to show up in one form or another in all of these web frameworks, it's rather striking how rarely it's talked about. Certainly, when discussing virtual DOM, people tend to focus on the simplicity of just blindly generating your virtual DOM and letting the diff algorithm sort out the problems. The subtleties of incrementalization are left as a footnote.

That's understandable, since for many applications you can get away without worrying about incrementalizing the computation of the virtual DOM. But it's worth paying attention to nonetheless, since more complex UIs need incrementalization, and the incrementalization strategy affects the design of a UI framework quite deeply.

The other benefit of thinking about incrementalization as a first class part of the design is it can lead you in new directions. In that vein, I've been experimenting with using self-adjusting computations, as embodied by our Incremental library, as another approach to incrementalizing computation of the virtual DOM.

Self-adjusting computations is a general purpose approach to building efficient on-line computations developed by Umut Acar in his dissertation. Thinking about Incremental in the context of GUI development has lead us to some new ideas about how to build efficient JavaScript GUIs, and some new ideas about how Incremental itself should work. I hope to write more about that in an upcoming post.

(You can check out the next post here.)


Most of what I've written here comes from talking to people who know a lot more about the web than I do, and I wanted to thank them. I had some very enlightening conversations with Jordan Walke about React's design and history. I've also talked a bunch to Evan Czaplicki about Elm, and I'm indebted to Spiros Eliopoulos, who helped me learn a bit about D3, in part through his ocaml-d3 bindings. Also thanks to Hugo Heuzard and Izzy Meckler for writing a bunch of useful code and helping me learn about js_of_ocaml and more generally about various details of JavaScript and modern browsers.

Why OCaml?

Here's a post from a talk I gave this last summer during our internship program about why we use OCaml. It spends a lot of time on how OCaml fits into the space of programming language designs, and why we think OCaml is in a real sweet spot in that design space, especially for the kind of work we do at Jane Street.

Warning: it's a very informal talk, with lots of questions and answers from the audience, not all of which are clearly audible, for which I apologize. Still, I hope people will get something out of it.

Testing with expectations

Testing is important, and it's hard to get people to do as much of it as they should. Testing tools matter because the smoother the process is, the more tests people will write.

Especially in the functional programming world, most of the talk about testing tools is focused on tools for property-based testing, like the various and sundry quickcheck-style systems. These are great, but sometimes, you don't want to write down properties --- what you want is to write your tests in terms of simple, concrete scenarios.

We've recently added support for what we call expect tests, a kind of test optimized for this kind testing. Expect tests allow you to write test scenarios without having to manually write out the output generated by the code you're testing. Instead, that output is captured and recorded automatically for you, in a way that makes it easy to integrate into the source of the test.

Our expect tests were inspired by Mercurial's unified test format. Unified tests are designed for testing command-line tools like hg, and so are specialized to the shell.

Here's an example using cram, an implementation of the unified test idea that is independent of Mercurial. Let's say we want to test the UNIX sort command. We might start by writing a test file, simple.t, that looks like this.

Dump some lines into a file

  $ cat > colors << HERE
  > red
  > yellow
  > green
  > HERE

sort the file and dump to stdout

  $ sort colors

If you then run cram on this, cram will show you that the test failed by showing you a diff.

expect-test $ cram simple.t
--- simple.t
+++ simple.t.err
@@ -10,5 +10,11 @@
 sort the file and dump to stdout

   $ sort colors
+  green
+  red
+  yellow

It also creates a new file, simple.t.err, which contains the output of running the script, intermixed with the script itself. You can accept the new version just by moving the err file over the original.

mv simple.t.err simple.t

If you run cram now, you'll see that the tests pass.

$ cram simple.t
# Ran 1 tests, 0 skipped, 0 failed.

If we break the tests somehow, then the diff will show us exactly what failed. For example, if we replace sort with cat, here's what Cram will show us:

bash-3.2$ cram simple.t 
--- simple.t
+++ simple.t.err
@@ -10,7 +10,7 @@
 sort the file and dump to stdout

   $ cat colors
-  green
+  green

# Ran 1 tests, 0 skipped, 1 failed.

Note how good the diff is for seeing how your test failed.

Starting with the development of Iron last year, we started using cram tests pretty extensively for command-line programs. We found it to be a very productive idiom, but it's pretty awkward to apply outside of the command-line domain. That's why we started thinking about how to get the benefits of cram, but in OCaml.

Breaking out of the shell

Unified tests are great for three reasons:

  • they let you combine the scenario and the output of that scenario (and comments) into one readable file
  • they help you construct the file automatically
  • they display test failures as easy-to-interpret diffs.

None of these advantages is tied to using the shell. To bring this to OCaml, though, we needed to figure out a reasonable way of embedding these tests in an OCaml program, without breaking all of the tooling. We did this by leveraging OCaml's annotations, which let us get the data we need in place without breaking from the ordinary syntax of an OCaml program. That means that tools like merlin and ocp-indent and editor modes like tuareg will work without incident.

We can write the OCaml analogue of our cram test by creating the following file, named simple.ml.

  1. open Core.Std
  3. let%expect_test "simple sort" =
  4. let sorted = List.sort ~cmp:String.compare ["red";"yellow";"green"] in
  5. [%sexp_of: string list] sorted |> Sexp.to_string_hum |> print_endline;
  6. [%expect {| |}]

Here, the let%expect_test introduces a new test, and registers it with our inline test framework. [%expect {| |}] introduces a section where output is captured, and multiple such declarations can go in a single test.

Since we haven't actually filled in the output, running the test will fail. Here's the diff it would show.

open Core.Std

  let%expect_test "simple sort" =
    let sorted = List.sort ~cmp:String.compare ["red";"yellow";"green"] in
    [%sexp_of: string list] sorted |> Sexp.to_string_hum |> print_endline;
-   [%expect {| |}]
+   [%expect {| (green red yellow) |}]

As with cram, a new file will have been generated, in this case called simple.ml.corrected, containing the updated test file. As with cram, you can accept the new test results by just copying the generated file over the original.

The above example is simple, but expect tests really shine when you start doing bigger and more complicated scenarios. And the ability to do this in ordinary OCaml code means you can use it for a much wider set of applications.

The source hasn't been released yet, but it will come out as part of our ordinary public release process, and we hope others will give it a spin when it does come out.