What the interns have wrought, 2016

Now that the interns have mostly gone back to school, it's a good time to look back at what they did while they were here. We had a bumper crop -- more than 30 dev interns between our London, New York and Hong Kong offices -- and they worked on just about every corner of our code-base.

In this post, I wanted to talk about just one of those areas: building efficient, browser-based user interfaces.

Really, that's kind of a weird topic for us. Jane Street is not a web company, and is not by nature a place that spends a lot of time on pretty user interfaces. The software we build is for our own consumption, so the kind of spit and polish you see in consumer oriented UIs are just not that interesting here.

But we do care about having functional, usable user interfaces. The work we do is very data driven, and that rewards having UIs that are good at presenting up-to-date data clearly and concisely. And we want to achieve that while keeping complexity of developing these UIs low.

Historically, almost all of our UIs have been text-based. While I love the command line, it does impose some limitations. For one thing, terminals don't offer any (decent) graphical capabilities. But beyond that obvious constraint, getting all the pieces you need for a decent UI to work well in a terminal, from scrolling to text-entry widgets, requires a lot of work that just isn't necessary in a browser.

So this year, we've finally started pushing to make it easy for us to write browser-based applications, in particular relying on OCaml's shockingly good JavaScript back-end. This has allowed us to write web applications in OCaml, using our usual tools and libraries. As I've blogged about previously, we've also been exploring how to use Incremental, a framework for building efficient on-line computations, to make browser UIs that are both pleasant to write and performant.

That's roughly where we were at the beginning of the summer: some good ideas about what designs to look at, and a few good foundational libraries. So the real story is what our interns, Aaron Zeng and Corwin de Boor, did to take us farther.

Web Tables

Aaron Zeng's project was to take all of the ideas and libraries we'd been working on and put them to use in a real project. That project was web-tables, a new user-interface for an existing tool developed on our options desk, called the annotated signal publisher. This service provides a simple way for traders to publish and then view streams of interesting tabular data often based on analysis of our own trading or trading that we see in the markets.

The publisher fed its data to Catalog, our internal pub-sub system. From Catalog, the data could be viewed in Excel, or in our terminal-based Catalog browser. But neither of these approaches worked as well or as flexibly as we wanted.

Enter web-tables. What we wanted was pretty simple: the ability to display tabular data from the annotated signal publisher with customizable formatting, filtering and sorting. This involved breaking a lot of new ground, from figuring out how do the sorting and filtering in an efficient and incremental way, to fixing performance issues with our RPC-over-websockets implementation, to figuring out a deployment and versioning story that let people easily create and deploy new views of their data.

One of the great things about the project is how quickly it was put into use. The options guys started using web-tables before the project was even really finished, and there was a tight feedback loop between Aaron and his mentor Matt Russell, who was using the tool on a daily basis.

Optimizing rendering

Aaron's web-tables work used incr_dom, a small framework that sets up the basic idioms for creating UIs using Incremental. As part of that work, we discovered some limitations of the library that made it hard to hit the performance goals we wanted to. Corwin de Boor's project was to fix those limitations.

The key to building an efficient UI for displaying a lot of data is figuring out what work you can avoid. To this end, Corwin wanted to build UIs that logically contained thousands or even millions of rows, while only actually materializing DOM nodes corresponding to the hundred or so rows that are actually in view.

In order to figure out which nodes to render, he had to first figure out which nodes would be visible, based on the location of the browser's viewport. This in turn required a way of looking up data-points based on where that data would be expected to render on the screen.

Corwin did this by building a data structure for storing the expected height of each object that could be rendered, while allowing one to query for a node based on the sum of the heights of all the nodes ahead of it. He did this by taking an existing splay tree library and rewriting it so it could be parameterized with a reduction function that would be used to aggregate extra information along the spine of the splay tree. By integrating the reduction into splay tree itself, the necessary data could be kept up to date efficiently as the splay tree was modified.

Corwin also spent a lot of time improving incr_dom itself, taking inspiration from other systems like React and the Elm language. We even corresponded a bit with Jordan Walke and Evan Czaplicki, the authors of React and Elm respectively.

One thing that came out of this was a neat trick for making the incr_dom API cleaner by using a relatively new feature of OCaml called open types. The details are a little technical (you can see the final result here and here), but I think what we ended up with is a bit of an advance on the state of the art.

There were a lot of other bits and pieces, like improving the handling of keyboard events in js_of_ocaml, creating a new incremental data-structure library called incr_select for more efficiently handling things like focus and visibility, and restructuring the incr_dom APIs to make them simpler to understand and open up new opportunities for optimization.

By the end, Corwin was able to build a demo that smoothly scrolled over a million rows of continuously updating data, all while keeping update times below 5ms. We're now looking at how to take all of this work and feed it into real applications, including Aaron's work on web-tables.

Sound like fun?

If this sounds like interesting stuff, you should consider applying for a summer internship with us!

Jane Street internships are a great learning experience, and a lot of fun to boot. You even get a chance to travel: interns get to visit Hong Kong, London or New York (depending on where they started!) as part of their summer.

And the projects I described here are really just a taste. Here are some other projects interns worked on this summer:

  • Building a low-latency order router.
  • Adding data representation optimizations to the OCaml compiler
  • A service for simulating a variety of network failures, to make it easier to test distributed applications.
  • Making it possible to write Emacs extensions in OCaml (this was actually Aaron Zeng's second project)

and that's still just a small sample. One thing I love about the work at Jane Street is the surprising variety of problems we find ourselves needing to solve.

Unraveling of the tech hiring market

Recruiting talented people has always been challenging.

In some years that meant competing with a hot new company that aggressively courted every fresh graduate with promises of stock options and IPO glory.  In other years there wasn’t a specific company so much as an entire rising industry looking for people (I’m looking at you cloud services, driverless cars, and peer-to-peer sharing).  Either way, we understood the yearly back and forth.  Our job was to explain to candidates how we stacked up, and more importantly, why a career at Jane Street might be the right choice for many of them.

But this year I got to learn a new name for a new challenge.  “Unraveling”.

I first encountered it in a book I was reading for fun: "Who Gets What, and Why", by the Nobel Prize-winning economist Alvin Roth.  He does a lovely job explaining the idea of a matching market.  In a matching market each person wants only one of each item, each item is unique, and each item can be given to at most one person at a time.  Jobs are a classic matching market, and just like any market, matching markets can work well, or poorly.

Unraveling is one of the primary things that makes a matching market fail.  When a market unravels  matches start to happen earlier and earlier, to the point where people no longer get a complete view of their options.  In the book Roth relates the story of a person who stepped off of a plane to find three voicemails on his phone.  The first offered him a job, the second urged him to respond soon, and the last rescinded the offer because he hadn't responded quickly enough.

We call them exploding offers, and this year they have gotten completely out of hand as companies race to the bottom in their efforts to recruit the next wave of interns and fresh graduates.

Colleges try to impose deadline limits explicitly to stop unraveling, and in the past these have largely been honored.  The cheating and fudging, such as it was, was kept to the fringes.  But this year it seems like the seal is broken, and we've seen major companies delivering internship and full-time offers with 2 week (and less) hard deadlines.  Other companies now routinely deliver expiring bonus offers for signing early.  Many of these offers circumvent or outright break the guidelines set down by schools, and if past matching markets are a model for this one, next year will come with even earlier offers and worse conditions.

This unraveling has been the subject of a lot of discussion, both internally at Jane Street and with the various schools we recruit at, who see it - rightly - as bad for their students.  How can someone make a thoughtful decision about where they want to build a career without the time to interview at more than one or two places?  Unfortunately, most of this discussion is out of the public light, and so the unraveling continues.

We can't control the actions of others, but we also don’t have to follow the herd, so we'd like to be clear:

Jane Street is committed to making sure that you have the time and information you need to decide on an offer from us.  Our offer letters do have good-until dates as a matter of professional practice, but we try to work with every candidate to choose a date that works for them.  We are also happy to extend the date if something unexpected comes up, or, frankly, if someone just needs more time.

Choosing where to start your career is a big decision and we hope you have the time to make a good one.

Do you love dev tools? Come work at Jane Street.

In the last few years, we've spent more and more effort working on developer tools, to the point where we now have a tools-and-compilers group devoted to the area, for which we're actively hiring.

The group builds software supporting around 200 developers, sysadmins and traders on an OCaml codebase running into millions of lines of code. This codebase provides the foundation for the firm's business of trading on financial markets around the world.

Software that the group develops, much of which is written in-house, includes:
- build, continuous integration and code review systems;
- preprocessors and core libraries;
- editor enhancements and integration.

The group also devotes significant time to working on the OCaml compiler itself, often in collaboration with external parties, with work being released as open source. Recent work includes work on the Flambda optimization framework and the Spacetime memory profiler.

Candidates need to be familiar with a statically typed functional language and possess some amount of experience (within industry or otherwise) in this kind of infrastructure.

We're looking for candidates for both our New York and London office. Benefits and compensation are highly competitive.

If you are interested, please email tools-and-compilers-job@janestreet.com with a CV and cover letter.

Let syntax, and why you should use it

Earlier this year, we created a ppx_let, a PPX rewriter that introduces a syntax for working with monadic and applicative libraries like Command, Async, Result and Incremental. We've now amassed about six months of experience with it, and we've now seen enough to recommend it to a wider audience.

For those of you who haven't seen it, let syntax lets you write this:

  1. let%bind <var> = <expr1> in <expr2>

instead of this:

  1. <expr1> >>= fun <var> -> <expr2>

with analogous support for the map operator, via let%map. The choice of monad is made by opening the appropriate Let_syntax module, e.g., you might open Deferred.Result.Let_syntax, or Incr.Let_syntax. Note that Async.Std now opens Deferred.Let_syntax by default.

There's also support for match statements, e.g.:

  1. match%bind <expr0> with
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

is equivalent to:

  1. <expr0> >>= function
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

There's also support for parallel binds and maps, using the and syntax. So this:

  1. let%map <var1> = <expr1> and <var2> = <expr2> in <expr3>

is (roughly) equivalent to this:

  1. map3 <expr1> <expr2> ~f:(fun <var1> <var2> -> <expr3>)

This pattern generalizes to arbitrarily wide maps. It's implemented using map and the both operator, which sacrifices some performance in exchange for generality, vs the explicit mapN operators.


My experience with the new syntax has been quite positive. Here's my summary of the wins.

Parallel binds

For libraries like Command.Param and Incremental, where multi-way map functions (like `map2` and `map3`) are important, it's been a pretty big win in terms of the comprehensibility of the resulting code. This tends to be the case for applicatives like Command.Param, which are just monads without bind. The big advantage is that by writing:

  1. let%map x1 = some_very long expression
  2. and x2 = some_other long expression
  3. and x3 = yet another_thing
  4. in
  5. x1 + x2 / x3

we get to put the variable names directly next to the expressions they're being bound. Using an explicit mapN operator, the result is more awkward:

  1. map3
  2. (some_very long expression)
  3. (some_other long expression)
  4. (yet another_thing)
  5. ~f:(fun x1 x2 x3 -> x1 + x2 / x3)

This is error prone, since it's easy to mix up the variables and the expressions. To avoid the corresponding issue in the original Command library, we used some fancy combinators and the dreaded step operator, leading to some hard to understand code. The let-syntax equivalents are materially easier to use.

Variables first

Using a let-like syntax lets you put the variable before the definition, which follows the pattern of ordinary OCaml code, and makes it a bit easier to read. This cleans up some otherwise awkward patterns that are pretty common in our code. In particular, instead of this:

  1. begin
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. end
  6. >>= fun meaningful_variable_name ->
  7. <expr4>

You can write this:

  1. let%bind meaningful_variable_name =
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. in
  6. <expr4>

which flows a bit more naturally, in part because the meaningful variable name comes first, and in part because the extra begin and end are dropped.

Connecting bind to let

Let binds are a lot like monadic binds, even before you add in any special syntax. i.e., this

  1. <expr1> >>= fun x -> expr2

is a lot like this.

  1. let x = <expr1> in <expr2>

This is why monads are sometimes described as "programmable let-binds" (or, relatedly, "programmable semicolons", which are just let-binds with a unit argument.)

I've found this to be a useful analogy in understanding monads, and the analogy is made clearer with let syntax. We have some preliminary reports of this making monadic code more approachable for beginners, which lines up with my intuition.

The similarity between ordinary lets and monadic lets also makes diffs easier to read. e.g., in Async, if some function goes from being synchronous to deferred, the change at the call point would now be from this

  1. let x = some_synchronous_thing () in
  2. more things

to this.

  1. some_asynchronous_thing ()
  2. >>= fun () ->
  3. more things

With let-syntax, we would instead change it to this.

  1. let%bind x = some_asynchronous_thing () in
  2. more things

i.e., the only thing that would change would be the addition of %bind. The resulting diff is more targeted, making the substance of the change a bit easier to see, making refactoring that adds or remove blocking easier to do and understand.


It's not all wine and roses. There are some downsides to let-syntax:

It's new and different

Enough said.

It's kinda ugly

This is a matter of taste, but I've heard some distaste for the percent sign itself. That's something forced on us by PPX, but I don't exactly disagree.

Also, the %bind and %map are a little wordy. There's been some talk of adding the ability to define alternate let syntaxes in OCaml proper, which would allow you to write something like this.

  1. let* x = some_asynchronous_thing () in
  2. let* y = some_other_thing () in
  3. let+ z = a_third thing in
  4. x + y + z

Here, let* would be equivalent to let%bind, and let+ is equivalent to let%map. Again, it's not clear to me that this would all in be a win.

I personally find the new syntax all in less ugly than using infix operators everywhere, but again, tastes vary.

Unit binds aren't great

In particular, because we have no "monadic semicolon" in the syntax, you have to go from this:

  1. <expr1>
  2. >>= fun () ->
  3. <expr2>


  1. let%bind () = <expr1> in
  2. <expr2>

which is not ideal, since it's not parallel to the normal semicolon syntax for this outside of the monad. We've looked at making it possible to do something like:

  1. <expr1> ;%bind
  2. <expr2>

which would be more parallel with ordinary OCaml syntax, but that's not yet possible, and it's not clear it's a net win.

It changes how you think about interleaving in Async

In Async, when you write:

  1. load_something ()
  2. >>= fun x ->
  3. process_thing x

you can think of the point where interleaving can happen as the place where the bind operator is found. With let-syntax:

  1. let%bind x = load_something () in
  2. process_thing x

the location is different, and somewhat less obvious. My experience has been that this was easy to adjust to and hasn't tripped me up in practice, but it's a concern.


A few thoughts on how to use let syntax effectively.

Let syntax for variables, infix for point-free

One might wonder whether there's any use for the infix operators once you are using Let_syntax. I believe the answer is yes. In particular, the style we've adopted is to use let syntax when binding a variable.

  1. let%bind x = some_expression in

and infix operators when going point-free, i.e., when not binding variables.

  1. let v = some_function x >>| ok_exn in

One special case of this is binding unit, where some people prefer to use the following pattern, since we don't have a nice monadic semi-colon yet.

  1. let%bind x = some_operation in
  2. some_other_operation >>= fun () ->
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

rather than:

  1. let%bind x = some_operation in
  2. let%bind () = some_other_operation in
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

Mixing monads

One change we made recently was to add the return function and the monadic infix operators to the Let_syntax module that one opens to choose a monad. This has the useful property of causing one to basically switch cleanly from one monad to another when you open the Let_syntax module. Mixing multiple monads in the same scope is hard to think about.

Command.Param and Deferred

A few interesting cases that come up are mixing the Command.Param syntax with the Deferred syntax. This one is pretty easy to solve, because you don't typically need to mix them together, really. It's just that in the body of the command, you often want Deferred, but in the definition of the command line parser, you want to use Command.Param. This can be handled by doing a local open of Command.Param.Let_syntax or Deferred.Let_syntax as necessary.

Deferred and Deferred.Result

A more complicated case is choosing between the Deferred and Deferred.Result monads. In Async, there are infix operators that let you use both sets of bind and map operators (basically, with question-marks at the end of the ordinary infix operators for the Deferred.Result operators.)

Mixing these operators together in a single scope can be a little awkward, often leaving people to add and remove question-marks until things compile. With let syntax, you really have to pick a single monad, which is easier to read, but then requires some changes in behavior. In particular, you often need to move things from one monad to another. For example, if you're in the Deferred monad and get a result of type Deferred.Or_error.t, you might want to do something like this:

  1. let open Deferred.Let_syntax in
  2. let%bind v = some_operation x y >>| ok_exn in

Here, mapping over ok_exn will take the error and raise it, if necessary. Similarly, if you're using an operation that's in the ordinary Deferred monad but you're operating in the Deferred.Result monad, you might want to lift that operation up, i.e.:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> Deferred.map ~f:(fun x -> Ok x) in

This is something of a mouthful, so we just added the Deferred.ok function, so on our latest release you can write:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> Deferred.ok in

This idiom is useful is useful whether or not you're using let syntax.

ppx_core: context-free rewriters for better semantics and faster compilation

At Jane Street, we have always been heavy users of pre-processors, first with camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, but it save us a lot of time by taking care of a lot of tedious boilerplate code and in some case makes the code a bit prettier.

All in all, our standard set has 19 rewriters:

  • ppx_assert
  • ppx_bench
  • ppx_bin_prot
  • ppx_compare
  • ppx_custom_printf
  • ppx_enumerate
  • ppx_expect
  • ppx_fail
  • ppx_fields_conv
  • ppx_here
  • ppx_inline_test
  • ppx_js_style*
  • ppx_let
  • ppx_pipebang
  • ppx_sexp_conv
  • ppx_sexp_message
  • ppx_sexp_value
  • ppx_typerep_conv
  • ppx_variants_conv

These rewriters fall into 3 big categories:

  1. type driven code generators: ppx_sexp_conv, ppx_bin_prot, ...
  2. inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench
  3. convenience: ppx_sexp_value, ppx_custom_printf, ...

The first category is the one that definitely justify the use of pre-processors, until we get something better in the language itself.

With such a high number of code transformations, there is an important question of how they compose with each other. For instance what happens if the output of a ppx generates some code that is rewritten by another ppx?

Since the switch from camlp4 to ppx a year ago really, category 1 transformers were handled all at once as a whole-AST mapping pass by ppx_type_conv while all the other one were implemented as separate passes. With the previous list that means 13 passes, given that 7 of them are ppx_type_conv plugins. This means that the output depended on the order in which the various passes were applied.

Intuitively, one would think it's not a big deal, given that it is quite rare for a ppx to produce code that would be rewritten by another ppx. Still we ran into several issues over time:

  • Some ppx rewriters - such as ppx_inline_test that rewrites [%%test ...] extensions - captures a pretty-print of their payload, for debugging purposes. Depending on when ppx_inline_test is applied, the payload won't be the same, as it might have been expanded by other ppx rewriters, which is confusing for users.
  • A few ppx rewriters interpret the payload of a specific extension point as a DSL to be interpreted. This is the case of ppx_sexp_value and ppx_sexp_message. If another ppx messed with the payload before them, the result will be unspecified. We had such an issue with ppx_here: inside [%sexp ...], [%here] is interpreted by ppx_sexp_value and ppx_sexp_message and produces "<filename>:<line>:<column>", while outside it is interpreted by ppx_here and produces a record of type Lexing.position

Initially we dealt with these issues by using a specific order in the default set of rewriters, but that's more of a dirty hack than a real solution. Often developers are not aware of this and might end up using a wrong order when using a custom set of rewriters. Moreover this worked because we have control over the order with Jenga, but in opensource packages using oasis, ocamlbuild and ocamlfind we have no control over the final ordering.

But apart from the semantic problems, there is an obvious performance problem: all the transformations are local, but still we are doing 12 passes over the entire AST. What a waste of CPU time!

The different ways of composing ppx rewriters

Before jumping into the subject of this post, we recall a few of the various methods one can use to compose ppx rewriters.

Via separate process

The default method, that was adopted early by the community is to define each transformation as a separate executable. To compose them, one just has to call all the executables one by one. The main advantage of this approach is that each transformation is a black box and can do whatever dirty hacks it wants.

This is what you get when you are using a ppx by just putting the package name in your build system without doing anything special.

Via a driver

Another approach, that we developed at Jane Street is to link all the transformations into a single executable. For this to work properly all transformations must use the same framework. Technically they all register themselves with ppx_driver via a call to Ppx_driver.register_transformation. Ppx_driver is then responsible for composing them.

There are several advantages of the second approach: since ppx_driver has knowledge of all transformations, it can do extended checks such as making sure that all attributes have been interpreted. This helps detect typos, which in practice saves a lot of debugging time. But what really interest us in this post is that it can use more clever composition methods.

Code transformations using ppx_driver can still export a single executable compatible with the first method, that's why all Jane Street ppx rewriters can be used with both methods.

ppx_driver has an ocamlbuild plugin to simplify building custom drivers.

Context free transformations

Given that all transformations are local, it was clear that they should be defined as such; i.e. if all you want to do is turn [%sexp "blah"] into Sexp.Atom "blah", you don't need to visit the whole AST yourself. You just need to instruct whatever framework you are using that you want to rewrite [%sexp ...] extension points.

Context-free extension expander

We started with this idea a few month ago by adding an API in ppx_core to declare context-free extension expanders. For instance, this shows how you would declare a ppx that interpret an extension [%foo ...] inside expressions:

  1. open Ppx_core.Std
  3. let ext =
  4. Extension.declare "foo" Expression
  5. Ast_pattern.(...)
  6. (fun ~path ~loc <parsed-payload...> -> <expansion>)
  8. let () = Ppx_driver.register "foo" ~extensions:[ext]

The Ast_pattern.(...) bit describes what the extension expects as its payload.

Since ppx_driver knows about all the local extension expanders, it can expand them all in one pass over the AST. Moreover it can detect ambiguities and error out in such cases.

There was a choice to make as to whether rewrite the AST in a bottom-up or top-down manner. We choose top-down, to allow extension expanders to interpret their payload before anyone else, and so they can correctly implement a DSL.

This solved most of the initial issues and reduced the number of passes to 7:

  • all extension expanders
  • ppx_type_conv
  • ppx_custom_printf
  • ppx_expect
  • ppx_fail
  • ppx_pipebang
  • ppx_js_style

ppx_expect wasn't initially defined as a context-free extension expander for technical reasons.

Making everything context-free

Recently we went even further and added a Context_free module to Ppx_core to cover all of our transformations. It doesn't support all possible rewriting but support enough to implement a lot of common ones:

  • context-free extension expanders
  • some specific support to implement type-driven code generators
  • support for ppx rewriters interpreting a function application at pre-processing time, such as ppx_custom_printf that interprets !"<format>"

With this we reduced the number of passes to only 2:

  • context free transformations
  • ppx_js_style

ppx_js_style is still done in a separate pass for simplicity. It is run last to ensure we don't generate code that doesn't match our coding rules.

Now, whatever order developers specify their ppx in their build system, they will get the exact same output.

Seeing the exact passes

Ppx_driver got a new option to print what passes it will execute, for instance with ppx-jane which a standard driver containing all of the Jane Street ppx rewriters linked in (available in the ppx_jane package in opam):

$ ppx-jane -print-passes

$ ppx-jane -print-passes -no-check

Safety checks are implemented as additional passes, that's why we see more than one passes by default.


No performance comparison was done when introducing context free extension expanders, but we did some for the second stage, when we changed all ppx rewriters to use Context_free; processing a file with the resulting driver was twice as fast (check passes included).

But how does this compare to the more traditional method of running each rewriter in a separate process? To find out we did some benchmark by taking one of the biggest ml file in core_kernel (src/command.ml) and comparing the two methods. We put a type error on the first line to be sure we stop just after pre-processing.

For reference, following are the numbers for calling ocamlfind ocamlc on the file with no pre-processing:

$ time ocamlfind ocamlc -c command.ml
File "command.ml", line 1, characters 12-15:
Error: This expression has type char but an expression was expected of type

real 0m0.022s
user 0m0.016s
sys  0m0.006s

To preprocess the file with ppx_jane as a single driver executable, one just has to pass one -ppx option, or a -pp option given that ppx_driver can be used either as a -ppx either as a -pp:

# via -ppx
$ time ocamlfind ocamlc \
  -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.095s
user 0m0.074s
sys  0m0.020s

# via -pp
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.091s
user 0m0.066s
sys  0m0.024s

# via -pp, with checks disabled
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.070s
user 0m0.051s
sys  0m0.018s

# via -pp, without merging passes
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.229s
user 0m0.206s
sys  0m0.022s

Using the other method turned out to be quite painful, given that the various ppx cannot share command line arguments, they had to be specified more than once:

$ time ocamlfind ocamlc -package ppx_jane \
  -ppxopt "ppx_inline_test,-inline-test-lib blah -inline-test-drop" \
  -ppxopt "ppx_bench,-inline-test-lib blah -bench-drop" \
  -ppxopt "ppx_expect,-inline-test-lib blah" \
  -c command.ml 2> /dev/null

real 0m0.339s
user 0m0.233s
sys  0m0.098s

So without surprise the single pass in a single executable method is really a lot faster.


This code is available on github. The context-free extension point API is already available in opam. The newer one is only in the git repository for ppx_core and ppx_driver. You can try them out by using our development opam repository. You should have a look at this if you care about how your rewriters are composed and/or if you care about compilation speed.

  • ppx_js_style is not currently released; it is an internal ppx that we use to enforce our coding standards.

Seven Implementations of Incremental

We finally got a decent recording of one of my favorite talks. This one is about our Incremental library (which I wrote about here), and in particular about the story of how we got to the present, quite performant, implementation.

It's not clear from the talk, but the work on this library wasn't done by me: The initial version was implemented by Stephen Weeks and Milan Stanojevic, and most of the intermediate versions were implemented by Nick Chapman and Valentin Gatien-Baron.

The high quality org-mode slides, though, are all me.

OCaml 4.03: Everything else

In my previous post I wrote about Flambda, which is the single biggest feature coming to OCaml in this release. In this post, I'll review the other features of 4.03 that caught my eye.

Inline records

Variants are my favorite thing about OCaml, and in this release, they're getting better. You've always been able to define variants with multiple arguments, e.g.:

  1. type shape =
  2. | Circle of float * float * float
  3. | Rect of float * float * float * float

But, as with this example, it can sometimes be a little hard to figure out what the meaning of the individual fields are, since they don't have labels. We can make this better by replacing our multi-argument variants with single argument variants containing approriately named records, as follows.

  1. type circle = { center_x: float;
  2. center_y: float;
  3. radius: float;
  4. }
  6. type rect = { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }
  12. type shape =
  13. | Circle of circle
  14. | Rect of rect

This works, but the separation of the record type from the variant definition is a little awkward. Beyond that, this approach imposes extra runtime costs. A multi-argument variant takes up just a single heap-allocated block, while a single argument variant containing a record takes two blocks.

With 4.03, we can have the best of both worlds by definining variants containing inline records. Here's what they look like.

  1. type shape =
  2. | Circle of { center_x: float;
  3. center_y: float;
  4. radius: float;
  5. }
  6. | Rect of { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }

And we can write code that uses these types as follows:

  1. let area = function
  2. | Circle c -> 3.14159 *. c.radius *. c.radius
  3. | Rect r -> (r.x_hi -. r.x_lo) *. (r.y_hi -. r.y_lo)

Note, however, that the values r and c aren't quite first-class. For example, we can't use them away from the context of the match statement in which they're found. So this function:

  1. let maybe_rect = function Circle _ -> None | Rect r -> Some r

fails with the error This form is not allowed as the type of the inlined record could escape.

Even with this complexity, inlined records are really convenient.

Another advantage of inline records is that they allow us to express variants with mutable fields. This is useful in a variety of circumstances. In Core, we fake this using Obj.magic for our mutable AVL tree implementation, and new we can remove these hacks. Similar uses of Obj.magic were removed from OCaml's own imperative queue module in this release as well.

Uchar and result

A couple of useful types have been added to the standard library: Uchar.t and result. Uchar.t represents a unicode character, and is effectively just an int under the covers.

The result type is a type used for signaling success or failure, and has the following type.

  1. type ('a,'b) result = Ok of 'a | Error of 'b

Both of these are in some sense trivial, but they're there as a coordination point between different libraries. Lots of OCaml libraries have some analogue of result, and each OCaml Unicode library has its own character type. By including these in the standard library, it provides an easy point of agreement between different external libraries, while adding almost no complexity to the core distribution itself.

Better unboxing for externals

A small but valuable change: it's now possible to write C bindings through which one can pass unboxed versions of a number of different types, including floats, Int64's, Int32's and Nativeint's. This was previously possible only for calls that took only floats and returned a float. This makes it easier to write efficient, zero-allocation C bindings in a wider variety of cases.

GC Latency improvements

I talked about most of this here, so I won't go into great detail now. But the news is that the changes that Damien Doligez did during his stay at Jane Street have finally made it upstream.


To a first approximation, you can think of a GC as a machine that determines what memory can be reclaimed by figuring out what data is reachable, and then reclaiming everything else. The simplest notion of reachability counts everything reachable by following pointers, starting at the GC roots, which are mostly just the values on the call stack.

This notion of reachability is a bit too restrictive, however. In particular, in some circumstances you want to keep pointers to objects without preventing those objects from being reclaimed. This is useful for some kinds of caching, where you want to cache previously computed values for as long as they're referenced, but no longer.

OCaml has long had an answer to this problem, which is the notion of a weak pointer. Weak pointers are references that aren't counted when computing reachability. When an object that is pointed to by a weak reference is collected, the weak reference itself is nulled out.

Weak references are good enough for some purposes (e.g. hash-consing), but they can be awkward for many use cases. One basic use-case for which weak references are an awkward fit is memoizing a function, where one wants to keep entries in the table as long as the input to the function in question is still reachable.

You could imagine just keeping the key of the hash-table in a weak pointer, and then using a finalizer to remove the entry in the table once the key gets collected. But this fails if there is a reference from the output of the function back to the key, which is common enough.

Ephemerons were proposed back in 97 by Barry Hayes to solve just this problem. The idea is pretty simple: an ephemeron has multiple keys and a single data entry. When determining whether the keys are alive, one doesn't count references from values that are reachable only via the ephemeron, so the references from the data back to the key don't keep the key alive. Also, once any of the ephemerons keys are collected, the key and the data element are removed immediately. Weak pointers are now just a special case of ephemerons --- a weak pointer is effectively just an epehmeron with no data.

Ephemerons don't come up that much, but if you need to build a memoization table, ephemerons make the task simpler and the result less leak-prone.

Licenses, Github and OCamlbuild

A few organizational changes are landing in 4.03 that I think are worth noting. First, OCaml development is officially moving from Subversion to Git, with Github as the primary coordination point for OCaml development.

Second, OCaml's license is changing from the somewhat awkward and rarely used QPL, to LGPLv2 with a linking exception. The latter had already been used for various libraries distributed with the compiler. But, acknowledging the fact that more and more of the guts of the compiler are being used as libraries for a variety of reasons, it was recently decided to move everything to LGPLv2.

And finally, ocamlbuild is being moved out of the core OCaml distribution. This is one in a series of decisions to break out software that was previously bundled together with OCaml. This allows the core team to focus more on the core compiler itself, and makes the exported projects freer to make changes on whatever schedule they see fit.

I think keeping the OCaml distribution lean is an excellent strategic decision, allowing the core team to focus on the compiler itself, and allowing tooling and libraries to be worked on independently.

Together, these are all small changes. But they're part of a trend towards making OCaml development simpler and more agile.

What's missing

There are a number of much anticipated features that haven't made it into this release. In particular, the multicore GC, which at one point had been expected to land in 4.03, has been pushed back, likely to 4.04. Algebraic effects, which are in part motivated by the multicore GC, also didn't make it.

And finally, modular implicits, which are intended to provide typeclass-like functionality for OCaml, are not in this release either. That said, they're being actively worked on, and I'm expecting there will be more news about those in the next six months.

All in, it's been a pleasure watching this release take shape. And just as important as what got in is what didn't. Despite a greatly increased rate of change, the process is quite conservative --- nonsense features just don't make it in. My gratitude goes out to the core team for their safekeeping of the language.

A better inliner for OCaml, and why it matters

OCaml 4.03 is branched and a first release candidate is imminent, so it seems like a good time to take stock of what's coming.

This post will focus on just one of those features: Flambda, a new IR (intermediate representation) in the depths of the compiler designed to allow for better inlining, paired with a set of optimizations leveraging that IR.

Why inlining matters

If your expectations about inlining come from a language like C, you might not be all that excited by Flambda. In C, after all, the benefits of inlining are relatively small, mostly allowing one to avoid function call overhead. That's useful, but has limited impact.

In a language like OCaml, inlining is about more than just function call overhead. That's because inlining grows the amount of code that the optimizer can look at a given point, and that makes other optimizations more effective. And there are simply more optimizations available in a language like OCaml than in one like C. In particular, many allocations can be optimized away, which can have a rather big effect on the performance of a program.

To get a sense of how we can eliminate allocations, consider this simple example.

  1. let rec fold l ~init ~f =
  2. match l with
  3. | [] -> init
  4. | hd :: tl -> fold tl ~init:(f init hd ) ~f
  6. let pow_sum l n =
  7. fold l ~init:0 ~f:(fun x y -> pow x n + pow y n)

In the current compiler, every invocation of pow_sum will allocate a closure for the function f. With Flambda, this code should be allocation free, since the fold can be inlined into pow_sum, f can be inlined into the body of the now inlined fold, and the fold can be specialized by adding an extra argument, thus making the closure allocation unnecessary.

Here's another example. Imagine you wanted to sum the first and last elements of a list. You could do this using functions in the list module and a simple pattern match, as follows.

  1. let first_plus_last =
  2. match List.hd l, List.last l with
  3. | None, _ | _, None -> None
  4. | Some x, Some y -> Some (x + y)

Without any inlining, this function creates three heap-allocated values, which are the various options being returned.

Now, let's consider what happens if we rewrite this code using the option monad.

  1. let (>>=) o f =
  2. match o with
  3. | None -> None
  4. | Some x -> f x
  6. let first_plus_last l =
  7. List.hd l >>= fun x ->
  8. List.last l >>= fun y ->
  9. Some (x + y)

Or, equivalently, using our ppx_let syntax:

  1. let first_plus_last l =
  2. let%bind x = List.hd l in
  3. let%bind y = List.last l in
  4. Some (x + y)

Without any inlining, this implementation will create five heap allocated values: one for each of the closures on the right hand side of the bind, one for the options returned from List.hd and List.tl respectively, and once for the final Some.

It's of course possible to write a version that only allocates one object, for the return value.

  1. let first_plus_last l =
  2. match l with
  3. | [] -> None
  4. | x::_ ->
  5. let rec last_exn = function
  6. | [] -> assert false
  7. | [y] -> y
  8. | _ :: tl -> last_exn tl
  9. in
  10. Some (x + last_exn l)

But this is obviously uglier and harder to read than either of the earlier versions.

With Flambda (and the appropriate flags, i.e., (-O3 -unbox-closures), these examples all allocate the same amount. That's because, once the inlining is done and all the code is in one place, the compiler can do things like observe that the option returned from List.hd is immediately deconstructed and is never used for anything else. As a result, the allocation of the options can be removed entirely.

This example highlights why these kinds of compiler optimizations are valuable. It's not that they're strictly necessary for good performance --- usually, pretty good performance can be achieved with sufficiently ugly code. But what good optimizations can do for us is let us write simpler, easier to understand code, without sacrificing performance.

A basis for future optimizations

In addition to the optimizations that will be there when 4.03 lands, Flambda will also make it easier to build new optimizations. One example comes from a project that was done last summer by Will Crichton, one of our interns. The aim of the project was to make the kind of allocation removal I described above more predictable.

From how I described it above, it might seem like Flambda's ability to remove allocations relies on the ability to inline the allocating functions in their entirety. That would be problematic, because what is and isn't inlined can be a bit unpredictable. That's because there have to be some heuristic thresholds, since inlining everything can lead to excessively large executables.

As such, depending on inlining makes it harder to predict the allocation behavior of your code, since it now hinges on the heuristics for determining when a function should be inlined. The situation isn't quite so grim --- some of the optimizations that come along with Flambda (like -unbox-closures) don't depend so critically on inlining. But still, many of Flambda's optimizations do depend on inlining, and as such whether they hit becomes harder to predict.

But we can make this better. The specific project the Will worked on had to do with allocations that come from returning small objects that are immediately deconstructed. With Flambda as it is today, such allocations are only removed if the both the construction and deconstruction of the value are together, which often requires inlining. Will's project made this more robust by changing the calling conventions in Flambda, specifically by adding a first class multi-argument return to the Flambda IR.

With a multi-argument return in place, a function that returns a small tuple, say, could instead be compiled to return the components of the tuple as individual values to be passed back on the stack. Then, this function can be wrapped with a small function that picks those values off the stack and allocates the necessary object.

This small wrapper function is by design small enough to be inlined reliably. Once that inlining is done, both the construction and deconstruction of the value will be visible to the optimizer, and they can be collapsed. This should work whether or not the main function is inlined.

This is really just one example. Our hope and expectation is that Flambda will be the basis of many improvements in the compilation of OCaml over the next few years.


The performance impact we've seen from our experiments with Flambda seem pretty promising so far. On real world applications we've tested, it's pretty normal to see allocations reduced by 20-30%. We've found similarly sized improvements in application latency as well. And we think that these numbers will improve yet more as more optimizations are added on top of Flambda.

Beyond improving existing programs, we already are seeing how Flambda allows us to write prettier and cleaner code without compromising on performance. For example, Flambda allows us to make freer use of OCaml's functors, which are a great abstraction tool, but one that imposed significant costs pre-Flambda. With Flambda, functors can simply be inlined away.

Flambda makes things easier in other ways too. For example, we're in the processing of developing a new internal protocol for zero-allocation messaging, which requires a lot of code generation, via PPX. Being able to rely on Flambda greatly simplifies that code generation, since it lets us write a fairly naive code generator, which generates efficient code because of Flambda. We can write code that computes offsets into the message naively, and Flambda, via a combination of inlining and constant folding, moves the computation of these offsets to compile time, so the runtime field access becomes a simple lookup.

Even if you're perfectly happy with OCaml's performance as is, Flambda is still an exciting change. That's because various upcoming language improvements like modular implicits (a feature that brings some of the same benefits as Haskell's typeclasses) will only really perform acceptably well with a good inliner in place. So in addition to making your current programs run faster, Flambda also enables new abstractions that can make your future programs easier to read and write.

One downside of all this is that it reduces the predictability of OCaml's performance, which is an important property of the language. Part of how we hope to address this is by improving Flambda's heuristics to be more predictable, but that's unlikely to be enough on its own. That's why OCaml 4.03 comes with new annotations that let the programmer require or prohibit inlining for a particular function.

Over time, we hope to find a good balance, where the heuristics do a pretty good and pretty predictable job by default, but where we give programmers hooks to control things more precisely where it matters.


Getting Flambda this far has been a big project, and a lot of people were involved. OCamlPro did a lot of the heavy lifting, with Pierre Chambart writing most of the compiler code, and Louis Gesbert and Michel Laporte working on benchmarking. Jane Street jumped in too, with Mark Shinwell and Leo White contributing mightily to the code and design as part of the review and stabilization process, and Jeremie Dimino helping with testing. And a number of other people on the core team (notably Alain Frisch, Gabriel Scherer, Runhang Li, Damien Doligez) contributed in a variety of ways to the final product. Thanks to all involved!

Self Adjusting DOM and Diffable Data

In my last post, I gave some simple examples showing how you could use self adjusting computations, or SAC, as embodied by our Incremental library, to incrementalize the computation of virtual dom nodes. In this post, I'd like to discuss how we can extend this approach to more realistic scales, and some of the extensions to Incremental itself that are required to get there.

Along the way, we'll discuss the way that diffable data structures relate to self adjusting computations, and how we can use ordinary pure data types and functional updates to those data types as the basis for building efficient incremental computations.

First, let's bring back the code of the example from that post, which is a simple component that displayes the state of a machine in a datacenter.

  1. module Machine = struct
  2. module Model = struct
  3. type display_mode = Temperature | Uptime
  4. type t = { temperature: float;
  5. uptime: Time.Span.t;
  6. selected: bool;
  7. mode: display_mode;
  8. }
  9. [@@deriving fields]
  10. end
  12. let view model =
  13. let mode = Incr.map model ~f:Model.mode in
  14. let temperature = Incr.map model ~f:Model.temperature in
  15. let selected = Incr.map model ~f:Model.selected in
  16. let uptime = Incr.map model ~f:Model.uptime in
  17. let text =
  18. match%bind mode with
  19. | Temperature ->
  20. let%map x = temperature in sprintf "%F degrees" x
  21. | Uptime ->
  22. let%map x = uptime in Time.Span.to_string x
  23. in
  24. let%map text = text and selected = selected in
  25. Vdom.text
  26. (if selected then [Vdom.class_ "selected"] else [])
  27. text
  28. end

It's worth mentioning that in the service of exercising the machinery, I've over-incrementalized this example. It would very likely be cheaper to just generate the necessary vdom in its entirety every time the model changes. That said, the above is still a good example of how to use Incremental, and it's worth understanding how it works.

The first thing that is done by view is to project out different parts of the model into separate incrementals. The key observation is that, even though model changes, if Model.mode model doesn't change, then the corresponding incremental will cut off computation there, preventing dependent computations from refiring. So, whenever the model changes we'll check each field in the model and whether it changed before launching dependent computations.

By default, this cutoff happens based on physical equality; but one can also specify more expansive semantic notions of equality when that's important.

Mapping over maps, incrementally

The projection approach works well for simple fixed-size records, but it doesn't work when it comes to more complex models with variable sized components. To see why, let's consider what happens if we add one more layer, an outer model consisting of a collection of named machines. We do this using OCaml's Map data structure, which is essentially a functional dictionary.

  1. module Model = struct
  2. type t = Machine.Model.t String.Map.t
  3. end

How do we construct an incremental view for this? We can no longer simply project out all of the individual machine models into a fixed set of incrementals, since the number of machines in the map is not known in advance.

What we need here is a way of efficiently mapping over an incremental map. (The naming collision between the Incr.map function and the Map.t data structure is an unforutnate source of confusion here.) Here's a signature that captures what we need:

  1. val Incr.Map.map
  2. : ('k,'v,'cmp) Map.t Incr.t
  3. -> f:('v Incr.t -> 'w Incr.t)
  4. -> ('k,'w,'cmp) Map.t Incr.t

If you're not used to reading OCaml signatures, this can take a moment to digest. First, note that ('k,'v,'cmp) Map.t denotes a map whose key is of type 'k, whose data is of type 'v. (You can ignore the 'cmp parameter.)

So, what does this function do? A good rule of thumb for incremental is that if you want to understand the meaning of an incremental function (ignoring performance), you can just drop all the incremental bits. If we do that for the above type signature, we get this.

  1. val Map.map
  2. : ('k,'v,'cmp) Map.t
  3. -> f:('v -> 'w)
  4. -> ('k,'w,'cmp) Map.t

This type correctly suggests that the function is, as the name suggests, a map, which is to say, a function that transforms the elements of a container, thereby producing a new instance of that container.

That tells us what Incr.Map.map is supposed to compute, but what about the performance of that computation? In particular, what is the structure of the incremental graph this generates?

The following picture illustrates the desired incremental graph. The idea is that there is a path in the graph for every key/data pair in the map. When a given key is updated, an update should be passed down the corresponding incremental path. When a key is added or removed, then paths should be added or removed accordingly. And, rather than recomputing map' from scratch on every update, we want to do the corresponding update, insert or delete into map'.

The incremental graph generated by Incr.Map.map

This function is enough to build an efficient version of our view. In particular, we can write our view function as follows:

  1. let view (model : Model.t Incr.t) : Vdom.t Incr.t =
  2. let%map machine_vdoms = Incr.Map.map model ~f:Machine.view in
  3. Vdom.node "div" [] (Map.data machine_vdoms)

Note that at the end, we do need to construct a complete list of all the displayed machines, and that does have to be reconstructed in its entirety on every change. That's because the virtual dom API uses lists, which can't be updated incrementally in an efficient way. If the virtual dom were modified to take a tree-like structure for nodes, this could be handled efficiently.

That said, I don't think this is a serious practical problem, since in most UIs, the cost of allocating such nodes is a small part of the overall cost of reconstructing the virtual dom.

Diffable data and incrementality

In order to implement Incr.Map.map efficiently, we need some special support both from our Map datatype and from incremental itself. From Map, we need a way to efficiently figure out a set of updates, inserts and deletes that will take us from one map to another. Happily, the Map module in Core_kernel contains a function that can help.

  1. val Map.symmetric_diff
  2. : ('k, 'v, 'cmp) Map.t
  3. -> ('k, 'v, 'cmp) Map.t
  4. -> data_equal:('v -> 'v -> bool)
  5. -> ('k * [ `Left of 'v | `Right of 'v | `Unequal of 'v * 'v ]) Sequence.t

Here, the Left case corresponds to data that's only in the first map, but not in the second, and so is a deletion. Similarly, the Right case corresponds to an addition, and the Unequal case corresponds to an update to the value.

A key property of Map.symmetric_diff is that it takes advantage of physical equality of nodes by short-circuiting. This means that if you take a map, make some modifications to it, and then compute the symmetric diff between the original map and the new one you just created, the computational effort is proportional to the work that was required to generate the second map in the first place. In other words, you can efficiently read back the set of changes applied to a map.

With Map.symmetric_diff, we have a way to efficiently determine what has changed in the map, and thereby which incrementals need to be updated, added or removed. This diffing is reminiscent of both the diffing inherent in the virtual dom approach, and the enter/update/exit pattern that characterizes the D3 visualization library.

Implementing Incr.Map.map

Map.symmetric_diff isn't quite enough to build Incr.Map.map. We also need some extra support from Incremental itself for controlling the pattern of updates so that even though the node containing the input map has many dependent nodes, we only fire the nodes that are impliciated by the symmetric diff. Indeed, we're still working on providing first class support for this kind of primitive within Incremental.

Even without those hooks, we can still take some steps in the right direction, the first of which is to use mutable state to keep the old map to diff against. Essentially, we want a function with this signature:

  1. val Incr.diff_map :
  2. 'a Incr.t -> f:(old:('a * 'b) option -> 'a -> 'b) -> 'b Incr.t

Here, the function f gets access to the old input and old output of this same function when computing the new one. We can implement diff_map straightforwardly enough:

  1. let diff_map i ~f =
  2. let old = ref None in
  3. let%map a = i in
  4. let b = f ~old:!old a in
  5. old := Some (a,b);
  6. b

Using diff_map, we can implement a simpler form of Incr.Map.map which takes a function that transforms the elements of the map directly, rather than incrementally. It would have this signature:

  1. val Incr.Map.simple_map
  2. : ('a, 'b, 'c) Map.t Incr.t
  3. -> f:('b -> 'd)
  4. -> ('a, 'd, 'c) Map.t Incr.t

And here's the implementation.

  1. let simple_map m ~f =
  2. diff_map m ~f:(fun ~old m ->
  3. match old with
  4. | None -> Map.map m ~f
  5. | Some (old_in,old_out) ->
  6. let diff = Map.symmetric_diff ~data_equal:phys_equal old_in m in
  7. Sequence.fold diff ~init:old_out ~f:(fun acc (key,change) ->
  8. match change with
  9. | `Left _ -> Map.remove acc key
  10. | `Right d | `Unequal (_,d) -> Map.add acc ~key ~data:(f d)
  11. )
  12. )

The implementation is simple enough: when the map changes, we diff the old input and the new input. By folding over that diff, we can efficiently update the old output to produce the new output.

Implementing the full Incr.Map.map is more complicated, because the fact that f is a function that consumes and produces incrementals means that you need to construct a more complicated incremental graph --- effectively, you need to split, transform and merge the results.

We can bridge the difference between Incr.Map.simple_map and Incr.Map.map with the following two function.

  1. val Incr.Map.split :
  2. ('a,'b,'cmp) Map.t Incr.t -> ('a,'b Incr.t,'cmp) Map.t Incr.t
  3. val Incr.Map.join :
  4. ('a,'b Incr.t,'cmp) Map.t Incr.t -> ('a,'b,'cmp) Map.t Incr.t

Incr.Map.split takes an incremental map and produces an incremental map whose data is itself incremental. The idea is that the outer incremental updates only when a key is added or removed from the input map. Changes to the input map that change data of an existing key instead lead to an update to the corresponding inner incremental. Incr.Map.join is simply the inverse operation, taking an incremental map full of incrementals, and producing a simple incremental map.

Using these two functions together, we can convert our simple_map into a full implementation of Incr.Map.map, as follows.

  1. let map m ~f =
  2. Incr.Map.join (simple_map ~f (Incr.Map.split m))

While there are hacks to approximate them using Incr.map and Incr.bind, the ordinary incremental interface isn't enough to build a version of Incr.Map.join and Incr.Map.split with the right performance characteristics. For that reason, we've started work on an Expert interface within Incremental that lets you create incremental nodes with precise control over their dependencies and when they refire.

Why virtual-dom?

Almost everyone that I've discussed these ideas with, both inside of Jane Street and beyond, have asked the question: given that you have Incremental, why bother with using a virtual dom at all? After all, Incremental provides a mechanism for tracking which parts of the computation have changed. WHy should one have two layers that are doing something so similar?

We've made things even more duplicative by diffing not just the virtual dom but also our own internal data structures as a way of creating more efficient incremental computations on top of ordinary functional data types.

But despite the seeming redundancy of this approach, I think it has real value. That's because, although the optimizations provided by Incremental are valuable, programming with Incremental is a worse experience than doing ordinary functional programming with immutable values. The big win of virtual dom is that it lets you do most of your programming in a typical functional style with reasonable performance. You only need to resort to using Incremental when the performance provided by this simple approach isn't sufficient.

But diffability isn't just useful on the output of your incremental computation. It's also helpful on the inputs, and that's where the diffability of Core_kernel's maps comes in handy. That way, we can write our code for updating our model in an ordinary functional style, but still use incremental to efficiently scaffold our view functions on top.

Summing up

We have some beginning versions of all of this infrastructure internally at this point. In particular, we've built fast versions of Incr.Map.map and friends, and used this for creating some simple dynamic web applications. The results so far are pretty promising. By paying some careful attention to how the input data changes and incrementalizing accordingly, we can build applications that display thousands of rows of data of which hundreds of cells are changing every second with relatively low latency, able to respond to every update in just a handful of milliseconds.

I doubt our performance is as good as more widely used and likely better optimized frameworks like React and Mercury. But the thing that has struck me about all of this work is how easy and natural it has been. With only a small amount of effort, we've been able to leverage existing tools, from Incremental to Async to virtual-dom, to write highly responsive applications in a simple and natural style, and one that fits naturally within our existing OCaml libraries and tools.

Another interesting question is how this all relates to the use of FRP for UIs. I've written a bit about the difference between FRP and SAC, but the short version is that FRP is more about modeling temporal processes, and SAC is purely about optimization.

For my perspective, SAC fits more cleanly into the goal of simply rendering virtual-dom, and I can't see as much value in having time-oriented operations as a first-class construct. And the fact that SAC is focused purely on optimization allows it to do a better job of that. In particular, the dynamism of operations like bind makes it possible to precisely control the dependency structure of the system, and do things like quiesce parts of the UI that are not in view. This is harder to do in an FRP system without introducing memory leaks.

All of which makes me think that self adjusting computation is a useful idiom for UI programming, and perhaps one that should be adopted more widely.

Self Adjusting DOM

I've been thinking recently about how to structure dynamic web applications, and in particular about the role that incremental computation should play.

In this post, I'd like to outline an approach we've been experimenting with internally which uses Incremental, a general purpose library for building so-called self adjusting computations. Self adjusting computations are essentially graph-structured computations that can be updated efficiently when their inputs change.

I'll describe this all in terms of OCaml, which is the language we're doing these experiments in (courtesy of the excellent js_of_ocaml), but the ideas should be applicable to other languages.

Elm in OCaml

Let's start by outlining a simple, non-incremental approach, inspired by the Elm Architecture. In that architecture, there are three primary elements from which an application is composed: the model, the view and a set of actions.

The model represents the logical state of the application. This includes things like the data being displayed on the page, the current page being displayed, and the part of the page that's in focus. But it omits most presentation details, and doesn't specify the concrete choice of dom elements used.

Actions are the things that can happen to the model. The arrival of new data to be displayed, or the click of a mouse, are translated into actions which are in turn interpreted to update the state. These are important, but don't affect the incrementality story much, so we won't discuss them in detail.

The view is a function that takes the current model and produces a vdom (virtual dom) tree, an immutable datastructure that represents the intended state of the dom. It is the view function that makes the concrete presentation choices that are left unsaid by the model. The view determines some of the dynamic behavior of the application as well, since within the vdom you can register callbacks on keypresses or mouse clicks that enqueue actions to later be applied to the state.

We can wrap this up into a module signature, as follows.

  1. module type App_intf = sig
  2. module Model : sig
  3. type t
  4. end
  6. module Action : sig
  7. type t
  8. val apply : t -> Model.t -> Model.t
  9. end
  11. val view : Model.t -> (Action.t -> unit) -> Vdom.t
  12. end

Note that the second argument to view is a function that can be used for scheduling actions to occur. This function is used to build callbacks which are attached to the vdom.

We can combine this with a simple interface for starting up an application:

  1. val start_app
  2. : (module App_intf with type Model.t = 'model)
  3. -> init:'model
  4. -> unit

which is responsible for running the display loop.

This isn't quite enough; in particular, I've omitted the necessary hooks for kicking off asynchronous processes for doing things like grabbing data from a server. But the model above is a pretty good start, and gives you a sense of the structure of an Elm-style application.

This approach isn't too bad from a performance perspective. In particular, the on_startup function minimizes the amount of churn to the dom by, on every action, diffing the newly generated vdom against the previous instantiation, to produce a minimal patch to apply to the dom proper. And modifications to the dom itself are quite expensive.

But this approach doesn't scale to big, complex UIs. That's because, even though most actions change the model in only small ways, the full vdom tree has to be created every time. If the vdom is large and complicated, this is a serious problem.

Incremental Basics

Let's see how we can use Incremental to optimize generation of the vdom.

Imagine we want to display information about the state of a set of machines in our datacenter. For each machine, assume we have two pieces of data: the temperature, and whether that particular server is currently selected. Now, given incremental values representing the data for just one machine, how can we use Incremental to produce the vdom?

Incremental provides operators for building computations on top of incremental values. For example, the function Incr.map2 has this signature.

  1. val Incr.map2 : 'a Incr.t -> 'b Incr.t -> f:('a -> 'b -> 'c) -> 'c Incr.t

This lets us build a computation that takes two incremental inputs, and combines them to make a single new incremental. We can use this for generating an incremental vdom to display our machine.

  1. let view temperature selected =
  2. Incr.map2 temperature selected
  3. ~f:(fun temperature selected ->
  4. Vdom.text
  5. (if selected then [Vdom.class_ "selected"] else [])
  6. (sprintf "%F degrees" temperature))

We can write this a bit more naturally using the ppx_let syntax extension. Essentially, ppx_let allows us to encode maps using ordinary let binding syntax.

  1. let view temperature selected =
  2. let%map temperature = temperature and selected = selected in
  3. Vdom.text
  4. (if selected then [Vdom.class_ "selected"] else [])
  5. (sprintf "%F degrees" temperature)

The key issue here is that the code regenerating the text node will only be rerun when necessary, i.e., when the value of either selected or temperature have changes. In a complex view with lots of incremental inputs and lots of vdom nodes built on top of them, judicious use of map allow you to recompute only the vdom that are in need of an update.

Using bind

It turns out that map isn't enough. One limitation of map is that the dependencies introduced by map are static. e.g. Incr.map2 a b ~f will produce a node that reruns f every time a or b change.

But sometimes, you want dynamic rather than static dependencies. For a trivial example, imagine that our machine view has different inputs that might be used in different situations, say, there's a mode that determines whether uptime or temperature are displayed. We could implement such a view on top of map as follows.

  1. let view temperature uptime selected mode =
  2. let%map temperature = temperature
  3. and uptime = uptime
  4. and selected = selected
  5. and mode = mode
  6. in
  7. Vdom.text
  8. (if selected then [Vdom.class_ "selected"] else [])
  9. (match mode with
  10. | Temperature -> sprintf "%F degrees" temperature
  11. | Uptime -> Time.Span.to_string uptime)

Here, the appearance of the dom node is dynamic, but the dependencies are not. Even if you're in Temperature node, you'll still recompute the Vdom when the uptime changes.

On this small scale, the extra cost is trivial. But as you consider larger more complicated views, the ability to control dependencies precisely can have real value.

In this case, we can control the dependencies using bind. Here's the signature of bind:

  1. val bind : 'a Incr.t -> ('a -> 'b Incr.t) -> 'b Incr.t

The signature is deceptively simple, but it lets you do something powerful. In particular, Incr.bind i f returns an incremental whose dependencies, and even whose interior nodes, are chosen dynamically by f.

Here's a simple rewrite of the code above that takes advantage of bind.

  1. let view temperature uptime selected mode =
  2. let text =
  3. Incr.bind mode (function
  4. | Temperature ->
  5. let%map x = temperature in sprintf "%F degrees" x
  6. | Uptime ->
  7. let%map x = uptime in Time.Span.to_string x
  8. )
  9. in
  10. let%map text = text and selected = selected in
  11. Vdom.text
  12. (if selected then [Vdom.class_ "selected"] else [])
  13. text

Here, bind lets us create a text incremental that depends either on the temperature or on the humidity, depending on the mode. We can write this with our syntax extension, using its specialized match syntax.

  1. let view temperature uptime selected mode =
  2. let text =
  3. match%bind mode with
  4. | Temperature ->
  5. let%map x = temperature in sprintf "%F degrees" x
  6. | Uptime ->
  7. let%map x = uptime in Time.Span.to_string x
  8. in
  9. let%map text = text and selected = selected in
  10. Vdom.text
  11. (if selected then [Vdom.class_ "selected"] else [])
  12. text

One thing that's nice about ppx_let is that it makes it easier to separate thinking about what your code does from how it's incrementalized. If you ignore the %map and %bind annotations, what you're left with is enough to understand the meaning of the computation that's being incrementalized. The annotations are only important for understanding the performance characteristics of the incremental recomputation.

Decomposing incrementals

Here's an updated version of our App_intf which includes Incremental.

  1. module type App_intf = sig
  2. module Model : sig
  3. type t
  4. end
  6. module Action : sig
  7. type t
  8. val apply : t -> Model.t -> Model.t
  9. end
  11. val view : Model.t Incr.t -> (Action.t -> unit) -> Vdom.t Incr.t
  12. end

The only change is the view function, which instead of taking a Model.t and returning Vdom.t now takes a Model.t Incr.t and returns a Vdom.t Incr.t. And instead of calling this function on every update, we simply call it once at the beginning. The start_app function would be responsible for repeatedly updating the Model.t Incr.t as the model changes, and can then read off the new vdom node from the Vdom.t Incr.t.

This all sounds good on the surface, but there's a fly in this ointment, which is that my earlier examples were built on the assumption that the different inputs to the computation were already broken down into separate incremental values. But here, we have one big incremental value containing the entire model. How do we apply Incremental effectively in this case?

Let's revist our server-display example from before. Now, instead of assuming we have a different incremental for each property of a server, imagine we have one incremental representing the full state of one server. We can use the following as our model type:

  1. module Model = struct
  2. type display_mode = Temperature | Uptime
  3. type t = { temperature: float;
  4. uptime: Time.Span.t;
  5. selected: bool;
  6. mode: display_mode;
  7. }
  8. [@@deriving fields]
  9. end

The deriving annotation above provides us with accessor functions for each field, which will be useful shortly.

It's easy enough to write a view function that returns an incremental that recomputes in its entirety every time the model changes, as shown below.

  1. let view model =
  2. let%map model = model in
  3. let text =
  4. match model.mode with
  5. | Temperature ->
  6. sprintf "%F degrees" model.temperature
  7. | Uptime ->
  8. Time.Span.to_string model.uptime
  9. in
  10. Vdom.text
  11. (if model.selected then [Vdom.class_ "selected"] else [])
  12. text

And for this tiny example, that's likely the correct thing to do. But we'll want to incrementalize more precisely for real world examples, so let's see how we could do that in this case.

What we effectively need to do to incrementalize this is to convert our one big incremental model into a number of smaller incrementals. We can do that by projecting out individual components of the model using map.

For example, if we write:

  1. Incr.map model ~f:(fun m -> m.mode)

or, equivalently

  1. Incr.map model ~f:Model.mode

We'll get an incremental that contains only the mode. Critically, incrementals that are depend on this one will only update when the mode actually changes, not, say, the temperature is updated. That's because Incremental cuts off computations when the new output is physically equal to the old one, so that even if the model changes, each projected incremental will only propagate the computation if its data has changed.

We can use this approach to first project out the fields of the model record into different incrementals, and then build out computation on top of that. That looks like this:

  1. let view model =
  2. let mode = Incr.map model ~f:Model.mode in
  3. let temperature = Incr.map model ~f:Model.temperature in
  4. let selected = Incr.map model ~f:Model.selected in
  5. let uptime = Incr.map model ~f:Model.uptime in
  6. let text =
  7. match%bind mode with
  8. | Temperature ->
  9. let%map x = temperature in sprintf "%F degrees" x
  10. | Uptime ->
  11. let%map x = uptime in Time.Span.to_string x
  12. in
  13. let%map text = text and selected = selected in
  14. Vdom.text
  15. (if selected then [Vdom.class_ "selected"] else [])
  16. text

And this has basically the right incremental structure.

This is a good start, but we still don't really have the full story. In particular, the above approach to incrementalization makes sense when our overall data is organized as simple static structures like records. But it's not clear what to do when we have more complex data structures. For example, what if we had a collection of machines stored in a map or a set? We don't yet have a way of efficiently handling this kind of complex abstract data type using Incremental.

More on that in my next post.