How to Build an Exchange

We're about to do the first of what will hopefully become a series of public tech talks in our NY office.

The first talk is on February 2nd, and is an overview of the architecture of a modern exchange. The talk is being given by Brian Nigito, and is inspired by our work on JX, a crossing engine built at Jane Street. But Brian's experience is much broader, going all the way back to the Island ECN, which in my mind marks the birth of the modern exchange.

I did some work on JX, and one of the things I was struck by is the role that performance plays in the design. In particular, JX uses a simple replication scheme based on reliable multicast that relies critically on the components of the system having high throughput and low, deterministic latencies.

This is a situation where performance engineering is done not so much for reducing end-to-end latency, but instead to act as a kind of architectural superpower, making it possible to build systems in a simpler and more reliable way than would be possible otheriwse.

Anyway, I think it's a fascinating topic. If you're interested in coming, you can go here to get the details and sign up. (We have a registration step so we can get people through building security.)

A brief trip through Spacetime

Spacetime is a new memory profiling facility for OCaml to help find space leaks and unwanted allocations. Whilst still a little rough around the edges, we've found it to be a very useful tool. Since there's not much documentation for using spacetime beyond this readme, I've written a little intro to give people an idea of how to use it.

Generating a profile

As an example of Spacetime in action let's get a profile for the js_of_ocaml compiler. First we'll need a Spacetime-enabled OCaml compiler:

$ opam switch 4.04.0+spacetime
$ eval `opam config env`

Using this compiler we build the executable. In this case we just let opam build it for us:

$ opam install js_of_ocaml

Now we run the executable, using the environment variable OCAML_SPACETIME_INTERVAL to turn on profiling and specify how frequently Spacetime should inspect the OCaml heap (in milliseconds).

$ OCAML_SPACETIME_INTERVAL=1000 js_of_ocaml core_kernel.cma

Executables with Spacetime enabled run more slowly and use more system memory than usual, but the contents of the OCaml heap should be unaffected.

Running the executable produces a file in the current directory:

$ ls | grep spacetime

8045 was the pid of the process. If your executable forks then there may be multiple Spacetime files.

Now that we have a Spacetime profile, we need to install the prof_spacetime profile viewer:

$ opam switch 4.04.0
$ eval `opam config env`
$ opam install prof_spacetime

Next we process the profile:

$ prof_spacetime process -e .opam/4.04.0+spacetime/bin/js_of_ocaml spacetime-8045
Processing series...done

$ ls | grep spacetime

This can take a while – a couple of minutes in this case. The -e option is used to pass prof_spacetime the executable that produced the profile. This option is not strictly necessary, but without it the profile wouldn't include locations for C code.

Web viewer

Now we are ready to look at the profile, we'll start by using the web viewer:

$ prof_spacetime serve -p spacetime-8045.p
Processing series...done
Serving on

Live words

Navigating to the appropriate address in a browser, we are greeted by an exciting colourful graph:

Spacetime graph

This graph shows the number of live words in the program over time, divided up by the source location at which the words were allocated. If we place our mouse over a section of the graph it will display the source location for that section:

Spacetime mouseover

and by clicking on a section we are taken to a new graph:

Spacetime subgraph

This graph contains only those live words allocated at the clicked source location. It is divided up by the source location of the call to the function which performed the allocation – i.e. the next frame up the backtrace. This is a key feature of Spacetime: not only can you see that these words were allocated by, you can see which call to allocated them. By continuing to click on these graphs we can get the entire backtrace when some words were allocated:

Spacetime subgraph

Clicking on the (top of stack) link returns us to the original graph.

Allocated words

The live graphs ("Live words" and "Live blocks") are useful for locating space leaks. For removing unwanted allocations, the allocation graph is more useful. By clicking the "All allocated words" link we are shown a new graph:

Spacetime subgraph

This graph shows the cumulative total of allocations in the program, divided up by the source location of those allocations. Holding your mouse over a section will display the location of that section. Clicking on a section will take you to a new graph containing only the allocations from that section, divided up by the location of the next frame up the backtrace.

Terminal viewer

Now we'll try the terminal viewer:

$ prof_spacetime view -p spacetime-8045.p

which launches us into a lambda-term style terminal view:

Spacetime subgraph

This shows the live words at a particular time (1.017844s) in the program's execution divided up by the source location at which the words were allocated. The ← and → keys move between different points in time. The ↑ and ↓ keys select different rows. Pressing return on a row loads a new view:

Spacetime subgraph

This shows the live words allocated at the selected source location (memory.c: 552) divided up by the source location of the call to the function containing the allocation – i.e. the next frame up the backtrace. Pressing backspace takes us back to the previous view.

Finally, pressing tab switches between the three different modes: live words, live blocks and allocated words. Use the q key to exit the viewer.

A solution to the ppx versioning problem

Ppx is a preprocessing system for OCaml where one maps over the OCaml abstract syntax tree (AST) to interpret some special syntax fragments to generate code.

Ppx rewriters get to work on the same AST definition as the compiler, which has many advantages:

  • The AST corresponds (almost) exactly to the OCaml language. This is not completely true as the AST can represent programs that you can't write, but it's quite close.

  • Given that the compiler and pre-processor agree on the data-type, they can communicate between each other using the unsafe [Marshal] module, which is a relatively cheap and fast way of serializing and deserializing OCaml values.

  • Finally, the biggest advantage for the user is that the locations in the original code are exactly preserved, which is a requirement to get usable error messages. This is not so great for the generated code, as the best one can do is reuse some locations from the original source code and hope for the best. In practice the user sometimes gets non-sensical errors, but this is a commonly accepted trade-off.

There is however one drawback to all this, the compiler AST is not stable and code using it is at the mercy of its evolution. We got lucky with the 4.04 release of OCaml but the 4.03 one was quite disruptive. Even before releases, whenever the AST definition changes during a development cycle, many ppx might not be usable for a while, which make testing a lot harder.

Several ideas have been flying around, such as adding a layer to convert between different versions of the AST. While this would work, it has the drawback that you need this layer for every variant of the AST. And when you want to make a patch modifying the AST, you'll need to do the extra work of updating this layer first.

In this blog post we show how we managed to solve the ppx compatiblity problem in a way that improves the user experience and lets us produce releases that don't depend on ppx rewriters at all.

We did this work while working on Base, our upcoming standard library. In the end, it's likely we'll use only the third of the methods described below for Base, while the others will be used to improve the user experience with the rest of our packages.

What do other code generators do?

Ppx is not the first system for generating code that is a mix of user-written code and machine generated code. A typical class of generators that get it right, i.e. that preserve locations and are independant from the AST definition, are lexer/parser generators, and not only the ones distributed with the compiler.

Let's take the example of lexer generators (parser generators work basically the same way). The users write a series of rules, consisting of a rular expression and an action to take if the input matches it:

rule token = parse
| "+"  { PLUS  }
| "-"  { MINUS }
| '0'..'9'+ as s { INT (int_of_string s) }
| " "* { token lexbuf (* skip blanks *) }

This code is written in a .mll file and the generator then produces a .ml file with code for the lexing engine interleaved with the user written actions.

In order to keep the locations of the user-written code pointing to the right place in the .mll file, the generator produces:

# 42 "lexer.mll"
         token lexbuf (* skip blanks *)

The OCaml compiler interprets the line starting with a # and updates its current location to point to the begining of line 42 in file lexer.mll. This is called a line directive.

To go back into the generated code, the lexer generator produces:

# 100 ""

Where 100 correspond to the real line number in

With this method, when there is an error in the user-written code, it points to the lexer.mll file, while when there is an error in the generated code it points to the file. Even if the generated code might not be particularly easy to understand, at least you get to see the real code the compiler chokes on.

Another big advantage is that when using a debugger, you can follow the execution through the generated code.

Can we do the same for ppx?

At first glance, it seems that ppx rewriters work in a very different way, but the result is the same: only parts of the file is generated and the rest is taken as if from what the user wrote. In fact, compared to the lexer case, most of the resulting code is user written.

There is however some work to do to get the same result as with lexer generators. First you have to distinguish the generated code from the user code.

If you take a ppx rewriter as a black box, then the only way is to apply some kind of tree diff between the input and the output. In our ppx framework however, we know exactly what fragments of the AST are rewritten by plugins and we know the rewriting is always local. This makes the job a lot simpler and probably faster as well, so we chose to take advantage of this information.

The method

It works this way: while mapping the AST, we collect all the fragments of generated code with the location of the code they replace in the original file. At the end we sort them in the order of the file and make sure there is no overlap. Every fragment is pretty printer to a string.

What we end up with is a list of text substitutions: beginning position, end position, replacemen text. The next step is to simply apply these substitutions to the original file. If you read the bog post about how we switched from camlp4 to ppx, you'll notice the resemblance here.

This is what the transformation looks like:

(* ----- input ----- *)
type t = int [@@deriving sexp_of]

let f x = x + 1

let g x = [%sexp_of: t] x

(* ----- output ----- *)
# 1 ""
type t = int [@@deriving sexp_of]
# 4 ""
let sexp_of_t = sexp_of_int
#2 ""

let f x = x + 1

let g x =
# 11 ""
# 5 ""

The result for [@@deriving sexp_of] is not bad at all. For code rewritten inside expressions, the result is not as good given that it breaks those expressions up. But given than extensions are often sparse in our source files, this is still acceptable.

This mode can be selected with ppx_driver based rewriter by passing the flag -reconcile.

Solving the compatiblity problem

With this mode, one can first generate a .ml.pp file and feed that to the compiler. Given that the concrete syntax of the language breaks much less often than the internal AST definition, a working ppx is likely to work for a very long time.

We'll soon start releasing a separate package that snapshots one version of the lexer/parser/AST/printer of the OCaml compiler. This package will have its own release schedule and will typically be updated soon after each relase of OCaml. This will give time for ppx authors to upgrade their code when it breaks while still allowing people to try out the new compiler with their favorite packages.

Mode for release tarballs

In addition to the mode described above, ppx_driver has a second mode -reconcile-with-comments where the result is similar to the one with line directives expect than the generated code is enclosed with comment markers:

type t = int [@@deriving sexp_of]
let sexp_of_t = sexp_of_int

let f x = x + 1

let g x =

This mode is intended for release tarballs. One can replace all the files in-place by the pre-processed version using =-reconcile-with-comments=. The result is readable and has the big advantage that you you don't need to depend on the ppx rewriter, which means the package is faster to install for users.

Jane Street packages will eventually move to this scheme, either for the next stable release or the one after that. One technical issue with this method is that to take full advantage of it, the runtime support libraries of the various ppx rewriters must be installable without the rewriter itself. Splitting the packages in opam is fine but splitting the repository is not desirable as often both components make strong assumption about the other.

For Jane Street packages, we'll need to update our release system so that it supports generating two opam packages from one repository.

ppx as a verfication tool only

While these new methods improve the ppx story in general, for Base we wanted to go even further and allow users to build Base without the need for ppx at all, both for the release and for the development versions. Not only to cut down the dependencies, but also to provide a better experience in general. For instance if you are working on a patched compiler and need the development version of Base, you shouldn't need all the ppx rewriters that might not work for some reason.

We explored various bootstrap story, and while they worked they were not very nice, especially for such an important library. Its development and build processes should be straightforward.

We even looked into not using ppx at all. While this is OK for many ppx rewriters that are mostly syntactic sugars, it is more problematic for [@@deriving ...]. It's not so much that the code is hard to write by hand, most data-structures in Base are either simple datatypes or require and written combinators anyway, but it is a pain to review. This code is very mechanical and you have to make sure that the constant strings correspond to the constructor/field names and other things where the machine can do much better than a human.

In the end we found a solution to keep the best of both worlds, /i.e./ being able to build the original source code without pre-processing and avoid having to write and review this boilerplate code.

The idea is to use ppx in the same way that we write expect tests; the tool only checks that what's comes after the type-definition correspond to what the rewriters derive from it. In case of mismatch it produces a .corrected file just like expect tests.

We are currently experimenting with this method for Base. It's possible that we'll have some marker to delimit the end of the generated code. In the end the code could look like this:

type t = A | B [@@deriving sexp_of]

let sexp_of_t = function
  | A -> Sexp.Atom "A"
  | B -> Sexp.Atom "B"


Given that the compiler ignores attributes it doesn't understand, this code compiles just fine without any pre-processing.

When running the ppx rewriter in this expect mode, the generated AST is matched against the source AST without taking locations into account, so that mean that you can reformat the code as you wish and even add comments.

The challenge now is to update our ppx rewriters so that they produce code that we are happy to show. Until now we didn't focus too much on that, but we have a good idea about how to do it. The plan is to move more of the logic of the various deriving system into proper functions instead of generating more code. Note that this is an improvement in general as proper functions are a lot easier to understand and maintain than code generators.


In this blog post we described a simple and clean method to decouple ppx rewriters from the release schedule of the compiler. This method has the advantage that once the ppx is written is likely to work for a long time and especially to work out of the box with development compilers.

Moreover, this method has is better for users as errors point to the real code the compiler sees and when debugging they can follow the execution through generated code without trouble.

All this is currently implemented in ppx_core/ppx_driver. Our github repositories haven't been updated in a while has the Base refactoring disrupted our public release process quite a bit. These new features should be published in the coming weeks and we'll be part of the next stable release of our packages, planned for the beginning of December.

Observations of a functional programmer

I was recently invited to do the keynote at the Commercial Users of Functional Programming workshop, a 15-year-old gathering which is attached to ICFP, the primary academic functional programming conference.

You can watch the whole video here, but it's a bit on the long side, so I thought an index might be useful. (Also, because of my rather minimal use of slides, this is closer to a podcast than a video...)

Anyway, here are the sections:

Hope you enjoy!

What the interns have wrought, 2016

Now that the interns have mostly gone back to school, it's a good time to look back at what they did while they were here. We had a bumper crop -- more than 30 dev interns between our London, New York and Hong Kong offices -- and they worked on just about every corner of our code-base.

In this post, I wanted to talk about just one of those areas: building efficient, browser-based user interfaces.

Really, that's kind of a weird topic for us. Jane Street is not a web company, and is not by nature a place that spends a lot of time on pretty user interfaces. The software we build is for our own consumption, so the kind of spit and polish you see in consumer oriented UIs are just not that interesting here.

But we do care about having functional, usable user interfaces. The work we do is very data driven, and that rewards having UIs that are good at presenting up-to-date data clearly and concisely. And we want to achieve that while keeping complexity of developing these UIs low.

Historically, almost all of our UIs have been text-based. While I love the command line, it does impose some limitations. For one thing, terminals don't offer any (decent) graphical capabilities. But beyond that obvious constraint, getting all the pieces you need for a decent UI to work well in a terminal, from scrolling to text-entry widgets, requires a lot of work that just isn't necessary in a browser.

So this year, we've finally started pushing to make it easy for us to write browser-based applications, in particular relying on OCaml's shockingly good JavaScript back-end. This has allowed us to write web applications in OCaml, using our usual tools and libraries. As I've blogged about previously, we've also been exploring how to use Incremental, a framework for building efficient on-line computations, to make browser UIs that are both pleasant to write and performant.

That's roughly where we were at the beginning of the summer: some good ideas about what designs to look at, and a few good foundational libraries. So the real story is what our interns, Aaron Zeng and Corwin de Boor, did to take us farther.

Web Tables

Aaron Zeng's project was to take all of the ideas and libraries we'd been working on and put them to use in a real project. That project was web-tables, a new user-interface for an existing tool developed on our options desk, called the annotated signal publisher. This service provides a simple way for traders to publish and then view streams of interesting tabular data often based on analysis of our own trading or trading that we see in the markets.

The publisher fed its data to Catalog, our internal pub-sub system. From Catalog, the data could be viewed in Excel, or in our terminal-based Catalog browser. But neither of these approaches worked as well or as flexibly as we wanted.

Enter web-tables. What we wanted was pretty simple: the ability to display tabular data from the annotated signal publisher with customizable formatting, filtering and sorting. This involved breaking a lot of new ground, from figuring out how do the sorting and filtering in an efficient and incremental way, to fixing performance issues with our RPC-over-websockets implementation, to figuring out a deployment and versioning story that let people easily create and deploy new views of their data.

One of the great things about the project is how quickly it was put into use. The options guys started using web-tables before the project was even really finished, and there was a tight feedback loop between Aaron and his mentor Matt Russell, who was using the tool on a daily basis.

Optimizing rendering

Aaron's web-tables work used incr_dom, a small framework that sets up the basic idioms for creating UIs using Incremental. As part of that work, we discovered some limitations of the library that made it hard to hit the performance goals we wanted to. Corwin de Boor's project was to fix those limitations.

The key to building an efficient UI for displaying a lot of data is figuring out what work you can avoid. To this end, Corwin wanted to build UIs that logically contained thousands or even millions of rows, while only actually materializing DOM nodes corresponding to the hundred or so rows that are actually in view.

In order to figure out which nodes to render, he had to first figure out which nodes would be visible, based on the location of the browser's viewport. This in turn required a way of looking up data-points based on where that data would be expected to render on the screen.

Corwin did this by building a data structure for storing the expected height of each object that could be rendered, while allowing one to query for a node based on the sum of the heights of all the nodes ahead of it. He did this by taking an existing splay tree library and rewriting it so it could be parameterized with a reduction function that would be used to aggregate extra information along the spine of the splay tree. By integrating the reduction into splay tree itself, the necessary data could be kept up to date efficiently as the splay tree was modified.

Corwin also spent a lot of time improving incr_dom itself, taking inspiration from other systems like React and the Elm language. We even corresponded a bit with Jordan Walke and Evan Czaplicki, the authors of React and Elm respectively.

One thing that came out of this was a neat trick for making the incr_dom API cleaner by using a relatively new feature of OCaml called open types. The details are a little technical (you can see the final result here and here), but I think what we ended up with is a bit of an advance on the state of the art.

There were a lot of other bits and pieces, like improving the handling of keyboard events in js_of_ocaml, creating a new incremental data-structure library called incr_select for more efficiently handling things like focus and visibility, and restructuring the incr_dom APIs to make them simpler to understand and open up new opportunities for optimization.

By the end, Corwin was able to build a demo that smoothly scrolled over a million rows of continuously updating data, all while keeping update times below 5ms. We're now looking at how to take all of this work and feed it into real applications, including Aaron's work on web-tables.

Sound like fun?

If this sounds like interesting stuff, you should consider applying for a summer internship with us!

Jane Street internships are a great learning experience, and a lot of fun to boot. You even get a chance to travel: interns get to visit Hong Kong, London or New York (depending on where they started!) as part of their summer.

And the projects I described here are really just a taste. Here are some other projects interns worked on this summer:

  • Building a low-latency order router.
  • Adding data representation optimizations to the OCaml compiler
  • A service for simulating a variety of network failures, to make it easier to test distributed applications.
  • Making it possible to write Emacs extensions in OCaml (this was actually Aaron Zeng's second project)

and that's still just a small sample. One thing I love about the work at Jane Street is the surprising variety of problems we find ourselves needing to solve.

Unraveling of the tech hiring market

Recruiting talented people has always been challenging.

In some years that meant competing with a hot new company that aggressively courted every fresh graduate with promises of stock options and IPO glory.  In other years there wasn’t a specific company so much as an entire rising industry looking for people (I’m looking at you cloud services, driverless cars, and peer-to-peer sharing).  Either way, we understood the yearly back and forth.  Our job was to explain to candidates how we stacked up, and more importantly, why a career at Jane Street might be the right choice for many of them.

But this year I got to learn a new name for a new challenge.  “Unraveling”.

I first encountered it in a book I was reading for fun: "Who Gets What, and Why", by the Nobel Prize-winning economist Alvin Roth.  He does a lovely job explaining the idea of a matching market.  In a matching market each person wants only one of each item, each item is unique, and each item can be given to at most one person at a time.  Jobs are a classic matching market, and just like any market, matching markets can work well, or poorly.

Unraveling is one of the primary things that makes a matching market fail.  When a market unravels  matches start to happen earlier and earlier, to the point where people no longer get a complete view of their options.  In the book Roth relates the story of a person who stepped off of a plane to find three voicemails on his phone.  The first offered him a job, the second urged him to respond soon, and the last rescinded the offer because he hadn't responded quickly enough.

We call them exploding offers, and this year they have gotten completely out of hand as companies race to the bottom in their efforts to recruit the next wave of interns and fresh graduates.

Colleges try to impose deadline limits explicitly to stop unraveling, and in the past these have largely been honored.  The cheating and fudging, such as it was, was kept to the fringes.  But this year it seems like the seal is broken, and we've seen major companies delivering internship and full-time offers with 2 week (and less) hard deadlines.  Other companies now routinely deliver expiring bonus offers for signing early.  Many of these offers circumvent or outright break the guidelines set down by schools, and if past matching markets are a model for this one, next year will come with even earlier offers and worse conditions.

This unraveling has been the subject of a lot of discussion, both internally at Jane Street and with the various schools we recruit at, who see it - rightly - as bad for their students.  How can someone make a thoughtful decision about where they want to build a career without the time to interview at more than one or two places?  Unfortunately, most of this discussion is out of the public light, and so the unraveling continues.

We can't control the actions of others, but we also don’t have to follow the herd, so we'd like to be clear:

Jane Street is committed to making sure that you have the time and information you need to decide on an offer from us.  Our offer letters do have good-until dates as a matter of professional practice, but we try to work with every candidate to choose a date that works for them.  We are also happy to extend the date if something unexpected comes up, or, frankly, if someone just needs more time.

Choosing where to start your career is a big decision and we hope you have the time to make a good one.

Do you love dev tools? Come work at Jane Street.

In the last few years, we've spent more and more effort working on developer tools, to the point where we now have a tools-and-compilers group devoted to the area, for which we're actively hiring.

The group builds software supporting around 200 developers, sysadmins and traders on an OCaml codebase running into millions of lines of code. This codebase provides the foundation for the firm's business of trading on financial markets around the world.

Software that the group develops, much of which is written in-house, includes:
- build, continuous integration and code review systems;
- preprocessors and core libraries;
- editor enhancements and integration.

The group also devotes significant time to working on the OCaml compiler itself, often in collaboration with external parties, with work being released as open source. Recent work includes work on the Flambda optimization framework and the Spacetime memory profiler.

Candidates need to be familiar with a statically typed functional language and possess some amount of experience (within industry or otherwise) in this kind of infrastructure.

We're looking for candidates for both our New York and London office. Benefits and compensation are highly competitive.

If you are interested, please email with a CV and cover letter.

Let syntax, and why you should use it

Earlier this year, we created a ppx_let, a PPX rewriter that introduces a syntax for working with monadic and applicative libraries like Command, Async, Result and Incremental. We've now amassed about six months of experience with it, and we've now seen enough to recommend it to a wider audience.

For those of you who haven't seen it, let syntax lets you write this:

  1. let%bind <var> = <expr1> in <expr2>

instead of this:

  1. <expr1> >>= fun <var> -> <expr2>

with analogous support for the map operator, via let%map. The choice of monad is made by opening the appropriate Let_syntax module, e.g., you might open Deferred.Result.Let_syntax, or Incr.Let_syntax. Note that Async.Std now opens Deferred.Let_syntax by default.

There's also support for match statements, e.g.:

  1. match%bind <expr0> with
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

is equivalent to:

  1. <expr0> >>= function
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

There's also support for parallel binds and maps, using the and syntax. So this:

  1. let%map <var1> = <expr1> and <var2> = <expr2> in <expr3>

is (roughly) equivalent to this:

  1. map3 <expr1> <expr2> ~f:(fun <var1> <var2> -> <expr3>)

This pattern generalizes to arbitrarily wide maps. It's implemented using map and the both operator, which sacrifices some performance in exchange for generality, vs the explicit mapN operators.


My experience with the new syntax has been quite positive. Here's my summary of the wins.

Parallel binds

For libraries like Command.Param and Incremental, where multi-way map functions (like `map2` and `map3`) are important, it's been a pretty big win in terms of the comprehensibility of the resulting code. This tends to be the case for applicatives like Command.Param, which are just monads without bind. The big advantage is that by writing:

  1. let%map x1 = some_very long expression
  2. and x2 = some_other long expression
  3. and x3 = yet another_thing
  4. in
  5. x1 + x2 / x3

we get to put the variable names directly next to the expressions they're being bound. Using an explicit mapN operator, the result is more awkward:

  1. map3
  2. (some_very long expression)
  3. (some_other long expression)
  4. (yet another_thing)
  5. ~f:(fun x1 x2 x3 -> x1 + x2 / x3)

This is error prone, since it's easy to mix up the variables and the expressions. To avoid the corresponding issue in the original Command library, we used some fancy combinators and the dreaded step operator, leading to some hard to understand code. The let-syntax equivalents are materially easier to use.

Variables first

Using a let-like syntax lets you put the variable before the definition, which follows the pattern of ordinary OCaml code, and makes it a bit easier to read. This cleans up some otherwise awkward patterns that are pretty common in our code. In particular, instead of this:

  1. begin
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. end
  6. >>= fun meaningful_variable_name ->
  7. <expr4>

You can write this:

  1. let%bind meaningful_variable_name =
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. in
  6. <expr4>

which flows a bit more naturally, in part because the meaningful variable name comes first, and in part because the extra begin and end are dropped.

Connecting bind to let

Let binds are a lot like monadic binds, even before you add in any special syntax. i.e., this

  1. <expr1> >>= fun x -> expr2

is a lot like this.

  1. let x = <expr1> in <expr2>

This is why monads are sometimes described as "programmable let-binds" (or, relatedly, "programmable semicolons", which are just let-binds with a unit argument.)

I've found this to be a useful analogy in understanding monads, and the analogy is made clearer with let syntax. We have some preliminary reports of this making monadic code more approachable for beginners, which lines up with my intuition.

The similarity between ordinary lets and monadic lets also makes diffs easier to read. e.g., in Async, if some function goes from being synchronous to deferred, the change at the call point would now be from this

  1. let x = some_synchronous_thing () in
  2. more things

to this.

  1. some_asynchronous_thing ()
  2. >>= fun () ->
  3. more things

With let-syntax, we would instead change it to this.

  1. let%bind x = some_asynchronous_thing () in
  2. more things

i.e., the only thing that would change would be the addition of %bind. The resulting diff is more targeted, making the substance of the change a bit easier to see, making refactoring that adds or remove blocking easier to do and understand.


It's not all wine and roses. There are some downsides to let-syntax:

It's new and different

Enough said.

It's kinda ugly

This is a matter of taste, but I've heard some distaste for the percent sign itself. That's something forced on us by PPX, but I don't exactly disagree.

Also, the %bind and %map are a little wordy. There's been some talk of adding the ability to define alternate let syntaxes in OCaml proper, which would allow you to write something like this.

  1. let* x = some_asynchronous_thing () in
  2. let* y = some_other_thing () in
  3. let+ z = a_third thing in
  4. x + y + z

Here, let* would be equivalent to let%bind, and let+ is equivalent to let%map. Again, it's not clear to me that this would all in be a win.

I personally find the new syntax all in less ugly than using infix operators everywhere, but again, tastes vary.

Unit binds aren't great

In particular, because we have no "monadic semicolon" in the syntax, you have to go from this:

  1. <expr1>
  2. >>= fun () ->
  3. <expr2>


  1. let%bind () = <expr1> in
  2. <expr2>

which is not ideal, since it's not parallel to the normal semicolon syntax for this outside of the monad. We've looked at making it possible to do something like:

  1. <expr1> ;%bind
  2. <expr2>

which would be more parallel with ordinary OCaml syntax, but that's not yet possible, and it's not clear it's a net win.

It changes how you think about interleaving in Async

In Async, when you write:

  1. load_something ()
  2. >>= fun x ->
  3. process_thing x

you can think of the point where interleaving can happen as the place where the bind operator is found. With let-syntax:

  1. let%bind x = load_something () in
  2. process_thing x

the location is different, and somewhat less obvious. My experience has been that this was easy to adjust to and hasn't tripped me up in practice, but it's a concern.


A few thoughts on how to use let syntax effectively.

Let syntax for variables, infix for point-free

One might wonder whether there's any use for the infix operators once you are using Let_syntax. I believe the answer is yes. In particular, the style we've adopted is to use let syntax when binding a variable.

  1. let%bind x = some_expression in

and infix operators when going point-free, i.e., when not binding variables.

  1. let v = some_function x >>| ok_exn in

One special case of this is binding unit, where some people prefer to use the following pattern, since we don't have a nice monadic semi-colon yet.

  1. let%bind x = some_operation in
  2. some_other_operation >>= fun () ->
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

rather than:

  1. let%bind x = some_operation in
  2. let%bind () = some_other_operation in
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

Mixing monads

One change we made recently was to add the return function and the monadic infix operators to the Let_syntax module that one opens to choose a monad. This has the useful property of causing one to basically switch cleanly from one monad to another when you open the Let_syntax module. Mixing multiple monads in the same scope is hard to think about.

Command.Param and Deferred

A few interesting cases that come up are mixing the Command.Param syntax with the Deferred syntax. This one is pretty easy to solve, because you don't typically need to mix them together, really. It's just that in the body of the command, you often want Deferred, but in the definition of the command line parser, you want to use Command.Param. This can be handled by doing a local open of Command.Param.Let_syntax or Deferred.Let_syntax as necessary.

Deferred and Deferred.Result

A more complicated case is choosing between the Deferred and Deferred.Result monads. In Async, there are infix operators that let you use both sets of bind and map operators (basically, with question-marks at the end of the ordinary infix operators for the Deferred.Result operators.)

Mixing these operators together in a single scope can be a little awkward, often leaving people to add and remove question-marks until things compile. With let syntax, you really have to pick a single monad, which is easier to read, but then requires some changes in behavior. In particular, you often need to move things from one monad to another. For example, if you're in the Deferred monad and get a result of type Deferred.Or_error.t, you might want to do something like this:

  1. let open Deferred.Let_syntax in
  2. let%bind v = some_operation x y >>| ok_exn in

Here, mapping over ok_exn will take the error and raise it, if necessary. Similarly, if you're using an operation that's in the ordinary Deferred monad but you're operating in the Deferred.Result monad, you might want to lift that operation up, i.e.:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> ~f:(fun x -> Ok x) in

This is something of a mouthful, so we just added the Deferred.ok function, so on our latest release you can write:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> Deferred.ok in

This idiom is useful is useful whether or not you're using let syntax.

ppx_core: context-free rewriters for better semantics and faster compilation

At Jane Street, we have always been heavy users of pre-processors, first with camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, but it save us a lot of time by taking care of a lot of tedious boilerplate code and in some case makes the code a bit prettier.

All in all, our standard set has 19 rewriters:

  • ppx_assert
  • ppx_bench
  • ppx_bin_prot
  • ppx_compare
  • ppx_custom_printf
  • ppx_enumerate
  • ppx_expect
  • ppx_fail
  • ppx_fields_conv
  • ppx_here
  • ppx_inline_test
  • ppx_js_style*
  • ppx_let
  • ppx_pipebang
  • ppx_sexp_conv
  • ppx_sexp_message
  • ppx_sexp_value
  • ppx_typerep_conv
  • ppx_variants_conv

These rewriters fall into 3 big categories:

  1. type driven code generators: ppx_sexp_conv, ppx_bin_prot, ...
  2. inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench
  3. convenience: ppx_sexp_value, ppx_custom_printf, ...

The first category is the one that definitely justify the use of pre-processors, until we get something better in the language itself.

With such a high number of code transformations, there is an important question of how they compose with each other. For instance what happens if the output of a ppx generates some code that is rewritten by another ppx?

Since the switch from camlp4 to ppx a year ago really, category 1 transformers were handled all at once as a whole-AST mapping pass by ppx_type_conv while all the other one were implemented as separate passes. With the previous list that means 13 passes, given that 7 of them are ppx_type_conv plugins. This means that the output depended on the order in which the various passes were applied.

Intuitively, one would think it's not a big deal, given that it is quite rare for a ppx to produce code that would be rewritten by another ppx. Still we ran into several issues over time:

  • Some ppx rewriters - such as ppx_inline_test that rewrites [%%test ...] extensions - captures a pretty-print of their payload, for debugging purposes. Depending on when ppx_inline_test is applied, the payload won't be the same, as it might have been expanded by other ppx rewriters, which is confusing for users.
  • A few ppx rewriters interpret the payload of a specific extension point as a DSL to be interpreted. This is the case of ppx_sexp_value and ppx_sexp_message. If another ppx messed with the payload before them, the result will be unspecified. We had such an issue with ppx_here: inside [%sexp ...], [%here] is interpreted by ppx_sexp_value and ppx_sexp_message and produces "<filename>:<line>:<column>", while outside it is interpreted by ppx_here and produces a record of type Lexing.position

Initially we dealt with these issues by using a specific order in the default set of rewriters, but that's more of a dirty hack than a real solution. Often developers are not aware of this and might end up using a wrong order when using a custom set of rewriters. Moreover this worked because we have control over the order with Jenga, but in opensource packages using oasis, ocamlbuild and ocamlfind we have no control over the final ordering.

But apart from the semantic problems, there is an obvious performance problem: all the transformations are local, but still we are doing 12 passes over the entire AST. What a waste of CPU time!

The different ways of composing ppx rewriters

Before jumping into the subject of this post, we recall a few of the various methods one can use to compose ppx rewriters.

Via separate process

The default method, that was adopted early by the community is to define each transformation as a separate executable. To compose them, one just has to call all the executables one by one. The main advantage of this approach is that each transformation is a black box and can do whatever dirty hacks it wants.

This is what you get when you are using a ppx by just putting the package name in your build system without doing anything special.

Via a driver

Another approach, that we developed at Jane Street is to link all the transformations into a single executable. For this to work properly all transformations must use the same framework. Technically they all register themselves with ppx_driver via a call to Ppx_driver.register_transformation. Ppx_driver is then responsible for composing them.

There are several advantages of the second approach: since ppx_driver has knowledge of all transformations, it can do extended checks such as making sure that all attributes have been interpreted. This helps detect typos, which in practice saves a lot of debugging time. But what really interest us in this post is that it can use more clever composition methods.

Code transformations using ppx_driver can still export a single executable compatible with the first method, that's why all Jane Street ppx rewriters can be used with both methods.

ppx_driver has an ocamlbuild plugin to simplify building custom drivers.

Context free transformations

Given that all transformations are local, it was clear that they should be defined as such; i.e. if all you want to do is turn [%sexp "blah"] into Sexp.Atom "blah", you don't need to visit the whole AST yourself. You just need to instruct whatever framework you are using that you want to rewrite [%sexp ...] extension points.

Context-free extension expander

We started with this idea a few month ago by adding an API in ppx_core to declare context-free extension expanders. For instance, this shows how you would declare a ppx that interpret an extension [%foo ...] inside expressions:

  1. open Ppx_core.Std
  3. let ext =
  4. Extension.declare "foo" Expression
  5. Ast_pattern.(...)
  6. (fun ~path ~loc <parsed-payload...> -> <expansion>)
  8. let () = Ppx_driver.register "foo" ~extensions:[ext]

The Ast_pattern.(...) bit describes what the extension expects as its payload.

Since ppx_driver knows about all the local extension expanders, it can expand them all in one pass over the AST. Moreover it can detect ambiguities and error out in such cases.

There was a choice to make as to whether rewrite the AST in a bottom-up or top-down manner. We choose top-down, to allow extension expanders to interpret their payload before anyone else, and so they can correctly implement a DSL.

This solved most of the initial issues and reduced the number of passes to 7:

  • all extension expanders
  • ppx_type_conv
  • ppx_custom_printf
  • ppx_expect
  • ppx_fail
  • ppx_pipebang
  • ppx_js_style

ppx_expect wasn't initially defined as a context-free extension expander for technical reasons.

Making everything context-free

Recently we went even further and added a Context_free module to Ppx_core to cover all of our transformations. It doesn't support all possible rewriting but support enough to implement a lot of common ones:

  • context-free extension expanders
  • some specific support to implement type-driven code generators
  • support for ppx rewriters interpreting a function application at pre-processing time, such as ppx_custom_printf that interprets !"<format>"

With this we reduced the number of passes to only 2:

  • context free transformations
  • ppx_js_style

ppx_js_style is still done in a separate pass for simplicity. It is run last to ensure we don't generate code that doesn't match our coding rules.

Now, whatever order developers specify their ppx in their build system, they will get the exact same output.

Seeing the exact passes

Ppx_driver got a new option to print what passes it will execute, for instance with ppx-jane which a standard driver containing all of the Jane Street ppx rewriters linked in (available in the ppx_jane package in opam):

$ ppx-jane -print-passes

$ ppx-jane -print-passes -no-check

Safety checks are implemented as additional passes, that's why we see more than one passes by default.


No performance comparison was done when introducing context free extension expanders, but we did some for the second stage, when we changed all ppx rewriters to use Context_free; processing a file with the resulting driver was twice as fast (check passes included).

But how does this compare to the more traditional method of running each rewriter in a separate process? To find out we did some benchmark by taking one of the biggest ml file in core_kernel (src/ and comparing the two methods. We put a type error on the first line to be sure we stop just after pre-processing.

For reference, following are the numbers for calling ocamlfind ocamlc on the file with no pre-processing:

$ time ocamlfind ocamlc -c
File "", line 1, characters 12-15:
Error: This expression has type char but an expression was expected of type

real 0m0.022s
user 0m0.016s
sys  0m0.006s

To preprocess the file with ppx_jane as a single driver executable, one just has to pass one -ppx option, or a -pp option given that ppx_driver can be used either as a -ppx either as a -pp:

# via -ppx
$ time ocamlfind ocamlc \
  -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \
  -c 2> /dev/null 

real 0m0.095s
user 0m0.074s
sys  0m0.020s

# via -pp
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \
  -c 2> /dev/null 

real 0m0.091s
user 0m0.066s
sys  0m0.024s

# via -pp, with checks disabled
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \
  -c 2> /dev/null 

real 0m0.070s
user 0m0.051s
sys  0m0.018s

# via -pp, without merging passes
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \
  -c 2> /dev/null 

real 0m0.229s
user 0m0.206s
sys  0m0.022s

Using the other method turned out to be quite painful, given that the various ppx cannot share command line arguments, they had to be specified more than once:

$ time ocamlfind ocamlc -package ppx_jane \
  -ppxopt "ppx_inline_test,-inline-test-lib blah -inline-test-drop" \
  -ppxopt "ppx_bench,-inline-test-lib blah -bench-drop" \
  -ppxopt "ppx_expect,-inline-test-lib blah" \
  -c 2> /dev/null

real 0m0.339s
user 0m0.233s
sys  0m0.098s

So without surprise the single pass in a single executable method is really a lot faster.


This code is available on github. The context-free extension point API is already available in opam. The newer one is only in the git repository for ppx_core and ppx_driver. You can try them out by using our development opam repository. You should have a look at this if you care about how your rewriters are composed and/or if you care about compilation speed.

  • ppx_js_style is not currently released; it is an internal ppx that we use to enforce our coding standards.

Seven Implementations of Incremental

We finally got a decent recording of one of my favorite talks. This one is about our Incremental library (which I wrote about here), and in particular about the story of how we got to the present, quite performant, implementation.

It's not clear from the talk, but the work on this library wasn't done by me: The initial version was implemented by Stephen Weeks and Milan Stanojevic, and most of the intermediate versions were implemented by Nick Chapman and Valentin Gatien-Baron.

The high quality org-mode slides, though, are all me.