One more talk, two more videos

I'm happy to announce our next public tech talk, called Seven Implementations of Incremental, on Wednesday, April 5th, presented by yours truly. You can register here.

The talk covers the history of Incremental, a library for building efficient online algorithms. The need to update computations incrementally is pretty common, and we've found Incremental to be useful in creating such computations in a number of different domains, from constructing efficient financial calculations to writing responsive, data-rich web UIs.

The ideas behind Incremental aren't new with us; there is a lot of prior art, most notably the work from Umut Acar's work on self-adjusting computations, on which Incremental is most directly modeled.

But there's a big gap between the academic version of an idea and a production ready instantiation, and this talk is about crossing that gap. It discusses the 7 different implementations we went through and the various mistakes we made along the way towards the current one we use in production.

So join us. I hope you enjoy seeing what we learned about building this kind of system, as well as hearing about the hilarious pratfalls along the way.

On another note, we have finally posted videos from our two previous talks, including Brian Nigito's talk on the architecture of the modern exchange, and Arjun Guha's talk on taming Puppet. And, of course, you can subscribe to our channel while you're there.

Jane Street Tech Talks: Verifying Puppet Configs

Our first Jane Street Tech Talk went really well! Thanks to everyone who came and made it a fun event.

Now it's time for another. We're planning for the series to feature a combination of voices from inside and outside Jane Street. This one is of the latter variety: on March 6th, Arjun Guha will be presenting On Verification for System Configuration Languages, which is about using static verification techniques for catching bugs in Puppet configs.

I've known Arjun for years, and he's a both a good hacker and a serious academic with a real knack for finding good ways of applying ideas from programming languages to systems problems. Also, he has excellent taste in programming languages...

I hope you'll come! You can sign up here.

How to Build an Exchange

UPDATE: We are full up. Tons of people signed up for the talk, and we're now at the limit of what we feel like we can support in the space. Thanks for all the interest, and if you didn't get into this one, don't worry, we have more talks coming!

We're about to do the first of what will hopefully become a series of public tech talks in our NY office.

The first talk is on February 2nd, and is an overview of the architecture of a modern exchange. The talk is being given by Brian Nigito, and is inspired by our work on JX, a crossing engine built at Jane Street. But Brian's experience is much broader, going all the way back to the Island ECN, which in my mind marks the birth of the modern exchange.

I did some work on JX, and one of the things I was struck by is the role that performance plays in the design. In particular, JX uses a simple replication scheme based on reliable multicast that relies critically on the components of the system having high throughput and low, deterministic latencies.

This is a situation where performance engineering is done not so much for reducing end-to-end latency, but instead to act as a kind of architectural superpower, making it possible to build systems in a simpler and more reliable way than would be possible otheriwse.

Anyway, I think it's a fascinating topic. If you're interested in coming, you can go here to get the details and sign up. (We have a registration step so we can get people through building security.)

Observations of a functional programmer

I was recently invited to do the keynote at the Commercial Users of Functional Programming workshop, a 15-year-old gathering which is attached to ICFP, the primary academic functional programming conference.

You can watch the whole video here, but it's a bit on the long side, so I thought an index might be useful. (Also, because of my rather minimal use of slides, this is closer to a podcast than a video...)

Anyway, here are the sections:

Hope you enjoy!

What the interns have wrought, 2016

Now that the interns have mostly gone back to school, it's a good time to look back at what they did while they were here. We had a bumper crop -- more than 30 dev interns between our London, New York and Hong Kong offices -- and they worked on just about every corner of our code-base.

In this post, I wanted to talk about just one of those areas: building efficient, browser-based user interfaces.

Really, that's kind of a weird topic for us. Jane Street is not a web company, and is not by nature a place that spends a lot of time on pretty user interfaces. The software we build is for our own consumption, so the kind of spit and polish you see in consumer oriented UIs are just not that interesting here.

But we do care about having functional, usable user interfaces. The work we do is very data driven, and that rewards having UIs that are good at presenting up-to-date data clearly and concisely. And we want to achieve that while keeping complexity of developing these UIs low.

Historically, almost all of our UIs have been text-based. While I love the command line, it does impose some limitations. For one thing, terminals don't offer any (decent) graphical capabilities. But beyond that obvious constraint, getting all the pieces you need for a decent UI to work well in a terminal, from scrolling to text-entry widgets, requires a lot of work that just isn't necessary in a browser.

So this year, we've finally started pushing to make it easy for us to write browser-based applications, in particular relying on OCaml's shockingly good JavaScript back-end. This has allowed us to write web applications in OCaml, using our usual tools and libraries. As I've blogged about previously, we've also been exploring how to use Incremental, a framework for building efficient on-line computations, to make browser UIs that are both pleasant to write and performant.

That's roughly where we were at the beginning of the summer: some good ideas about what designs to look at, and a few good foundational libraries. So the real story is what our interns, Aaron Zeng and Corwin de Boor, did to take us farther.

Web Tables

Aaron Zeng's project was to take all of the ideas and libraries we'd been working on and put them to use in a real project. That project was web-tables, a new user-interface for an existing tool developed on our options desk, called the annotated signal publisher. This service provides a simple way for traders to publish and then view streams of interesting tabular data often based on analysis of our own trading or trading that we see in the markets.

The publisher fed its data to Catalog, our internal pub-sub system. From Catalog, the data could be viewed in Excel, or in our terminal-based Catalog browser. But neither of these approaches worked as well or as flexibly as we wanted.

Enter web-tables. What we wanted was pretty simple: the ability to display tabular data from the annotated signal publisher with customizable formatting, filtering and sorting. This involved breaking a lot of new ground, from figuring out how do the sorting and filtering in an efficient and incremental way, to fixing performance issues with our RPC-over-websockets implementation, to figuring out a deployment and versioning story that let people easily create and deploy new views of their data.

One of the great things about the project is how quickly it was put into use. The options guys started using web-tables before the project was even really finished, and there was a tight feedback loop between Aaron and his mentor Matt Russell, who was using the tool on a daily basis.

Optimizing rendering

Aaron's web-tables work used incr_dom, a small framework that sets up the basic idioms for creating UIs using Incremental. As part of that work, we discovered some limitations of the library that made it hard to hit the performance goals we wanted to. Corwin de Boor's project was to fix those limitations.

The key to building an efficient UI for displaying a lot of data is figuring out what work you can avoid. To this end, Corwin wanted to build UIs that logically contained thousands or even millions of rows, while only actually materializing DOM nodes corresponding to the hundred or so rows that are actually in view.

In order to figure out which nodes to render, he had to first figure out which nodes would be visible, based on the location of the browser's viewport. This in turn required a way of looking up data-points based on where that data would be expected to render on the screen.

Corwin did this by building a data structure for storing the expected height of each object that could be rendered, while allowing one to query for a node based on the sum of the heights of all the nodes ahead of it. He did this by taking an existing splay tree library and rewriting it so it could be parameterized with a reduction function that would be used to aggregate extra information along the spine of the splay tree. By integrating the reduction into splay tree itself, the necessary data could be kept up to date efficiently as the splay tree was modified.

Corwin also spent a lot of time improving incr_dom itself, taking inspiration from other systems like React and the Elm language. We even corresponded a bit with Jordan Walke and Evan Czaplicki, the authors of React and Elm respectively.

One thing that came out of this was a neat trick for making the incr_dom API cleaner by using a relatively new feature of OCaml called open types. The details are a little technical (you can see the final result here and here), but I think what we ended up with is a bit of an advance on the state of the art.

There were a lot of other bits and pieces, like improving the handling of keyboard events in js_of_ocaml, creating a new incremental data-structure library called incr_select for more efficiently handling things like focus and visibility, and restructuring the incr_dom APIs to make them simpler to understand and open up new opportunities for optimization.

By the end, Corwin was able to build a demo that smoothly scrolled over a million rows of continuously updating data, all while keeping update times below 5ms. We're now looking at how to take all of this work and feed it into real applications, including Aaron's work on web-tables.

Sound like fun?

If this sounds like interesting stuff, you should consider applying for a summer internship with us!

Jane Street internships are a great learning experience, and a lot of fun to boot. You even get a chance to travel: interns get to visit Hong Kong, London or New York (depending on where they started!) as part of their summer.

And the projects I described here are really just a taste. Here are some other projects interns worked on this summer:

  • Building a low-latency order router.
  • Adding data representation optimizations to the OCaml compiler
  • A service for simulating a variety of network failures, to make it easier to test distributed applications.
  • Making it possible to write Emacs extensions in OCaml (this was actually Aaron Zeng's second project)

and that's still just a small sample. One thing I love about the work at Jane Street is the surprising variety of problems we find ourselves needing to solve.

Do you love dev tools? Come work at Jane Street.

In the last few years, we've spent more and more effort working on developer tools, to the point where we now have a tools-and-compilers group devoted to the area, for which we're actively hiring.

The group builds software supporting around 200 developers, sysadmins and traders on an OCaml codebase running into millions of lines of code. This codebase provides the foundation for the firm's business of trading on financial markets around the world.

Software that the group develops, much of which is written in-house, includes:
- build, continuous integration and code review systems;
- preprocessors and core libraries;
- editor enhancements and integration.

The group also devotes significant time to working on the OCaml compiler itself, often in collaboration with external parties, with work being released as open source. Recent work includes work on the Flambda optimization framework and the Spacetime memory profiler.

Candidates need to be familiar with a statically typed functional language and possess some amount of experience (within industry or otherwise) in this kind of infrastructure.

We're looking for candidates for both our New York and London office. Benefits and compensation are highly competitive.

If you are interested, please email with a CV and cover letter.

Let syntax, and why you should use it

Earlier this year, we created a ppx_let, a PPX rewriter that introduces a syntax for working with monadic and applicative libraries like Command, Async, Result and Incremental. We've now amassed about six months of experience with it, and we've now seen enough to recommend it to a wider audience.

For those of you who haven't seen it, let syntax lets you write this:

  1. let%bind <var> = <expr1> in <expr2>

instead of this:

  1. <expr1> >>= fun <var> -> <expr2>

with analogous support for the map operator, via let%map. The choice of monad is made by opening the appropriate Let_syntax module, e.g., you might open Deferred.Result.Let_syntax, or Incr.Let_syntax. Note that Async.Std now opens Deferred.Let_syntax by default.

There's also support for match statements, e.g.:

  1. match%bind <expr0> with
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

is equivalent to:

  1. <expr0> >>= function
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

There's also support for parallel binds and maps, using the and syntax. So this:

  1. let%map <var1> = <expr1> and <var2> = <expr2> in <expr3>

is (roughly) equivalent to this:

  1. map3 <expr1> <expr2> ~f:(fun <var1> <var2> -> <expr3>)

This pattern generalizes to arbitrarily wide maps. It's implemented using map and the both operator, which sacrifices some performance in exchange for generality, vs the explicit mapN operators.


My experience with the new syntax has been quite positive. Here's my summary of the wins.

Parallel binds

For libraries like Command.Param and Incremental, where multi-way map functions (like `map2` and `map3`) are important, it's been a pretty big win in terms of the comprehensibility of the resulting code. This tends to be the case for applicatives like Command.Param, which are just monads without bind. The big advantage is that by writing:

  1. let%map x1 = some_very long expression
  2. and x2 = some_other long expression
  3. and x3 = yet another_thing
  4. in
  5. x1 + x2 / x3

we get to put the variable names directly next to the expressions they're being bound. Using an explicit mapN operator, the result is more awkward:

  1. map3
  2. (some_very long expression)
  3. (some_other long expression)
  4. (yet another_thing)
  5. ~f:(fun x1 x2 x3 -> x1 + x2 / x3)

This is error prone, since it's easy to mix up the variables and the expressions. To avoid the corresponding issue in the original Command library, we used some fancy combinators and the dreaded step operator, leading to some hard to understand code. The let-syntax equivalents are materially easier to use.

Variables first

Using a let-like syntax lets you put the variable before the definition, which follows the pattern of ordinary OCaml code, and makes it a bit easier to read. This cleans up some otherwise awkward patterns that are pretty common in our code. In particular, instead of this:

  1. begin
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. end
  6. >>= fun meaningful_variable_name ->
  7. <expr4>

You can write this:

  1. let%bind meaningful_variable_name =
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. in
  6. <expr4>

which flows a bit more naturally, in part because the meaningful variable name comes first, and in part because the extra begin and end are dropped.

Connecting bind to let

Let binds are a lot like monadic binds, even before you add in any special syntax. i.e., this

  1. <expr1> >>= fun x -> expr2

is a lot like this.

  1. let x = <expr1> in <expr2>

This is why monads are sometimes described as "programmable let-binds" (or, relatedly, "programmable semicolons", which are just let-binds with a unit argument.)

I've found this to be a useful analogy in understanding monads, and the analogy is made clearer with let syntax. We have some preliminary reports of this making monadic code more approachable for beginners, which lines up with my intuition.

The similarity between ordinary lets and monadic lets also makes diffs easier to read. e.g., in Async, if some function goes from being synchronous to deferred, the change at the call point would now be from this

  1. let x = some_synchronous_thing () in
  2. more things

to this.

  1. some_asynchronous_thing ()
  2. >>= fun () ->
  3. more things

With let-syntax, we would instead change it to this.

  1. let%bind x = some_asynchronous_thing () in
  2. more things

i.e., the only thing that would change would be the addition of %bind. The resulting diff is more targeted, making the substance of the change a bit easier to see, making refactoring that adds or remove blocking easier to do and understand.


It's not all wine and roses. There are some downsides to let-syntax:

It's new and different

Enough said.

It's kinda ugly

This is a matter of taste, but I've heard some distaste for the percent sign itself. That's something forced on us by PPX, but I don't exactly disagree.

Also, the %bind and %map are a little wordy. There's been some talk of adding the ability to define alternate let syntaxes in OCaml proper, which would allow you to write something like this.

  1. let* x = some_asynchronous_thing () in
  2. let* y = some_other_thing () in
  3. let+ z = a_third thing in
  4. x + y + z

Here, let* would be equivalent to let%bind, and let+ is equivalent to let%map. Again, it's not clear to me that this would all in be a win.

I personally find the new syntax all in less ugly than using infix operators everywhere, but again, tastes vary.

Unit binds aren't great

In particular, because we have no "monadic semicolon" in the syntax, you have to go from this:

  1. <expr1>
  2. >>= fun () ->
  3. <expr2>


  1. let%bind () = <expr1> in
  2. <expr2>

which is not ideal, since it's not parallel to the normal semicolon syntax for this outside of the monad. We've looked at making it possible to do something like:

  1. <expr1> ;%bind
  2. <expr2>

which would be more parallel with ordinary OCaml syntax, but that's not yet possible, and it's not clear it's a net win.

It changes how you think about interleaving in Async

In Async, when you write:

  1. load_something ()
  2. >>= fun x ->
  3. process_thing x

you can think of the point where interleaving can happen as the place where the bind operator is found. With let-syntax:

  1. let%bind x = load_something () in
  2. process_thing x

the location is different, and somewhat less obvious. My experience has been that this was easy to adjust to and hasn't tripped me up in practice, but it's a concern.


A few thoughts on how to use let syntax effectively.

Let syntax for variables, infix for point-free

One might wonder whether there's any use for the infix operators once you are using Let_syntax. I believe the answer is yes. In particular, the style we've adopted is to use let syntax when binding a variable.

  1. let%bind x = some_expression in

and infix operators when going point-free, i.e., when not binding variables.

  1. let v = some_function x >>| ok_exn in

One special case of this is binding unit, where some people prefer to use the following pattern, since we don't have a nice monadic semi-colon yet.

  1. let%bind x = some_operation in
  2. some_other_operation >>= fun () ->
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

rather than:

  1. let%bind x = some_operation in
  2. let%bind () = some_other_operation in
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

Mixing monads

One change we made recently was to add the return function and the monadic infix operators to the Let_syntax module that one opens to choose a monad. This has the useful property of causing one to basically switch cleanly from one monad to another when you open the Let_syntax module. Mixing multiple monads in the same scope is hard to think about.

Command.Param and Deferred

A few interesting cases that come up are mixing the Command.Param syntax with the Deferred syntax. This one is pretty easy to solve, because you don't typically need to mix them together, really. It's just that in the body of the command, you often want Deferred, but in the definition of the command line parser, you want to use Command.Param. This can be handled by doing a local open of Command.Param.Let_syntax or Deferred.Let_syntax as necessary.

Deferred and Deferred.Result

A more complicated case is choosing between the Deferred and Deferred.Result monads. In Async, there are infix operators that let you use both sets of bind and map operators (basically, with question-marks at the end of the ordinary infix operators for the Deferred.Result operators.)

Mixing these operators together in a single scope can be a little awkward, often leaving people to add and remove question-marks until things compile. With let syntax, you really have to pick a single monad, which is easier to read, but then requires some changes in behavior. In particular, you often need to move things from one monad to another. For example, if you're in the Deferred monad and get a result of type Deferred.Or_error.t, you might want to do something like this:

  1. let open Deferred.Let_syntax in
  2. let%bind v = some_operation x y >>| ok_exn in

Here, mapping over ok_exn will take the error and raise it, if necessary. Similarly, if you're using an operation that's in the ordinary Deferred monad but you're operating in the Deferred.Result monad, you might want to lift that operation up, i.e.:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> ~f:(fun x -> Ok x) in

This is something of a mouthful, so we just added the Deferred.ok function, so on our latest release you can write:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> Deferred.ok in

This idiom is useful is useful whether or not you're using let syntax.

Seven Implementations of Incremental

We finally got a decent recording of one of my favorite talks. This one is about our Incremental library (which I wrote about here), and in particular about the story of how we got to the present, quite performant, implementation.

It's not clear from the talk, but the work on this library wasn't done by me: The initial version was implemented by Stephen Weeks and Milan Stanojevic, and most of the intermediate versions were implemented by Nick Chapman and Valentin Gatien-Baron.

The high quality org-mode slides, though, are all me.

OCaml 4.03: Everything else

In my previous post I wrote about Flambda, which is the single biggest feature coming to OCaml in this release. In this post, I'll review the other features of 4.03 that caught my eye.

Inline records

Variants are my favorite thing about OCaml, and in this release, they're getting better. You've always been able to define variants with multiple arguments, e.g.:

  1. type shape =
  2. | Circle of float * float * float
  3. | Rect of float * float * float * float

But, as with this example, it can sometimes be a little hard to figure out what the meaning of the individual fields are, since they don't have labels. We can make this better by replacing our multi-argument variants with single argument variants containing approriately named records, as follows.

  1. type circle = { center_x: float;
  2. center_y: float;
  3. radius: float;
  4. }
  6. type rect = { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }
  12. type shape =
  13. | Circle of circle
  14. | Rect of rect

This works, but the separation of the record type from the variant definition is a little awkward. Beyond that, this approach imposes extra runtime costs. A multi-argument variant takes up just a single heap-allocated block, while a single argument variant containing a record takes two blocks.

With 4.03, we can have the best of both worlds by definining variants containing inline records. Here's what they look like.

  1. type shape =
  2. | Circle of { center_x: float;
  3. center_y: float;
  4. radius: float;
  5. }
  6. | Rect of { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }

And we can write code that uses these types as follows:

  1. let area = function
  2. | Circle c -> 3.14159 *. c.radius *. c.radius
  3. | Rect r -> (r.x_hi -. r.x_lo) *. (r.y_hi -. r.y_lo)

Note, however, that the values r and c aren't quite first-class. For example, we can't use them away from the context of the match statement in which they're found. So this function:

  1. let maybe_rect = function Circle _ -> None | Rect r -> Some r

fails with the error This form is not allowed as the type of the inlined record could escape.

Even with this complexity, inlined records are really convenient.

Another advantage of inline records is that they allow us to express variants with mutable fields. This is useful in a variety of circumstances. In Core, we fake this using Obj.magic for our mutable AVL tree implementation, and new we can remove these hacks. Similar uses of Obj.magic were removed from OCaml's own imperative queue module in this release as well.

Uchar and result

A couple of useful types have been added to the standard library: Uchar.t and result. Uchar.t represents a unicode character, and is effectively just an int under the covers.

The result type is a type used for signaling success or failure, and has the following type.

  1. type ('a,'b) result = Ok of 'a | Error of 'b

Both of these are in some sense trivial, but they're there as a coordination point between different libraries. Lots of OCaml libraries have some analogue of result, and each OCaml Unicode library has its own character type. By including these in the standard library, it provides an easy point of agreement between different external libraries, while adding almost no complexity to the core distribution itself.

Better unboxing for externals

A small but valuable change: it's now possible to write C bindings through which one can pass unboxed versions of a number of different types, including floats, Int64's, Int32's and Nativeint's. This was previously possible only for calls that took only floats and returned a float. This makes it easier to write efficient, zero-allocation C bindings in a wider variety of cases.

GC Latency improvements

I talked about most of this here, so I won't go into great detail now. But the news is that the changes that Damien Doligez did during his stay at Jane Street have finally made it upstream.


To a first approximation, you can think of a GC as a machine that determines what memory can be reclaimed by figuring out what data is reachable, and then reclaiming everything else. The simplest notion of reachability counts everything reachable by following pointers, starting at the GC roots, which are mostly just the values on the call stack.

This notion of reachability is a bit too restrictive, however. In particular, in some circumstances you want to keep pointers to objects without preventing those objects from being reclaimed. This is useful for some kinds of caching, where you want to cache previously computed values for as long as they're referenced, but no longer.

OCaml has long had an answer to this problem, which is the notion of a weak pointer. Weak pointers are references that aren't counted when computing reachability. When an object that is pointed to by a weak reference is collected, the weak reference itself is nulled out.

Weak references are good enough for some purposes (e.g. hash-consing), but they can be awkward for many use cases. One basic use-case for which weak references are an awkward fit is memoizing a function, where one wants to keep entries in the table as long as the input to the function in question is still reachable.

You could imagine just keeping the key of the hash-table in a weak pointer, and then using a finalizer to remove the entry in the table once the key gets collected. But this fails if there is a reference from the output of the function back to the key, which is common enough.

Ephemerons were proposed back in 97 by Barry Hayes to solve just this problem. The idea is pretty simple: an ephemeron has multiple keys and a single data entry. When determining whether the keys are alive, one doesn't count references from values that are reachable only via the ephemeron, so the references from the data back to the key don't keep the key alive. Also, once any of the ephemerons keys are collected, the key and the data element are removed immediately. Weak pointers are now just a special case of ephemerons --- a weak pointer is effectively just an epehmeron with no data.

Ephemerons don't come up that much, but if you need to build a memoization table, ephemerons make the task simpler and the result less leak-prone.

Licenses, Github and OCamlbuild

A few organizational changes are landing in 4.03 that I think are worth noting. First, OCaml development is officially moving from Subversion to Git, with Github as the primary coordination point for OCaml development.

Second, OCaml's license is changing from the somewhat awkward and rarely used QPL, to LGPLv2 with a linking exception. The latter had already been used for various libraries distributed with the compiler. But, acknowledging the fact that more and more of the guts of the compiler are being used as libraries for a variety of reasons, it was recently decided to move everything to LGPLv2.

And finally, ocamlbuild is being moved out of the core OCaml distribution. This is one in a series of decisions to break out software that was previously bundled together with OCaml. This allows the core team to focus more on the core compiler itself, and makes the exported projects freer to make changes on whatever schedule they see fit.

I think keeping the OCaml distribution lean is an excellent strategic decision, allowing the core team to focus on the compiler itself, and allowing tooling and libraries to be worked on independently.

Together, these are all small changes. But they're part of a trend towards making OCaml development simpler and more agile.

What's missing

There are a number of much anticipated features that haven't made it into this release. In particular, the multicore GC, which at one point had been expected to land in 4.03, has been pushed back, likely to 4.04. Algebraic effects, which are in part motivated by the multicore GC, also didn't make it.

And finally, modular implicits, which are intended to provide typeclass-like functionality for OCaml, are not in this release either. That said, they're being actively worked on, and I'm expecting there will be more news about those in the next six months.

All in, it's been a pleasure watching this release take shape. And just as important as what got in is what didn't. Despite a greatly increased rate of change, the process is quite conservative --- nonsense features just don't make it in. My gratitude goes out to the core team for their safekeeping of the language.

A better inliner for OCaml, and why it matters

OCaml 4.03 is branched and a first release candidate is imminent, so it seems like a good time to take stock of what's coming.

This post will focus on just one of those features: Flambda, a new IR (intermediate representation) in the depths of the compiler designed to allow for better inlining, paired with a set of optimizations leveraging that IR.

Why inlining matters

If your expectations about inlining come from a language like C, you might not be all that excited by Flambda. In C, after all, the benefits of inlining are relatively small, mostly allowing one to avoid function call overhead. That's useful, but has limited impact.

In a language like OCaml, inlining is about more than just function call overhead. That's because inlining grows the amount of code that the optimizer can look at a given point, and that makes other optimizations more effective. And there are simply more optimizations available in a language like OCaml than in one like C. In particular, many allocations can be optimized away, which can have a rather big effect on the performance of a program.

To get a sense of how we can eliminate allocations, consider this simple example.

  1. let rec fold l ~init ~f =
  2. match l with
  3. | [] -> init
  4. | hd :: tl -> fold tl ~init:(f init hd ) ~f
  6. let pow_sum l n =
  7. fold l ~init:0 ~f:(fun x y -> pow x n + pow y n)

In the current compiler, every invocation of pow_sum will allocate a closure for the function f. With Flambda, this code should be allocation free, since the fold can be inlined into pow_sum, f can be inlined into the body of the now inlined fold, and the fold can be specialized by adding an extra argument, thus making the closure allocation unnecessary.

Here's another example. Imagine you wanted to sum the first and last elements of a list. You could do this using functions in the list module and a simple pattern match, as follows.

  1. let first_plus_last =
  2. match List.hd l, List.last l with
  3. | None, _ | _, None -> None
  4. | Some x, Some y -> Some (x + y)

Without any inlining, this function creates three heap-allocated values, which are the various options being returned.

Now, let's consider what happens if we rewrite this code using the option monad.

  1. let (>>=) o f =
  2. match o with
  3. | None -> None
  4. | Some x -> f x
  6. let first_plus_last l =
  7. List.hd l >>= fun x ->
  8. List.last l >>= fun y ->
  9. Some (x + y)

Or, equivalently, using our ppx_let syntax:

  1. let first_plus_last l =
  2. let%bind x = List.hd l in
  3. let%bind y = List.last l in
  4. Some (x + y)

Without any inlining, this implementation will create five heap allocated values: one for each of the closures on the right hand side of the bind, one for the options returned from List.hd and respectively, and once for the final Some.

It's of course possible to write a version that only allocates one object, for the return value.

  1. let first_plus_last l =
  2. match l with
  3. | [] -> None
  4. | x::_ ->
  5. let rec last_exn = function
  6. | [] -> assert false
  7. | [y] -> y
  8. | _ :: tl -> last_exn tl
  9. in
  10. Some (x + last_exn l)

But this is obviously uglier and harder to read than either of the earlier versions.

With Flambda (and the appropriate flags, i.e., (-O3 -unbox-closures), these examples all allocate the same amount. That's because, once the inlining is done and all the code is in one place, the compiler can do things like observe that the option returned from List.hd is immediately deconstructed and is never used for anything else. As a result, the allocation of the options can be removed entirely.

This example highlights why these kinds of compiler optimizations are valuable. It's not that they're strictly necessary for good performance --- usually, pretty good performance can be achieved with sufficiently ugly code. But what good optimizations can do for us is let us write simpler, easier to understand code, without sacrificing performance.

A basis for future optimizations

In addition to the optimizations that will be there when 4.03 lands, Flambda will also make it easier to build new optimizations. One example comes from a project that was done last summer by Will Crichton, one of our interns. The aim of the project was to make the kind of allocation removal I described above more predictable.

From how I described it above, it might seem like Flambda's ability to remove allocations relies on the ability to inline the allocating functions in their entirety. That would be problematic, because what is and isn't inlined can be a bit unpredictable. That's because there have to be some heuristic thresholds, since inlining everything can lead to excessively large executables.

As such, depending on inlining makes it harder to predict the allocation behavior of your code, since it now hinges on the heuristics for determining when a function should be inlined. The situation isn't quite so grim --- some of the optimizations that come along with Flambda (like -unbox-closures) don't depend so critically on inlining. But still, many of Flambda's optimizations do depend on inlining, and as such whether they hit becomes harder to predict.

But we can make this better. The specific project the Will worked on had to do with allocations that come from returning small objects that are immediately deconstructed. With Flambda as it is today, such allocations are only removed if the both the construction and deconstruction of the value are together, which often requires inlining. Will's project made this more robust by changing the calling conventions in Flambda, specifically by adding a first class multi-argument return to the Flambda IR.

With a multi-argument return in place, a function that returns a small tuple, say, could instead be compiled to return the components of the tuple as individual values to be passed back on the stack. Then, this function can be wrapped with a small function that picks those values off the stack and allocates the necessary object.

This small wrapper function is by design small enough to be inlined reliably. Once that inlining is done, both the construction and deconstruction of the value will be visible to the optimizer, and they can be collapsed. This should work whether or not the main function is inlined.

This is really just one example. Our hope and expectation is that Flambda will be the basis of many improvements in the compilation of OCaml over the next few years.


The performance impact we've seen from our experiments with Flambda seem pretty promising so far. On real world applications we've tested, it's pretty normal to see allocations reduced by 20-30%. We've found similarly sized improvements in application latency as well. And we think that these numbers will improve yet more as more optimizations are added on top of Flambda.

Beyond improving existing programs, we already are seeing how Flambda allows us to write prettier and cleaner code without compromising on performance. For example, Flambda allows us to make freer use of OCaml's functors, which are a great abstraction tool, but one that imposed significant costs pre-Flambda. With Flambda, functors can simply be inlined away.

Flambda makes things easier in other ways too. For example, we're in the processing of developing a new internal protocol for zero-allocation messaging, which requires a lot of code generation, via PPX. Being able to rely on Flambda greatly simplifies that code generation, since it lets us write a fairly naive code generator, which generates efficient code because of Flambda. We can write code that computes offsets into the message naively, and Flambda, via a combination of inlining and constant folding, moves the computation of these offsets to compile time, so the runtime field access becomes a simple lookup.

Even if you're perfectly happy with OCaml's performance as is, Flambda is still an exciting change. That's because various upcoming language improvements like modular implicits (a feature that brings some of the same benefits as Haskell's typeclasses) will only really perform acceptably well with a good inliner in place. So in addition to making your current programs run faster, Flambda also enables new abstractions that can make your future programs easier to read and write.

One downside of all this is that it reduces the predictability of OCaml's performance, which is an important property of the language. Part of how we hope to address this is by improving Flambda's heuristics to be more predictable, but that's unlikely to be enough on its own. That's why OCaml 4.03 comes with new annotations that let the programmer require or prohibit inlining for a particular function.

Over time, we hope to find a good balance, where the heuristics do a pretty good and pretty predictable job by default, but where we give programmers hooks to control things more precisely where it matters.


Getting Flambda this far has been a big project, and a lot of people were involved. OCamlPro did a lot of the heavy lifting, with Pierre Chambart writing most of the compiler code, and Louis Gesbert and Michel Laporte working on benchmarking. Jane Street jumped in too, with Mark Shinwell and Leo White contributing mightily to the code and design as part of the review and stabilization process, and Jeremie Dimino helping with testing. And a number of other people on the core team (notably Alain Frisch, Gabriel Scherer, Runhang Li, Damien Doligez) contributed in a variety of ways to the final product. Thanks to all involved!