A solution to the ppx versioning problem

Ppx is a preprocessing system for OCaml where one maps over the OCaml abstract syntax tree (AST) to interpret some special syntax fragments to generate code.

Ppx rewriters get to work on the same AST definition as the compiler, which has many advantages:

  • The AST corresponds (almost) exactly to the OCaml language. This is not completely true as the AST can represent programs that you can't write, but it's quite close.

  • Given that the compiler and pre-processor agree on the data-type, they can communicate between each other using the unsafe [Marshal] module, which is a relatively cheap and fast way of serializing and deserializing OCaml values.

  • Finally, the biggest advantage for the user is that the locations in the original code are exactly preserved, which is a requirement to get usable error messages. This is not so great for the generated code, as the best one can do is reuse some locations from the original source code and hope for the best. In practice the user sometimes gets non-sensical errors, but this is a commonly accepted trade-off.

There is however one drawback to all this, the compiler AST is not stable and code using it is at the mercy of its evolution. We got lucky with the 4.04 release of OCaml but the 4.03 one was quite disruptive. Even before releases, whenever the AST definition changes during a development cycle, many ppx might not be usable for a while, which make testing a lot harder.

Several ideas have been flying around, such as adding a layer to convert between different versions of the AST. While this would work, it has the drawback that you need this layer for every variant of the AST. And when you want to make a patch modifying the AST, you'll need to do the extra work of updating this layer first.

In this blog post we show how we managed to solve the ppx compatiblity problem in a way that improves the user experience and lets us produce releases that don't depend on ppx rewriters at all.

We did this work while working on Base, our upcoming standard library. In the end, it's likely we'll use only the third of the methods described below for Base, while the others will be used to improve the user experience with the rest of our packages.

What do other code generators do?

Ppx is not the first system for generating code that is a mix of user-written code and machine generated code. A typical class of generators that get it right, i.e. that preserve locations and are independant from the AST definition, are lexer/parser generators, and not only the ones distributed with the compiler.

Let's take the example of lexer generators (parser generators work basically the same way). The users write a series of rules, consisting of a rular expression and an action to take if the input matches it:

rule token = parse
| "+"  { PLUS  }
| "-"  { MINUS }
| '0'..'9'+ as s { INT (int_of_string s) }
| " "* { token lexbuf (* skip blanks *) }

This code is written in a .mll file and the generator then produces a .ml file with code for the lexing engine interleaved with the user written actions.

In order to keep the locations of the user-written code pointing to the right place in the .mll file, the generator produces:

# 42 "lexer.mll"
         token lexbuf (* skip blanks *)

The OCaml compiler interprets the line starting with a # and updates its current location to point to the begining of line 42 in file lexer.mll. This is called a line directive.

To go back into the generated code, the lexer generator produces:

# 100 "lexer.ml"

Where 100 correspond to the real line number in lexer.ml.

With this method, when there is an error in the user-written code, it points to the lexer.mll file, while when there is an error in the generated code it points to the lexer.ml file. Even if the generated code might not be particularly easy to understand, at least you get to see the real code the compiler chokes on.

Another big advantage is that when using a debugger, you can follow the execution through the generated code.

Can we do the same for ppx?

At first glance, it seems that ppx rewriters work in a very different way, but the result is the same: only parts of the file is generated and the rest is taken as if from what the user wrote. In fact, compared to the lexer case, most of the resulting code is user written.

There is however some work to do to get the same result as with lexer generators. First you have to distinguish the generated code from the user code.

If you take a ppx rewriter as a black box, then the only way is to apply some kind of tree diff between the input and the output. In our ppx framework however, we know exactly what fragments of the AST are rewritten by plugins and we know the rewriting is always local. This makes the job a lot simpler and probably faster as well, so we chose to take advantage of this information.

The method

It works this way: while mapping the AST, we collect all the fragments of generated code with the location of the code they replace in the original file. At the end we sort them in the order of the file and make sure there is no overlap. Every fragment is pretty printer to a string.

What we end up with is a list of text substitutions: beginning position, end position, replacemen text. The next step is to simply apply these substitutions to the original file. If you read the bog post about how we switched from camlp4 to ppx, you'll notice the resemblance here.

This is what the transformation looks like:

(* ----- input ----- *)
type t = int [@@deriving sexp_of]

let f x = x + 1

let g x = [%sexp_of: t] x

(* ----- output ----- *)
# 1 "foo.ml"
type t = int [@@deriving sexp_of]
# 4 "foo.ml.pp"
let sexp_of_t = sexp_of_int
#2 "foo.ml"

let f x = x + 1

let g x =
# 11 "foo.ml.pp"
sexp_of_t
# 5 "foo.ml"
                        x

The result for [@@deriving sexp_of] is not bad at all. For code rewritten inside expressions, the result is not as good given that it breaks those expressions up. But given than extensions are often sparse in our source files, this is still acceptable.

This mode can be selected with ppx_driver based rewriter by passing the flag -reconcile.

Solving the compatiblity problem

With this mode, one can first generate a .ml.pp file and feed that to the compiler. Given that the concrete syntax of the language breaks much less often than the internal AST definition, a working ppx is likely to work for a very long time.

We'll soon start releasing a separate package that snapshots one version of the lexer/parser/AST/printer of the OCaml compiler. This package will have its own release schedule and will typically be updated soon after each relase of OCaml. This will give time for ppx authors to upgrade their code when it breaks while still allowing people to try out the new compiler with their favorite packages.

Mode for release tarballs

In addition to the mode described above, ppx_driver has a second mode -reconcile-with-comments where the result is similar to the one with line directives expect than the generated code is enclosed with comment markers:

type t = int [@@deriving sexp_of]
(* GENERATED CODE BEGIN *)
let sexp_of_t = sexp_of_int
(* GENERATED CODE END *)

let f x = x + 1

let g x =
(* GENERATED CODE BEGIN *)
sexp_of_t
(* GENERATED CODE END *)
                        x

This mode is intended for release tarballs. One can replace all the files in-place by the pre-processed version using =-reconcile-with-comments=. The result is readable and has the big advantage that you you don't need to depend on the ppx rewriter, which means the package is faster to install for users.

Jane Street packages will eventually move to this scheme, either for the next stable release or the one after that. One technical issue with this method is that to take full advantage of it, the runtime support libraries of the various ppx rewriters must be installable without the rewriter itself. Splitting the packages in opam is fine but splitting the repository is not desirable as often both components make strong assumption about the other.

For Jane Street packages, we'll need to update our release system so that it supports generating two opam packages from one repository.

ppx as a verfication tool only

While these new methods improve the ppx story in general, for Base we wanted to go even further and allow users to build Base without the need for ppx at all, both for the release and for the development versions. Not only to cut down the dependencies, but also to provide a better experience in general. For instance if you are working on a patched compiler and need the development version of Base, you shouldn't need all the ppx rewriters that might not work for some reason.

We explored various bootstrap story, and while they worked they were not very nice, especially for such an important library. Its development and build processes should be straightforward.

We even looked into not using ppx at all. While this is OK for many ppx rewriters that are mostly syntactic sugars, it is more problematic for [@@deriving ...]. It's not so much that the code is hard to write by hand, most data-structures in Base are either simple datatypes or require and written combinators anyway, but it is a pain to review. This code is very mechanical and you have to make sure that the constant strings correspond to the constructor/field names and other things where the machine can do much better than a human.

In the end we found a solution to keep the best of both worlds, /i.e./ being able to build the original source code without pre-processing and avoid having to write and review this boilerplate code.

The idea is to use ppx in the same way that we write expect tests; the tool only checks that what's comes after the type-definition correspond to what the rewriters derive from it. In case of mismatch it produces a .corrected file just like expect tests.

We are currently experimenting with this method for Base. It's possible that we'll have some marker to delimit the end of the generated code. In the end the code could look like this:

type t = A | B [@@deriving sexp_of]

let sexp_of_t = function
  | A -> Sexp.Atom "A"
  | B -> Sexp.Atom "B"

[@@@end_of_derived_code]

Given that the compiler ignores attributes it doesn't understand, this code compiles just fine without any pre-processing.

When running the ppx rewriter in this expect mode, the generated AST is matched against the source AST without taking locations into account, so that mean that you can reformat the code as you wish and even add comments.

The challenge now is to update our ppx rewriters so that they produce code that we are happy to show. Until now we didn't focus too much on that, but we have a good idea about how to do it. The plan is to move more of the logic of the various deriving system into proper functions instead of generating more code. Note that this is an improvement in general as proper functions are a lot easier to understand and maintain than code generators.

Conclusion

In this blog post we described a simple and clean method to decouple ppx rewriters from the release schedule of the compiler. This method has the advantage that once the ppx is written is likely to work for a long time and especially to work out of the box with development compilers.

Moreover, this method has is better for users as errors point to the real code the compiler sees and when debugging they can follow the execution through generated code without trouble.

All this is currently implemented in ppx_core/ppx_driver. Our github repositories haven't been updated in a while has the Base refactoring disrupted our public release process quite a bit. These new features should be published in the coming weeks and we'll be part of the next stable release of our packages, planned for the beginning of December.

Observations of a functional programmer

I was recently invited to do the keynote at the Commercial Users of Functional Programming workshop, a 15-year-old gathering which is attached to ICFP, the primary academic functional programming conference.

You can watch the whole video here, but it's a bit on the long side, so I thought an index might be useful. (Also, because of my rather minimal use of slides, this is closer to a podcast than a video...)

Anyway, here are the sections:

Hope you enjoy!

What the interns have wrought, 2016

Now that the interns have mostly gone back to school, it's a good time to look back at what they did while they were here. We had a bumper crop -- more than 30 dev interns between our London, New York and Hong Kong offices -- and they worked on just about every corner of our code-base.

In this post, I wanted to talk about just one of those areas: building efficient, browser-based user interfaces.

Really, that's kind of a weird topic for us. Jane Street is not a web company, and is not by nature a place that spends a lot of time on pretty user interfaces. The software we build is for our own consumption, so the kind of spit and polish you see in consumer oriented UIs are just not that interesting here.

But we do care about having functional, usable user interfaces. The work we do is very data driven, and that rewards having UIs that are good at presenting up-to-date data clearly and concisely. And we want to achieve that while keeping complexity of developing these UIs low.

Historically, almost all of our UIs have been text-based. While I love the command line, it does impose some limitations. For one thing, terminals don't offer any (decent) graphical capabilities. But beyond that obvious constraint, getting all the pieces you need for a decent UI to work well in a terminal, from scrolling to text-entry widgets, requires a lot of work that just isn't necessary in a browser.

So this year, we've finally started pushing to make it easy for us to write browser-based applications, in particular relying on OCaml's shockingly good JavaScript back-end. This has allowed us to write web applications in OCaml, using our usual tools and libraries. As I've blogged about previously, we've also been exploring how to use Incremental, a framework for building efficient on-line computations, to make browser UIs that are both pleasant to write and performant.

That's roughly where we were at the beginning of the summer: some good ideas about what designs to look at, and a few good foundational libraries. So the real story is what our interns, Aaron Zeng and Corwin de Boor, did to take us farther.

Web Tables

Aaron Zeng's project was to take all of the ideas and libraries we'd been working on and put them to use in a real project. That project was web-tables, a new user-interface for an existing tool developed on our options desk, called the annotated signal publisher. This service provides a simple way for traders to publish and then view streams of interesting tabular data often based on analysis of our own trading or trading that we see in the markets.

The publisher fed its data to Catalog, our internal pub-sub system. From Catalog, the data could be viewed in Excel, or in our terminal-based Catalog browser. But neither of these approaches worked as well or as flexibly as we wanted.

Enter web-tables. What we wanted was pretty simple: the ability to display tabular data from the annotated signal publisher with customizable formatting, filtering and sorting. This involved breaking a lot of new ground, from figuring out how do the sorting and filtering in an efficient and incremental way, to fixing performance issues with our RPC-over-websockets implementation, to figuring out a deployment and versioning story that let people easily create and deploy new views of their data.

One of the great things about the project is how quickly it was put into use. The options guys started using web-tables before the project was even really finished, and there was a tight feedback loop between Aaron and his mentor Matt Russell, who was using the tool on a daily basis.

Optimizing rendering

Aaron's web-tables work used incr_dom, a small framework that sets up the basic idioms for creating UIs using Incremental. As part of that work, we discovered some limitations of the library that made it hard to hit the performance goals we wanted to. Corwin de Boor's project was to fix those limitations.

The key to building an efficient UI for displaying a lot of data is figuring out what work you can avoid. To this end, Corwin wanted to build UIs that logically contained thousands or even millions of rows, while only actually materializing DOM nodes corresponding to the hundred or so rows that are actually in view.

In order to figure out which nodes to render, he had to first figure out which nodes would be visible, based on the location of the browser's viewport. This in turn required a way of looking up data-points based on where that data would be expected to render on the screen.

Corwin did this by building a data structure for storing the expected height of each object that could be rendered, while allowing one to query for a node based on the sum of the heights of all the nodes ahead of it. He did this by taking an existing splay tree library and rewriting it so it could be parameterized with a reduction function that would be used to aggregate extra information along the spine of the splay tree. By integrating the reduction into splay tree itself, the necessary data could be kept up to date efficiently as the splay tree was modified.

Corwin also spent a lot of time improving incr_dom itself, taking inspiration from other systems like React and the Elm language. We even corresponded a bit with Jordan Walke and Evan Czaplicki, the authors of React and Elm respectively.

One thing that came out of this was a neat trick for making the incr_dom API cleaner by using a relatively new feature of OCaml called open types. The details are a little technical (you can see the final result here and here), but I think what we ended up with is a bit of an advance on the state of the art.

There were a lot of other bits and pieces, like improving the handling of keyboard events in js_of_ocaml, creating a new incremental data-structure library called incr_select for more efficiently handling things like focus and visibility, and restructuring the incr_dom APIs to make them simpler to understand and open up new opportunities for optimization.

By the end, Corwin was able to build a demo that smoothly scrolled over a million rows of continuously updating data, all while keeping update times below 5ms. We're now looking at how to take all of this work and feed it into real applications, including Aaron's work on web-tables.

Sound like fun?

If this sounds like interesting stuff, you should consider applying for a summer internship with us!

Jane Street internships are a great learning experience, and a lot of fun to boot. You even get a chance to travel: interns get to visit Hong Kong, London or New York (depending on where they started!) as part of their summer.

And the projects I described here are really just a taste. Here are some other projects interns worked on this summer:

  • Building a low-latency order router.
  • Adding data representation optimizations to the OCaml compiler
  • A service for simulating a variety of network failures, to make it easier to test distributed applications.
  • Making it possible to write Emacs extensions in OCaml (this was actually Aaron Zeng's second project)

and that's still just a small sample. One thing I love about the work at Jane Street is the surprising variety of problems we find ourselves needing to solve.

Unraveling of the tech hiring market

Recruiting talented people has always been challenging.

In some years that meant competing with a hot new company that aggressively courted every fresh graduate with promises of stock options and IPO glory.  In other years there wasn’t a specific company so much as an entire rising industry looking for people (I’m looking at you cloud services, driverless cars, and peer-to-peer sharing).  Either way, we understood the yearly back and forth.  Our job was to explain to candidates how we stacked up, and more importantly, why a career at Jane Street might be the right choice for many of them.

But this year I got to learn a new name for a new challenge.  “Unraveling”.

I first encountered it in a book I was reading for fun: "Who Gets What, and Why", by the Nobel Prize-winning economist Alvin Roth.  He does a lovely job explaining the idea of a matching market.  In a matching market each person wants only one of each item, each item is unique, and each item can be given to at most one person at a time.  Jobs are a classic matching market, and just like any market, matching markets can work well, or poorly.

Unraveling is one of the primary things that makes a matching market fail.  When a market unravels  matches start to happen earlier and earlier, to the point where people no longer get a complete view of their options.  In the book Roth relates the story of a person who stepped off of a plane to find three voicemails on his phone.  The first offered him a job, the second urged him to respond soon, and the last rescinded the offer because he hadn't responded quickly enough.

We call them exploding offers, and this year they have gotten completely out of hand as companies race to the bottom in their efforts to recruit the next wave of interns and fresh graduates.

Colleges try to impose deadline limits explicitly to stop unraveling, and in the past these have largely been honored.  The cheating and fudging, such as it was, was kept to the fringes.  But this year it seems like the seal is broken, and we've seen major companies delivering internship and full-time offers with 2 week (and less) hard deadlines.  Other companies now routinely deliver expiring bonus offers for signing early.  Many of these offers circumvent or outright break the guidelines set down by schools, and if past matching markets are a model for this one, next year will come with even earlier offers and worse conditions.

This unraveling has been the subject of a lot of discussion, both internally at Jane Street and with the various schools we recruit at, who see it - rightly - as bad for their students.  How can someone make a thoughtful decision about where they want to build a career without the time to interview at more than one or two places?  Unfortunately, most of this discussion is out of the public light, and so the unraveling continues.

We can't control the actions of others, but we also don’t have to follow the herd, so we'd like to be clear:

Jane Street is committed to making sure that you have the time and information you need to decide on an offer from us.  Our offer letters do have good-until dates as a matter of professional practice, but we try to work with every candidate to choose a date that works for them.  We are also happy to extend the date if something unexpected comes up, or, frankly, if someone just needs more time.

Choosing where to start your career is a big decision and we hope you have the time to make a good one.

Do you love dev tools? Come work at Jane Street.

In the last few years, we've spent more and more effort working on developer tools, to the point where we now have a tools-and-compilers group devoted to the area, for which we're actively hiring.

The group builds software supporting around 200 developers, sysadmins and traders on an OCaml codebase running into millions of lines of code. This codebase provides the foundation for the firm's business of trading on financial markets around the world.

Software that the group develops, much of which is written in-house, includes:
- build, continuous integration and code review systems;
- preprocessors and core libraries;
- editor enhancements and integration.

The group also devotes significant time to working on the OCaml compiler itself, often in collaboration with external parties, with work being released as open source. Recent work includes work on the Flambda optimization framework and the Spacetime memory profiler.

Candidates need to be familiar with a statically typed functional language and possess some amount of experience (within industry or otherwise) in this kind of infrastructure.

We're looking for candidates for both our New York and London office. Benefits and compensation are highly competitive.

If you are interested, please email tools-and-compilers-job@janestreet.com with a CV and cover letter.

Let syntax, and why you should use it

Earlier this year, we created a ppx_let, a PPX rewriter that introduces a syntax for working with monadic and applicative libraries like Command, Async, Result and Incremental. We've now amassed about six months of experience with it, and we've now seen enough to recommend it to a wider audience.

For those of you who haven't seen it, let syntax lets you write this:

  1. let%bind <var> = <expr1> in <expr2>

instead of this:

  1. <expr1> >>= fun <var> -> <expr2>

with analogous support for the map operator, via let%map. The choice of monad is made by opening the appropriate Let_syntax module, e.g., you might open Deferred.Result.Let_syntax, or Incr.Let_syntax. Note that Async.Std now opens Deferred.Let_syntax by default.

There's also support for match statements, e.g.:

  1. match%bind <expr0> with
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

is equivalent to:

  1. <expr0> >>= function
  2. | <pattern1> -> <expr1>
  3. | <pattern2> -> <expr2>

There's also support for parallel binds and maps, using the and syntax. So this:

  1. let%map <var1> = <expr1> and <var2> = <expr2> in <expr3>

is (roughly) equivalent to this:

  1. map3 <expr1> <expr2> ~f:(fun <var1> <var2> -> <expr3>)

This pattern generalizes to arbitrarily wide maps. It's implemented using map and the both operator, which sacrifices some performance in exchange for generality, vs the explicit mapN operators.

Advantages

My experience with the new syntax has been quite positive. Here's my summary of the wins.

Parallel binds

For libraries like Command.Param and Incremental, where multi-way map functions (like `map2` and `map3`) are important, it's been a pretty big win in terms of the comprehensibility of the resulting code. This tends to be the case for applicatives like Command.Param, which are just monads without bind. The big advantage is that by writing:

  1. let%map x1 = some_very long expression
  2. and x2 = some_other long expression
  3. and x3 = yet another_thing
  4. in
  5. x1 + x2 / x3

we get to put the variable names directly next to the expressions they're being bound. Using an explicit mapN operator, the result is more awkward:

  1. map3
  2. (some_very long expression)
  3. (some_other long expression)
  4. (yet another_thing)
  5. ~f:(fun x1 x2 x3 -> x1 + x2 / x3)

This is error prone, since it's easy to mix up the variables and the expressions. To avoid the corresponding issue in the original Command library, we used some fancy combinators and the dreaded step operator, leading to some hard to understand code. The let-syntax equivalents are materially easier to use.

Variables first

Using a let-like syntax lets you put the variable before the definition, which follows the pattern of ordinary OCaml code, and makes it a bit easier to read. This cleans up some otherwise awkward patterns that are pretty common in our code. In particular, instead of this:

  1. begin
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. end
  6. >>= fun meaningful_variable_name ->
  7. <expr4>

You can write this:

  1. let%bind meaningful_variable_name =
  2. <expr1>;
  3. let <var> = <expr2> in
  4. <expr3>
  5. in
  6. <expr4>

which flows a bit more naturally, in part because the meaningful variable name comes first, and in part because the extra begin and end are dropped.

Connecting bind to let

Let binds are a lot like monadic binds, even before you add in any special syntax. i.e., this

  1. <expr1> >>= fun x -> expr2

is a lot like this.

  1. let x = <expr1> in <expr2>

This is why monads are sometimes described as "programmable let-binds" (or, relatedly, "programmable semicolons", which are just let-binds with a unit argument.)

I've found this to be a useful analogy in understanding monads, and the analogy is made clearer with let syntax. We have some preliminary reports of this making monadic code more approachable for beginners, which lines up with my intuition.

The similarity between ordinary lets and monadic lets also makes diffs easier to read. e.g., in Async, if some function goes from being synchronous to deferred, the change at the call point would now be from this

  1. let x = some_synchronous_thing () in
  2. more things

to this.

  1. some_asynchronous_thing ()
  2. >>= fun () ->
  3. more things

With let-syntax, we would instead change it to this.

  1. let%bind x = some_asynchronous_thing () in
  2. more things

i.e., the only thing that would change would be the addition of %bind. The resulting diff is more targeted, making the substance of the change a bit easier to see, making refactoring that adds or remove blocking easier to do and understand.

Disadvantages

It's not all wine and roses. There are some downsides to let-syntax:

It's new and different

Enough said.

It's kinda ugly

This is a matter of taste, but I've heard some distaste for the percent sign itself. That's something forced on us by PPX, but I don't exactly disagree.

Also, the %bind and %map are a little wordy. There's been some talk of adding the ability to define alternate let syntaxes in OCaml proper, which would allow you to write something like this.

  1. let* x = some_asynchronous_thing () in
  2. let* y = some_other_thing () in
  3. let+ z = a_third thing in
  4. x + y + z

Here, let* would be equivalent to let%bind, and let+ is equivalent to let%map. Again, it's not clear to me that this would all in be a win.

I personally find the new syntax all in less ugly than using infix operators everywhere, but again, tastes vary.

Unit binds aren't great

In particular, because we have no "monadic semicolon" in the syntax, you have to go from this:

  1. <expr1>
  2. >>= fun () ->
  3. <expr2>

to

  1. let%bind () = <expr1> in
  2. <expr2>

which is not ideal, since it's not parallel to the normal semicolon syntax for this outside of the monad. We've looked at making it possible to do something like:

  1. <expr1> ;%bind
  2. <expr2>

which would be more parallel with ordinary OCaml syntax, but that's not yet possible, and it's not clear it's a net win.

It changes how you think about interleaving in Async

In Async, when you write:

  1. load_something ()
  2. >>= fun x ->
  3. process_thing x

you can think of the point where interleaving can happen as the place where the bind operator is found. With let-syntax:

  1. let%bind x = load_something () in
  2. process_thing x

the location is different, and somewhat less obvious. My experience has been that this was easy to adjust to and hasn't tripped me up in practice, but it's a concern.

Idioms

A few thoughts on how to use let syntax effectively.

Let syntax for variables, infix for point-free

One might wonder whether there's any use for the infix operators once you are using Let_syntax. I believe the answer is yes. In particular, the style we've adopted is to use let syntax when binding a variable.

  1. let%bind x = some_expression in

and infix operators when going point-free, i.e., when not binding variables.

  1. let v = some_function x >>| ok_exn in

One special case of this is binding unit, where some people prefer to use the following pattern, since we don't have a nice monadic semi-colon yet.

  1. let%bind x = some_operation in
  2. some_other_operation >>= fun () ->
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

rather than:

  1. let%bind x = some_operation in
  2. let%bind () = some_other_operation in
  3. let%bind y = yet_another_thing () in
  4. a_final_thing y

Mixing monads

One change we made recently was to add the return function and the monadic infix operators to the Let_syntax module that one opens to choose a monad. This has the useful property of causing one to basically switch cleanly from one monad to another when you open the Let_syntax module. Mixing multiple monads in the same scope is hard to think about.

Command.Param and Deferred

A few interesting cases that come up are mixing the Command.Param syntax with the Deferred syntax. This one is pretty easy to solve, because you don't typically need to mix them together, really. It's just that in the body of the command, you often want Deferred, but in the definition of the command line parser, you want to use Command.Param. This can be handled by doing a local open of Command.Param.Let_syntax or Deferred.Let_syntax as necessary.

Deferred and Deferred.Result

A more complicated case is choosing between the Deferred and Deferred.Result monads. In Async, there are infix operators that let you use both sets of bind and map operators (basically, with question-marks at the end of the ordinary infix operators for the Deferred.Result operators.)

Mixing these operators together in a single scope can be a little awkward, often leaving people to add and remove question-marks until things compile. With let syntax, you really have to pick a single monad, which is easier to read, but then requires some changes in behavior. In particular, you often need to move things from one monad to another. For example, if you're in the Deferred monad and get a result of type Deferred.Or_error.t, you might want to do something like this:

  1. let open Deferred.Let_syntax in
  2. let%bind v = some_operation x y >>| ok_exn in

Here, mapping over ok_exn will take the error and raise it, if necessary. Similarly, if you're using an operation that's in the ordinary Deferred monad but you're operating in the Deferred.Result monad, you might want to lift that operation up, i.e.:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> Deferred.map ~f:(fun x -> Ok x) in

This is something of a mouthful, so we just added the Deferred.ok function, so on our latest release you can write:

  1. let open Deferred.Result.Let_syntax in
  2. let%bind v = some_other_operation x y |> Deferred.ok in

This idiom is useful is useful whether or not you're using let syntax.

ppx_core: context-free rewriters for better semantics and faster compilation

At Jane Street, we have always been heavy users of pre-processors, first with camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, but it save us a lot of time by taking care of a lot of tedious boilerplate code and in some case makes the code a bit prettier.

All in all, our standard set has 19 rewriters:

  • ppx_assert
  • ppx_bench
  • ppx_bin_prot
  • ppx_compare
  • ppx_custom_printf
  • ppx_enumerate
  • ppx_expect
  • ppx_fail
  • ppx_fields_conv
  • ppx_here
  • ppx_inline_test
  • ppx_js_style*
  • ppx_let
  • ppx_pipebang
  • ppx_sexp_conv
  • ppx_sexp_message
  • ppx_sexp_value
  • ppx_typerep_conv
  • ppx_variants_conv

These rewriters fall into 3 big categories:

  1. type driven code generators: ppx_sexp_conv, ppx_bin_prot, ...
  2. inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench
  3. convenience: ppx_sexp_value, ppx_custom_printf, ...

The first category is the one that definitely justify the use of pre-processors, until we get something better in the language itself.

With such a high number of code transformations, there is an important question of how they compose with each other. For instance what happens if the output of a ppx generates some code that is rewritten by another ppx?

Since the switch from camlp4 to ppx a year ago really, category 1 transformers were handled all at once as a whole-AST mapping pass by ppx_type_conv while all the other one were implemented as separate passes. With the previous list that means 13 passes, given that 7 of them are ppx_type_conv plugins. This means that the output depended on the order in which the various passes were applied.

Intuitively, one would think it's not a big deal, given that it is quite rare for a ppx to produce code that would be rewritten by another ppx. Still we ran into several issues over time:

  • Some ppx rewriters - such as ppx_inline_test that rewrites [%%test ...] extensions - captures a pretty-print of their payload, for debugging purposes. Depending on when ppx_inline_test is applied, the payload won't be the same, as it might have been expanded by other ppx rewriters, which is confusing for users.
  • A few ppx rewriters interpret the payload of a specific extension point as a DSL to be interpreted. This is the case of ppx_sexp_value and ppx_sexp_message. If another ppx messed with the payload before them, the result will be unspecified. We had such an issue with ppx_here: inside [%sexp ...], [%here] is interpreted by ppx_sexp_value and ppx_sexp_message and produces "<filename>:<line>:<column>", while outside it is interpreted by ppx_here and produces a record of type Lexing.position

Initially we dealt with these issues by using a specific order in the default set of rewriters, but that's more of a dirty hack than a real solution. Often developers are not aware of this and might end up using a wrong order when using a custom set of rewriters. Moreover this worked because we have control over the order with Jenga, but in opensource packages using oasis, ocamlbuild and ocamlfind we have no control over the final ordering.

But apart from the semantic problems, there is an obvious performance problem: all the transformations are local, but still we are doing 12 passes over the entire AST. What a waste of CPU time!

The different ways of composing ppx rewriters

Before jumping into the subject of this post, we recall a few of the various methods one can use to compose ppx rewriters.

Via separate process

The default method, that was adopted early by the community is to define each transformation as a separate executable. To compose them, one just has to call all the executables one by one. The main advantage of this approach is that each transformation is a black box and can do whatever dirty hacks it wants.

This is what you get when you are using a ppx by just putting the package name in your build system without doing anything special.

Via a driver

Another approach, that we developed at Jane Street is to link all the transformations into a single executable. For this to work properly all transformations must use the same framework. Technically they all register themselves with ppx_driver via a call to Ppx_driver.register_transformation. Ppx_driver is then responsible for composing them.

There are several advantages of the second approach: since ppx_driver has knowledge of all transformations, it can do extended checks such as making sure that all attributes have been interpreted. This helps detect typos, which in practice saves a lot of debugging time. But what really interest us in this post is that it can use more clever composition methods.

Code transformations using ppx_driver can still export a single executable compatible with the first method, that's why all Jane Street ppx rewriters can be used with both methods.

ppx_driver has an ocamlbuild plugin to simplify building custom drivers.

Context free transformations

Given that all transformations are local, it was clear that they should be defined as such; i.e. if all you want to do is turn [%sexp "blah"] into Sexp.Atom "blah", you don't need to visit the whole AST yourself. You just need to instruct whatever framework you are using that you want to rewrite [%sexp ...] extension points.

Context-free extension expander

We started with this idea a few month ago by adding an API in ppx_core to declare context-free extension expanders. For instance, this shows how you would declare a ppx that interpret an extension [%foo ...] inside expressions:

  1. open Ppx_core.Std
  2.  
  3. let ext =
  4. Extension.declare "foo" Expression
  5. Ast_pattern.(...)
  6. (fun ~path ~loc <parsed-payload...> -> <expansion>)
  7.  
  8. let () = Ppx_driver.register "foo" ~extensions:[ext]

The Ast_pattern.(...) bit describes what the extension expects as its payload.

Since ppx_driver knows about all the local extension expanders, it can expand them all in one pass over the AST. Moreover it can detect ambiguities and error out in such cases.

There was a choice to make as to whether rewrite the AST in a bottom-up or top-down manner. We choose top-down, to allow extension expanders to interpret their payload before anyone else, and so they can correctly implement a DSL.

This solved most of the initial issues and reduced the number of passes to 7:

  • all extension expanders
  • ppx_type_conv
  • ppx_custom_printf
  • ppx_expect
  • ppx_fail
  • ppx_pipebang
  • ppx_js_style

ppx_expect wasn't initially defined as a context-free extension expander for technical reasons.

Making everything context-free

Recently we went even further and added a Context_free module to Ppx_core to cover all of our transformations. It doesn't support all possible rewriting but support enough to implement a lot of common ones:

  • context-free extension expanders
  • some specific support to implement type-driven code generators
  • support for ppx rewriters interpreting a function application at pre-processing time, such as ppx_custom_printf that interprets !"<format>"

With this we reduced the number of passes to only 2:

  • context free transformations
  • ppx_js_style

ppx_js_style is still done in a separate pass for simplicity. It is run last to ensure we don't generate code that doesn't match our coding rules.

Now, whatever order developers specify their ppx in their build system, they will get the exact same output.

Seeing the exact passes

Ppx_driver got a new option to print what passes it will execute, for instance with ppx-jane which a standard driver containing all of the Jane Street ppx rewriters linked in (available in the ppx_jane package in opam):

$ ppx-jane -print-passes
<builtin:freshen-and-collect-attributes>
<bultin:context-free>
<builtin:check-unused-attributes>
<builtin:check-unused-extensions>

$ ppx-jane -print-passes -no-check
<bultin:context-free>

Safety checks are implemented as additional passes, that's why we see more than one passes by default.

Numbers

No performance comparison was done when introducing context free extension expanders, but we did some for the second stage, when we changed all ppx rewriters to use Context_free; processing a file with the resulting driver was twice as fast (check passes included).

But how does this compare to the more traditional method of running each rewriter in a separate process? To find out we did some benchmark by taking one of the biggest ml file in core_kernel (src/command.ml) and comparing the two methods. We put a type error on the first line to be sure we stop just after pre-processing.

For reference, following are the numbers for calling ocamlfind ocamlc on the file with no pre-processing:

$ time ocamlfind ocamlc -c command.ml
File "command.ml", line 1, characters 12-15:
Error: This expression has type char but an expression was expected of type
         int

real 0m0.022s
user 0m0.016s
sys  0m0.006s

To preprocess the file with ppx_jane as a single driver executable, one just has to pass one -ppx option, or a -pp option given that ppx_driver can be used either as a -ppx either as a -pp:

# via -ppx
$ time ocamlfind ocamlc \
  -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.095s
user 0m0.074s
sys  0m0.020s

# via -pp
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.091s
user 0m0.066s
sys  0m0.024s

# via -pp, with checks disabled
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.070s
user 0m0.051s
sys  0m0.018s

# via -pp, without merging passes
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.229s
user 0m0.206s
sys  0m0.022s

Using the other method turned out to be quite painful, given that the various ppx cannot share command line arguments, they had to be specified more than once:

$ time ocamlfind ocamlc -package ppx_jane \
  -ppxopt "ppx_inline_test,-inline-test-lib blah -inline-test-drop" \
  -ppxopt "ppx_bench,-inline-test-lib blah -bench-drop" \
  -ppxopt "ppx_expect,-inline-test-lib blah" \
  -c command.ml 2> /dev/null

real 0m0.339s
user 0m0.233s
sys  0m0.098s

So without surprise the single pass in a single executable method is really a lot faster.

Availability

This code is available on github. The context-free extension point API is already available in opam. The newer one is only in the git repository for ppx_core and ppx_driver. You can try them out by using our development opam repository. You should have a look at this if you care about how your rewriters are composed and/or if you care about compilation speed.

  • ppx_js_style is not currently released; it is an internal ppx that we use to enforce our coding standards.

Seven Implementations of Incremental

We finally got a decent recording of one of my favorite talks. This one is about our Incremental library (which I wrote about here), and in particular about the story of how we got to the present, quite performant, implementation.

It's not clear from the talk, but the work on this library wasn't done by me: The initial version was implemented by Stephen Weeks and Milan Stanojevic, and most of the intermediate versions were implemented by Nick Chapman and Valentin Gatien-Baron.

The high quality org-mode slides, though, are all me.

OCaml 4.03: Everything else

In my previous post I wrote about Flambda, which is the single biggest feature coming to OCaml in this release. In this post, I'll review the other features of 4.03 that caught my eye.

Inline records

Variants are my favorite thing about OCaml, and in this release, they're getting better. You've always been able to define variants with multiple arguments, e.g.:

  1. type shape =
  2. | Circle of float * float * float
  3. | Rect of float * float * float * float

But, as with this example, it can sometimes be a little hard to figure out what the meaning of the individual fields are, since they don't have labels. We can make this better by replacing our multi-argument variants with single argument variants containing approriately named records, as follows.

  1. type circle = { center_x: float;
  2. center_y: float;
  3. radius: float;
  4. }
  5.  
  6. type rect = { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }
  11.  
  12. type shape =
  13. | Circle of circle
  14. | Rect of rect

This works, but the separation of the record type from the variant definition is a little awkward. Beyond that, this approach imposes extra runtime costs. A multi-argument variant takes up just a single heap-allocated block, while a single argument variant containing a record takes two blocks.

With 4.03, we can have the best of both worlds by definining variants containing inline records. Here's what they look like.

  1. type shape =
  2. | Circle of { center_x: float;
  3. center_y: float;
  4. radius: float;
  5. }
  6. | Rect of { x_lo: float;
  7. y_lo: float;
  8. x_hi: float;
  9. y_hi: float;
  10. }

And we can write code that uses these types as follows:

  1. let area = function
  2. | Circle c -> 3.14159 *. c.radius *. c.radius
  3. | Rect r -> (r.x_hi -. r.x_lo) *. (r.y_hi -. r.y_lo)

Note, however, that the values r and c aren't quite first-class. For example, we can't use them away from the context of the match statement in which they're found. So this function:

  1. let maybe_rect = function Circle _ -> None | Rect r -> Some r

fails with the error This form is not allowed as the type of the inlined record could escape.

Even with this complexity, inlined records are really convenient.

Another advantage of inline records is that they allow us to express variants with mutable fields. This is useful in a variety of circumstances. In Core, we fake this using Obj.magic for our mutable AVL tree implementation, and new we can remove these hacks. Similar uses of Obj.magic were removed from OCaml's own imperative queue module in this release as well.

Uchar and result

A couple of useful types have been added to the standard library: Uchar.t and result. Uchar.t represents a unicode character, and is effectively just an int under the covers.

The result type is a type used for signaling success or failure, and has the following type.

  1. type ('a,'b) result = Ok of 'a | Error of 'b

Both of these are in some sense trivial, but they're there as a coordination point between different libraries. Lots of OCaml libraries have some analogue of result, and each OCaml Unicode library has its own character type. By including these in the standard library, it provides an easy point of agreement between different external libraries, while adding almost no complexity to the core distribution itself.

Better unboxing for externals

A small but valuable change: it's now possible to write C bindings through which one can pass unboxed versions of a number of different types, including floats, Int64's, Int32's and Nativeint's. This was previously possible only for calls that took only floats and returned a float. This makes it easier to write efficient, zero-allocation C bindings in a wider variety of cases.

GC Latency improvements

I talked about most of this here, so I won't go into great detail now. But the news is that the changes that Damien Doligez did during his stay at Jane Street have finally made it upstream.

Ephemerons

To a first approximation, you can think of a GC as a machine that determines what memory can be reclaimed by figuring out what data is reachable, and then reclaiming everything else. The simplest notion of reachability counts everything reachable by following pointers, starting at the GC roots, which are mostly just the values on the call stack.

This notion of reachability is a bit too restrictive, however. In particular, in some circumstances you want to keep pointers to objects without preventing those objects from being reclaimed. This is useful for some kinds of caching, where you want to cache previously computed values for as long as they're referenced, but no longer.

OCaml has long had an answer to this problem, which is the notion of a weak pointer. Weak pointers are references that aren't counted when computing reachability. When an object that is pointed to by a weak reference is collected, the weak reference itself is nulled out.

Weak references are good enough for some purposes (e.g. hash-consing), but they can be awkward for many use cases. One basic use-case for which weak references are an awkward fit is memoizing a function, where one wants to keep entries in the table as long as the input to the function in question is still reachable.

You could imagine just keeping the key of the hash-table in a weak pointer, and then using a finalizer to remove the entry in the table once the key gets collected. But this fails if there is a reference from the output of the function back to the key, which is common enough.

Ephemerons were proposed back in 97 by Barry Hayes to solve just this problem. The idea is pretty simple: an ephemeron has multiple keys and a single data entry. When determining whether the keys are alive, one doesn't count references from values that are reachable only via the ephemeron, so the references from the data back to the key don't keep the key alive. Also, once any of the ephemerons keys are collected, the key and the data element are removed immediately. Weak pointers are now just a special case of ephemerons --- a weak pointer is effectively just an epehmeron with no data.

Ephemerons don't come up that much, but if you need to build a memoization table, ephemerons make the task simpler and the result less leak-prone.

Licenses, Github and OCamlbuild

A few organizational changes are landing in 4.03 that I think are worth noting. First, OCaml development is officially moving from Subversion to Git, with Github as the primary coordination point for OCaml development.

Second, OCaml's license is changing from the somewhat awkward and rarely used QPL, to LGPLv2 with a linking exception. The latter had already been used for various libraries distributed with the compiler. But, acknowledging the fact that more and more of the guts of the compiler are being used as libraries for a variety of reasons, it was recently decided to move everything to LGPLv2.

And finally, ocamlbuild is being moved out of the core OCaml distribution. This is one in a series of decisions to break out software that was previously bundled together with OCaml. This allows the core team to focus more on the core compiler itself, and makes the exported projects freer to make changes on whatever schedule they see fit.

I think keeping the OCaml distribution lean is an excellent strategic decision, allowing the core team to focus on the compiler itself, and allowing tooling and libraries to be worked on independently.

Together, these are all small changes. But they're part of a trend towards making OCaml development simpler and more agile.

What's missing

There are a number of much anticipated features that haven't made it into this release. In particular, the multicore GC, which at one point had been expected to land in 4.03, has been pushed back, likely to 4.04. Algebraic effects, which are in part motivated by the multicore GC, also didn't make it.

And finally, modular implicits, which are intended to provide typeclass-like functionality for OCaml, are not in this release either. That said, they're being actively worked on, and I'm expecting there will be more news about those in the next six months.

All in, it's been a pleasure watching this release take shape. And just as important as what got in is what didn't. Despite a greatly increased rate of change, the process is quite conservative --- nonsense features just don't make it in. My gratitude goes out to the core team for their safekeeping of the language.

A better inliner for OCaml, and why it matters

OCaml 4.03 is branched and a first release candidate is imminent, so it seems like a good time to take stock of what's coming.

This post will focus on just one of those features: Flambda, a new IR (intermediate representation) in the depths of the compiler designed to allow for better inlining, paired with a set of optimizations leveraging that IR.

Why inlining matters

If your expectations about inlining come from a language like C, you might not be all that excited by Flambda. In C, after all, the benefits of inlining are relatively small, mostly allowing one to avoid function call overhead. That's useful, but has limited impact.

In a language like OCaml, inlining is about more than just function call overhead. That's because inlining grows the amount of code that the optimizer can look at a given point, and that makes other optimizations more effective. And there are simply more optimizations available in a language like OCaml than in one like C. In particular, many allocations can be optimized away, which can have a rather big effect on the performance of a program.

To get a sense of how we can eliminate allocations, consider this simple example.

  1. let rec fold l ~init ~f =
  2. match l with
  3. | [] -> init
  4. | hd :: tl -> fold tl ~init:(f init hd ) ~f
  5.  
  6. let pow_sum l n =
  7. fold l ~init:0 ~f:(fun x y -> pow x n + pow y n)

In the current compiler, every invocation of pow_sum will allocate a closure for the function f. With Flambda, this code should be allocation free, since the fold can be inlined into pow_sum, f can be inlined into the body of the now inlined fold, and the fold can be specialized by adding an extra argument, thus making the closure allocation unnecessary.

Here's another example. Imagine you wanted to sum the first and last elements of a list. You could do this using functions in the list module and a simple pattern match, as follows.

  1. let first_plus_last =
  2. match List.hd l, List.last l with
  3. | None, _ | _, None -> None
  4. | Some x, Some y -> Some (x + y)

Without any inlining, this function creates three heap-allocated values, which are the various options being returned.

Now, let's consider what happens if we rewrite this code using the option monad.

  1. let (>>=) o f =
  2. match o with
  3. | None -> None
  4. | Some x -> f x
  5.  
  6. let first_plus_last l =
  7. List.hd l >>= fun x ->
  8. List.last l >>= fun y ->
  9. Some (x + y)

Or, equivalently, using our ppx_let syntax:

  1. let first_plus_last l =
  2. let%bind x = List.hd l in
  3. let%bind y = List.last l in
  4. Some (x + y)

Without any inlining, this implementation will create five heap allocated values: one for each of the closures on the right hand side of the bind, one for the options returned from List.hd and List.tl respectively, and once for the final Some.

It's of course possible to write a version that only allocates one object, for the return value.

  1. let first_plus_last l =
  2. match l with
  3. | [] -> None
  4. | x::_ ->
  5. let rec last_exn = function
  6. | [] -> assert false
  7. | [y] -> y
  8. | _ :: tl -> last_exn tl
  9. in
  10. Some (x + last_exn l)

But this is obviously uglier and harder to read than either of the earlier versions.

With Flambda (and the appropriate flags, i.e., (-O3 -unbox-closures), these examples all allocate the same amount. That's because, once the inlining is done and all the code is in one place, the compiler can do things like observe that the option returned from List.hd is immediately deconstructed and is never used for anything else. As a result, the allocation of the options can be removed entirely.

This example highlights why these kinds of compiler optimizations are valuable. It's not that they're strictly necessary for good performance --- usually, pretty good performance can be achieved with sufficiently ugly code. But what good optimizations can do for us is let us write simpler, easier to understand code, without sacrificing performance.

A basis for future optimizations

In addition to the optimizations that will be there when 4.03 lands, Flambda will also make it easier to build new optimizations. One example comes from a project that was done last summer by Will Crichton, one of our interns. The aim of the project was to make the kind of allocation removal I described above more predictable.

From how I described it above, it might seem like Flambda's ability to remove allocations relies on the ability to inline the allocating functions in their entirety. That would be problematic, because what is and isn't inlined can be a bit unpredictable. That's because there have to be some heuristic thresholds, since inlining everything can lead to excessively large executables.

As such, depending on inlining makes it harder to predict the allocation behavior of your code, since it now hinges on the heuristics for determining when a function should be inlined. The situation isn't quite so grim --- some of the optimizations that come along with Flambda (like -unbox-closures) don't depend so critically on inlining. But still, many of Flambda's optimizations do depend on inlining, and as such whether they hit becomes harder to predict.

But we can make this better. The specific project the Will worked on had to do with allocations that come from returning small objects that are immediately deconstructed. With Flambda as it is today, such allocations are only removed if the both the construction and deconstruction of the value are together, which often requires inlining. Will's project made this more robust by changing the calling conventions in Flambda, specifically by adding a first class multi-argument return to the Flambda IR.

With a multi-argument return in place, a function that returns a small tuple, say, could instead be compiled to return the components of the tuple as individual values to be passed back on the stack. Then, this function can be wrapped with a small function that picks those values off the stack and allocates the necessary object.

This small wrapper function is by design small enough to be inlined reliably. Once that inlining is done, both the construction and deconstruction of the value will be visible to the optimizer, and they can be collapsed. This should work whether or not the main function is inlined.

This is really just one example. Our hope and expectation is that Flambda will be the basis of many improvements in the compilation of OCaml over the next few years.

Impact

The performance impact we've seen from our experiments with Flambda seem pretty promising so far. On real world applications we've tested, it's pretty normal to see allocations reduced by 20-30%. We've found similarly sized improvements in application latency as well. And we think that these numbers will improve yet more as more optimizations are added on top of Flambda.

Beyond improving existing programs, we already are seeing how Flambda allows us to write prettier and cleaner code without compromising on performance. For example, Flambda allows us to make freer use of OCaml's functors, which are a great abstraction tool, but one that imposed significant costs pre-Flambda. With Flambda, functors can simply be inlined away.

Flambda makes things easier in other ways too. For example, we're in the processing of developing a new internal protocol for zero-allocation messaging, which requires a lot of code generation, via PPX. Being able to rely on Flambda greatly simplifies that code generation, since it lets us write a fairly naive code generator, which generates efficient code because of Flambda. We can write code that computes offsets into the message naively, and Flambda, via a combination of inlining and constant folding, moves the computation of these offsets to compile time, so the runtime field access becomes a simple lookup.

Even if you're perfectly happy with OCaml's performance as is, Flambda is still an exciting change. That's because various upcoming language improvements like modular implicits (a feature that brings some of the same benefits as Haskell's typeclasses) will only really perform acceptably well with a good inliner in place. So in addition to making your current programs run faster, Flambda also enables new abstractions that can make your future programs easier to read and write.

One downside of all this is that it reduces the predictability of OCaml's performance, which is an important property of the language. Part of how we hope to address this is by improving Flambda's heuristics to be more predictable, but that's unlikely to be enough on its own. That's why OCaml 4.03 comes with new annotations that let the programmer require or prohibit inlining for a particular function.

Over time, we hope to find a good balance, where the heuristics do a pretty good and pretty predictable job by default, but where we give programmers hooks to control things more precisely where it matters.

Thanks

Getting Flambda this far has been a big project, and a lot of people were involved. OCamlPro did a lot of the heavy lifting, with Pierre Chambart writing most of the compiler code, and Louis Gesbert and Michel Laporte working on benchmarking. Jane Street jumped in too, with Mark Shinwell and Leo White contributing mightily to the code and design as part of the review and stabilization process, and Jeremie Dimino helping with testing. And a number of other people on the core team (notably Alain Frisch, Gabriel Scherer, Runhang Li, Damien Doligez) contributed in a variety of ways to the final product. Thanks to all involved!

4