Caveat Configurator: how to replace configs with code, and why you might not want to

We have a new tech talk coming up on May 17th, from our very own Dominick LoBraico. This one is about how to represent configurations with programs. In some sense, this is an obvious idea. Lots of programmers have experienced the dysphoria that comes from watching your elegant little configuration format metamorphize into a badly constructed programming language with miserable tools. This happens because, as you try to make your configs clearer and more concise, you often end up walking down the primrose path of making your config format ever more language-like. But you never really have the time to make it into a proper language.

The obvious alternative is to just use a real language, one that comes with decent tooling and well-designed abstractions. (And ideally, a functional language, because they tend to be better at writing clear, declarative code.)

This talk discusses what happens when you try to put this obvious idea into practice, which is less, well, obvious. I like this kind of topic because it represents the kind of hard-won knowledge you can only get by trying something and screwing it up a few times.

So please join us! You can register here.

Trivial meta-programming with cinaps

From now and then, I found myself having to write some mechanical and repetitive code. The usual solution for this is to write a code generator; for instance in the form of a ppx rewriter in the case of OCaml code. This however comes with a cost: code generators are harder to review than plain code and it is a new syntax to learn for other developers. So when the repetitive pattern is local to a specific library or not widely used, it is often not worth the effort. Especially if the code in question is meant to be reviewed and maintained by several people.

Then there is the possibility of using a macro pre-processor such as cpp or cppo which is the equivalent of cpp but for OCaml. This can help in some cases but this has a cost as well:

  • macros generally make the code harder to read
  • errors tends to be harder to understand since they don't point where you'd expect
  • you can say goodbye to merlin

In fact, when the repetitive pattern is specific to one particular case and of reasonable size, committing and reviewing the generated code is acceptable. That's the problem Cinaps tries to solve.

What is cinaps?

Cinaps is an application that reads input files and recognize special syntactic forms. Such forms are expected to embed some OCaml code printing something to stdout. What they print is compared against what follow these special forms. The rest works exactly the same as expectation tests.

The special form is (*$ <ocaml-code> *) for ml source files, /*$ <ocaml-code> */ for C source files and #|$ <ocaml-code> |# for S-expression files.

For instance:

$ cat
let x = 1
(*$ print_newline ();
    List.iter (fun s -> Printf.printf "let ( %s ) = Pervasives.( %s )\n" s s)
      ["+"; "-"; "*"; "/"] *)
let y = 2

$ cinaps
File "", line 5, characters 0-1:
  let x = 1
  (*$ print_newline ();
      List.iter (fun s -> Printf.printf "let ( %s ) = Pervasives.( %s )\n" s s)
        ["+"; "-"; "*"; "/"] *)
+|let ( + ) = Pervasives.( + )
+|let ( - ) = Pervasives.( - )
+|let ( * ) = Pervasives.( * )
+|let ( / ) = Pervasives.( / )
  let y = 2

$ echo $?
$ cp
$ cinaps
$ echo $?

Real example

What follows is a real example where using Cinaps made the code much easier to write and maintain. However, I changed the names for this blog post since this code is not released publicly. Note also that this example shows one way we usually write C bindings at Jane Street. It is not meant as a model of how to write C bindings, and the excellent ctypes library should be the default choice in most cases. However, this code pre-dates ctypes and migrating it would be quite a lot of work.

The example itself is part of a C binding that I wrote a few years ago. While doing so I used Core.Flags in order to represent a few C enumerations on the OCaml side. Core.Flags is a module providing a nice abstraction for representing C flags.

The OCaml code looks like what you'd expect from code using Core.Flags:

  1. module Open_flags = struct
  2. external get_rdonly : unit -> Int63.t = "mylib_O_RDONLY" [@@noalloc]
  3. external get_wronly : unit -> Int63.t = "mylib_O_WRONLY" [@@noalloc]
  4. external get_rdwr : unit -> Int63.t = "mylib_O_RDWR" [@@noalloc]
  5. external get_nonblock : unit -> Int63.t = "mylib_O_NONBLOCK" [@@noalloc]
  6. external get_append : unit -> Int63.t = "mylib_O_APPEND" [@@noalloc]
  7. external get_creat : unit -> Int63.t = "mylib_O_CREAT" [@@noalloc]
  8. external get_trunc : unit -> Int63.t = "mylib_O_TRUNC" [@@noalloc]
  9. external get_excl : unit -> Int63.t = "mylib_O_EXCL" [@@noalloc]
  10. external get_noctty : unit -> Int63.t = "mylib_O_NOCTTY" [@@noalloc]
  11. external get_dsync : unit -> Int63.t = "mylib_O_DSYNC" [@@noalloc]
  12. external get_sync : unit -> Int63.t = "mylib_O_SYNC" [@@noalloc]
  13. external get_rsync : unit -> Int63.t = "mylib_O_RSYNC" [@@noalloc]
  15. let rdonly = get_rdonly ()
  16. let wronly = get_wronly ()
  17. let rdwr = get_rdwr ()
  18. let nonblock = get_nonblock ()
  19. let append = get_append ()
  20. let creat = get_creat ()
  21. let trunc = get_trunc ()
  22. let excl = get_excl ()
  23. let noctty = get_noctty ()
  24. let dsync = get_dsync ()
  25. let sync = get_sync ()
  26. let rsync = get_rsync ()
  28. include Flags.Make(struct
  29. let known =
  30. [ rdonly , "rdonly"
  31. ; wronly , "wronly"
  32. ; rdwr , "rdwr"
  33. ; nonblock , "nonblock"
  34. ; append , "append"
  35. ; creat , "creat"
  36. ; trunc , "trunc"
  37. ; excl , "excl"
  38. ; noctty , "noctty"
  39. ; dsync , "dsync"
  40. ; sync , "sync"
  41. ; rsync , "rsync"
  42. ]
  43. let remove_zero_flags = false
  44. let allow_intersecting = false
  45. let should_print_error = true
  46. end)
  47. end

And there are about 3 modules like this in this file, plus the corresponding stubs in the C file. Writing this code initially was no fun, and adding new flags now that the C library has evolved is still no fun.

The rest of this section explains how to make it more fun with cinaps.

Setting up and using cinaps

First I add a rule in the build system to call cinaps appropriately. I use a few settings specific to our jenga based builds and it is currently not possible to replicate this outside of Jane Street, but assuming you have a Makefile, you can write:

.PHONY: cinaps
    cinaps -i src/*.ml src/*.c

Now whenever you call make cinaps, all the files will be updated in place. You can then do git diff to see what changed.

Then I write a file src/cinaps_helpers. It is plain OCaml source file, however it is not suffixed with .ml so that it is not confused with a regular module of the library. It contains the various bits that are common between the ml/C files in the library:

  1. (* -*- tuareg -*- *)
  3. let stub_prefix = "mylib_"
  4. let stub name = stub_prefix ^ name
  6. let open_flags =
  7. [ "O_RDONLY"
  8. ; "O_WRONLY"
  9. ; "O_RDWR"
  10. ; "O_NONBLOCK"
  11. ; "O_APPEND"
  12. ; "O_CREAT"
  13. ; "O_TRUNC"
  14. ; "O_EXCL"
  15. ; "O_NOCTTY"
  16. ; "O_DSYNC"
  17. ; "O_SYNC"
  18. ; "O_RSYNC"
  19. ]
  21. let other_flags =
  22. [ ...
  23. ]
  26. let yet_other_flags =
  27. [ ...
  28. ]
  30. let all_flags = open_flags @ other_flags @ yet_other_flags
  32. open StdLabels
  33. open Printf
  34. let pr fmt = printf (fmt ^^ "\n")
  36. let flags_module module_name flags ~prefix ~allow_intersection =
  37. <code to print an Open_flags like module>

Now, in my original .ml file, I can write:

  1. (*$ #use "cinaps_helpers" $*)
  3. (*$ flags_module "Open_flags" open_flags ~prefix:"O_" ~allow_intersecting:false *)
  4. module Open_flags = struct
  5. external get_rdonly : unit -> Int63.t = "mylib_O_RDONLY" [@@noalloc]
  6. external get_wronly : unit -> Int63.t = "mylib_O_WRONLY" [@@noalloc]
  7. external get_rdwr : unit -> Int63.t = "mylib_O_RDWR" [@@noalloc]
  8. external get_nonblock : unit -> Int63.t = "mylib_O_NONBLOCK" [@@noalloc]
  9. external get_append : unit -> Int63.t = "mylib_O_APPEND" [@@noalloc]
  10. external get_creat : unit -> Int63.t = "mylib_O_CREAT" [@@noalloc]
  11. external get_trunc : unit -> Int63.t = "mylib_O_TRUNC" [@@noalloc]
  12. external get_excl : unit -> Int63.t = "mylib_O_EXCL" [@@noalloc]
  13. external get_noctty : unit -> Int63.t = "mylib_O_NOCTTY" [@@noalloc]
  14. external get_dsync : unit -> Int63.t = "mylib_O_DSYNC" [@@noalloc]
  15. external get_sync : unit -> Int63.t = "mylib_O_SYNC" [@@noalloc]
  16. external get_rsync : unit -> Int63.t = "mylib_O_RSYNC" [@@noalloc]
  18. let rdonly = get_rdonly ()
  19. let wronly = get_wronly ()
  20. let rdwr = get_rdwr ()
  21. let nonblock = get_nonblock ()
  22. let append = get_append ()
  23. let creat = get_creat ()
  24. let trunc = get_trunc ()
  25. let excl = get_excl ()
  26. let noctty = get_noctty ()
  27. let dsync = get_dsync ()
  28. let sync = get_sync ()
  29. let rsync = get_rsync ()
  31. include Flags.Make(struct
  32. let known =
  33. [ rdonly , "rdonly"
  34. ; wronly , "wronly"
  35. ; rdwr , "rdwr"
  36. ; nonblock , "nonblock"
  37. ; append , "append"
  38. ; creat , "creat"
  39. ; trunc , "trunc"
  40. ; excl , "excl"
  41. ; noctty , "noctty"
  42. ; dsync , "dsync"
  43. ; sync , "sync"
  44. ; rsync , "rsync"
  45. ]
  46. let remove_zero_flags = false
  47. let allow_intersecting = false
  48. let should_print_error = true
  49. end)
  50. end
  51. (*$*)

And cinaps will check that the text between the (*$ ... *) and (*$*) forms is what is printed by flags_module "Open_flags" .... I write something similar in the .c file. Note the initial (*$ ... $*) form, which is not expected to print anything and is only used for its other side effects.

Adding new flags become trivial: add it to the list in src/cinaps_helper and execute make cinaps.

Pushing the system

Now I decide that I don't like the fact that all my constant flags are initialized at runtime and I want them to be static constant on the ml side. A simple way to do this is to write a C program that include the right headers and output a .ml file defining these constants. I use cynaps to write this C file as well:

  1. /*$ #use "cinaps_helpers" $*/
  3. #include <stdio.h>
  5. #include <sys/types.h>
  6. #include <sys/stat.h>
  7. #include <fcntl.h>
  9. int main()
  10. {
  11. printf("open Core\n");
  12. printf("let mk = Int63.of_int_exn\n");
  13. /*$
  14.   printf "\n";
  15.   let len = longest all_flags in
  16.   List.iter all_flags ~f:(fun f ->
  17.   pr {| printf("let _%-*s = mk %%d\n", %-*s);|} len f len f );
  18.   printf " " */
  19. printf("let _O_RDONLY = mk %d\n", O_RDONLY );
  20. printf("let _O_WRONLY = mk %d\n", O_WRONLY );
  21. printf("let _O_RDWR = mk %d\n", O_RDWR );
  22. printf("let _O_NONBLOCK = mk %d\n", O_NONBLOCK);
  23. printf("let _O_APPEND = mk %d\n", O_APPEND );
  24. printf("let _O_CREAT = mk %d\n", O_CREAT );
  25. printf("let _O_TRUNC = mk %d\n", O_TRUNC );
  26. printf("let _O_EXCL = mk %d\n", O_EXCL );
  27. printf("let _O_NOCTTY = mk %d\n", O_NOCTTY );
  28. printf("let _O_DSYNC = mk %d\n", O_DSYNC );
  29. printf("let _O_SYNC = mk %d\n", O_SYNC );
  30. printf("let _O_RSYNC = mk %d\n", O_RSYNC );
  31. /*$*/
  32. return 0;
  33. }

Updating the various flag modules in the the ml code is as simple as editing src/cinaps_helpers and doing make cinaps:

  1. (*$ flags_module "Open_flags" open_flags ~prefix:"O_" ~allow_intersecting:false *)
  2. module Open_flags = struct
  3. let rdonly = Consts._O_RDONLY
  4. let wronly = Consts._O_WRONLY
  5. let rdwr = Consts._O_RDWR
  6. let nonblock = Consts._O_NONBLOCK
  7. let append = Consts._O_APPEND
  8. let creat = Consts._O_CREAT
  9. let trunc = Consts._O_TRUNC
  10. let excl = Consts._O_EXCL
  11. let noctty = Consts._O_NOCTTY
  12. let dsync = Consts._O_DSYNC
  13. let sync = Consts._O_SYNC
  14. let rsync = Consts._O_RSYNC
  16. include Flags.Make(struct
  17. let known =
  18. [ Consts._O_RDONLY , "rdonly"
  19. ; Consts._O_WRONLY , "wronly"
  20. ; Consts._O_RDWR , "rdwr"
  21. ; Consts._O_NONBLOCK , "nonblock"
  22. ; Consts._O_APPEND , "append"
  23. ; Consts._O_CREAT , "creat"
  24. ; Consts._O_TRUNC , "trunc"
  25. ; Consts._O_EXCL , "excl"
  26. ; Consts._O_NOCTTY , "noctty"
  27. ; Consts._O_DSYNC , "dsync"
  28. ; Consts._O_SYNC , "sync"
  29. ; Consts._O_RSYNC , "rsync"
  30. ]
  31. let remove_zero_flags = false
  32. let allow_intersecting = false
  33. let should_print_error = true
  34. end)
  35. end
  36. (*$*)

Tweak: indenting the generated code

You can either write cinaps code that produce properly indented code, or you can use the styler option:

.PHONY: cinaps
    cinaps -styler ocp-indent -i src/*.ml src/*.c

History behind the name

I initially wrote this tool while I did some work on the ocaml-migrate-parsetree project. ocaml-migrate-parsetree was started by Alain Frisch and continued by Frederic Bour and aims at providing a solid and stable base for authors of ppx rewriters or other tools using the OCaml frontend. I helped a bit during development and did some testing on a large scale while rebasing our ppx infrastructure on top it.

Due to its nature, this project contains a lot of repetitive code that cannot be factorized other than by using some kind of meta-programming. Initially we had a small pre-preprocessor that was interpreting a made-up syntax and was working like cpp does. The syntax was yet another DSL and the generated code was generated on the fly. This made the .ml and .mli files harder to understand since you had to decode this DSL in order to understand what the code was.

Cinaps replaced this tool and the name was chosen to emphasize that it is not a preprocessor. It means "Cinaps Is Not A Preprocessing System".


Cinaps is published on github and is part of the upcoming v0.9 Jane Street release. The version that is published doesn't yet support the C/S-expression syntaxes but once the stable release has gone through, an updated version of Cinaps supporting these syntaxes will be released.

One more talk, two more videos

I'm happy to announce our next public tech talk, called Seven Implementations of Incremental, on Wednesday, April 5th, presented by yours truly. You can register here.

The talk covers the history of Incremental, a library for building efficient online algorithms. The need to update computations incrementally is pretty common, and we've found Incremental to be useful in creating such computations in a number of different domains, from constructing efficient financial calculations to writing responsive, data-rich web UIs.

The ideas behind Incremental aren't new with us; there is a lot of prior art, most notably the work from Umut Acar's work on self-adjusting computations, on which Incremental is most directly modeled.

But there's a big gap between the academic version of an idea and a production ready instantiation, and this talk is about crossing that gap. It discusses the 7 different implementations we went through and the various mistakes we made along the way towards the current one we use in production.

So join us. I hope you enjoy seeing what we learned about building this kind of system, as well as hearing about the hilarious pratfalls along the way.

On another note, we have finally posted videos from our two previous talks, including Brian Nigito's talk on the architecture of the modern exchange, and Arjun Guha's talk on taming Puppet. And, of course, you can subscribe to our channel while you're there.

What a Jane Street dev interview is like

Are you thinking about applying to Jane Street for a developer role? Or already have a phone interview scheduled but unsure what to expect? Read on as we walk through an example phone interview with you.

We want to give you some insight into what a typical Jane Street phone interview looks like and give you a chance to prepare. We're going to take a look at a question we call "Memo" which we used to ask regularly (but of course don't ask anymore so no need to memorize anything on this page!). As such this post is meant to be a specific case analysis. If you haven't yet seen it, we recommend reading this blog post for a general overview what we are looking for in candidates.

Getting started

To allow us to work on the question together, we'll use a shared online editor. We'll provide you with the link to use (either before hand or during the interview).

We expect you to write code in a real programming language in the interview, not pseudo-code. You can use any programming language you'd like, but we strongly recommend you use the one you're most familiar with and will let you solve the problem in the best way (it's fine to change your mind as you're exploring the problem!). You should be comfortable with basic data structures and APIs in the language of your choice.

Note, there are no bonus points for using a functional language like OCaml. Please don't use OCaml just because you think it will make us happy. We want to see you at your best, so you should use the language you're most comfortable with. If OCaml isn't your strongest language, then don't use it.

If during the interview you realize you have seen a question before, then please let us know and we can do a different question. As you'll see in this post, knowing a question in advance greatly reduces what we can learn about you!

Part 1 - Basic coding

Have you heard about memoization? Can you carefully describe what it is? If you haven't heard about it, don't worry. We'll bring you up to speed. (A good introduction is on Wikipedia.)

Now let's say there is a function f of type int -> int whose output only depends on the input. f is very expensive to compute. We'd like you to write a memoized version of this function, i.e. another function g of the same type, that returns the same values – g(x) = f(x) for all x – but only does the expensive computation once for each input value.

A typical first solution we're looking for at this stage uses a hash-table to store calculated results. A possible solution in OCaml might be:

  1. let memo f =
  2. let results = Hashtbl.create 256 in
  3. (fun input ->
  4. match Hashtbl.find results input with
  5. | None ->
  6. let result = f input in
  7. Hashtbl.add results ~key:input ~data:result;
  8. result
  9. | Some result -> result)

(As I said above, you're not required or expected to write in OCaml but in this blog post I'm going to follow my own advice and use the language I'm most familiar with, which is OCaml. You might also object that this does a test-and-set without a lock, so can't possibly be thread-safe. Nice spot! For the purpose of this question let's ignore re-entry to focus on the core ideas.)

Whichever APIs or data structures you end up using for your solution: you should be prepared to talk about how they work and what the complexity of various operations is.

Part 2 - Reasoning about and improving your code

Can you think of any issues we could run into when using the function from part 1? For example, let's say we run your function in production and notice our application performs significantly worse than before. Quite the opposite from what we hoped memoization would do! Can you see what the problem might be?

The big problem is memory usage. Our application might call f with lots of different inputs and each result will be stored in the hashtable forever – a memory leak! Can you come up with some ideas to improve upon this?

A reasonable approach to control the memory usage is to bound the size of the hash-table and to evict things from it in a FIFO fashion. What trade-offs does FIFO have versus other eviction schemes? How could you modify your memo function to implement FIFO? Let's aim for O(1) complexity when evicting from the cache.

There are a few different ways to do this. A good solution keeps a separate queue: when adding a new result to your hashtable, if the size grows beyond the bound, then dequeue from the queue and remove that element from the hashtable.

Besides being able to write code to do this, we look for how you communicate your thoughts on the problem and ideas to improve it. We don't necessarily expect every candidate to immediately jump to the O(1) solution, but we're interested in the process of talking through this problem and what you can come up with.

Part 3 - Going further

As you probably realize, FIFO can be very inefficient in some use-cases. Let's say we want to do LRU (least recently used) instead. We still want the implementation to be as efficient as possible. How can you implement this?

Once more, there are multiple ways to do this. The easiest solution stores timestamps with each value in the hashtable and linearly scans through the table when evicting. This is O(n). It's possible to improve to O(log n) using a min-heap or even to O(1) using a doubly-linked list.

Side-note: implementing the most efficient solution One way to get to O(1) is to maintain a doubly linked list and make the values in the hash-table point to elements in that list. Then when looking up a cached value in the hash-table, you can slice it out of its current position in the list and put it at the top in O(1). You maintain a separate pointer to the bottom of the doubly linked list to evict in O(1).

Few candidates come up with the doubly-linked list immediately, so don't worry if this is not something you thought of straight away. While we might ask you to implement parts of your solution, this part is intended as a discussion and test your ideas for improving complexity. We'll guide you to the right level of abstraction, so you don't have to worry too much about which details to include.

We also have various other extensions for this question that make it possible to keep going further as far as time allows.

What we look for

Again, for a good overview see this post.

While interviewing, we try to evaluate how well you would fit in our work environment by collaboratively solving a problem. This means the journey through the interview is a lot more important than the snapshot of the solution at the end of it. We are more interested in seeing how you approach a difficult problem than just capturing a boolean flag whether you can come up with the solution.

As a concrete example, we might prefer a candidate that wrote careful and clear code, communicated well and had good insights and ideas along the way, but didn't get as far through the problem compared to another candidate that immediately solved every part but was sloppy and hard to follow.

This makes it impossible to give a rigid set of requirements of what needs to be achieved during the interview. Nevertheless, to give you some rough guidelines:

Every successful candidate should achieve a bug-free solution for part 1 relatively quickly. If some of part 1 sounds unfamiliar to you, it might be better to hold off applying to give yourself more time to prepare.

Most candidates that we pass also complete part 2 fully in the time of the interview. Strong candidates finish part 3 with a complete solution, but not finishing this part doesn't mean that you will be rejected! As said above, it's what happens during the interview that really counts, not the result at the end.

Sharing questions afterwards

We are interested in seeing you approach a difficult problem and walk through the process of deriving a solution with you. If you already know the question in advance (either by finding it online, or by talking to friends who have previously applied), it reduces the efficacy of the interview: We now only get to test if you can write code for a problem you've already understood and solved, which is a small subset of things we are interested in learning about. For example, if you've read through this post, we wouldn't learn much from asking you Memo!

Thus we would like to ask you not to share the interview question with anybody else (in person or online).

What happens next

We usually try to get back to you about the outcome of the interview within one week. During busy periods this might take a bit longer, but not hearing from us after 10 days is unexpected and you should definitely poke us.

If the interview went well, we invite you to come to one of our offices (New York, London or Hong Kong) for a day of onsite interviews. These proceed in much the same way as the phone interview – all questions are technical.

Ready to apply?

If you read this post and feel ready to apply, then simply go here. We have a very straightforward application process – all we need is your resume or CV. We look forward to interviewing you!

Jane Street Tech Talks: Verifying Puppet Configs

Our first Jane Street Tech Talk went really well! Thanks to everyone who came and made it a fun event.

Now it's time for another. We're planning for the series to feature a combination of voices from inside and outside Jane Street. This one is of the latter variety: on March 6th, Arjun Guha will be presenting On Verification for System Configuration Languages, which is about using static verification techniques for catching bugs in Puppet configs.

I've known Arjun for years, and he's a both a good hacker and a serious academic with a real knack for finding good ways of applying ideas from programming languages to systems problems. Also, he has excellent taste in programming languages...

I hope you'll come! You can sign up here.

How to Build an Exchange

UPDATE: We are full up. Tons of people signed up for the talk, and we're now at the limit of what we feel like we can support in the space. Thanks for all the interest, and if you didn't get into this one, don't worry, we have more talks coming!

We're about to do the first of what will hopefully become a series of public tech talks in our NY office.

The first talk is on February 2nd, and is an overview of the architecture of a modern exchange. The talk is being given by Brian Nigito, and is inspired by our work on JX, a crossing engine built at Jane Street. But Brian's experience is much broader, going all the way back to the Island ECN, which in my mind marks the birth of the modern exchange.

I did some work on JX, and one of the things I was struck by is the role that performance plays in the design. In particular, JX uses a simple replication scheme based on reliable multicast that relies critically on the components of the system having high throughput and low, deterministic latencies.

This is a situation where performance engineering is done not so much for reducing end-to-end latency, but instead to act as a kind of architectural superpower, making it possible to build systems in a simpler and more reliable way than would be possible otheriwse.

Anyway, I think it's a fascinating topic. If you're interested in coming, you can go here to get the details and sign up. (We have a registration step so we can get people through building security.)

A brief trip through Spacetime

Spacetime is a new memory profiling facility for OCaml to help find space leaks and unwanted allocations. Whilst still a little rough around the edges, we've found it to be a very useful tool. Since there's not much documentation for using spacetime beyond this readme, I've written a little intro to give people an idea of how to use it.

Generating a profile

As an example of Spacetime in action let's get a profile for the js_of_ocaml compiler. First we'll need a Spacetime-enabled OCaml compiler:

$ opam switch 4.04.0+spacetime
$ eval `opam config env`

Using this compiler we build the executable. In this case we just let opam build it for us:

$ opam install js_of_ocaml

Now we run the executable, using the environment variable OCAML_SPACETIME_INTERVAL to turn on profiling and specify how frequently Spacetime should inspect the OCaml heap (in milliseconds).

$ OCAML_SPACETIME_INTERVAL=1000 js_of_ocaml core_kernel.cma

Executables with Spacetime enabled run more slowly and use more system memory than usual, but the contents of the OCaml heap should be unaffected.

Running the executable produces a file in the current directory:

$ ls | grep spacetime

8045 was the pid of the process. If your executable forks then there may be multiple Spacetime files.

Now that we have a Spacetime profile, we need to install the prof_spacetime profile viewer:

$ opam switch 4.04.0
$ eval `opam config env`
$ opam install prof_spacetime

Next we process the profile:

$ prof_spacetime process -e .opam/4.04.0+spacetime/bin/js_of_ocaml spacetime-8045
Processing series...done

$ ls | grep spacetime

This can take a while – a couple of minutes in this case. The -e option is used to pass prof_spacetime the executable that produced the profile. This option is not strictly necessary, but without it the profile wouldn't include locations for C code.

Web viewer

Now we are ready to look at the profile, we'll start by using the web viewer:

$ prof_spacetime serve -p spacetime-8045.p
Processing series...done
Serving on

Live words

Navigating to the appropriate address in a browser, we are greeted by an exciting colourful graph:

Spacetime graph

This graph shows the number of live words in the program over time, divided up by the source location at which the words were allocated. If we place our mouse over a section of the graph it will display the source location for that section:

Spacetime mouseover

and by clicking on a section we are taken to a new graph:

Spacetime subgraph

This graph contains only those live words allocated at the clicked source location. It is divided up by the source location of the call to the function which performed the allocation – i.e. the next frame up the backtrace. This is a key feature of Spacetime: not only can you see that these words were allocated by, you can see which call to allocated them. By continuing to click on these graphs we can get the entire backtrace when some words were allocated:

Spacetime subgraph

Clicking on the (top of stack) link returns us to the original graph.

Allocated words

The live graphs ("Live words" and "Live blocks") are useful for locating space leaks. For removing unwanted allocations, the allocation graph is more useful. By clicking the "All allocated words" link we are shown a new graph:

Spacetime subgraph

This graph shows the cumulative total of allocations in the program, divided up by the source location of those allocations. Holding your mouse over a section will display the location of that section. Clicking on a section will take you to a new graph containing only the allocations from that section, divided up by the location of the next frame up the backtrace.

Terminal viewer

Now we'll try the terminal viewer:

$ prof_spacetime view -p spacetime-8045.p

which launches us into a lambda-term style terminal view:

Spacetime subgraph

This shows the live words at a particular time (1.017844s) in the program's execution divided up by the source location at which the words were allocated. The ← and → keys move between different points in time. The ↑ and ↓ keys select different rows. Pressing return on a row loads a new view:

Spacetime subgraph

This shows the live words allocated at the selected source location (memory.c: 552) divided up by the source location of the call to the function containing the allocation – i.e. the next frame up the backtrace. Pressing backspace takes us back to the previous view.

Finally, pressing tab switches between the three different modes: live words, live blocks and allocated words. Use the q key to exit the viewer.

A solution to the ppx versioning problem

Ppx is a preprocessing system for OCaml where one maps over the OCaml abstract syntax tree (AST) to interpret some special syntax fragments to generate code.

Ppx rewriters get to work on the same AST definition as the compiler, which has many advantages:

  • The AST corresponds (almost) exactly to the OCaml language. This is not completely true as the AST can represent programs that you can't write, but it's quite close.

  • Given that the compiler and pre-processor agree on the data-type, they can communicate between each other using the unsafe [Marshal] module, which is a relatively cheap and fast way of serializing and deserializing OCaml values.

  • Finally, the biggest advantage for the user is that the locations in the original code are exactly preserved, which is a requirement to get usable error messages. This is not so great for the generated code, as the best one can do is reuse some locations from the original source code and hope for the best. In practice the user sometimes gets non-sensical errors, but this is a commonly accepted trade-off.

There is however one drawback to all this, the compiler AST is not stable and code using it is at the mercy of its evolution. We got lucky with the 4.04 release of OCaml but the 4.03 one was quite disruptive. Even before releases, whenever the AST definition changes during a development cycle, many ppx might not be usable for a while, which make testing a lot harder.

Several ideas have been flying around, such as adding a layer to convert between different versions of the AST. While this would work, it has the drawback that you need this layer for every variant of the AST. And when you want to make a patch modifying the AST, you'll need to do the extra work of updating this layer first.

In this blog post we show how we managed to solve the ppx compatiblity problem in a way that improves the user experience and lets us produce releases that don't depend on ppx rewriters at all.

We did this work while working on Base, our upcoming standard library. In the end, it's likely we'll use only the third of the methods described below for Base, while the others will be used to improve the user experience with the rest of our packages.

What do other code generators do?

Ppx is not the first system for generating code that is a mix of user-written code and machine generated code. A typical class of generators that get it right, i.e. that preserve locations and are independant from the AST definition, are lexer/parser generators, and not only the ones distributed with the compiler.

Let's take the example of lexer generators (parser generators work basically the same way). The users write a series of rules, consisting of a rular expression and an action to take if the input matches it:

rule token = parse
| "+"  { PLUS  }
| "-"  { MINUS }
| '0'..'9'+ as s { INT (int_of_string s) }
| " "* { token lexbuf (* skip blanks *) }

This code is written in a .mll file and the generator then produces a .ml file with code for the lexing engine interleaved with the user written actions.

In order to keep the locations of the user-written code pointing to the right place in the .mll file, the generator produces:

# 42 "lexer.mll"
         token lexbuf (* skip blanks *)

The OCaml compiler interprets the line starting with a # and updates its current location to point to the begining of line 42 in file lexer.mll. This is called a line directive.

To go back into the generated code, the lexer generator produces:

# 100 ""

Where 100 correspond to the real line number in

With this method, when there is an error in the user-written code, it points to the lexer.mll file, while when there is an error in the generated code it points to the file. Even if the generated code might not be particularly easy to understand, at least you get to see the real code the compiler chokes on.

Another big advantage is that when using a debugger, you can follow the execution through the generated code.

Can we do the same for ppx?

At first glance, it seems that ppx rewriters work in a very different way, but the result is the same: only parts of the file is generated and the rest is taken as if from what the user wrote. In fact, compared to the lexer case, most of the resulting code is user written.

There is however some work to do to get the same result as with lexer generators. First you have to distinguish the generated code from the user code.

If you take a ppx rewriter as a black box, then the only way is to apply some kind of tree diff between the input and the output. In our ppx framework however, we know exactly what fragments of the AST are rewritten by plugins and we know the rewriting is always local. This makes the job a lot simpler and probably faster as well, so we chose to take advantage of this information.

The method

It works this way: while mapping the AST, we collect all the fragments of generated code with the location of the code they replace in the original file. At the end we sort them in the order of the file and make sure there is no overlap. Every fragment is pretty printer to a string.

What we end up with is a list of text substitutions: beginning position, end position, replacemen text. The next step is to simply apply these substitutions to the original file. If you read the bog post about how we switched from camlp4 to ppx, you'll notice the resemblance here.

This is what the transformation looks like:

(* ----- input ----- *)
type t = int [@@deriving sexp_of]

let f x = x + 1

let g x = [%sexp_of: t] x

(* ----- output ----- *)
# 1 ""
type t = int [@@deriving sexp_of]
# 4 ""
let sexp_of_t = sexp_of_int
#2 ""

let f x = x + 1

let g x =
# 11 ""
# 5 ""

The result for [@@deriving sexp_of] is not bad at all. For code rewritten inside expressions, the result is not as good given that it breaks those expressions up. But given than extensions are often sparse in our source files, this is still acceptable.

This mode can be selected with ppx_driver based rewriter by passing the flag -reconcile.

Solving the compatiblity problem

With this mode, one can first generate a .ml.pp file and feed that to the compiler. Given that the concrete syntax of the language breaks much less often than the internal AST definition, a working ppx is likely to work for a very long time.

We'll soon start releasing a separate package that snapshots one version of the lexer/parser/AST/printer of the OCaml compiler. This package will have its own release schedule and will typically be updated soon after each relase of OCaml. This will give time for ppx authors to upgrade their code when it breaks while still allowing people to try out the new compiler with their favorite packages.

Mode for release tarballs

In addition to the mode described above, ppx_driver has a second mode -reconcile-with-comments where the result is similar to the one with line directives expect than the generated code is enclosed with comment markers:

type t = int [@@deriving sexp_of]
let sexp_of_t = sexp_of_int

let f x = x + 1

let g x =

This mode is intended for release tarballs. One can replace all the files in-place by the pre-processed version using =-reconcile-with-comments=. The result is readable and has the big advantage that you you don't need to depend on the ppx rewriter, which means the package is faster to install for users.

Jane Street packages will eventually move to this scheme, either for the next stable release or the one after that. One technical issue with this method is that to take full advantage of it, the runtime support libraries of the various ppx rewriters must be installable without the rewriter itself. Splitting the packages in opam is fine but splitting the repository is not desirable as often both components make strong assumption about the other.

For Jane Street packages, we'll need to update our release system so that it supports generating two opam packages from one repository.

ppx as a verfication tool only

While these new methods improve the ppx story in general, for Base we wanted to go even further and allow users to build Base without the need for ppx at all, both for the release and for the development versions. Not only to cut down the dependencies, but also to provide a better experience in general. For instance if you are working on a patched compiler and need the development version of Base, you shouldn't need all the ppx rewriters that might not work for some reason.

We explored various bootstrap story, and while they worked they were not very nice, especially for such an important library. Its development and build processes should be straightforward.

We even looked into not using ppx at all. While this is OK for many ppx rewriters that are mostly syntactic sugars, it is more problematic for [@@deriving ...]. It's not so much that the code is hard to write by hand, most data-structures in Base are either simple datatypes or require and written combinators anyway, but it is a pain to review. This code is very mechanical and you have to make sure that the constant strings correspond to the constructor/field names and other things where the machine can do much better than a human.

In the end we found a solution to keep the best of both worlds, /i.e./ being able to build the original source code without pre-processing and avoid having to write and review this boilerplate code.

The idea is to use ppx in the same way that we write expect tests; the tool only checks that what's comes after the type-definition correspond to what the rewriters derive from it. In case of mismatch it produces a .corrected file just like expect tests.

We are currently experimenting with this method for Base. It's possible that we'll have some marker to delimit the end of the generated code. In the end the code could look like this:

type t = A | B [@@deriving sexp_of]

let sexp_of_t = function
  | A -> Sexp.Atom "A"
  | B -> Sexp.Atom "B"


Given that the compiler ignores attributes it doesn't understand, this code compiles just fine without any pre-processing.

When running the ppx rewriter in this expect mode, the generated AST is matched against the source AST without taking locations into account, so that mean that you can reformat the code as you wish and even add comments.

The challenge now is to update our ppx rewriters so that they produce code that we are happy to show. Until now we didn't focus too much on that, but we have a good idea about how to do it. The plan is to move more of the logic of the various deriving system into proper functions instead of generating more code. Note that this is an improvement in general as proper functions are a lot easier to understand and maintain than code generators.


In this blog post we described a simple and clean method to decouple ppx rewriters from the release schedule of the compiler. This method has the advantage that once the ppx is written is likely to work for a long time and especially to work out of the box with development compilers.

Moreover, this method has is better for users as errors point to the real code the compiler sees and when debugging they can follow the execution through generated code without trouble.

All this is currently implemented in ppx_core/ppx_driver. Our github repositories haven't been updated in a while has the Base refactoring disrupted our public release process quite a bit. These new features should be published in the coming weeks and we'll be part of the next stable release of our packages, planned for the beginning of December.

Observations of a functional programmer

I was recently invited to do the keynote at the Commercial Users of Functional Programming workshop, a 15-year-old gathering which is attached to ICFP, the primary academic functional programming conference.

You can watch the whole video here, but it's a bit on the long side, so I thought an index might be useful. (Also, because of my rather minimal use of slides, this is closer to a podcast than a video...)

Anyway, here are the sections:

Hope you enjoy!

What the interns have wrought, 2016

Now that the interns have mostly gone back to school, it's a good time to look back at what they did while they were here. We had a bumper crop -- more than 30 dev interns between our London, New York and Hong Kong offices -- and they worked on just about every corner of our code-base.

In this post, I wanted to talk about just one of those areas: building efficient, browser-based user interfaces.

Really, that's kind of a weird topic for us. Jane Street is not a web company, and is not by nature a place that spends a lot of time on pretty user interfaces. The software we build is for our own consumption, so the kind of spit and polish you see in consumer oriented UIs are just not that interesting here.

But we do care about having functional, usable user interfaces. The work we do is very data driven, and that rewards having UIs that are good at presenting up-to-date data clearly and concisely. And we want to achieve that while keeping complexity of developing these UIs low.

Historically, almost all of our UIs have been text-based. While I love the command line, it does impose some limitations. For one thing, terminals don't offer any (decent) graphical capabilities. But beyond that obvious constraint, getting all the pieces you need for a decent UI to work well in a terminal, from scrolling to text-entry widgets, requires a lot of work that just isn't necessary in a browser.

So this year, we've finally started pushing to make it easy for us to write browser-based applications, in particular relying on OCaml's shockingly good JavaScript back-end. This has allowed us to write web applications in OCaml, using our usual tools and libraries. As I've blogged about previously, we've also been exploring how to use Incremental, a framework for building efficient on-line computations, to make browser UIs that are both pleasant to write and performant.

That's roughly where we were at the beginning of the summer: some good ideas about what designs to look at, and a few good foundational libraries. So the real story is what our interns, Aaron Zeng and Corwin de Boor, did to take us farther.

Web Tables

Aaron Zeng's project was to take all of the ideas and libraries we'd been working on and put them to use in a real project. That project was web-tables, a new user-interface for an existing tool developed on our options desk, called the annotated signal publisher. This service provides a simple way for traders to publish and then view streams of interesting tabular data often based on analysis of our own trading or trading that we see in the markets.

The publisher fed its data to Catalog, our internal pub-sub system. From Catalog, the data could be viewed in Excel, or in our terminal-based Catalog browser. But neither of these approaches worked as well or as flexibly as we wanted.

Enter web-tables. What we wanted was pretty simple: the ability to display tabular data from the annotated signal publisher with customizable formatting, filtering and sorting. This involved breaking a lot of new ground, from figuring out how do the sorting and filtering in an efficient and incremental way, to fixing performance issues with our RPC-over-websockets implementation, to figuring out a deployment and versioning story that let people easily create and deploy new views of their data.

One of the great things about the project is how quickly it was put into use. The options guys started using web-tables before the project was even really finished, and there was a tight feedback loop between Aaron and his mentor Matt Russell, who was using the tool on a daily basis.

Optimizing rendering

Aaron's web-tables work used incr_dom, a small framework that sets up the basic idioms for creating UIs using Incremental. As part of that work, we discovered some limitations of the library that made it hard to hit the performance goals we wanted to. Corwin de Boor's project was to fix those limitations.

The key to building an efficient UI for displaying a lot of data is figuring out what work you can avoid. To this end, Corwin wanted to build UIs that logically contained thousands or even millions of rows, while only actually materializing DOM nodes corresponding to the hundred or so rows that are actually in view.

In order to figure out which nodes to render, he had to first figure out which nodes would be visible, based on the location of the browser's viewport. This in turn required a way of looking up data-points based on where that data would be expected to render on the screen.

Corwin did this by building a data structure for storing the expected height of each object that could be rendered, while allowing one to query for a node based on the sum of the heights of all the nodes ahead of it. He did this by taking an existing splay tree library and rewriting it so it could be parameterized with a reduction function that would be used to aggregate extra information along the spine of the splay tree. By integrating the reduction into splay tree itself, the necessary data could be kept up to date efficiently as the splay tree was modified.

Corwin also spent a lot of time improving incr_dom itself, taking inspiration from other systems like React and the Elm language. We even corresponded a bit with Jordan Walke and Evan Czaplicki, the authors of React and Elm respectively.

One thing that came out of this was a neat trick for making the incr_dom API cleaner by using a relatively new feature of OCaml called open types. The details are a little technical (you can see the final result here and here), but I think what we ended up with is a bit of an advance on the state of the art.

There were a lot of other bits and pieces, like improving the handling of keyboard events in js_of_ocaml, creating a new incremental data-structure library called incr_select for more efficiently handling things like focus and visibility, and restructuring the incr_dom APIs to make them simpler to understand and open up new opportunities for optimization.

By the end, Corwin was able to build a demo that smoothly scrolled over a million rows of continuously updating data, all while keeping update times below 5ms. We're now looking at how to take all of this work and feed it into real applications, including Aaron's work on web-tables.

Sound like fun?

If this sounds like interesting stuff, you should consider applying for a summer internship with us!

Jane Street internships are a great learning experience, and a lot of fun to boot. You even get a chance to travel: interns get to visit Hong Kong, London or New York (depending on where they started!) as part of their summer.

And the projects I described here are really just a taste. Here are some other projects interns worked on this summer:

  • Building a low-latency order router.
  • Adding data representation optimizations to the OCaml compiler
  • A service for simulating a variety of network failures, to make it easier to test distributed applications.
  • Making it possible to write Emacs extensions in OCaml (this was actually Aaron Zeng's second project)

and that's still just a small sample. One thing I love about the work at Jane Street is the surprising variety of problems we find ourselves needing to solve.