How to fail – introducing Or_error.t

There are a bunch of different ways of handling errors in OCaml. If you've just started learning about functional programming, you'll no doubt have come across the famed option type. Here's its definition:

  1. type 'a option = Some of 'a | None

Functions that might fail return an option instead of the corresponding "unwrapped" type. For example, the function to get the first element of the list has type List.hd : 'a list -> 'a option – it might fail if the list is empty. Ask a functional programming fanatic what the best things about static expressive type systems are, and they're bound to mention the option type very high up the list. The option type makes it explicit in the type system what functions might fail. If you're writing such a function it gives you a mechanism to force your caller to check your error case. All the other classic ways of handling errors – returning null pointers, returning some special value (like -1), setting global error variables or raising exceptions – require the caller to remember to do a thing. With option types, the compiler checks you haven't forgotten. It's impossible to understate the significance of this. This is going to be a much longer blog post if I get started down this path of evangelism, so let's move on :) But as you start to crank out your project, you'll quickly realise that option has its limitations. For example, consider the following function:

  1. val Hashtbl.of_alist : ('a * 'b) list -> ('a, 'b) Hashtbl.t option

(Eagle-eyed readers will spot that this is not how the hashtable type works in core, because we eschew polymorphic comparison as much as possible. But that's another blog post .) This function can fail in the case where you have a duplicate key. But this is a user-experience disaster waiting to happen. I'm picturing a gigantic config file, involving (the sexp representation of) a large hashtable with hundreds of keys; you launch your program, and are presented with an error that says "duplicate keys". Taking a peek under the hood:

  1. let hashtbl =
  2. match Hashtbl.of_alist alist_from_config_file with
  3. | Some table -> table
  4. | None ->
  5. (* I really want to give the user a better error, but I don't have
  6.   enough information! *)
  7. failwith "Duplicate keys"
  8. in

How tragic.

The alternatives

One of the disadvantages of having such an expressive type system is that there's so many ways to skin a cat, and you end up with inconsistent and incomposable code. Here are some of the ways that we solved the option problem.

1. Define a new variant type that lists all the failure cases.

Defining types in ocaml is cheap, right? So why not define one for each function that might fail, which lists all of the ways it can fail, and use that?

  1. type ('a, 'b) of_alist_result =
  2. | Ok of ('a, 'b) Hashtbl.t
  3. | Duplicate_keys of 'a list
  5. val Hashtbl.of_alist : ('a * 'b) list -> ('a, 'b) of_alist_resultn

Well, unsurprisingly, this turns out pretty lexically heavy and annoying to do. However, polymorphic variants make it cheaper by relieving the burden of naming the type:

  1. val Hashtbl.of_alist 
  2. : ('a * 'b) list -> [ `Ok of ('a, 'b) Hashtbl.t | `Duplicate_keys of 'a list ]

(This also makes the client-side code more legible as they can just say "| `Ok table -> ..." rather than "| Hashtbl.Ok table -> ...".) This has some nice things going for it. Another disadvantage of option is that, if your function can fail in one of many different ways, you have no way to communicate that to your caller. All you can do is return None. But here we can list out the ways, and given them clear names, which will appear in the caller's code. The main disadvantage of this approach is that it's not composable. You end up writing code that looks like this:

  1. match thing_that_might_fail_1 () with
  2. | `Failure_case_1 -> (* somehow report error *)
  3. | `Ok x ->
  4. match thing_that_might_fail_2 x with
  5. | `Failure_case_2
  6. | `Failure_case_3 -> (* report error here too *)
  7. | `Ok y ->
  8. match thing_that_might_fail_3 x y with
  9. | `Failure_case_4 -> (* same treatment *)
  10. | `Ok z ->
  11. ...

What I want to say here is: just do these things in order; if any of them fail, stop and tell me that error; at each stage I might want to use the results of the previous (successful) results when computing the next. Commonly, you want to treat all the errors in a similar way: maybe convert them to a string and log to a file, or something. But you're forced to write very verbose code. Similarly, you really start to miss "generic" functions like which can operate on the results of any potentially-failing computation and transform it in some way. With this approach, you have to write a new mapping function for each function that might fail! It seems we're expressing too much in the type system here: if the types were a little less specific about the ways our functions might go wrong (and, as we said, we often don't care about the details, as long as there's some way of getting a string of the error out), we'd have an easier time dealing with failure in general.

2. Use the Result type

Let's extend the option type:

  1. type ('a, 'b) Result.t =
  2. | Ok of 'a
  3. | Error of 'b

That is, either the calculation was successful, and we got the 'a that we wanted, or it failed, in which case I have a 'b to tell me some more about what went wrong. The question is: what should we use in the 'b type? 2a('a, [ `Failure_case_1 | `Failure_case_2 ]) Result.t – one option is just to "lift" the `Ok constructor from all our individual variants into Result, and still use polymorphic variants to list the error cases. This makes it possible to use those "generic" functions like This is pretty good. However, you still end up writing the "ever-indenting" style of code as above, 2b. ('a, exn) Result.t – the next idea is, instead of using an explicit variant, we'll use exn. Exceptions in ocaml are actual values: you can throw them around between functions, store them in tables etc. (Then there's a function called "raise" that lets you actually throw an exception value.) All exceptions have type exn, which is a bit like an "open ended variant type". If you have an exn in your hands, you can match on it just as normal:

  1. match my_exception with
  2. | Not_found -> ...
  3. | Invalid_arg -> ...
  4. | _ -> ...

But, anyone can add to this variant by saying "exception My_exception". (So match statements on exceptions will always need to have an "underscore case".) And there's a generic way to convert exceptions into strings. The fact that all of our errors have the same error type makes it possible to use the Result monad to improve our verbose mess from above:

  1. let result =
  2. let open Result.Monad_infix in
  3. thing_that_might_fail_1 () >>= fun x ->
  4. thing_that_might_fail_2 x >>= fun y ->
  5. thing_that_might_fail_3 x y
  6. in
  7. match result with
  8. | Ok z -> (* huzzah *)
  9. | Error e -> log_error e (* the error handling code exists only once, here *)

There are a number of small-ish problems with this type:

  • One has to define an exception constructor for each error case. This ends up being non-local to the code.
  • The code to convert an exn to a sexp is very complicated indeed, and has cases that can leak space.
  • Matching on exceptions, although possible, should be discouraged. There's no way apart from comments for a function to indicate which exceptions it might throw. So if client code begins matching on a certain exception, that function can never use a different constructor if, for example, it wants to add some extra information to the error. This is why we're stuck in core with throwing "legacy" exceptions like Not_found in a few places. Now, ('a, exn) Result.t does not require you to match on exceptions, but it does at least make it possible, and we'd like to discourage it.

2c. ('a, string) result – The idea here is: well, someone is likely to want to convert this error to a string eventually, so let's just represent errors as strings. If we want to include other ocaml values to provide context (e.g. the list of 'a values that were the duped keys), then we convert them to strings (probably by converting to sexps and from there to strings) and build up the string we want. We don't have any of the disadvantages of the above use of exceptions. And of course, we can still use the Result monad. The cons here are quite subtle. The trouble is: sometimes, we actually don't want to consume any error that might get produced. And constructing large number of error strings can be very expensive. You have to do all the conversion to sexps and strings upfront, regardless of whether someone wants to consume it in the end or not.

Introducing Error.t and Or_error.t

There was a clear need for unity. Having different libraries (or different functions within the same library...) using different conventions is really annoying – you end up doing a lot of converting between different error types, which aside from being code noise, is a needless performance penalty, and most importantly, can make errors a lot harder to read. This last disadvantage is the so-called "tagging problem". Failures deep down in software often need to bubble up a few layers before it gets written to a log or presented to the user or crashes the program or whatever. All of those layers might want to add a bit more context. If you are using lots of different ways of representing errors, it becomes impossible to read the resulting errors: sexps containing strings are converted themselves to strings, which escapes all the quote characters; if this process iterates, you can end up with snippets that look like \\\\\\\\\\\\\\\\" ..., and the error is quite illegible without a sed script to strip out the duplicate slashes :) So, it's likely that using any of the above solutions consistently would have been better than doing nothing. But, instead, we chose to define a new type, and push hard to get it adopted everywhere. That type is Or_error.t:

  1. type 'a Or_error.t = ('a, Error.t) Result.t

The key here is the new type Error.t. This is, to a first approximation, a lazy string. That is, we still get the advantages that tagging is easy, but we don't have to pay the up-front costs of constructing a string. We've done benchmarks, and constructing an Error.t is "pretty cheap" 1. Here's how it's done:

  1. Error.create "problem with foo" (4, Some "baz") <:sexp_of< int * string option >>

That is, you give it a string, some additional values as context, and the sexp converter, for which you can use the convenient quotation syntax afforded by pa_sexp. Here are a few more handy snippets:

  1. Error.create "values were not equal"
  2. (`from_system_a thing_a, `from_system_b thing_b)
  3. <:sexp_of< [ `from_system_a of Thing.t ] * [ `from_system_b of Thing.t ] >>
  4. (* using polymorphic variants to make things more readable *)
  5. Error.of_string "couldn't find a foo" (* no additional values *)
  6. Error.tag error "call to my_deeper_library failed" (* tagging *)
  7. Error.tag_arg error "call to my_deeper_library failed" 
  8. (4, Some "baz") <:sexp_of< int * string option >> (* tagging with arguments *)
  9. Or_error.error_string "couldn't find a foo" (* shorthand for Error (Error.of_string ...) *)
  10. Or_error.error "problem with foo" (4, Some "baz") <:sexp_of< int * string option >>
  11. (* shorthand for Error (Error.create ...) *)
  12. Or_error.tag or_error "call to my_deeper_library failed"
  13. (* shorthand for grabbing the error out, tagging it, then wrapping it up again *)
  14. Or_error.tag_arg error "call to my_deeper_library failed"
  15. (4, Some "baz") <:sexp_of< int * string option >> (* similarly *)

And, of course, the convenient monadic syntax is possible with Or_error.t. Also, opening Core.Std brings some names into scope:

  1. ok_exn (* short for Or_error.ok_exn *)
  2. error (* short for Or_error.error *)

Having these two available without module qualifiers emphasises our choice of Or_error as the default way of doing errors. In particular, having a short name for ok_exn is very convenient – in the past, we've often defined pairs of functions in Core, one called foo that returned an option or an error, and one called foo_exn that calls Option.value_exn or Result.ok_exn on the return value. But, having ok_exn in the global scope reduces the need for this. 1: It's not completely free, since it's still an allocation, so beware of using it in your hottest loops – you might consider statically allocating some errors, which helps a lot, although it restricts you to Error.of_string rather than Error.create.

A larger example

Here's a simple configuration file for a minesweeper clone:

  1. open Core.Std
  2. open Async.Std
  4. module Dimensions = struct
  5. type t = { width : int; height : int }
  6. let area { width; height } = width * height
  7. let both_nonnegative { width; height } =
  8. Or_error.combine_errors_unit
  9. [ if width > 0 then Ok () else Or_error.error "width <= 0" width <:sexp_of< int >>;
  10. if height > 0 then Ok () else Or_error.error "height <= 0" height <:sexp_of< int >>
  11. ]
  12. end
  14. type t =
  15. { play_grid_in_blocks : Dimensions.t;
  16. block_size_in_px : Dimensions.t;
  17. num_mines : int;
  18. background_color : [ `White | `Black ]
  19. } with sexp
  21. let validate { play_grid_in_blocks;
  22. block_size_in_px;
  23. num_mines;
  24. background_color = _
  25. } =
  26. let open Or_error.Monad_infix in
  27. Dimensions.both_nonnegative playable_area_in_blocks
  28. >>= fun () ->
  29. Dimensions.both_nonnegative block_size_in_px
  30. >>= fun () ->
  31. begin
  32. let playable_area = Dimensions.area play_grid_in_blocks in
  33. if playable_area >= 4 then
  34. Ok playable_area
  35. else
  36. Or_error.error_string "playable area must be at least 4 blocks"
  37. end
  38. >>= fun playable_area ->
  39. begin
  40. if num_mines <= playable_area then Ok () else
  41. Or_error.error "too many mines" (num_mines, `playable_area playable_area)
  42. <:sexp_of< int * [ `playable_area of int ] >>
  43. end
  44. >>= fun () ->
  45. Ok t
  47. let load file =
  48. Reader.load_sexp_exn file <:of_sexp< t >>
  49. >>= fun ->
  50. Deferred.return
  51. (Or_error.tag_arg (validate t) "invalid configuration" t <:sexp_of< t >>)

Points to make:

  • If you're writing in async too, as is typical inside Jane Street, then you have to be careful as to when the monadic infix operators (>>= and >>|) mean async-monad and when they mean Result-monad. I favour local opens of one of the Monad_infix modules, like in validate above, to keep things clear.
  • val Or_error.combine_errors_unit : unit Or_error.t list -> unit Or_error.t is a really nice function. If you have a bunch of checks, none of which depend on the result of any other checks, then you can stick them all in a list and call this function, which will return Ok if all of them are Ok, and one single Error, representing all the errors, if not.
  • I used the record-deconstruction syntax in validate to check I was validating all the fields. (Recent versions of OCaml will give you a warning unless you either mention all fields or put "; _" at the end of your pattern, and if you're at all sensible you convert compiler warnings into hard errors :)) Note that there's nothing to validate for background_color, but the compiler forced me to explicitly say so. This is a powerful trick.
  • Using polymorphic variants as labels, as in the validation of num_mines, is really nice, but you do have to repeat the name of the variant in the sexp converter. A price worth paying, though.
  • Tagging is really awesome. Every time one writes  | Error _ -> (* construct another error *)", that's a red flag that you should be tagging the original error and propagating it instead. (In fact, every time you say "| Error _" is probably a mistake – at least there should be a comment on why you're throwing away this error.) But, there is no clear contract on who should be doing the tagging, caller or callee. E.g. is it up to load to say "invalid configuration", or the caller of load? Sadly, no clear convention has as of yet established itself.

The importance of consistency

One place where Or_error does not apply is: it's occasionally crucial to be able to enumerate your error cases and force the caller to go through them one by one. This is not the common case – generally callers just want to know if their call was successful or not, and if not, be able to tell a human why not (i.e. convert the error to a string). But there are times when it's worth spelling it out. In that case, we either use 1. or 2(a) – it doesn't much matter, because this will only be for a small minority of your code. Also, it's often okay to use option. Some functions have a sufficiently small scope that they can only fail in one way, and that way is obvious given the function's name. For example, in Core, List.hd still returns an option – it's clear that the only error case is the empty list. Moreover the caller can probably give a better idea of what's gone wrong: rather than the error saying "empty list", it could say something more like "item foo didn't match any classification filters" or something. I don't think it's obvious that Error.t is the best type to use in all circumstances. It does seem to hit the sweet spot among all the aforementioned options, but we probably could have been successful pushing, e.g., "Or_exn.t". But, consistency is extremely important. It's really nice that you can make a function call to two libraries and sequence them together using the Result monad. It can be worth using Or_error in some situations where it might be marginally preferable to use a different approach. We've found it to be a big win for consistency, composability and readability of errors.

Generic mapping and folding in OCaml

Haskell has a function fmap which can map over a number of different datatypes. For example, fmap can map a function over both a List and a Maybe (the equivalent of an option in OCaml):

Prelude> fmap (+ 1) [1,2]
Prelude> fmap (+ 1) (Just 3)
Just 4

Unfortunately, the equivalent is impossible in OCaml. That is, there's no way to define an OCaml value fmap so that the two expressions:

# fmap [1;2]    ~f:((+) 1)
# fmap (Some 3) ~f:((+) 1)

both typecheck and evaluate to the right value.

Even if we eliminate the complexity of type inference by specifying the type explicitly, we can't define fmap so that the two expressions:

# fmap ([1;2]  : _ list)   ~f:((+) 1)
# fmap (Some 3 : _ option) ~f:((+) 1)

typecheck and evaluate to the right value.

However, the Generic module in Jane Street's Core_extended library will let us do exactly that with just a trivial syntactic change. But before continuing, I'll warn you that the Generic module is not necessarily something you'd want to use in real world code; it falls much more in the "cute trick" category. But with that caveat, let's look at our example using Generic:

# open Core.Std;;
# open Core_extended.Generic;;

# map ([1;2] >: __ list) ~f:((+) 1);;
- : int list = [2; 3]
# map (Some 3 >: __ option) ~f:((+) 1);;
- : int option = Some 4    

Note that, after opening the Generic module, all we did to the previous example was change : to >: and _ to __. (Also, the Generic module calls the mapping function map instead of fmap, but that's inconsequential.)

Of course, the trick is that >:, __, list, and option are actually values defined by the Generic module in such a way that their intended usage looks like a type annotation.

Note that these "types" are nestable as you would expect real types to be:

# map ([None; Some 3] >: __ option list) ~f:((+) 1);;
- : int option list = [None; Some 4]        

This means that you can change what map does just by changing the "type" you assign to its argument:

# map ([None; Some 3] >: __ option list) ~f:(fun _ -> ());;
- : unit option list = [None; Some ()]
# map ([None; Some 3] >: __ list) ~f:(fun _ -> ());;
- : unit list = [(); ()]

The Generic module also defines a generic fold function so that you can accumulate values at any "depth" in your value:

# fold ([[Some 3; None]; [Some 5; Some 2]] >: __ option list list) ~init:0 ~f:(+);;
- : int = 10

Not every "type" formable is __ followed by some sequence of options and lists: for example, Generic also provides string (considered as a container of characters):

# map ([Some "foo"; None; Some "bar"] >: string option list) ~f:Char.uppercase;;
- : string option list = [Some "FOO"; None; Some "BAR"]

Note that the fact that the "types" are nestable means that these values must have unusual definitions: in particular, __ (and string) are functions which must be able to take a variable number of arguments. Indeed, these values are defined using a technique sweeks wrote about in a blog post on variable argument functions: the f and z in sweeks's post are analogous here to __ and >: respectively.

Here's the definition of the primitive values we've used so far (Generic actually defines a few more):

let __ k = k (fun f x -> f x)

let ( >: ) x t y = t (fun x -> x) y x

let map x ~f = x f

let string k = k (fun f -> ~f)

let list map k = k (fun f -> ~f:(map f))

let option map k = k (fun f -> ~f:(map f))

The types of these turn out to be extremely unhelpful, and you can't really use them to figure out how to use these values. For example, here is the type of >: (and this isn't just the inferred type of the above definition, this is the type which must actually be exposed to use >:):

val ( >: ) : 'a -> (('b -> 'b) -> 'c -> 'a -> 'd) -> 'c -> 'd

Finally, is this module actually used? The answer is no. As far as I know, it's used nowhere in Jane Street's codebase. But it's still kind of cute.

Breaking down FRP

As anyone who has looked into functional reactive programming (FRP) knows, there are lots of competing approaches to it, and not a lot of conceptual clarity about how they relate to each other. In this post, I'll try to shed some light, and in particular give you some guide posts for understanding how the different versions of FRP relate to each other. Plus, I'll show some connections to a similar technique called self-adjusting computation (SAC).

The analysis here should mostly be credited to Evan Czaplicki, who gave a talk at Jane Street a week ago. Any confusions and mistakes are, of course, my own. Also, thanks to Jake McArthur for filling me in on a few key details.

In all of this I'm basically going to talk only about discrete FRP. A lot of the people involved in FRP think of continuous-time semantics as really important, but that's not the focus here. (I am curious, however, if anyone can explain why people think continuous time semantics are important when writing programs for a largely discrete computer.)

First, some basics. Roughly speaking, FRP systems are meant to make it easier to write programs that react to external events by providing a natural way to program with and reason about time-varying signals.

Now, time to oversimplify.

An FRP program effectively ties signals together in a dependency graph, where each signal is either an external input or a derived signal that feeds off of other signals that have already been defined. A key aspect of a system like this is that the language runtime uses the dependency graph to minimize the amount of work that needs to be done in response to an external event.

Here are some properties that you might want from your FRP system:

  • History-sensitivity, or the ability to construct calculations that react not just to the current state of the world, but also to what has happened in the past.
  • Efficiency. This comes in two forms: space efficiency, mostly meaning that you want to minimize the amount of your past that you need to remember; and computational efficiency, meaning that you want to minimize the amount of the computation that must be rerun when inputs change.
  • Dynamism, or the ability to reconfigure the computation over time, as the inputs to your system change.
  • Ease of reasoning. You'd like the resulting system to have a clean semantics that's easy to reason about.

It turns out you can't have all of these at the same time, and you can roughly categorize different approaches to FRP by which subset they aim to get. Let's walk through them one by one.

Pure Monadic FRP

This approach gives you dynamism, history-sensitivity and ease of reasoning, but has unacceptable space efficiency.

As you might expect, the signal combinators in pure monadic FRP can be described as a monad. That means we have access to the usual monadic operators, the simplest of which is return, which creates a constant signal.

  1. val return : 'a -> 'a signal

You also have map, which lets you transform a signal by applying a function to it at every point in time.

  1. val map: 'a signal -> ('a -> 'b) -> 'b signal

Operators like map2 let you take multiple signals and combine them together, again by applying a function to the input signals at every point in time to produce the output signal.

  1. val map2 : 'a signal -> 'b signal -> ('a -> 'b -> 'c) -> 'c signal

Note that all of the above essentially correspond to building up a static set of dependencies between signals. To finish our monadic interface and to add dynamism, we need one more operator, called join.

  1. val join: 'a signal signal -> 'a signal

Nested signals are tricky. In a nested signal, you can think of the outer signal as choosing between different inner signals. When these are collapsed with join, you essentially get a signal that can change its definition, and therefore its dependencies, in response to changing inputs.

We're still missing history sensitivity. All of the operators thus far work on contemporaneous values. We don't have any operators that let us use information from the past. foldp is an operator that does just that, by folding forward from the past.

  1. val foldp: 'a signal -> init:'acc -> ('acc -> 'a -> 'acc) -> 'acc signal

With foldp, history is at our disposal. For example, we can write a function that takes a signal containing an x/y position, and returns the largest distance that position has ever been from the origin. Here's the code.

  1. let max_dist_to_origin (pos : (float * float) signal) : float signal =
  2. foldp pos ~init:0. ~f:(fun max_so_far (x,y) ->
  3. Float.(max max_so_far (sqrt (x * x + y * y))))

Here, max_so_far acts as a kind of state variable that efficiently summarizes the necessary information about the past.

foldp seems like it should be implementable efficiently, but we run into trouble when we try to combine history sensitivity and dynamism. In particular, consider what happens when you try to compute max_dist_to_origin on the signal representing the position of the mouse. And in particular, what if we only decide to run this computation at some point in the middle of the execution of our program? We then have two choices: either (max_dist_to_origin
always has the same meaning, or, its meaning depends on when it was called.

In pure monadic FRP, we make the choice to always give such an expression the same meaning, and thus preserve equational reasoning. We also end up with something that's impossible to implement efficiently. In particular, this choice forces us to remember every value generated by every input forever.

There are various ways out of this performance trap. In the following, I'll describe the different escape paths chosen by different styles of FRP systems.

Pure Applicative FRP

The idea behind applicative FRP is simple enough: just drop the join operator, thus giving up dynamism. This means that you end up constructing static dependency graphs. Without the ability to reconfigure, you don't run into the question of what happens when an expression like (max_dist_to_origin mouse_pos) is evaluated at multiple points in time.

This is the approach that Elm takes, and seems like the primary approach that is taken by practical systems concerned with describing UI interactions.

There's a variant on pure applicative FRP called, confusingly, Arrowized FRP. Effectively, Arrowized FRP lets you create a finite collection of static graphs which you can switch between. If those static graphs contain history-dependent computations, then all the graphs will have to be kept running at all times, which means that, while it can be more efficient then applicative FRP, it's not materially more expressive.

Impure Monadic FRP

Impure monadic FRP basically gives up on equational reasoning. In other words, the meaning of (max_dist_to_origin mouse_pos) depends on when you call it. Essentially, evaluating an expression that computes a history-sensitive signal should be thought of as an effect in that it returns different results depending on when you evaluate it.

Loosing equational reasoning is not necessarily the end of the world, but my experience programming in this style makes me think that it really is problematic. In particular, reasoning about when a computation was called in a dynamic dependency graph is really quite tricky and non-local, which can lead to programs whose semantics is difficult to predict.

Self-Adjusting Computations

Self-adjusting computations are what you get when you give up on history-sensitivity. In particular, SAC has no foldp operator. The full set of monadic operators, however, including join are in place, which means that you do have dynamism. This dynamism is quite valuable, it turns out. Among other things, it allows you to build a highly configurable computation that can respond to reconfiguration efficiently.

As you might expect, the lack of history-sensitivity makes SAC less suitable for writing user interfaces. Indeed, SAC was never intended for such applications; its original purpose was for building efficient on-line algorithms, i.e., algorithms that could be updated efficiently when the problem changes in a small way.

SAC is also easy to reason about, in that all an SAC computation is doing is incrementalizing an otherwise ordinary functional program. You get full equational reasoning as long as you avoid effects within your SAC computation.

History sensitivity for SAC

At Jane Street, we have our own SAC library called Incremental, which is used for a variety of different applications. In practice, however, a lot of our SAC applications do require some amount of history-sensitivity. The simplest and easiest approach to dealing with history within SAC is to create inputs that keep track of whatever history is important to your application. Then, the SAC computation can use that input without any complications.

Thus, if you there's an input whose minimum and maximum value you want to be able to depend on in your calculation, you simply set up calculations outside of the system that create new inputs that inject that historical information into your computation.

You can keep track of history in a more ad-hoc way by doing side-effects within the functions that are used to compute a given node. Thus, we could write a node that computes the maximum value of a given input using a reference, as follows.

  1. let max_of_signal i =
  2. let max = ref None in
  3. map i (fun x ->
  4. match !max with
  5. | None -> max := Some x; x
  6. | Some y ->
  7. let new_max = Float.max x y in
  8. max := new_max;
  9. new_max)

But this kind of trick is dangerous, particularly because of the optimizations that are implemented in Incremental and other SAC implementations. In particular, Incremental tries to avoid computing nodes whose values are not presently in use, and as such, signals that are deemed unnecessary are not kept up-to-date. Thus, if you create a signal by calling (max_of_signal s), and then keep it around but don't hook it into your final output, the computation will stop running and will thus stop receiving updates. Then, if you pull it back into your computation, it will have a value that reflects only part of the true history.

There are some tricks for dealing with this in Incremental. In particular, we have an operator called necessary_if_alive, which forces the node in question to remain alive even if it's not necessary at the moment. That helps, but there are still complicated corner cases. Our preferred approach to dealing with such cases is to statically declare the set of history-sensitive signals, and make sure that those are alive and necessary at all times.

Broader lessons

This is I think a theme in FRP systems: history is made tractable by limiting dynamism. From the work I've done with SAC systems, I think the usual approach in the FRP world is backwards: rather than start with a static system that supports history and extend it with increased dynamism, I suspect it's better to start with a highly dynamic system like SAC and carefully extend it with ways of statically adding history-sensitive computations. That said, it's hard to be confident about any of this, since this is a rather subtle area where no one has all the answers.

Async Parallel


Parallel is a library for spawning processes on a cluster of machines, and passing typed messages between them. The aim is to make using another processes as easy as possible. Parallel was built to take advantage of multicore computers in OCaml, which can't use threads for parallelism due to it's non reentrant runtime.


Parallel is built on top of Async, an OCaml library that provides cooperative concurrency. So what do we want an async interface to parallel computations to look like? (more…)

10 tips for writing comments (plus one more)

A few words about what we're striving for in our comments, particularly in Core. (more…)

A module type equivalence surprise

I usually think of two module types S1 and S2 as being equivalent if the following two functors type check:

  1. module F12 (M : S1) = (M : S2)
  2. module F21 (M : S2) = (M : S1)

And by equivalent, I mean indisinguishable -- one should be able to use S1 anywhere one uses S2, and vice versa, with exactly the same type checker and semantic behavior.

However, I found an an example today with two module types that are equivalent, both in my internal mental model of equivalence and in my formal definition that F12 and F21 type check, but that one can distinguish using module type of. Here are the module types:

  1. module type S1 = sig module N : sig type t end type u = N.t end
  3. module type S2 = sig
  4. type u
  5. module N : sig
  6. type t = u
  7. end
  8. end
  10. module F12 (M : S1) = (M : S2)
  11. module F21 (M : S2) = (M : S1)

And here is a context that distinguishes them: F1 type checks, but F2 does not:

  1. module F1 (A : S1) = struct
  2. module F (B : sig type t end) = (B : module type of A.N)
  3. end
  4. module F2 (A : S2) = struct
  5. module F (B : sig type t end) = (B : module type of A.N)
  6. end

What's going on is that in F1, module type of A.N decides to abstract t, because it doesn't have a definition. But in F2, module type of A.N does not abstract t, because it is defined to be u.

Since I thought of S1 and S2 as equivalent, I would have preferred that module type of not abstract t in both cases, and thus that both F1 and F2 be rejected. But I don't see anything unsound about what OCaml is doing.

RWO tidbits: the runtime

This is my favorite tweet about Real World OCaml.

It is indeed pretty rare for a language introduction to spend this much time on the runtime. One reason we included it in RWO is that OCaml's simple and efficient runtime is one of its real strengths - it makes OCaml simple to reason about from a performance perspective, and simple to use in a wide variety of contexts. Also, critically, the runtime is also simple enough to explain! (more…)

RWO tidbits: Benign effects

Now that Real World OCaml is out, I thought it would be fun to have a series of posts highlighting different interesting corners of the book.

Today: the section on benign effects.

Benign effects are uses of imperative programming that by and large preserve the functional nature of the code you write. Typically, benign effects are used to improve performance without changing the semantics of your code. I think there are a lot of fun and interesting ideas here, my favorite one being an elegant approach to dynamic programming, as exemplified by a simple implementation of a function for computing the edit distance between two strings.

And of course, if you enjoy the book, you can get a hardcopy on Amazon.

The making of Real World OCaml

It's taken a good long while, but Real World OCaml is finally done. You can read it for free at, or buy a hardcopy or an ebook version.

The idea for the book was born in a bar in Tokyo in 2011. After a talk-filled day at ICFP, a few of us, Ashish Agarwal and Marius Eriksen, Anil, and myself, went out drinking. We were all bellyaching over the lack of a high-quality OCaml book in English, and, emboldened by the Guinness and egged on by Ashish and Marius, Anil and I decided that we were just going to have to go ahead and write the book ourselves. We brought Jason into the project soon after.

From the very beginning, Anil and I had different visions for the book. From my perspective, it was a simple proposition: there was no good book available, and we were going to write one. The goal was to document the state of the art so as to make it more accessible.

Anil's vision was grander. He argued that writing a book is an opportunity to discover all of the things that are just too embarrassing to explain. And, once discovered, you needed to find a way to get all of those things fixed before the book was published. It was Anil's grander vision that we followed, and to good effect. It's not that we fixed the problems ourselves. We solved some problems directly, but by and large, we ended up getting help from those who were better positioned than us to fix the problems at hand.

Here are a few examples of pieces that came together over those two years.

  • OPAM. The core work on OPAM was done by Thomas Gazagnaire at OCamlPro (funded by Jane Street), with Anil collaborating closely. This is probably the biggest improvement of the bunch. It's hard to overstate how transformational OPAM is. Before OPAM, installing OCaml libraries was a complex chore. Now it's a joy.
  • The Core release process. We decided early on to base the book on Core, Jane Street's alternative to OCaml's standard library. But the release process around Core was too monolithic and too hard for external collaborators. We reorganized the process completely, moving to weekly releases to github, with the repos broken out into manageable components. Most of this work was done by Jeremie Dimino and Yury Sulsky at Jane Street.
  • Ctypes. Anil was in charge of writing the section on C bindings, and he decided that the standard way was just too painful to explain. That was the germ of ctypes, a library, written by Jeremy Yallop, that lets you build C bindings entirely from within OCaml, with a easy and simple interface.
  • Short paths. One of the real pains associated with working with Core is the heavy use that Core makes of the module system. OCaml's error messages, unfortunately, do not cope with this terribly well, leading to absurd types in error messages, so you might see a type rendered as Core.Std.Int.Map.Key.t Core.Std.Option.Monad_infix where it could just as well have been rendered as int option. We worked with Jacques Garrigue to implement a better heuristic for picking type names (suggested by Stephen Weeks, who implemented the same heuristic for Mlton). This is now available in the 4.01 version of the compiler, via the -short-paths patch.

This period also saw the founding of OCaml Labs, a lab at Cambridge University devoted to improving OCaml as a platform, with Anil at the head.

And we're not done. OCaml Labs and OCamlPro are still working on improving OPAM and building out a curated OCaml Platform that's built on top of OPAM. There's ongoing work on improving documentation generation so we can have better online API docs. Jacques Garrigue and Leo White are working on adding module aliases to OCaml which will greatly improve compilation time for libraries like Core that need to manage large namespaces.

And there are more projects that are improving OCaml and its surrounding infrastructure that have no direct connection to Real World OCaml, like:

  • Frederic Bour and Thomas Refis' work on Merlin, a tool for providing IDE-like tooling from within editors like VI and Emacs.
  • Mark Shinwell's work on improving GDB support for OCaml,
  • Fabrice le Fessant's work improving OCaml's interactions with performance tools like perf.
  • Work that Ashish Agarwal, Christophe Troestler, Esther Baruk, and now many others, have poured into improving the website.

Real World OCaml is of course a personal milestone for Jason, Anil and myself (and for Leo White, Jeremy Yallop and Stephen Weeks, who made some key contributions to the text). But viewed from a broader perspective, it's just one part of the increasing swirl of activity in the OCaml community.

OCaml has been a great language for a very long time. Now, we're growing a social and technological infrastructure around it to match.

Patch review vs diff review, revisited

I've been thinking about code review a lot recently.

Code review is a key part of our dev process, and has been from the beginning. From our perspective, code is more than a way of getting things done, it's a way of expressing your intent. Code that's easy to read and understand is likely to be more robust, more flexible, and critically, safer. And we care about safety a lot, for obvious reasons.

But the importance of code review doesn't mean that we've always done a good job of organizing it. I'll talk a bit more about how we used to do code review, how we do it now, and the impact that those changes have had.

The bad old world

Our old code review process was what you might call batch-oriented. We'd prepare a set of changes for a release, and then, after someone gave it a quick look-over, combine these changes together in a branch. We'd then read over these changes very carefully, with multiple people reading each file, making comments, requesting changes, and fixing those changes, until the code was in a releasable state.

This was a big and expensive process, involving many people, and quite a lot of work and coordination. Given the time it took, we focused our code review on our so-called critical path systems, i.e., the ones that are involved in sending orders to the market.

The management task was complex enough that we wrote a tool called cr for managing and tracking the reading of these diffs, parceling out responsibility for different files to different people. We've actually blogged about this before, here and here.

Batch-oriented review worked well when we and our codebase were smaller, but it did not scale. By combining multiple changes into a single branch, you were stuck reading a collection of unrelated changes, and the breadth of the changes made fitting it all into your head harder. Even worse, when you throw a bunch of changes together, some are going to take longer than others, so the release is blocked until the slowest change gets beaten into shape.

The end result is that, while we found code review to be indispensable in creating high quality code that we felt comfortable connecting to the markets, the overhead of that review kept on getting higher and higher as we grew.

We needed a better solution.

Feature-based review

Another approach to code review, and a more common one, is patch-based review. In patch review, someone proposes a change to the current release in the form of a patch, and it is the patch itself that is reviewed. Once it passes review, it can be applied to the tree. Patch-based review is great in that it gives you independence: one patch taking a while doesn't block other patches from getting out.

We avoided patch-based review initially because we were worried about the complexities of dealing with conflicting patches. Indeed, one issue with patch-based review is that the state of the tree when the patch is reviewed is likely not the state of the tree when the patch is applied. Even when this doesn't lead to textual conflicts, this should leave you a little nervous, since a patch that is correct against one version of the tree is not necessarily correct against a changed tree.

And then, what do you do when there's a conflict and the patch no longer applies cleanly? You can rebase the patch against the new state of the tree, and then re-review the patch from scratch. But humans are really bad at investing mental energy in boring work, and carefully re-reviewing a patch you've already mostly read is deadly dull.

Moreover, when do you decide that there's a conflict? When dealing with patches that involve file moves and renames, even deciding what it means for a patch written previously to still apply cleanly is a tricky question.

Also, squaring patch-based review with a git or hg-based workflow can be tricky. There's something quite nice about the github-style pull-request workflow; but the semantics of merging are pretty tricky, and you need to be careful that what you read corresponds with the actual changes that are made by the merge.

For all the problems, the virtues of patch-based review are clear, and so about six months ago we started a project to revamp our cr tool to make it suitable for doing patch-like review. The new version of cr is now organized around what we call features, which are essentially hg bookmarks (similar to git branches) augmented with some associated metadata. This metadata includes

  • An English description of the change
  • A base-revision that the changes should be read against
  • An owner
  • A collection (usually just one other than the owner) of full-feature reviewers.

The workflow for a developer goes something like this:

  • create a new feature by running cr feature create. You'll select a name for the feature and write the initial description. The base-revision will automatically be chosen as the most recent release.
  • Write the code, using hg in the ordinary way, making commits as you go and pushing the result to a shared, multi-headed repo that has all of the features people are working on.
  • When you think the code is in a good state, get the feature enabled for review. At this point, you'll need to get a full-feature reviewer selected. It's this person's job to read every change that goes into this feature.
  • The full feature reviewer then reads the diff from the base-revision to the tip, adding comments, requesting fixes, and reading diffs forward until they're happy, at which point, it's seconded.
  • Once it's seconded, the feature is enabled for review by anyone who is signed up for review for the specific files you touched. How many file reviewers you are depends on the nature of the project. In our most safety-critical systems, every file has three reviewers. In some other systems, there are no file reviewers at all.

The remaining work needs to be done by the release manager. A release manager can create a new release based on a set of features that:

  • are fully reviewed, and have no outstanding reviewer complaints to be resolved.
  • compile cleanly on their own and pass their automated tests
  • have as their base revision the previous release
  • can be merged together cleanly

Checking that things "can be merged together cleanly" is actually tricky, since you can't just trust hg's notion of a merge. cr has its own merge logic that is more conservative than what hg and git do. The biggest worry with hg is that it tries to guess at a reasonable base-point for the 3-way merge (usually the greatest common ancestor of the two heads to be merged). Usually this works well, but it's easy to construct crazy cases where on one branch you make changes that are just silently dropped in the merge. There is also some rather surprising behavior that can come into play when files are moved, copied or deleted as part of the mix.

cr, on the other hand, will always choose the base-point of the features to be merged as the base-point for the 3-way merge. This way, the diffs that are reviewed are also the diffs that are used for constructing the merged node. Also, cr has some extra sanity conditions on what merges it's willing to try. This all greatly reduces the scope for surprising results popping out the other side of a merge.

If the base-revision of a given feature is against an older release, then you need to rebase the review before it can be released, i.e., update the base-revision to the most recent release. Among other things requires you to merge your changes with tip. If there are conflicts, you then either need to review the resolution of the conflicts, or you simply need to reread the diffs from scratch. The last bit is pretty rare, but it's an important escape hatch.

How'd it go?

The new code-review process has had a dramatic effect on our world. The review process for our main repository used to take anywhere from a couple of weeks to a three months to complete a cycle. Today, those releases go out every week, like clockwork. Everyone knows that if they can get their feature cleaned up and ready, they can always get it out that week. Indeed, if you're following our open-source releases on github, you'll see that new packages have shown up once a week for the last 16 weeks.

Feature-baaed review has led to a significant increase in the rate of change of our critical-path systems. Code review is now considerably less painful, and most importantly, it's easier than ever to say no to a feature that isn't ready to go. In old-style batch review, there was a lot of pressure to not hold up a release polishing some small bit, which sometimes lead you to release code that wasn't really ready. Now, that problem has largely vanished.

The barrier to entry for people who want to contribute to critical path systems has also been lowered. This has also contributed to us being able to get projects out the door faster.

But the most striking result I've seen is from our post-trade group, which operates outside of the review process used for the critical-path systems. The post-trade team is responsible for our infrastructure that handles everything after a transaction is done, like tracking the clearing and settlement of our trades or managing our trading books and records.

Post-trade has historically had a more relaxed approach to code review --- they do it, but not on all parts of the system, and not in a particularly strict way. In the last few months, however, they switched over to using the new feature-based workflow, and even though they're doing a lot more code review (which takes serious time), their overall process has become faster and more efficient. We think that's largely do to having a well-managed workflow for managing and merging independent features, even without whatever the benefits of review itself.

I'm now pushing to get feature-based review adopted throughout the firm. Obviously, not all code needs to be scrutinized to the same level --- having three reviewers for every file is sometimes sensible, sometimes overkill --- but ensuring that no change can get in unless one other person reads it and thinks it's reasonable is a good baseline rule. Review has a lot of benefits: it improves the quality of the code, gives you a great opportunity for training, and helps spread knowledge. Those benefits make sense everywhere we have people programming.

Maybe the biggest lesson in this for me is the importance of thinking through your processes, focusing on the biggest bottlenecks, and doing what you can to fix them.