Self Adjusting DOM and Diffable Data

In my last post, I gave some simple examples showing how you could use self adjusting computations, or SAC, as embodied by our Incremental library, to incrementalize the computation of virtual dom nodes. In this post, I'd like to discuss how we can extend this approach to more realistic scales, and some of the extensions to Incremental itself that are required to get there.

Along the way, we'll discuss the way that diffable data structures relate to self adjusting computations, and how we can use ordinary pure data types and functional updates to those data types as the basis for building efficient incremental computations.

First, let's bring back the code of the example from that post, which is a simple component that displayes the state of a machine in a datacenter.

  1. module Machine = struct
  2. module Model = struct
  3. type display_mode = Temperature | Uptime
  4. type t = { temperature: float;
  5. uptime: Time.Span.t;
  6. selected: bool;
  7. mode: display_mode;
  8. }
  9. [@@deriving fields]
  10. end
  11.  
  12. let view model =
  13. let mode = Incr.map model ~f:Model.mode in
  14. let temperature = Incr.map model ~f:Model.temperature in
  15. let selected = Incr.map model ~f:Model.selected in
  16. let uptime = Incr.map model ~f:Model.uptime in
  17. let text =
  18. match%bind mode with
  19. | Temperature ->
  20. let%map x = temperature in sprintf "%F degrees" x
  21. | Uptime ->
  22. let%map x = uptime in Time.Span.to_string x
  23. in
  24. let%map text = text and selected = selected in
  25. Vdom.text
  26. (if selected then [Vdom.class_ "selected"] else [])
  27. text
  28. end

It's worth mentioning that in the service of exercising the machinery, I've over-incrementalized this example. It would very likely be cheaper to just generate the necessary vdom in its entirety every time the model changes. That said, the above is still a good example of how to use Incremental, and it's worth understanding how it works.

The first thing that is done by view is to project out different parts of the model into separate incrementals. The key observation is that, even though model changes, if Model.mode model doesn't change, then the corresponding incremental will cut off computation there, preventing dependent computations from refiring. So, whenever the model changes we'll check each field in the model and whether it changed before launching dependent computations.

By default, this cutoff happens based on physical equality; but one can also specify more expansive semantic notions of equality when that's important.

Mapping over maps, incrementally

The projection approach works well for simple fixed-size records, but it doesn't work when it comes to more complex models with variable sized components. To see why, let's consider what happens if we add one more layer, an outer model consisting of a collection of named machines. We do this using OCaml's Map data structure, which is essentially a functional dictionary.

  1. module Model = struct
  2. type t = Machine.Model.t String.Map.t
  3. end

How do we construct an incremental view for this? We can no longer simply project out all of the individual machine models into a fixed set of incrementals, since the number of machines in the map is not known in advance.

What we need here is a way of efficiently mapping over an incremental map. (The naming collision between the Incr.map function and the Map.t data structure is an unforutnate source of confusion here.) Here's a signature that captures what we need:

  1. val Incr.Map.map
  2. : ('k,'v,'cmp) Map.t Incr.t
  3. -> f:('v Incr.t -> 'w Incr.t)
  4. -> ('k,'w,'cmp) Map.t Incr.t

If you're not used to reading OCaml signatures, this can take a moment to digest. First, note that ('k,'v,'cmp) Map.t denotes a map whose key is of type 'k, whose data is of type 'v. (You can ignore the 'cmp parameter.)

So, what does this function do? A good rule of thumb for incremental is that if you want to understand the meaning of an incremental function (ignoring performance), you can just drop all the incremental bits. If we do that for the above type signature, we get this.

  1. val Map.map
  2. : ('k,'v,'cmp) Map.t
  3. -> f:('v -> 'w)
  4. -> ('k,'w,'cmp) Map.t

This type correctly suggests that the function is, as the name suggests, a map, which is to say, a function that transforms the elements of a container, thereby producing a new instance of that container.

That tells us what Incr.Map.map is supposed to compute, but what about the performance of that computation? In particular, what is the structure of the incremental graph this generates?

The following picture illustrates the desired incremental graph. The idea is that there is a path in the graph for every key/data pair in the map. When a given key is updated, an update should be passed down the corresponding incremental path. When a key is added or removed, then paths should be added or removed accordingly. And, rather than recomputing map' from scratch on every update, we want to do the corresponding update, insert or delete into map'.

The incremental graph generated by Incr.Map.map

This function is enough to build an efficient version of our view. In particular, we can write our view function as follows:

  1. let view (model : Model.t Incr.t) : Vdom.t Incr.t =
  2. let%map machine_vdoms = Incr.Map.map model ~f:Machine.view in
  3. Vdom.node "div" [] (Map.data machine_vdoms)

Note that at the end, we do need to construct a complete list of all the displayed machines, and that does have to be reconstructed in its entirety on every change. That's because the virtual dom API uses lists, which can't be updated incrementally in an efficient way. If the virtual dom were modified to take a tree-like structure for nodes, this could be handled efficiently.

That said, I don't think this is a serious practical problem, since in most UIs, the cost of allocating such nodes is a small part of the overall cost of reconstructing the virtual dom.

Diffable data and incrementality

In order to implement Incr.Map.map efficiently, we need some special support both from our Map datatype and from incremental itself. From Map, we need a way to efficiently figure out a set of updates, inserts and deletes that will take us from one map to another. Happily, the Map module in Core_kernel contains a function that can help.

  1. val Map.symmetric_diff
  2. : ('k, 'v, 'cmp) Map.t
  3. -> ('k, 'v, 'cmp) Map.t
  4. -> data_equal:('v -> 'v -> bool)
  5. -> ('k * [ `Left of 'v | `Right of 'v | `Unequal of 'v * 'v ]) Sequence.t

Here, the Left case corresponds to data that's only in the first map, but not in the second, and so is a deletion. Similarly, the Right case corresponds to an addition, and the Unequal case corresponds to an update to the value.

A key property of Map.symmetric_diff is that it takes advantage of physical equality of nodes by short-circuiting. This means that if you take a map, make some modifications to it, and then compute the symmetric diff between the original map and the new one you just created, the computational effort is proportional to the work that was required to generate the second map in the first place. In other words, you can efficiently read back the set of changes applied to a map.

With Map.symmetric_diff, we have a way to efficiently determine what has changed in the map, and thereby which incrementals need to be updated, added or removed. This diffing is reminiscent of both the diffing inherent in the virtual dom approach, and the enter/update/exit pattern that characterizes the D3 visualization library.

Implementing Incr.Map.map

Map.symmetric_diff isn't quite enough to build Incr.Map.map. We also need some extra support from Incremental itself for controlling the pattern of updates so that even though the node containing the input map has many dependent nodes, we only fire the nodes that are impliciated by the symmetric diff. Indeed, we're still working on providing first class support for this kind of primitive within Incremental.

Even without those hooks, we can still take some steps in the right direction, the first of which is to use mutable state to keep the old map to diff against. Essentially, we want a function with this signature:

  1. val Incr.diff_map :
  2. 'a Incr.t -> f:(old:('a * 'b) option -> 'a -> 'b) -> 'b Incr.t

Here, the function f gets access to the old input and old output of this same function when computing the new one. We can implement diff_map straightforwardly enough:

  1. let diff_map i ~f =
  2. let old = ref None in
  3. let%map a = i in
  4. let b = f ~old:!old a in
  5. old := Some (a,b);
  6. b

Using diff_map, we can implement a simpler form of Incr.Map.map which takes a function that transforms the elements of the map directly, rather than incrementally. It would have this signature:

  1. val Incr.Map.simple_map
  2. : ('a, 'b, 'c) Map.t Incr.t
  3. -> f:('b -> 'd)
  4. -> ('a, 'd, 'c) Map.t Incr.t

And here's the implementation.

  1. let simple_map m ~f =
  2. diff_map m ~f:(fun ~old m ->
  3. match old with
  4. | None -> Map.map m ~f
  5. | Some (old_in,old_out) ->
  6. let diff = Map.symmetric_diff ~data_equal:phys_equal old_in m in
  7. Sequence.fold diff ~init:old_out ~f:(fun acc (key,change) ->
  8. match change with
  9. | `Left _ -> Map.remove acc key
  10. | `Right d | `Unequal (_,d) -> Map.add acc ~key ~data:(f d)
  11. )
  12. )

The implementation is simple enough: when the map changes, we diff the old input and the new input. By folding over that diff, we can efficiently update the old output to produce the new output.

Implementing the full Incr.Map.map is more complicated, because the fact that f is a function that consumes and produces incrementals means that you need to construct a more complicated incremental graph --- effectively, you need to split, transform and merge the results.

We can bridge the difference between Incr.Map.simple_map and Incr.Map.map with the following two function.

  1. val Incr.Map.split :
  2. ('a,'b,'cmp) Map.t Incr.t -> ('a,'b Incr.t,'cmp) Map.t Incr.t
  3. val Incr.Map.join :
  4. ('a,'b Incr.t,'cmp) Map.t Incr.t -> ('a,'b,'cmp) Map.t Incr.t

Incr.Map.split takes an incremental map and produces an incremental map whose data is itself incremental. The idea is that the outer incremental updates only when a key is added or removed from the input map. Changes to the input map that change data of an existing key instead lead to an update to the corresponding inner incremental. Incr.Map.join is simply the inverse operation, taking an incremental map full of incrementals, and producing a simple incremental map.

Using these two functions together, we can convert our simple_map into a full implementation of Incr.Map.map, as follows.

  1. let map m ~f =
  2. Incr.Map.join (simple_map ~f (Incr.Map.split m))

While there are hacks to approximate them using Incr.map and Incr.bind, the ordinary incremental interface isn't enough to build a version of Incr.Map.join and Incr.Map.split with the right performance characteristics. For that reason, we've started work on an Expert interface within Incremental that lets you create incremental nodes with precise control over their dependencies and when they refire.

Why virtual-dom?

Almost everyone that I've discussed these ideas with, both inside of Jane Street and beyond, have asked the question: given that you have Incremental, why bother with using a virtual dom at all? After all, Incremental provides a mechanism for tracking which parts of the computation have changed. WHy should one have two layers that are doing something so similar?

We've made things even more duplicative by diffing not just the virtual dom but also our own internal data structures as a way of creating more efficient incremental computations on top of ordinary functional data types.

But despite the seeming redundancy of this approach, I think it has real value. That's because, although the optimizations provided by Incremental are valuable, programming with Incremental is a worse experience than doing ordinary functional programming with immutable values. The big win of virtual dom is that it lets you do most of your programming in a typical functional style with reasonable performance. You only need to resort to using Incremental when the performance provided by this simple approach isn't sufficient.

But diffability isn't just useful on the output of your incremental computation. It's also helpful on the inputs, and that's where the diffability of Core_kernel's maps comes in handy. That way, we can write our code for updating our model in an ordinary functional style, but still use incremental to efficiently scaffold our view functions on top.

Summing up

We have some beginning versions of all of this infrastructure internally at this point. In particular, we've built fast versions of Incr.Map.map and friends, and used this for creating some simple dynamic web applications. The results so far are pretty promising. By paying some careful attention to how the input data changes and incrementalizing accordingly, we can build applications that display thousands of rows of data of which hundreds of cells are changing every second with relatively low latency, able to respond to every update in just a handful of milliseconds.

I doubt our performance is as good as more widely used and likely better optimized frameworks like React and Mercury. But the thing that has struck me about all of this work is how easy and natural it has been. With only a small amount of effort, we've been able to leverage existing tools, from Incremental to Async to virtual-dom, to write highly responsive applications in a simple and natural style, and one that fits naturally within our existing OCaml libraries and tools.

Another interesting question is how this all relates to the use of FRP for UIs. I've written a bit about the difference between FRP and SAC, but the short version is that FRP is more about modeling temporal processes, and SAC is purely about optimization.

For my perspective, SAC fits more cleanly into the goal of simply rendering virtual-dom, and I can't see as much value in having time-oriented operations as a first-class construct. And the fact that SAC is focused purely on optimization allows it to do a better job of that. In particular, the dynamism of operations like bind makes it possible to precisely control the dependency structure of the system, and do things like quiesce parts of the UI that are not in view. This is harder to do in an FRP system without introducing memory leaks.

All of which makes me think that self adjusting computation is a useful idiom for UI programming, and perhaps one that should be adopted more widely.

Self Adjusting DOM

I've been thinking recently about how to structure dynamic web applications, and in particular about the role that incremental computation should play.

In this post, I'd like to outline an approach we've been experimenting with internally which uses Incremental, a general purpose library for building so-called self adjusting computations. Self adjusting computations are essentially graph-structured computations that can be updated efficiently when their inputs change.

I'll describe this all in terms of OCaml, which is the language we're doing these experiments in (courtesy of the excellent js_of_ocaml), but the ideas should be applicable to other languages.

Elm in OCaml

Let's start by outlining a simple, non-incremental approach, inspired by the Elm Architecture. In that architecture, there are three primary elements from which an application is composed: the model, the view and a set of actions.

The model represents the logical state of the application. This includes things like the data being displayed on the page, the current page being displayed, and the part of the page that's in focus. But it omits most presentation details, and doesn't specify the concrete choice of dom elements used.

Actions are the things that can happen to the model. The arrival of new data to be displayed, or the click of a mouse, are translated into actions which are in turn interpreted to update the state. These are important, but don't affect the incrementality story much, so we won't discuss them in detail.

The view is a function that takes the current model and produces a vdom (virtual dom) tree, an immutable datastructure that represents the intended state of the dom. It is the view function that makes the concrete presentation choices that are left unsaid by the model. The view determines some of the dynamic behavior of the application as well, since within the vdom you can register callbacks on keypresses or mouse clicks that enqueue actions to later be applied to the state.

We can wrap this up into a module signature, as follows.

  1. module type App_intf = sig
  2. module Model : sig
  3. type t
  4. end
  5.  
  6. module Action : sig
  7. type t
  8. val apply : t -> Model.t -> Model.t
  9. end
  10.  
  11. val view : Model.t -> (Action.t -> unit) -> Vdom.t
  12. end

Note that the second argument to view is a function that can be used for scheduling actions to occur. This function is used to build callbacks which are attached to the vdom.

We can combine this with a simple interface for starting up an application:

  1. val start_app
  2. : (module App_intf with type Model.t = 'model)
  3. -> init:'model
  4. -> unit

which is responsible for running the display loop.

This isn't quite enough; in particular, I've omitted the necessary hooks for kicking off asynchronous processes for doing things like grabbing data from a server. But the model above is a pretty good start, and gives you a sense of the structure of an Elm-style application.

This approach isn't too bad from a performance perspective. In particular, the on_startup function minimizes the amount of churn to the dom by, on every action, diffing the newly generated vdom against the previous instantiation, to produce a minimal patch to apply to the dom proper. And modifications to the dom itself are quite expensive.

But this approach doesn't scale to big, complex UIs. That's because, even though most actions change the model in only small ways, the full vdom tree has to be created every time. If the vdom is large and complicated, this is a serious problem.

Incremental Basics

Let's see how we can use Incremental to optimize generation of the vdom.

Imagine we want to display information about the state of a set of machines in our datacenter. For each machine, assume we have two pieces of data: the temperature, and whether that particular server is currently selected. Now, given incremental values representing the data for just one machine, how can we use Incremental to produce the vdom?

Incremental provides operators for building computations on top of incremental values. For example, the function Incr.map2 has this signature.

  1. val Incr.map2 : 'a Incr.t -> 'b Incr.t -> f:('a -> 'b -> 'c) -> 'c Incr.t

This lets us build a computation that takes two incremental inputs, and combines them to make a single new incremental. We can use this for generating an incremental vdom to display our machine.

  1. let view temperature selected =
  2. Incr.map2 temperature selected
  3. ~f:(fun temperature selected ->
  4. Vdom.text
  5. (if selected then [Vdom.class_ "selected"] else [])
  6. (sprintf "%F degrees" temperature))

We can write this a bit more naturally using the ppx_let syntax extension. Essentially, ppx_let allows us to encode maps using ordinary let binding syntax.

  1. let view temperature selected =
  2. let%map temperature = temperature and selected = selected in
  3. Vdom.text
  4. (if selected then [Vdom.class_ "selected"] else [])
  5. (sprintf "%F degrees" temperature)

The key issue here is that the code regenerating the text node will only be rerun when necessary, i.e., when the value of either selected or temperature have changes. In a complex view with lots of incremental inputs and lots of vdom nodes built on top of them, judicious use of map allow you to recompute only the vdom that are in need of an update.

Using bind

It turns out that map isn't enough. One limitation of map is that the dependencies introduced by map are static. e.g. Incr.map2 a b ~f will produce a node that reruns f every time a or b change.

But sometimes, you want dynamic rather than static dependencies. For a trivial example, imagine that our machine view has different inputs that might be used in different situations, say, there's a mode that determines whether uptime or temperature are displayed. We could implement such a view on top of map as follows.

  1. let view temperature uptime selected mode =
  2. let%map temperature = temperature
  3. and uptime = uptime
  4. and selected = selected
  5. and mode = mode
  6. in
  7. Vdom.text
  8. (if selected then [Vdom.class_ "selected"] else [])
  9. (match mode with
  10. | Temperature -> sprintf "%F degrees" temperature
  11. | Uptime -> Time.Span.to_string uptime)

Here, the appearance of the dom node is dynamic, but the dependencies are not. Even if you're in Temperature node, you'll still recompute the Vdom when the uptime changes.

On this small scale, the extra cost is trivial. But as you consider larger more complicated views, the ability to control dependencies precisely can have real value.

In this case, we can control the dependencies using bind. Here's the signature of bind:

  1. val bind : 'a Incr.t -> ('a -> 'b Incr.t) -> 'b Incr.t

The signature is deceptively simple, but it lets you do something powerful. In particular, Incr.bind i f returns an incremental whose dependencies, and even whose interior nodes, are chosen dynamically by f.

Here's a simple rewrite of the code above that takes advantage of bind.

  1. let view temperature uptime selected mode =
  2. let text =
  3. Incr.bind mode (function
  4. | Temperature ->
  5. let%map x = temperature in sprintf "%F degrees" x
  6. | Uptime ->
  7. let%map x = uptime in Time.Span.to_string x
  8. )
  9. in
  10. let%map text = text and selected = selected in
  11. Vdom.text
  12. (if selected then [Vdom.class_ "selected"] else [])
  13. text

Here, bind lets us create a text incremental that depends either on the temperature or on the humidity, depending on the mode. We can write this with our syntax extension, using its specialized match syntax.

  1. let view temperature uptime selected mode =
  2. let text =
  3. match%bind mode with
  4. | Temperature ->
  5. let%map x = temperature in sprintf "%F degrees" x
  6. | Uptime ->
  7. let%map x = uptime in Time.Span.to_string x
  8. in
  9. let%map text = text and selected = selected in
  10. Vdom.text
  11. (if selected then [Vdom.class_ "selected"] else [])
  12. text

One thing that's nice about ppx_let is that it makes it easier to separate thinking about what your code does from how it's incrementalized. If you ignore the %map and %bind annotations, what you're left with is enough to understand the meaning of the computation that's being incrementalized. The annotations are only important for understanding the performance characteristics of the incremental recomputation.

Decomposing incrementals

Here's an updated version of our App_intf which includes Incremental.

  1. module type App_intf = sig
  2. module Model : sig
  3. type t
  4. end
  5.  
  6. module Action : sig
  7. type t
  8. val apply : t -> Model.t -> Model.t
  9. end
  10.  
  11. val view : Model.t Incr.t -> (Action.t -> unit) -> Vdom.t Incr.t
  12. end

The only change is the view function, which instead of taking a Model.t and returning Vdom.t now takes a Model.t Incr.t and returns a Vdom.t Incr.t. And instead of calling this function on every update, we simply call it once at the beginning. The start_app function would be responsible for repeatedly updating the Model.t Incr.t as the model changes, and can then read off the new vdom node from the Vdom.t Incr.t.

This all sounds good on the surface, but there's a fly in this ointment, which is that my earlier examples were built on the assumption that the different inputs to the computation were already broken down into separate incremental values. But here, we have one big incremental value containing the entire model. How do we apply Incremental effectively in this case?

Let's revist our server-display example from before. Now, instead of assuming we have a different incremental for each property of a server, imagine we have one incremental representing the full state of one server. We can use the following as our model type:

  1. module Model = struct
  2. type display_mode = Temperature | Uptime
  3. type t = { temperature: float;
  4. uptime: Time.Span.t;
  5. selected: bool;
  6. mode: display_mode;
  7. }
  8. [@@deriving fields]
  9. end

The deriving annotation above provides us with accessor functions for each field, which will be useful shortly.

It's easy enough to write a view function that returns an incremental that recomputes in its entirety every time the model changes, as shown below.

  1. let view model =
  2. let%map model = model in
  3. let text =
  4. match model.mode with
  5. | Temperature ->
  6. sprintf "%F degrees" model.temperature
  7. | Uptime ->
  8. Time.Span.to_string model.uptime
  9. in
  10. Vdom.text
  11. (if model.selected then [Vdom.class_ "selected"] else [])
  12. text

And for this tiny example, that's likely the correct thing to do. But we'll want to incrementalize more precisely for real world examples, so let's see how we could do that in this case.

What we effectively need to do to incrementalize this is to convert our one big incremental model into a number of smaller incrementals. We can do that by projecting out individual components of the model using map.

For example, if we write:

  1. Incr.map model ~f:(fun m -> m.mode)

or, equivalently

  1. Incr.map model ~f:Model.mode

We'll get an incremental that contains only the mode. Critically, incrementals that are depend on this one will only update when the mode actually changes, not, say, the temperature is updated. That's because Incremental cuts off computations when the new output is physically equal to the old one, so that even if the model changes, each projected incremental will only propagate the computation if its data has changed.

We can use this approach to first project out the fields of the model record into different incrementals, and then build out computation on top of that. That looks like this:

  1. let view model =
  2. let mode = Incr.map model ~f:Model.mode in
  3. let temperature = Incr.map model ~f:Model.temperature in
  4. let selected = Incr.map model ~f:Model.selected in
  5. let uptime = Incr.map model ~f:Model.uptime in
  6. let text =
  7. match%bind mode with
  8. | Temperature ->
  9. let%map x = temperature in sprintf "%F degrees" x
  10. | Uptime ->
  11. let%map x = uptime in Time.Span.to_string x
  12. in
  13. let%map text = text and selected = selected in
  14. Vdom.text
  15. (if selected then [Vdom.class_ "selected"] else [])
  16. text

And this has basically the right incremental structure.

This is a good start, but we still don't really have the full story. In particular, the above approach to incrementalization makes sense when our overall data is organized as simple static structures like records. But it's not clear what to do when we have more complex data structures. For example, what if we had a collection of machines stored in a map or a set? We don't yet have a way of efficiently handling this kind of complex abstract data type using Incremental.

More on that in my next post.

Incremental computation and the web

I've recently been thinking about the world of JavaScript and web applications. That's odd for me, since I know almost nothing about the web. Indeed, Jane Street's use of web technologies is quite minimal -- nearly all of our user interfaces are text based, and all told we've been pretty happy with that.

But there are real limitations to console apps, and if you need something richer that's cross-platform, the web is pretty appealing. For us it's made yet more appealing by the fact that OCaml, our language of choice, compiles into JavaScript via js_of_ocaml.

So recently, when a few people internally got interested in trying out JavaScript-based UIs, I dug in a little to try to understand the landscape, and help us figure out what approach to take.

Virtual DOM, all at once

I started by trying to understand more about the approaches taken in the wider world for building JavaScript apps. One idea the struck me as particularly interesting was virtual DOM. Virtual DOM showed up first in Facebook's React, but has since inspired other implementations, like Matt Esch's virtual-dom library, which in turn is the basis of the Mercury web framework and Elm's Html library. Other libraries have wrapped and extended React, like ClojureScript's Om framework.

To understand the appeal of virtual DOM, you first need to understand what the world looks like without it. JavaScript applications in the browser are fundamentally tied to the DOM, which is the tree of objects that reflects the structure of the HTML that the page is built from. The DOM is wired into the browser, and mutating the DOM is how you change what's shown on the screen, a necessity for a dynamic web app.

But working directly with DOM can be awkward. For one thing, it encourages you to write your display logic twice: once to create the initial state of your page, and then again for the code that updates the DOM in response to external events.

But it's worse than just having to write your logic twice: the second time is also trickier. That's because, for performance reasons, you need to minimize changes to the DOM, since those changes can cause expensive reflows and the like. Doing this kind of change minimization by hand can be pretty painful.

The goal of virtual DOM is to let you write your display logic just once, and to do so in a convenient, straight-ahead style. The idea is simple: instead of modifying the DOM directly, you create immutable trees that represent the DOM you'd like to have, a virtual DOM. Every time your state changes, you recompute the virtual DOM, and then diff the new virtual DOM against the old. The resulting patch can then be applied to the actual DOM, which minimizes changes to the DOM proper.

This makes it easier to express your display logic cleanly. Elm uses virtual DOM as part of the Elm architecture. In that approach, the state of the application is kept in a model value, which abstractly represents the state of the application, omitting presentation details. A separate view function is used to convert the model into a virtual DOM tree that describes the page that should be shown to the user.

The application is made dynamic by adding an action type which summarizes the kinds of changes that can be made to the model. Actions are enqueued either from callbacks embedded in the virtual DOM, or by other jobs that communicate with the outside world via web requests and the like. When the action is applied to the model, a DOM update is done by recomputing the virtual DOM, and then diffing and patching the real DOM accordingly.

Virtual DOM, incrementally

As described, the above approach involves computing the virtual DOM from scratch after every action. This is done even if the change to the DOM implied by the action is small, which is the common case. Essentially, every key press and mouse click causes the entire virtual DOM to be recomputed.

In a world where DOM updates are the only expense that matters, this isn't so bad. And for sufficiently small web applications, that's almost right. But once you're creating large, dynamic UIs, this simple story falls apart, and the cost of recreating the virtual DOM every time matters.

That's why in all of these virtual DOM APIs and frameworks, there's some form of incrementalization built in, a way to avoid paying the full cost of rebuilding the virtual DOM when the logical changes are small.

In React, for example, the state is organized into a set of hierarchical components, each with its own render function. These components are structured to match the structure of the HTML that they generate, with the idea that you'll only have to re-render the few components whose input data has actually changed. React effectively memoizes the render computation at the component level.

Elm, rather than tying the incrementalization directly to a framework-level notion of component, lets you introduce memoization in the construction of individual virtual DOM nodes. To do this, Elm's Html module exposes a set of "lazy" functions with roughly these signatures (shown with OCaml syntax).

  1. val lazy1 : ('a -> Html.t) -> 'a -> Html.t
  2. val lazy2 : ('a -> 'b -> Html.t) -> 'a -> 'b -> Html.t
  3. val lazy3 : ('a -> 'b -> 'c -> Html.t) -> 'a -> 'b -> 'c -> Html.t

Here, the first argument is the render function, and the remaining arguments are the values to be passed to the render function.

The idea is that a call to one of these lazy functions won't call the render function immediately. Instead, it creates a special node that stores the render function and its arguments for later. The render function is only called as part of the process of diffing two virtual DOM trees. When the diff gets to the point of comparing two such nodes, it first compares the things that the node was built from, i.e., the render function and its arguments. If they're the same, then the diff is empty. If they differ, then the render function is run to compute more of the tree, and the diffing process continues from there.

It's worth noting that forcing the render function for a given node to run will create more of the virtual DOM tree, but it won't necessarily build everything below that node. In particular, the tree that's created may contain yet more lazy nodes, which won't be forced until the diffing process gets to them.

By making enough nodes lazy in this way, you can incrementalize the computation of the entire virtual dom tree, only forcing the recomputation of parts of the virtual dom that could have changed given the changes in the underlying data model.

Elm's approach has some limitations. While it doesn't limit memoization to a particular notion of a component, it does tie it to nodes in the DOM tree. This can be limiting, since it prevents you from sharing other parts of the computation that don't result concretely in DOM nodes.

It's also a little anti-modular, in that you basically need to call your lazy function on simple values and top-level functions, so ordinary functional programming modularization techniques, which often rely on passing around closures, don't work as well as you'd hope.

Beyond virtual DOM

Virtual DOM isn't the only approach to simplifying the process of programming the DOM. Another example I ran across is Mike Bostock's amazing D3 library. D3 has some of the same goals as virtual DOM, in that it aims to provide a nice way to construct complex web pages based on some more abstract data model. Like virtual DOM, D3's approach lets you specify the view logic once, while producing a view that responds efficiently to changing data. D3 is doing this in the service of data visualization, but the approach it takes is not limited to that domain.

Where virtual DOM encourages you to think of your view calculation as an all at once affair, D3 makes you think about incrementalization explicitly where it matters. In particular, when you specify how the view changes in response to data, you do so by explicitly specifying what happens in three cases: enter, update, and exit. The enter case corresponds to new data points arriving, update corresponds to data points that are changing, and exit corresponds to data being removed.

These transformations are specified using a spiffed up version of the DOM selectors API, which lets you can select a collection of nodes by stating conditions that those nodes satisfy. You can then specify ways of transforming those nodes, and, somewhat surprisingly, specify the creation of nodes that don't exist yet. This is done using the append operation, and is all part of what's is called data binding in the D3 world.

If this sounds confusing, well, I found it confusing too. But the D3 approach has some good things going for it. For one thing, it gives you a natural way of thinking about animations, since you can specify simple animations to run on the enter/exit/update actions, which is more awkward in virtual DOM based approaches.

To borrow an analogy from circuits, virtual DOM is level-triggered, meaning the view depends only on the current value of the state; but D3 is edge-triggered, meaning that the display logic can depend on the precise transition that's occurring. This is a real difference in the models, but I'm not sure how important it is in practice.

To some degree, you can get around this issue on the virtual DOM side by expressing more time-dependent information in the model. Also, you can add edge-triggered events on top of your virtual DOM, which React does. That said, it's not as front-and-center in the Virtual DOM API as it is with D3, where edge-triggered animations are an easy and natural part of the design.

Incrementality everywhere

Given that incrementality seems to show up in one form or another in all of these web frameworks, it's rather striking how rarely it's talked about. Certainly, when discussing virtual DOM, people tend to focus on the simplicity of just blindly generating your virtual DOM and letting the diff algorithm sort out the problems. The subtleties of incrementalization are left as a footnote.

That's understandable, since for many applications you can get away without worrying about incrementalizing the computation of the virtual DOM. But it's worth paying attention to nonetheless, since more complex UIs need incrementalization, and the incrementalization strategy affects the design of a UI framework quite deeply.

The other benefit of thinking about incrementalization as a first class part of the design is it can lead you in new directions. In that vein, I've been experimenting with using self-adjusting computations, as embodied by our Incremental library, as another approach to incrementalizing computation of the virtual DOM.

Self-adjusting computations is a general purpose approach to building efficient on-line computations developed by Umut Acar in his dissertation. Thinking about Incremental in the context of GUI development has lead us to some new ideas about how to build efficient JavaScript GUIs, and some new ideas about how Incremental itself should work. I hope to write more about that in an upcoming post.

(You can check out the next post here.)

Thanks

Most of what I've written here comes from talking to people who know a lot more about the web than I do, and I wanted to thank them. I had some very enlightening conversations with Jordan Walke about React's design and history. I've also talked a bunch to Evan Czaplicki about Elm, and I'm indebted to Spiros Eliopoulos, who helped me learn a bit about D3, in part through his ocaml-d3 bindings. Also thanks to Hugo Heuzard and Izzy Meckler for writing a bunch of useful code and helping me learn about js_of_ocaml and more generally about various details of JavaScript and modern browsers.

Why OCaml?

Here's a post from a talk I gave this last summer during our internship program about why we use OCaml. It spends a lot of time on how OCaml fits into the space of programming language designs, and why we think OCaml is in a real sweet spot in that design space, especially for the kind of work we do at Jane Street.

Warning: it's a very informal talk, with lots of questions and answers from the audience, not all of which are clearly audible, for which I apologize. Still, I hope people will get something out of it.

Testing with expectations

Testing is important, and it's hard to get people to do as much of it as they should. Testing tools matter because the smoother the process is, the more tests people will write.

Especially in the functional programming world, most of the talk about testing tools is focused on tools for property-based testing, like the various and sundry quickcheck-style systems. These are great, but sometimes, you don't want to write down properties --- what you want is to write your tests in terms of simple, concrete scenarios.

We've recently added support for what we call expect tests, a kind of test optimized for this kind testing. Expect tests allow you to write test scenarios without having to manually write out the output generated by the code you're testing. Instead, that output is captured and recorded automatically for you, in a way that makes it easy to integrate into the source of the test.

Our expect tests were inspired by Mercurial's unified test format. Unified tests are designed for testing command-line tools like hg, and so are specialized to the shell.

Here's an example using cram, an implementation of the unified test idea that is independent of Mercurial. Let's say we want to test the UNIX sort command. We might start by writing a test file, simple.t, that looks like this.

Dump some lines into a file

  $ cat > colors << HERE
  > red
  > yellow
  > green
  > HERE

sort the file and dump to stdout

  $ sort colors

If you then run cram on this, cram will show you that the test failed by showing you a diff.

expect-test $ cram simple.t
!
--- simple.t
+++ simple.t.err
@@ -10,5 +10,11 @@
 sort the file and dump to stdout

   $ sort colors
+  green
+  red
+  yellow

It also creates a new file, simple.t.err, which contains the output of running the script, intermixed with the script itself. You can accept the new version just by moving the err file over the original.

mv simple.t.err simple.t

If you run cram now, you'll see that the tests pass.

$ cram simple.t
.
# Ran 1 tests, 0 skipped, 0 failed.

If we break the tests somehow, then the diff will show us exactly what failed. For example, if we replace sort with cat, here's what Cram will show us:

bash-3.2$ cram simple.t 
!
--- simple.t
+++ simple.t.err
@@ -10,7 +10,7 @@
 sort the file and dump to stdout

   $ cat colors
-  green
   red
   yellow
+  green

# Ran 1 tests, 0 skipped, 1 failed.

Note how good the diff is for seeing how your test failed.

Starting with the development of Iron last year, we started using cram tests pretty extensively for command-line programs. We found it to be a very productive idiom, but it's pretty awkward to apply outside of the command-line domain. That's why we started thinking about how to get the benefits of cram, but in OCaml.

Breaking out of the shell

Unified tests are great for three reasons:

  • they let you combine the scenario and the output of that scenario (and comments) into one readable file
  • they help you construct the file automatically
  • they display test failures as easy-to-interpret diffs.

None of these advantages is tied to using the shell. To bring this to OCaml, though, we needed to figure out a reasonable way of embedding these tests in an OCaml program, without breaking all of the tooling. We did this by leveraging OCaml's annotations, which let us get the data we need in place without breaking from the ordinary syntax of an OCaml program. That means that tools like merlin and ocp-indent and editor modes like tuareg will work without incident.

We can write the OCaml analogue of our cram test by creating the following file, named simple.ml.

  1. open Core.Std
  2.  
  3. let%expect_test "simple sort" =
  4. let sorted = List.sort ~cmp:String.compare ["red";"yellow";"green"] in
  5. [%sexp_of: string list] sorted |> Sexp.to_string_hum |> print_endline;
  6. [%expect {| |}]

Here, the let%expect_test introduces a new test, and registers it with our inline test framework. [%expect {| |}] introduces a section where output is captured, and multiple such declarations can go in a single test.

Since we haven't actually filled in the output, running the test will fail. Here's the diff it would show.

open Core.Std

  let%expect_test "simple sort" =
    let sorted = List.sort ~cmp:String.compare ["red";"yellow";"green"] in
    [%sexp_of: string list] sorted |> Sexp.to_string_hum |> print_endline;
-   [%expect {| |}]
+   [%expect {| (green red yellow) |}]

As with cram, a new file will have been generated, in this case called simple.ml.corrected, containing the updated test file. As with cram, you can accept the new test results by just copying the generated file over the original.

The above example is simple, but expect tests really shine when you start doing bigger and more complicated scenarios. And the ability to do this in ordinary OCaml code means you can use it for a much wider set of applications.

The source hasn't been released yet, but it will come out as part of our ordinary public release process, and we hope others will give it a spin when it does come out.

Quickcheck for Core

Automated testing is a powerful tool for finding bugs and specifying correctness properties of code. Haskell's Quickcheck library is the most well-known automated testing library, based on over 15 years of research into how to write property-base tests, generate useful sources of inputs, and report manageable counterexamples. Jane Street's Core library has not had anything comparable up until now; version 113.00 of Core finally has a version of Quickcheck, integrating automated testing with our other facilities like s-expression reporting for counterexample values, and support for asynchronous tests using Async.


Motivation

There are at least two other Quickcheck implementations on OPAM. Why write implementation N+1?

Before working at Jane Street, I did some research relating to randomized testing (Eastlund 2009, Klein et al. 2012). In both cases, users of the software packages involved found it easy to write tests that use random data, but hard to write random distributions that actually produce values that are useful to test.

This Quickcheck clone started as an attempt to address those concerns by building tools for tuning random distributions of values. One way I've done that is building in tools for "remembering" sets of chosen values; for example, random tests won't repeat values, and it is easy to build distributions that generate sets or sorted lists and maintain uniqueness. This design is still in early stages, I don't know if these are the sort of tools that will be needed most, and I'm eager to get feedback from more users. The library has also evolved to integrate Quickcheck-style testing with the Jane Street Core and Async libraries. Over time I hope this will also produce useful random distributions for more of the Core types, so they will be easily testable.


Example

For those unfamiliar with Haskell's Quickcheck, the library allows users to write tests of some property of a function, then test that property on many automatically-generated input values. For example, we might want to test that the optimized implementation of list-append in Core is associative:

  1. TEST_UNIT "associativity of list append" =
  2. Quickcheck.test Quickcheck.Generator.(tuple3 (list int) (list int) (list int))
  3. ~sexp_of:<:sexp_of>
  4. ~f:(fun (xs, ys, zs) ->
  5. <:test_eq>
  6. (List.append xs (List.append ys zs))
  7. (List.append (List.append xs ys) zs))

The test above randomly generates three lists of integers, appends them together two different ways, and tests that the results are equal. This process is repeated with different randomly chosen lists each time, until an error is reported or the default trial count is reached. Let's break down the parts of the code here.

  • TEST_UNIT is a camlp4 syntax for unit tests.
  • Quickcheck.test is the main entry point for running a test using Quickcheck. It takes two required, unnamed arguments. The first is a generator, specifying the probability distribution of values to choose from when generating inputs for the test. The second is a function that consumes the generated input values and runs a test. The function returns () if successful and raises an exception otherwise.
  • Quickcheck.Generator.(tuple3 (list int) (list int) (list int)) constructs the generator that we want to use here. Most of the functions in the Quickcheck.Generator module are named after the types they generate; here, default probability distributions for lists of ints, combined using tuple3.
  • We provide the optional named argument ~sexp_of to Quickcheck.test. This argument is used to render the first generated value that triggers an error. The <:sexp_of< ... >> expression is camlp4 syntax for the default sexp conversion for a type.
  • The final argument to Quickcheck.test is a function that takes the tuples of lists produced by our generator, appends them two different ways, and compares the output. <:test_eq< ... >> is camlp4 syntax for an equality test.

The example above uses the s-expression conversions and camlp4 syntax extensions that are common in Jane Street's libraries, but these things are not necessary for using Quickcheck. Quickcheck.test just needs a generator built from the functions in Quickcheck.Generator and a function that raises an exception on failure, and it will return () if successful or raise an exception describing the nature of the failure if not.


Generators

The primary data structure used by Quickcheck is the generator, or 'a Quickcheck.Generator.t. This corresponds to an implementation of the Arbitrary type class in Haskell's Quickcheck. Primarily, a generator represents a random distribution of values of type 'a, although in our implementation there is a little more metadata besides that under the hood. The Quickcheck.Generator module provides default distributions of several types, and tools for creating more distributions or customizing the existing ones.

In our example above, we generated three lists of integers using the following expression.

  1. Quickcheck.Generator.(tuple3 (list int) (list int) (list int))

Looking at the implementation of Core.Std.List.append, we can see that the implementation works in chunks of 5 elements, and changes behavior after 1000 chunks. So we might want to change our generator to make sure we get lists of the lengths we want to test.

  1. let open Quickcheck.Generator in
  2. let list_int = list int ~length:(`Between_inclusive (4900,5100)) in
  3. tuple3 list_int list_int list_int

Some experimentation might show us that this still doesn't hit the list lengths we want, as often as we want. The [Quickcheck.Generator.int_between] function, however, is documented as stressing boundary conditions, so we should be able to use it to get values at the upper and lower ends of the range we want. Here, it helps us that generators form a monad. If we combine generators using monadic bind, we get a weighted composition of their probability distributions. We can use that to first generate lengths for our lists, then use those randomly-generated lengths to build generators for the lists themselves.

  1. let open Quickcheck.Generator in
  2. let list_int =
  3. int_between ~lower_bound:(Incl 4900) ~upper_bound:(Incl 5100)
  4. >>= fun len ->
  5. list int ~length:(`Exactly len)
  6. in
  7. tuple3 list_int list_int list_int

Now we have a generator for three lists of integers, each list with a length between 4900 and 5100 inclusive, weighted toward the ends of that range. This is probably sufficient for our purposes. But if we want to go further down, if we decide that we need a very specific probability distribution, we can build one from scratch ourselves. Here is a rather idiosyncratic example that demonstrates the tools available in Quickcheck.Generator.

  1. let open Quickcheck.Generator in
  2. let rec ranges_of_five_between lower upper =
  3. if upper - lower < 10
  4. then of_list (List.range lower upper ~start:`inclusive ~stop:`inclusive)
  5. else weighted_union
  6. [ 5., singleton (lower + 0)
  7. ; 4., singleton (lower + 1)
  8. ; 3., singleton (lower + 2)
  9. ; 2., singleton (lower + 3)
  10. ; 1., singleton (lower + 4)
  11. ; 1., of_fun (fun () -> ranges_of_five_between (lower + 5) (upper - 5))
  12. ; 1., singleton (upper - 4)
  13. ; 2., singleton (upper - 3)
  14. ; 3., singleton (upper - 2)
  15. ; 4., singleton (upper - 1)
  16. ; 5., singleton (upper - 0)
  17. ]
  18. in
  19. let list_int =
  20. ranges_of_five_between 4900 5100
  21. >>= fun len ->
  22. list int ~length:(`Exactly len)
  23. in
  24. tuple3 list_int list_int list_int

This example uses a few more functions from Quickcheck.Generator. The of_list function takes a list of values and produces a generator that makes a uniform choice among them. weighted_union creates a probability distribution representing a weighted choice among the probability distributions of the associated sub-generators. singleton produces constant-valued generators, and of_fun produces a lazily-evaluated (but not memoized) generator. (Memoizing during random testing causes some unfortunate space leaks, it is important to be able to release resources after a batch of tests.) While this peculiar generator is probably not of practical use, it shows that when we need to, we can dig down into the interface and build whatever probability distribution we want.

Of course, it is also useful to construct generators for new types.

  1. type bst = Leaf | Node of bst * int * bst
  2. let gen : bst Quickcheck.Generator.t =
  3. let open Quickcheck.Generator in
  4. recursive (fun self ->
  5. let node =
  6. self >>= fun l ->
  7. int >>= fun k ->
  8. self >>| fun r ->
  9. Node (l, k, r)
  10. in
  11. union [ singleton Leaf; node ])

The function Quickcheck.Generator.recursive is a fixed-point generator that helps build simply-recursive generators that need to invoke themselves and don't have additional arguments. The function union is like weighted_union, but for uniform choice.


Observers

In Haskell's Quickcheck, there is a duality between the type class Arbitrary for generating random values and Coarbitrary for observing inputs to random functions. Our version of Quickcheck mirrors Generator with Observer. Most tests using Quickcheck do not need an observer, but if you want to generate a random input for a higher-order function, you will need an observer for the function's input type.

  1. TEST_UNIT "function composition" =
  2. let open Quickcheck.Generator in
  3. Quickcheck.test
  4. (tuple3
  5. (fn Quickcheck.Observer.int char)
  6. (fn Quickcheck.Observer.string int)
  7. string)
  8. ~f:(fun (f, g, x) ->
  9. <:test_eq< char >>
  10. ((Fn.compose f g) x)
  11. (f (g x)))

Here, Quickcheck.Generator.fn creates a generator for functions. It takes two arguments: an observer for the function's input type and a generator for the function's output type.

Think of an observer as a "generator of decision trees". For instance, Quickcheck.Observer.int might randomly generate any of the following decision trees:

?

 x>5
 / \
?   ?

  x>4
  / \
 x>2 ?
 / \
?   ?

These decision trees control how a randomly-generated function will use its input. The generator for the function's output is used to fill each of the ?s in with a concrete value. The result is a family of functions operating on the appropriate types, making randomly-chosen observations on the input and producing randomly-chosen outputs based on those observations.

If you need to build an observer for a custom type, there are tools for that as well.

  1. type bst = Leaf | Node of bst * int * bst
  2. let obs : bst Quickcheck.Observer.t =
  3. let open Quickcheck.Observer in
  4. recursive (fun self ->
  5. unmap (variant2 unit (tuple3 self int self))
  6. ~f:(function
  7. | Leaf -> `A ()
  8. | Node (l, k, r) ->`B (l, k, r))
  9. ~f_sexp:(fun () -> Atom "variant2_of_bst"))

As with generators, there is a fixed point function Quickcheck.Observer.recursive that helps for simply-recursive types. The function unmap transforms an input of some new type into an input for which we already have an observer. Variant types can be transformed to polymorphic variants, which have default observers variant2 through variant6. Records and constructor arguments can be transformed to tuples, which have default observers tuple2 through tuple6.


Work in Progress

Our OCaml adaptation of Quickcheck is new and still evolving. We already have some changes to the library internally which will be released over time, such as moving default generators and observers out of the Quickcheck module and into the modules for each type. For example, Quickcheck.Generator.int becomes Int.gen.

There are still some pragmatic lessons to learn about how best to use our formulation of the library, how to calibrate our default distributions, and what other distributions we might want to provide. As always, we hope to get feedback from anyone who tries out this library so that we can improve it.


Happy testing!

rsync rounds timestamps to the nearest second

I'm not sure how I've managed to use rsync for so many years without ever noticing this, but hey, you learn something new every day!

[cperl@localhost ~]$ rpm -q --qf '%{name}-%{version}-%{release}\n' rsync
rsync-3.0.6-12.el6

[cperl@localhost ~]$ touch foo
[cperl@localhost ~]$ stat --format "%y" foo
2015-09-24 14:07:05.349357260 -0400
 
[cperl@localhost ~]$ rsync -a foo bar
[cperl@localhost ~]$ stat --format "%y" bar
2015-09-24 14:07:05.000000000 -0400
 
[cperl@localhost ~]$ cp -a foo baz
[cperl@localhost ~]$ stat --format "%y" baz
2015-09-24 14:07:05.349357260 -0400

No (functional) experience required

Jane Street is a serious functional programming shop. We use OCaml, a statically typed functional language for almost everything and have what is probably the largest OCaml codebase anywhere.

This leads lots of people to think that they shouldn't even bother applying, under the assumption that we are only interested in hiring deep functional programming gurus. I think people get to this conclusion in part because they think of functional languages, especially those with fancy type systems, as arcane tools that can only be used effectively after deep study.

To the contrary, one of the reasons we started building production systems with OCaml was that it was relatively easy to understand, even for people with no formal CS background. Since then, we've had good experiences taking students with no functional experience at all and getting them to the point of being able to complete a project in just a few weeks. We also have a very successful "OCaml Bootcamp" program, where over four weeks, we train all of the incoming traders and many other non-developer hires on OCaml and our development tools and libraries. By the end, most of them are able to create useful applications.

All of this is to say that we don't go out of our way to hire people who are already familiar with functional programming. In practice, it's just not that hard for strong programmers to pick it up after they start.

That said, a solid majority of the developers we hire do come in with functional programming experience --- but that's because of their preferences, not ours. Programmers with an interest in functional languages have an extra reason to want to work here, and so we get an unusually high number of good applicants from that pool.

There's a more general lesson here: using well-loved tools is a good way of attracting (and retaining) great developers.

Introducing Incremental

I'm pleased to announce the release of Incremental (well commented mli here), a powerful library for building self-adjusting computations, i.e., computations that can be updated efficiently when their inputs change.

At its simplest, you can think of a self-adjusting computation as a fancy spreadsheet. In a spreadsheet, each cell contains either simple data, or an equation that describes how the value in this cell should be derived from values in other cells. Collectively, this amounts to a graph-structured computation, and one of the critical optimizations in Excel is that when some of the cells change, Excel only recomputes the parts of the graph that depend on those changed cells.

What makes self-adjusting computation (or SAC) different from a spreadsheet is its dynamism. The structure of the computational graph in an SAC can change at runtime, in response to the changing input data.

This dynamism gives you quite a bit of flexibility, which can be used in different ways. Here are a few.

On-line combinatorial algorithms. Incremental is based on work by Umut Acar et. al.., on self adjusting computations (that's where the term comes from), and there, they were mostly interested in building efficient on-line versions of various combinatoral algorithms. In many cases, they could match the asymptotic complexity of custom on-line algorithms by fairly simple incrementalizations of all-at-once algorithms.

Incremental GUI construction. One simple and natural way to model a GUI application is to structure your display as a function that geneates a view from some more abstract data model.

Having a function that constructs a view from scratch at every iteration is simple, but prohibitively expensive. But if you can write this function so that it produces an incrementalized computation, then you have a solution that is both simple and efficient. We've used this technique in a number of our UIs, to good effect.

This might remind you of how functional reactive programming (FRP) is used for construction of GUIs in languages like Elm. SAC and FRP have different semantics --- FRP is mostly concerned with time-like computations, and SAC is mostly about optimizing DAG-structured computations --- but they are nonetheless closely related, especially at the implementation level. You can see my post here for a description of the broader conceptual landscape that includes both FRP and SAC.

Configurable computations. An example that comes from our own work is risk calculations. Calculating measures of risk of a portfolio involves combining data from a complex collection of interdependent models. Each of these models is dependent both on live data from the markets, and on configurations determined by users.

A config change could merely tweak a coefficient, or it could change the overall structure of the computation, say by changing the list of factors used by a given model. Incremental allows you to build a computation that can update efficiently in response to both simple data changes as well as more structural config changes, in one unified framework.

A taste of Incremental

It's hard to give a compelling example of Incremental in action in just a few lines of code, because what makes Incremental really useful is how it helps you build large and complex computations. Nonetheless, small examples can give you a sense of how the library works.

To that end, let's walk through a few small examples. To begin, we need to instantiate the Incremental functor.

  1. open Core.Std
  2. module Inc = Incremental_lib.Incremental.Make ()

Each instance thus generated is its own independent computational world. The Incremental functor is generative, meaning it mints fresh types each time it's applied, which prevents values from different incremental worlds from being mixed accidentally.

An Incremental computation always starts at its variables. Modifications to variables are how updates to input data are communicated to Incremental.

Let's write down a few variables corresponding to the dimensions of a rectangular prism.

  1. module Var = Inc.Var
  2.  
  3. (* dimensions of a rectangular prism *)
  4. let width_v = Var.create 3.
  5. let depth_v = Var.create 5.
  6. let height_v = Var.create 4.

We can use Var.watch to get the (trivial) incremental computions associated with each variable.

  1. let width = Var.watch width_v
  2. let depth = Var.watch depth_v
  3. let height = Var.watch height_v

The following is an incremental computation of the base area of the prism, and of the volume.

  1. let base_area =
  2. Inc.map2 width depth ~f:( *. )
  3. let volume =
  4. Inc.map2 base_area height ~f:( *. )

In order to get information out of an incremental computation, we need to explicitly mark which nodes we want data from by creating observer nodes. Because it knows which nodes are observed, the framework can track what parts of the computation are still necessary to the results.

  1. let base_area_obs = Inc.observe base_area
  2. let volume_obs = Inc.observe volume

In order to force the computation to run, we need to explicitly call Inc.stabilize. Here's some code that uses stabilize to run the computation and then gets the information from the observers.

  1. let () =
  2. let v = Inc.Observer.value_exn in
  3. let display s =
  4. printf "%20s: base area: %F; volume: %F\n"
  5. s (v base_area_obs) (v volume_obs)
  6. in
  7. Inc.stabilize ();
  8. display "1st stabilize";
  9. Var.set height_v 10.;
  10. display "after set height";
  11. Inc.stabilize ();
  12. display "2nd stabilize"

If we run this, we'll se the following output:

1st stabilize: base area: 25.; volume: 125.
    after set height: base area: 25.; volume: 125.
       2nd stabilize: base area: 25.; volume: 250.

Note that setting the height isn't enough to change the observed values; we need a stabilization to make that happen.

That's a fairly trivial computation, and there certainly isn't much to incrementalize. Let's try something a little more complicated: a function for merging together an array of incrementals, using some commutative and associative operator like addition or max.

  1. let rec merge ar ~f =
  2. if Array.length ar <= 1 then ar.(0)
  3. else
  4. let len = Array.length ar in
  5. let len' = len / 2 + len % 2 in
  6. let ar' =
  7. Array.init len' ~f:(fun i ->
  8. if i * 2 + 1 >= len then ar.(i*2)
  9. else Inc.map2 ar.(i*2) ar.(i*2+1) ~f)
  10. in
  11. merge ar' ~f;;

Because this is done using a binary tree as the dependency graph, the complexity of updating an element is log(n), where n is the size of the array. We can use this for, computing an average:

  1. let average ar =
  2. let sum = merge ar ~f:(+.) in
  3. Inc.map sum ~f:(fun s -> s /. float (Array.length ar))

This works, but we can do better performance-wise, at least, if our merge operation has an inverse. In that case, maintaining the sum can in principle be done on constant time, by first, removing the old value before adding in the new. Incremental has a function for taking advantage of this structure.

  1. let sum ar =
  2. Inc.unordered_array_fold ~f:(+.) ~f_inverse:(-.) ar;;

Now, let's say we want to do something a little more dynamic: in particular, what if we want to compute the average of a prefix of the given array? For that, we need to use the bind function, which allows us to produce new incremental nodes within an incremental computation.

  1. let average_of_prefix ar length =
  2. Inc.bind length (fun length ->
  3. average (Array.init length ~f:(fun i -> ar.(i))))

The type of this function is float Inc.t array -> int Inc.t -> float Inc.t, so the length of the prefix is a fully fledged part of the incremental computation. As a result, the dependency structure of this computation changes dynamically, e.g., if the value of length is 7, then the computation only depends on length and the first 7 elements of the array.

Hopefully this gives you enough of a sense of what Incremental is about to start thinking about where it might be useful for you. Note that the overhead of incremental is not inconsiderable --- on my laptop, firing a single node takes on the order of 30ns, which is far more than, say, summing numbers together. Incremental tends to be useful when the computation that is put into a single node is large relative to that overhead, or when the computational graph is large relative to the sub-graph that needs to be recomputed. Our experience has been that there are plenty of applications in this space that can benefit from Incremental.

Converting a code base from camlp4 to ppx

As with many projects in the OCaml world, at Jane Street we have been working on migrating from camlp4 to ppx. After having developed equivalent ppx rewriters for our camlp4 syntax extensions, the last step is to actually translate the code source of all our libraries and applications from the camlp4 syntax to the standard OCaml syntax with extension points and attributes.

For instance to translate code using pa_ounit and pa_test, we have to rewrite:

TEST = <:test_result< int >> ~expect:42 (f x)

to:

let%test _ = [%test_result: int] ~expect:42 (f x)

For small to medium projects it is enough to just take a couple hours to translate the source code by hand. But at Jane Street where we have a huge OCaml code base making extensive use of camlp4, it is simply not realistic. So we needed a tool to do that for us.

Writing a tool to automatically convert the syntax

Since the output of such as tool has to be accepted as the new code source that is committed in our repository, it must preserve the layout of the original file as much as possible and of course keep the comments. This mean that any approach using an AST pretty-printer would be extremely complex.

The path we choosed is to textually substitute the foreign syntaxes in the original file for the new ones. One could imagine doing that with a tool such as sed, awk, perl, ... however doing it properly would be fastidious and it would be pretty-much impossible to be 100% sure it would never translate things it is not supposed to. Plus writing perl is not as fun as writing OCaml programs.

Instead there is an easy way to find the foreign syntaxes: using camlp4 itself. To subsitute the text of foreign syntaxes the only thing we need to know is their location in the original file, and camlp4 can help us with that.

Writing dummy camlp4 syntax extensions

The idea is to write for each camlp4 syntax extension a dummy one that define the same grammar productions as the real one, but instead of generating code it simply record substitutions at certain locations.

Then we do the following:

  • parse a file with camlp4 and our dummy syntax extensions
  • apply all the recorded substitutions to the original file

This approach has the advantage of interpreting the original file in the exact same way as our regular syntax extensions. Giving us good confidence we did not change the syntactic constructions by mistake.

To do so we define this API:

(** [replace loc repl] records a text substitution that replaces the
    portion of text pointed by [loc] by [repl]. *)
val replace : Loc.t -> string -> unit

Then writing a dummy camlp4 syntax extension is pretty easy. For instance for a subset of pa_ounit:

EXTEND Gram
  GLOBAL: str_item;

  test: [[ "TEST" -> replace _loc "let%test" ]];

  name_equal:
    [[ `STRING _; "=" -> ()
     |            "=" -> replace _loc "_ ="
     ]];

  str_item:
    [[ test; name_equal; expr -> <:str_item< >>
    ]];
END

On the fly conversion and diffing the generated code

Since this tool was convenient to use, we used it to check that our newly written ppx rewriters did the same thing as the old camlp4 syntax extensions:

  • for a given OCaml source file of a library or application, we converted it using camlp4-to-ppx and saved the result
  • we processed the original file using camlp4 and the translated one using our ppx rewriters
  • in both case we saved the output of -dparsetree (human-readable version of the internal OCaml AST) and -dsource (pretty-printed code)
  • we diffed the camlp4 and ppx outputs of -dparsetree, as well as the outputs of -dsource

This was all quite easy to do with jenga. We kept looking at the generated diffs until they were all empty.

We have quite a lot of code in our camlp4 syntax extensions and converting them to ppx was a long mechanical job and so quite error-prone. Given that this diffing turned out to be really helpful to find errors.

The Camlp4 Syntax is not quite the OCaml syntax

While using this we noticed that quite a few syntaxes accepted by camlp4 are not accepted by OCaml, for instance:

let _ x = x

let f l = List.map l ~f:fun x -> x + 1

These where quite easy to fix automatically as well using camlp4-to-ppx.

Github repo and extension

We published a slightly modified version of this tool on github.

The method we used doesn't work out of the box with all syntax extensions. For instance to convert code using lwt.syntax some more work needs to be done on camlp4-to-ppx. But it is a good starting point.

4