We just had an interesting discussion around the office about staging and curried functions. I was reading over a new string-escaping routine, which had the following signature:

val escape : escapeworthy:Char.t list -> escape_char:Char.t -> string -> string

And the code had a big comment explaining that the performance of this was much better if you first partially applied the function, and then fed in the strings to be escaped one by one. In other words, this:

List.map l ~f:(fun s -> String.escape ~escapeworthy:[a;z] ~escape_char:_ s)

is 2-10x slower than this:

let my_escape = String.escape ~escapeworth:[a;z] ~escape_char:_ in List.map l ~f:my_escape

That’s because when my_escape is computed, a bunch of work is done just once that can be shared over multiple calls to my_escape, whereas a fully-applied call to String.escape will redo that computation each time. But there’s no way of really seeing this by just looking at the type signature, and it’s not obvious from the call-point either. In this case, the issues is just one of performance, but the difference between true staging and mere currying can be a semantic one as well. Consider this function for creating a unique-id allocator.

let make_id_allocator () =
  let ctr = ref 0 in
  (fun () -> incr ctr; !ctr)

The signature of this function looks a little odd at first glance:

val make_id_allocator : unit -> unit -> int

This is truly a staged function, and there is an important semantic difference between partial and full applications. In particular, if we always use it as a full application, like this:

let x = make_id_allocator () ()
let y = make_id_allocator () ()

the result will always be 1. Partial application, however, can give us different answers:

let alloc = make_id_allocator ()
let x = alloc ()
let y = alloc ()

Here, x is1 and y is 2.

The basic problem here is that the type signatures don’t tell you when there is true staging in a function, verses when the function is simply curried. In some ways, I prefer the state of affairs in SML, where the default is that function arguments are tuples, and curried functions are a special case.

But we can recover some of that explicitness with a minimum of fuss, and that’s just what we’re about to do in Core. We’ve done this by adding a new type, called Staged.t:

open Core.Std

module Staged : sig
  type 'a t
  val stage : 'a -> 'a t
  val unstage : 'a t -> 'a
end = struct
  type 'a t = 'a
  let stage = Fn.id
  let unstage = Fn.id
end

and we put stage and unstage at the top-level. Now, we can write our ID allocator this way:

let make_id_allocator () =
  let ctr = ref 0 in
  stage (fun () -> incr ctr; !ctr)

which has this signature:

val make_id_allocator : unit -> (unit -> unit) Staged.t

And we can use it like this:

let alloc = unstage (make_id_allocator ())
  let x = alloc ()
  let y = alloc ()

But what we can’t do is just use the function without noticing it’s staged. If we want to rerun the initial stage every time, we have to do this:

let x = unstage (make_id_allocator ()) ()
let y = unstage (make_id_allocator ()) ()

which makes it quite clear what’s going on. Similarly, with the original string escaping function, we can change the definition to this:

val escape : escapeworthy:Char.t list -> escape_char:Char.t -> (string -> string) Staged.t

and now, it’s much clearer that this code:

List.map l ~f:(fun s -> unstage (String.escape ~escapeworthy:[a;z] ~escape_char:_) s)

is doing repeated work, and it pushes you in the direction of writing this instead:

List.map l ~f:(unstage (String.escape ~escapeworthy:[a;z] ~escape_char:_ s))

One thing that I find satisfying about this example is that it shows how in a language like OCaml, you can often overcome what feel like deficiencies in the language itself simply by writing better libraries.