OCaml with Jane Street extensions is available from our public opam repo. Only a slice of the features described in this series are currently implemented.

In part one, we discussed how OCaml’s locality mode enables safe stack allocation. In part two, we explored how the uniqueness and linearity modes represent ownership. In this (final) post, we leverage modes to define a statically data-race-free API in multicore OCaml.


Data Races

With the recent release of OCaml 5, shared-memory parallelism has become a first-class language feature. OCaml programs now span multiple domains, each of which corresponds to an operating-system thread. Higher-level code can schedule work onto domains using async-await semantics.

Let’s consider extending Async to support parallel tasks. Using a simplified interface, we’ll spawn two invocations of a function before waiting on their results:

val parallel : (unit -> 'a) -> 'a Deferred.t
val await : 'a Deferred.t -> 'a

let count () =
    let i = ref 0 in
    let incr () = for _ = 1 to 100_000 do i := !i + 1 done in
    let p1, p2 = parallel incr, parallel incr in
    let (), () = await p1, await p2 in
    print_int !i
;;

Naively, combining parallelism with mutability leads to data races. When one domain writes to a memory location while others access it, readers will observe unspecified results. In languages like C++, we’re forced to conclude that data races are undefined behavior. On the other hand, OCaml’s memory model makes races less dangerous—despite producing unpredictable results, they never invalidate type safety, so cannot crash the program.

So, what happens if we run the above code?

> 118533

Data races still cause bugs! Correctly authoring a parallel-safe mutable data structure requires careful usage of atomics, mutexes, or other traditional synchronization primitives. Getting it wrong can cause nondeterministic failures that are difficult to debug.

Rust is comparatively stricter: data races are unrepresentable, as all mutations must go through exclusively owned references. However, Rust’s underlying ownership system results in complicated code, especially in the presence of concurrency.

Thanks to OCaml’s garbage collector, most code does not need to worry about ownership. This allows users to avoid the complexity of tracking ownership in the type system. In this post, we seek to make data races unrepresentable without losing this ability. First, we’ll use modes to prohibit sharing mutable data between domains, only allowing mutation via ownership-based exclusive references. Second, we’ll use capsules to safely encapsulate shared mutability without tracking fine-grained ownership constraints.

The Sync Mode

The sync mode describes values that can be freely shared between domains. Values are sharable if they do not allow concurrent writes, so:

  • Immutable values are always sync, as no writes ever occur.
  • Exclusively mutable values are also sync, since all writes must go through exclusive references.
  • Other mutable values are unsync. We have no control over when they may be written to, so they can’t be safely shared between domains.
type x = { x : int }
type y = { exclusively mutable y : int }
type z = { mutable z : int }

let sync x = { x = 0 }
let sync y = { y = 0 }
let sync z = { z = 0 }
7 | let sync z = { z = 0 }
                 ^^^^^^^^^
Error: found an unsync value where a sync value was expected.

Like uniqueness, a sync parameter represents a promise by the caller.

  • A sync parameter requires the caller to provide sharable values. The callee is allowed to share the value with other domains.

  • An unsync parameter does not encode a promise from the caller; the callee must not share the value with another domain.

A sync variable may be used as if it were unsync, so sync is the sub-mode. Hence, unsync values may reference sync values, but not vice versa. For example, a closure capturing an unsync value must also be unsync.

Usage

Armed with our new mode axis, we can make parallel require a sync closure. We’ll want to schedule tasks that capture unique variables, so the closure will also be at mode once. Lastly, since the function’s result will be passed back to our domain, it too must be sync.

val parallel : (unit -> 'a @ sync) @ sync once -> 'a Deferred.t

This signature involves several modes, so let’s translate it to a new version of the “@” syntax. For better readability, we’ll write one annotation that may expand function arrows.

val parallel : (unit -> 'a) -> 'a Deferred.t
             @ sync once (. -> sync) -> .

Compiling our example program now results in an error:

let count () =
    let i = ref 0 in
    let incr () = for _ = 1 to 100_000 do i := !i + 1 done in
    let () = await (parallel incr) in
    print_int !i
;;
6 | let () = await (parallel incr) in
                             ^^^^
Error: found an unsync value where a sync value was expected.

The closure incr captures a mutable reference, so it can’t be passed to parallel.

The sync mode is sufficient to prevent data races. Like Rust, shared values may only be mutated via exclusive references. At runtime, we’ll find that our program partitions its mutable values per domain:

Diagram of mutable values partitioned between domains.

In this diagram, red circles are mutable; blue circles are exclusively-mutable or immutable. Each domain contains a collection of unsync values, each of which points toward some red circle. All domains may reference sync values, which only see blue circles. Critically, there are no pointers into a domain: we cannot traverse from one domain’s unsync values to another’s.

Exclusive Mutability

In our new model, we can only share exclusively mutable data structures. Naively, we could convert our example program to use an exclusively mutable counter—but we’ll find that exclusivity requires tracking the same precise ownership constraints we’d find in Rust. For example:

type counter = { exclusively mutable i : int }

let count () =
    let c = { i = 0 } in
    let incr () = for _ = 1 to 100_000 do &c.i <- &c.i + 1 done in
    let () = await (parallel incr) in
    print_int c.i
;;
6 | let () = await (parallel incr) in
                             ^^^^
Error: this value escapes its region.

Here, incr borrows the counter, making it a local closure—and parallel tasks must be global. We can instead explicitly mark incr as global, causing the counter to be moved into the closure.

let count () =
    let c = { i = 0 } in
    let global incr () = for _ = 1 to 100_000 do &c.i <- &c.i + 1 done in
    let () = await (parallel incr) in
    print_int c.i
;;
5 | print_int c.i
              ^
Error: c is used uniquely so cannot be used twice.

But now we can’t reference the counter outside incr—its ownership has been transferred. To fix this, we can return the counter from the parallel task.

let count () =
    let c = { i = 0 } in
    let global incr () = for _ = 1 to 100_000 do &c.i <- &c.i + 1 done; c in
    let c = await (parallel incr) in
    print_int c.i
;;
val count : unit -> unit

> 100000

Since parallel tasks cannot borrow, we’re forced to explicitly move the counter between domains—it’s never actually shared. You might have seen this coming: if multiple domains reference an exclusively-mutable value, it can’t be mutated.

Diagram of an exclusively mutable value shared between domains.

Technically, we got what we asked for: we can write parallel code without fear of data races. However, if we can never share exclusive references, exclusive mutability isn’t useful in practice. Therefore, we will introduce mutexes.

Mutexes

Traditionally, mutable state can be safely shared between domains via careful use of locks. The simplest kind of lock is a mutex, which assures that only one domain can access a value at a time.

In OCaml, a mutex interface might look like the following:

module Mutex : sig
    type 'a t

    val create : 'a -> 'a t
               @ unique sync -> .

    val with_lock : 'a t -> ('a -> 'b) -> 'b
                  @ . -> local (local exclusive -> .) -> .
end

Calling with_lock locks the mutex, borrows the wrapped value, runs the callback, and finally unlocks the mutex. If the mutex was already locked, we know some other domain is accessing the value, so we first wait for it to release the mutex. Therefore, mutexes can be safely shared across domains.

We can illustrate a mutex as a green circle. Its single outgoing pointer may only be traversed by one domain at a time, so it provides an exclusive reference.

Diagram of mutual exclusion between two domains.

Importantly, mutexes are created from unique sync values. Stipulating sync assures that no unsync value—which may be visible to another domain—is ever made accessible via the mutex. Uniqueness additionally guarantees that the mutex holds the only pointer to its contents.

type counter = { exclusively mutable i : int }

let count () =
    let mutex = Mutex.create { i = 0 } in
    let worker () =
        for _ = 1 to 100_000 do
            Mutex.with_lock mutex (fun counter -> counter.i <- counter.i + 1)
        done
    in
    let p1, p2 = parallel worker, parallel worker in
    let (), () = await p1, await p2 in
    let result = Mutex.with_lock mutex (fun counter -> counter.i) in
    print_int result
;;
> 200000

The addition of locks gives us capabilities roughly equivalent to safe Rust: we can trade off exclusive borrows without explicitly moving the underlying value. However, this API is not very flexible. First, we must create a mutex for each shared value. Second, we’re still restricted to sync data—we can’t just wrap an existing mutable data structure in a mutex.

Shared Mutability

Restricting ourselves to exclusive mutability means we can’t use shared-ownership data structures. To see why, let’s attempt to implement a mutable circularly linked list.

type 'a node = { data : 'a
               ; exclusively mutable next : 'a node }

let create data =
    let rec node = { data; next = node } in
    node
;;
val create : 'a -> 'a node

Since create does not return a unique list, we’ll never be able to mutate the result—even within a domain, we can never exclusively reference a cyclic value.

Diagram of an exclusively mutable cycle.

This problem is endemic in Rust, where shared mutation requires unsafe code. OCaml’s garbage collector enables a safe solution—mutable fields—but the resulting data structure isn’t sync. If we try to use a mutable value in parallel, we’ll get an error:

type 'a node = { data : 'a
               ; mutable next : 'a node }

let count () =
    let rec head = { data = 0; next = head } in
    let append () = head.next <- { data = 1; next = head } in
    let () = await (parallel append) in
    print_int head.next.data
;;
7 | let () = await (parallel append) in
                             ^^^^^^
Error: found an unsync value where a sync value was expected.

Rust works around this issue by allowing shared-ownership data structures to present an exclusively-mutable API. This abstraction allows the type to be protected by a mutex: its underlying shared mutability is encapsulated in its unsafe implementation.

Diagram of an unsafe Rust shared ownership data structure.

In OCaml, we need a way to encapsulate unrestricted mutability without using unsafe language features. For this purpose, we introduce capsules.

Capsules

In our initial design, each domain was allowed to reference a collection of unsync data. For example, a mutable circularly linked list:

Diagram of a mutable cycle.

There are no pointers into our domain, so no other domains can access our list.

We can relax this restriction by creating collections of unsync data that aren’t associated with a particular domain. We call these collections capsules. Capsules allow incoming green pointers, which may only be traversed by one domain at a time.

Diagram of two capsules.

To enforce exclusivity, we’ll need a way to indicate which domain is currently accessing which capsule. This is the purpose of keys, which have a one-to-one correspondence with capsules.

module Capsule : sig

  module Key : sig
    type 'k t

    type packed = Key : 'k t -> packed
  end

  val create : unit -> Key.packed
             @ . -> unique
  (* ... *)
end

When we create a capsule, we get back a Key.packed, which hides the underlying type 'k. We know that this 'k exists, but that’s all we know—so we call 'k an existential type.

Concretely, creating a capsule mints a brand-new 'k that is associated with only the returned key. This mechanism uniquely identifies keys based on their creation site, allowing us to distinguish between them. For example:

val require_same_key : 'k Capsule.Key.t -> 'k Capsule.Key.t -> unit

let distinct_keys () =
    let Key key0 = Capsule.create () in
    let Key key1 = Capsule.create () in
    require_same_key key0 key1
;;
6 | require_same_key key0 key1
                          ^^^^
Error: this expression has type $Key_'k1 Capsule.Key.t
       but an expression was expected of type $Key_'k0 Capsule.Key.t

We can never pass two different keys to require_same_key, because distinct keys necessarily have distinct types.

Pointers

When a domain has exclusive access to a key, it may traverse any green pointer into the associated capsule. We can model this behavior with the following signature, where a capsule pointer of type ('a, 'k) t references a value of type 'a and requires the 'k Key.t to traverse.

module Capsule : sig

  (* ... *)

  module Ptr : sig

    type ('a, 'k) t

    val create : ('a -> 'b) -> 'a -> ('b, 'k) t
               @ local once sync (sync -> .) -> sync -> .

    val map : 'k Key.t -> ('a -> 'b) -> ('a, 'k) t -> ('b, 'k) t
            @ local exclusive -> local once sync -> . -> .

    val extract : 'k Key.t -> ('a -> 'b) -> ('a, 'k) t -> 'b
                @ local exclusive -> local once sync (. -> sync) -> . -> sync
  end

  val destroy : 'k Key.t -> ('a, 'k) Ptr.t -> 'a
              @ unique -> . -> .
end

The first three functions each require a local once sync callback. The annotation indicates that the callback will not be stored, will only be called once, and cannot close over unsync values.

That’s a lot to take in—let’s unpack each new operation.

  • create initializes a capsule containing the result of a function. Because we’re making the function’s input available to another capsule, it must be sync. Its output, however, may be unsync, allowing us to create mutable state.

     (* Create a mutable reference inside a capsule. *)
     let pointer = Capsule.Ptr.create (fun i -> ref i) 0
    
    Diagram of creating a capsule from a sync value.

    Creating a pointer does not require a key, since it does not provide access to any preexisting data.

  • map executes a function inside an existing capsule, arbitrarily restructuring the data therein. It requires an exclusive reference to the key, implying that only one domain can run map at a time.

    For example, we can wrap our reference in another reference, creating a mutable chain:

    let Key key = Capsule.create ()
    let pointer = Capsule.Ptr.create (fun i -> ref i) 0
    (* Wrap the reference in another reference. *)
    let pointer = Capsule.Ptr.map &key (fun r -> ref r) pointer
    
    Diagram of creating mutable state inside a capsule.

    Critically, map also allows us to create additional pointers into the same capsule. In this example, map returns a pointer to our newly created state, which we may access using the same key.

  • extract also executes a function inside a capsule, but directly returns its result to the caller. The returned value must be sync, as it is now visible to the calling domain.

    let Key key = Capsule.create ()
    let pointer = Capsule.Ptr.create (fun i -> ref i) 0
    (* Return an immutable value read from the reference. *)
    let result = Capsule.Ptr.extract &key (fun i -> !i) pointer
    
    Diagram of extracting a sync value from a capsule.

    Conveniently, we can match the type of each function with the mode of its callback. Creation inputs a sync value, mapping a value to a pointer; extraction outputs a sync value, mapping a pointer to a value. Similarly, map neither inputs nor outputs a sync value, so maps pointers to pointers.

  • destroy merges a pointer into the caller’s capsule. We must provide the associated key uniquely, guaranteeing that no other domain can traverse this pointer again.

    let Key key = Capsule.create ()
    let pointer = Capsule.Ptr.create (fun i -> ref i) 0
    (* Merge the reference into the enclosing capsule. *)
    let i = Capsule.destroy key pointer
    
    Diagram of destroying a capsule pointer.

    In this diagram, destroy recolors a green pointer to black by consuming its associated key.

Capsules provide two important capabilities that mutexes do not.

  • Aggregating ownership: a mutex always protects a unique value, that is, a single pointer. Capsules associate a semantic key with arbitrarily many pointers.

  • Protecting shared mutable state: mutexes only contain sync values. Capsules enable us to allocate shared mutable data structures without tracking their ownership at the type level.

Putting it All Together

We’ve built a way to trade off exclusive references (mutexes), and a way to associate a key with a dynamic data structure (capsules). Combining these approaches—storing keys in mutexes—lets us create a statically data-race-free version of any existing data structure!

For example, a hash table:

module Locked_hashtbl = struct
    type t = Table :
        { table : ((int, string) Hashtbl.t, 'k) Capsule.Ptr.t
          lock  : 'k Capsule.Key.t Mutex.t } -> t

    let create () =
        let Key key = Capsule.create () in
        let table = Capsule.Ptr.create Hashtbl.create () in
        let lock = Mutex.create key in
        Table { table; lock }
    ;;

    let add_exn ( Table { table; lock } ) k v =
        Mutex.with_lock lock (fun key ->
            Capsule.Ptr.extract key (fun table ->
                Hashtbl.add_exn table k v) table)
    ;;

    let length ( Table { table; lock } ) =
        Mutex.with_lock lock (fun key ->
            Capsule.Ptr.extract key Hashtbl.length table)
    ;;
end

The implementation is almost as concise as wrapping a table with a mutex in C++ or Java, but far less error-prone. Mixing up locking is a compilation error—once it compiles, we can be sure that our code is free of data races.

let count () =
    let table = Locked_hashtbl.create () in
    let worker start () =
        for i = start to start + 99_999 do
            Locked_hashtbl.add_exn table i (Int.to_string i)
        done
    in
    let p1, p2 = parallel (worker 0), parallel (worker 100_000) in
    let (), () = await p1, await p2 in
    let result = Locked_hashtbl.length table in
    print_int result
;;
> 200000

Performance-minded readers might be concerned about the runtime overhead of this approach—but with two clever observations, we may compile it to a simple low-level lock.

  • Keys do not contain information; we either have one or we don’t. That means they take zero bits to store, and don’t have to exist at runtime at all.

  • The mutex is essentially a key-option, hence only takes on two values—locked and unlocked. At runtime, the mutex is a bool; acquire/release are atomic test-and-set instructions.

More practically, we could implement locking using futexes. Fortunately, separating the concepts of keys and locks lets users swap between lock implementations without rewriting their data structures.

Afterword

Given the initial success of OCaml 5, we’re optimistic about the future of writing high-performance parallel programs in OCaml. We believe data-race-free OCaml will enable users to write correct and efficient code, without complicating the single-core experience.

At Jane Street, we’ve been using locality in production for over a year, and have recently merged compiler support for the unique and once modes. Once they’ve been extensively tested in our internal environment, we hope to contribute these features to upstream OCaml. Instructions for installing OCaml with Jane Street extensions using opam are available on GitHub.

This post concludes Oxidizing OCaml, but it’s not the end for modes. We’re planning to extend the design presented here with sync’s dual mode axis, allowing for more fine-grained restrictions on data access. Similarly, we’re exploring modal representations for algebraic effects, off-heap data, and compile-time computations—as well as mode-polymorphic functions.