This is the last in my series of posts about new features in OCaml 4.02. So far, I've discussed how OCaml is getting more like Lisp because of extension points, how module aliases will massively speed up compilation of Core and similar libraries, and how you can simplify your error handling by catching exceptions with match statements.

Below, I've summarized the other features that strike me as worth mentioning, but don't seem worth their own blog post.

Goodbye Camlp4

This is in some sense related to the extension-points change, but camlp4 has been evicted from the compiler, and is now its own independent project, with Jeremie Dimino as the primary maintainer.

I think this is good news for OCaml and for camlp4. Updating camlp4 to match new functionality in the compiler is hard, and in the past, new compiler releaes often came out with camlp4 broken in some subtle way. (In 4.01.0, for example, the new open! syntax was broken in camlp4).

Marrying camlp4 and OCaml together slows OCaml down, and also means that when OCaml gets released, we often get stuck with a broken camlp4. Now, because they have separate release cycles, it will be possible to fix camlp4 bugs when they arise, meaning we won't get stuck with incompatible bugs for long.

A key enabler of this disentanglement is opam. Having a decent package manager makes it simpler and easier to deal with a more disaggregated world. Hopefully, this will lead to a nimbler compiler development process.

Open Types

Ordinary OCaml variants are closed, which is to say that once you define a variant, you can't extend it with new cases. But OCaml does have another almost-variant that's open, which is to say you can add new variants to it after it has been defined: the exn type.

Open types are useful for more than just exceptions, though. They tend to come up in certain kinds of modular designs where you want a single type to act as a kind of meeting point between values that come from different places.

In the past, when we've needed open types, we've basically abused the exception type to get this functionality. In 4.02, you can simply declare new open variant types. Interestingly, these new open types are a bit more powerful than the old exception model, in that a new open type can have type parameters, and the constructors can be GADTs.

Better format with GADTs

OCaml's printf is both great and terrible. They're great, because they give you a type-safe way of dealing with format strings.

# printf "a string: %s, an int: %i\n" "three" 3;;
a string: three, an int: 3
- : unit = ()
# printf "a string: %s, an int: %i\n" "three" 3.5;;
Characters 44-47:
  printf "a string: %s, an int: %i\n" "three" 3.5;;
                                              ^^^
Error: This expression has type float but an expression was expected of type
         int

This type-safety comes at a bit of complexity, though. First, OCaml has to parse the format string at compile time and convert it to an object that understands the types of the values it needs to consume. That's not so bad, but unfortunately, before 4.02, this was done with a special-purpose type that didn't fit neatly into the type system. Perhaps because of this, there have been many bugs over the years associated with format types.

In addition, printing with format types was horribly slow. OCaml 4.02 solves both of these problems with a rewrite of the format types on top of GADTs.

Immutable strings

This one was a surprise. One unfortunate bit of historical cruft in the language is that the default string type in OCaml is mutable. Nobody really likes this, but it seemed too painful to change, since changing it would obviously break lots of old code.

What the Caml team did instead was to make it possible to make strings immutable. In particular, there is a new module Bytes which is intended for dealing with mutable byte buffers, whose underlying type Bytes.t is the same as String.t. And there's now a flag which when you turn it on, breaks the type equality between Bytes.t and String.t, and also disables the mutation operators in String. This gives us a migration path towards making strings immutable. It will take a while for it to push through, but I do expect lots of people to make the flip, including us at Jane Street.

Generative functors

For you SML fans, OCaml now has generative in addition to applicative functors. Applicative functors have the property that when run repeatedly on the same input module, they generate the same types in the output. This is sometimes useful, but it's sometimes not at all what you want. For example, consider this case.

  1. module Unique_id (Unit : sig end) : sig
  2. type t
  3. val allocate : unit -> t
  4. end = struct
  5. type t = int
  6. let id = ref 0
  7. let allocate () = incr id; !id
  8. end

This is supposed to generate a new unique-id module with a distinct type every time it's called. But if you call it on the same module, you'll get the same type, which is totally wrong, as you can see:

# module Empty = struct end;;
module Empty : sig  end
# module Id1 = Unique_id (Empty);;
module Id1 : sig type t = Unique_id(Empty).t val allocate : unit -> t end
# module Id2 = Unique_id (Empty);;
module Id2 : sig type t = Unique_id(Empty).t val allocate : unit -> t end
# Id1.allocate () = Id2.allocate ();;
- : bool = true

This is clearly not what we want. If we used different (but identical) modules as inputs, however, we would have had no problem.

# module Id1 = Unique_id(struct end);;
module Id1 : sig type t val allocate : unit -> t end
# module Id2 = Unique_id(struct end);;
module Id2 : sig type t val allocate : unit -> t end
# Id1.allocate () = Id2.allocate ();;
Characters 18-33:
  Id1.allocate () = Id2.allocate ();;
                    ^^^^^^^^^^^^^^^
Error: This expression has type Id2.t but an expression was expected of type
         Id1.t

Generative functors work like the second case every time, which for this kind of functor makes more sense. We can mark a functor as generative by having a dummy argument of the form (). So, we can redo our example as follows:

  1. module Unique_id () : sig
  2. type t
  3. val allocate : unit -> t
  4. end = struct
  5. type t = int
  6. let id = ref 0
  7. let allocate () = incr id; !id
  8. end

And now, there's every invocation of this functor produces a fresh type.

# module Id1 = Unique_id ();;
module Id1 : sig type t val allocate : unit -> t end
# module Id2 = Unique_id ();;
module Id2 : sig type t val allocate : unit -> t end
# Id1.allocate () = Id2.allocate ();;
Characters 18-33:
  Id1.allocate () = Id2.allocate ();;
                    ^^^^^^^^^^^^^^^
Error: This expression has type Id2.t but an expression was expected of type
         Id1.t

The other benefit of generative functors is that they lift the annoying restriction on unpacking first class modules within applicative functors.

Optimizations

There are a few good optimizations that landed. One of them derived from work done by Phil Denys, who was an intern at Jane Street when he implemented some division-by-a-constant optimizations. Another came from our own Vlad Brankov, who eliminated some unnecessary float boxing associated with let bindings. And there's a number of other ones, improving the compilation of optional arguments, accessing values in nested modules, and more. We'll see the results of these more clearly when we get to building our whole tree with the new compiler and running our benchmarks.

Summing up

That's not quite everything, but it's close. Notably, there's the usual collection of small bugfixes and tweaks which didn't seem worth mentioning individually. But really this covers most of the interesting changes

All told, it's a pretty serious release. I think it's a sign of how much energy is being poured into the language. Indeed, the speed of change is high enough that it raises other concerns: is OCaml moving too fast? Is it accreting features at such a rate that the language is going to get too complicated?

I think the answer is no. The changes that have been coming seem to me to be overwhelmingly thoughtful and conservative. Indeed, some of the changes, like extension points, or the new GADT-based format strings, are all in simplifications.

There's still some time until this all gets released. There are bugs that are being actively tracked down, and there's a lot of work to be done to test this release. But from what I understand, we should see a final release some time this summer.