(OCaml 4.02 is entering a feature freeze, which makes it a good time unfto stop and take a look at what to expect for this release. This is part of a series of posts where I'll describe the features that strike me as notable. This is part 2.)

OCaml has a bit of a namespace problem.

In particular OCaml has no good way of organizing modules into packages. One sign of the problem is that you can't build an executable that has two modules with the same module name. This is a pretty awkward restriction, and it gets unworkable pretty fast as your codebase gets bigger

Other than just prefixing all of your module names with a package name (e.g., Core_kernel_list, Core_kernel_int, Core_kernel_array, etc. It gets old fast.), the only solution right now is something called packed modules. OCaml can pack a collection of individual module into a single synthetic "packed" module. Importantly, different packs included in the same executable are allowed to contain modules of the same name.

In practice, a packed moule is a lot like what you'd get it you named all of your modules distinctly, and then used a single module to packs together all your other modules, giving them shorter and more usable names in the process. Thus, for Core_kernel, we could name all our modules uniquely, and then provide a single renaming module to allow people to use those modules conveniently, like this:

  1. module List = Core_kernel_list
  2. module Array = Core_kernel_array
  3. module Int = Core_kernel_int
  4. ...

And then user code could use these short names by opening the module:

  1. open Core_kernel
  3. let drop_zeros l = List.filter l ~f:(fun x -> x <> 0)

In the above, List refers to Core_kernel's list, not the List module that ships with the compiler. The longer names would only show up within the Core_kernel package.

Packed modules basically automate this process for you, with the one improvement that you get to use the short names within the package your building as well as outside of it.

We use packed modules extensively at Jane Street, and they've been a real help in organizing our large and complex codebase. But packs turn out to be highly problematic. In particular, they lead to three distinct problems.

  • slow compilation of individual files
  • large executable sizes
  • coarse dependency tracking, leading to slow incremental rebuilds.

The slow compilation of individual files comes from the cost of interacting with a large module like Core_kernel. Core_kernel is large because it effectively contains a full copy of every module in the Core_kernel package. That's because a line like this:

  1. module List = Core_kernel_list

doesn't simply make Core_kernel.List an alias to Core_kernel_list; it makes a full copy of the module. Indeed, the above line is equivalent to the following.

  1. module List = struct include Core_kernel_list end

Packed modules also increase your executable size, since OCaml includes code at the compilation unit granularity. Because packed modules are compilation units, referring to even a single module of Core_kernel requires you to link all of Core_kernel into your executable.

The coarse dependency problem has to do with the fact that a packed module depends on all the modules that are included in it, and so once you depend on anything in the pack, you depend on everything there. For us, that means that changing a single line of the most obscure module in Core_kernel will cause us to have to rebuild essentially our entire tree.

Module aliases, along with a few related improvements to the compiler, let us work around all of these problems. In particular, in 4.02, the following statement

  1. module List = Core_kernel_list

is in fact an alias rather than a copy. This means that opening Core_kernel would only introduce a bunch of aliases, which does not require a lot of work from the compiler.

Executable size will be improved because we'll be able to move to having a package be structured as a module containing a set of aliases, rather than as a pack. That means we no longer have a single large compilation unit for the entire package, and so, using some improved dependency handling in the compiler, we can link in only the modules that we actually use.

Finally, the dependency-choke-point problem will be fixed by having a tighter understanding of dependencies. In particular, the fact that I depend on Core_kernel, which contains a collection of aliases to many other modules like Core_kernel_list or Core_kernel_array, doesn't mean I truly depend on all those modules. In particular, if I don't use (and so don't link in) Core_kernel_array, then I don't need to recompile when `Core_kernel_array changes.

Module aliases have other uses, in particular having to do with changes to the semantics of functors. But for us, the change to compilation speed and executable size are the big story.