Parametric polymorphism is a basic mechanism in ML for writing code that is generic, *i.e.*, that can be used on multiple different types. To get the basic idea of how what parametric polymorphism is, think about the following simple example.

module M : sig (* Takes a list and returns a stuttered version, e.g., [1;2;3] is mapped to [1;1;2;2;3;3] *) val double : 'a list -> 'a list end = struct let rec double = function | [] -> [] | hd :: tl -> hd :: hd :: double tl end

In the type signature for `double`

, the expression `'a`

is a type variable, meaning that this function can be used with an arbitrary type plugged in for `'a`

. The reason that the type variable shows up is that the code of `double`

doesn't depend in any way on the properties of the elements of the list. At first glance, parametric polymorphism doesn't seem all that powerful. After all, how useful is it to write functions that can only be generalized over all possible types? More often than not you want to write functions that can be used over some narrow universe of types with particular properties. Object oriented languages provide this functionality with subtyping, and Haskell lets you get at this with type-classes. What do you do in ML?

It turns out that ML does allow you to write functions like this in a quite straightforward and ordinary way. A simple example can be found be examining the signature of a standard sort function:

val sort : cmp : ('a -> 'a -> int) -> 'a list -> 'a list

The signature above can be used on a value of any type, provided that you can also provide a comparison function for that type. In other words, a polymorphic function can take advantage of the idiosyncratic capabilities of the types it deals with, but the ability to take advantage of those capabilities must be passed in along with the values in question.

This is used in small ways throughout any non-trivial ML codebase. But we can use this in a more structured way by creating what are sometimes called *type-indexed values*. A type-indexed value is a value used to represent a set of capabilities associated with a type. Here's an example of a simple type-indexed value for capturing number-ness. In what follows, the type-indexed value is `Num.Type.t`

, and the rest of the `Num`

module is just utility functions to make the interface pretty.

module Type : sig type 'a t val int : int t end val (+) : 'a Type.t -> 'a -> 'a -> 'a val (-) : 'a Type.t -> 'a -> 'a -> 'a val ( * ) : 'a Type.t -> 'a -> 'a -> 'a val neg : 'a Type.t -> 'a -> 'a val zero : 'a Type.t -> 'a val sum : 'a Type.t -> 'a list -> 'a val sum_product 'a Type.t -> 'a list -> 'a list -> 'a end = struct module Type = struct module T = struct type 'a t = { plus : 'a -> 'a -> 'a; mul : 'a -> 'a -> 'a; neg : 'a -> 'a; zero : 'a; } end open T let int = { plus = Int.(+); neg = Int.(-); zero = Int.zero; mul = Int.mul; } neg = Float.(-); zero = Float.zero; mul = Float.mul; } end open Type.T let (+) typ x y = typ.plus x y let neg typ x = typ.neg x let zero typ = typ.zero let ( * ) typ x y = typ.mul x y (* Some derived operations *) let (-) typ x y = typ.plus x (typ.neg y) end

You'll note that the definition above of `Type.int`

and `Type.float`

are basically boilerplate. Because the modules in question themselves have a fairly standardized interface, we could instead use a functor to create these type-indexed values without the boilerplate:

module type Arith = sig type t val (+) : t -> t -> t val neg : t -> t val zero : t end module Build_type(M:Arith) = struct let typ x = { Type. plus = M.(+); neg = M.(-); zero = M.zero; } end let int = let module Z = Build_type(Int) in Z.typ let native = let module Z = Build_type(Native_int) in Z.typ

This is yet another advantage one gets from having standardized interfaces.

If type indexed-values look similar to Haskell's type-classes, it's because they are. In my limited understanding of Haskell, the implementation is similar as well, in that under the cover, Haskell passes around dictionaries of functions which play the same role that the `Type.t`

s play here.

The number typeclass described above is just an example, and not something I've felt the need for in practice. But here are some places where we've used type-indexed values to good effect:

### Serialization

The latest (unreleased) version of the `bin_prot`

macros that we use for binary serialization and deserialization now come with a type-indexed value that ties together all the little bits that you need to use the library. Before we did that, one could only instantiate useful bin-prot functionality using the module language. Now, we can do it using ordinary polymorphic functions.

### Little languages

Sometimes we design domain-specific languages embedded in the type system. It is often useful to have values representing the different types that can be generated in the language. For example, we use this as part of a set of SQL bindings to represent types that we know how to convert to and from SQL.

### Containers

We've started experimenting with type-indexed values representing the container-hood of a given object. This is a little trickier than the previous examples, since the type-indexed value has two type parameters, one for the type of the container, and one for the type of the elements of the container. In the end, this let's you write functions with signatures like

and use it to find the maximum element of a list (where the type-indexed value has type `('a, 'a list) Container.t`

) or an array (`('a, 'a array) Container.t`

) or a string (`(char,string)`

).

Container.t

Type-indexed values obviously have their downsides: they can be somewhat inconvenient syntactically, since you need to explicitly pass them along; and they sacrifice some performance because it leads you to call closures where you could otherwise call static functions that could be inlined instead. But overall, they are a flexible and elegant way of writing generic code in ML.

For information, we have something quite similar to your Arith in Batteries Included — the documentation for these modules is not on-line yet, but that will e part of release 0.1.

As for the containers, well, we don’t have that yet, but I like very much the idea of [('a, 'b) Container.t].

I would be curious to learn more about the motivation behind this approach. What is the advantage over simply instantiating

`Arith`

with modules`Int`

,`Float`

etc.?The main benefit of Haskell type classes seems to be that the dictionary is inserted automatically. As you point out, this isn’t done here (and is probably impossible to achieve through programming tricks in OCaml).

If you want to write generic code with modules, you use functors. If you want to do the same with type-indexed values (or really any ordinary kind of value) then you rely on parametric polymorphism. The big obvious advantage of parametric polymorphism is that it is considerably lighter-weight. Imagine having to instantiate a functor every time you wanted to use a new kind of list. Even worse, you’d have to start pushing functors everywhere in your code to be able to write your list code generically.

I think the key advantage of type-indexed values is that they provide a structured approach for writing code that is polymorphic over a complex types.

We actually have a version of this issue in our own codebase. We use functorized versions of hashtables throughout our codebase, both for performance and to get more control over the equality and hash functions used. But it’s become clear to us that the functor approach is a mistake, precisely because it doesn’t allow one to write polymorphic code. Our intent is to switch over to using hashtables that are instantiated by passing in a type-indexed value that has what you need to construct the hashtable, including pretty-printers for the types (so you can have decent error messages) and equality and hashing functions.

There’s a lot of meat to chew on here.

First, any idea as to how much performance is given up using this record of functions approach versus

functorized code? In an ideal world, there’s be an OCamlton whole world type compiler and both

approaches would have negligible (zero?) overhead, but we don’t live there, and neither functors nor

records of functions are compiled away.

Second, if I read you right, you think this approach is generally superior to functorized libraries. If that’s true,

then ML is not really at a sweet spot in language design at all. A better ML would be based on type classes,

since your type indexed values are just the translation of type classes to ML without the convenience

they provide. You could dispense with functors altogether, and just provide a Modula style Module

system for scoping/namespace control. Is that a correct assessment of your opinions?

I don’t think that type-indexed values are generally superior to functorized libraries. I think both approaches have their uses, and sometimes one is the right thing, and sometimes the other (although I must admit I’m having a hard time coming up with clear guidelines as to when one should use one or the other. Perhaps Steven would have something better developed to say about it.)

As for ML being a sweet spot in language design, I stand my ground, but I don’t want to overstate the point. I never meant to suggest that ML is the best of all possible languages; merely that it’s close to a local optimum. There are various small tweaks and improvements one can make, but to make a really big leap forward (like the leap from Java to ML) appears to be very difficult.

That said, it’s quite possible that a better language could be designed that featured type-classes more prominently. I’m probably too ignorant about Haskell to say this, but it is nonetheless my belief that Haskell is farther from the sweet spot because of laziness (which makes it too hard to reason about performance, particularly space performance) and the requirement to explicitly reflect all side-effects in the type system, which in my mind is too strict of a constraint in practice.

I wasn’t suggesting that Haskell is near a sweet spot. I happen to share your opinion about laziness and

imperative features; I might even go a step further and say that I wish OCaml were more like SML and

had a defined evaluation order. I was suggesting that the module system of ML could be simplified by

removing functors, and that type classes could be added, and that something like that would be a

better language. I haven’t used Haskell enough to feel sure about that. I do know that I’ve long wanted

some kind of overloading in ML.

If there are examples where you think functors are better, that would make the discussion more

interesting. I’ve used functors, every OCaml programmer has because the standard library makes

use of them, but I have to admit, almost all of the code I write is functor free. I may even have used

classes more than functors, but I’d be way happier if OCaml had a decent record system (like SML#)

and did away with classes.

Anyways, this is yet another motivation for me to try and use Haskell in earnest, for some real projects,

so that I can really learn the weaknesses of type classes. Unfortunately, I always end up having to use

union find, or corner stitched tiles, or some other data structure that cries out for mutability in the

implementation :-).

It seems your approach is similar to the “mixed” approach using classes and modules in Xavier Leroy’s ICFP99 slides.

Also, can you please give an example where functors make it impossible to write polymorphic code?

I don’t think my opinions have changed too much, but OCaml has changed. The advent of first-class packaged modules makes it possible to use functors in a polymorphic context, which is pretty neat. Indeed, type-indexed values can now be created naturally from the modules relating to the type in question. We’re just starting to think seriously about how to use this feature in our codebase, but I expect it to shift the balance between polymorphism and functors, and in particular, I think it will make it easier to combine the two approaches in the same codebase, and to use functors in some spots without having to functorize everything.

We have by the way made the change I mentioned above to our hashtables, and it does appear to be a clear improvement.

Back to this blog entry after now more than two years. I guess you have now more experience with this in your code base, so I was wondering if you still have the same opinion about this “functors vs param. polymorphism” problem. Did you notice some particular caveats with the parametric polymorphism approach?

On the relation between Haskell type-classes and your Ocaml module solution, Oleg Kiselyov sums it up nicely by saying, “The only difference is in the shape of the arrow”. Except that it’s much less convenient to have to manually perform the dictionary-passing transform (described in Phil Wadler’s first paper on this topic).

The upside comes especially in cases like you described, where you want to apply one generic function to a container, and another to its elements. You can address these independently:

class Foldable f where

fold_left :: a -> (a -> b -> a) -> f b -> a

class Ord a where

(< ) :: a -> a -> Bool

fmax :: (Ord a, Foldable f) => f a -> Maybe a

fmax xs = fold_left Nothing emax xs

where emax Nothing y = Just y

emax (Just x) y = Just (if (x < y) then y else x)

And thankfully, after the “proof obligation” is settled once you can forget about it.

fmax [1,2,3,4]

=> Just 4

fmax (Branch (Leaf 1, Branch (Branch (Leaf 2, Leaf 3), Leaf 4)))

=> Just 4

This kind of overloading is tremendously valuable. Somebody must have already ported type-classes to Ocaml. If not, I think I’ll give it a shot — it seems like a worthwhile project.