GADTs allow one to statically enforce stronger program invariants than are otherwise possible in a Hindley-Milner style type system. This post retells the story of how to "roll your own" GADTs using an explicit type of equality constraints. More interestingly, we discuss a particularly versatile definition of type equality in Haskell that can now be transcribed into OCaml due to the recent addition of first class modules.

## GADTs

The acronym GADT stands for Generalized Algebraic DataType. One definition of a GADT is a parameterized algebraic datatype in which type parameters may vary from one constructor to the next. Consider the following example (adapted from the Haskell Wiki's GADT page, which is also a good starting point for papers on this sort of thing):

data Term x where Pair :: Term a -> Term b -> Term (a, b) Fst :: Term (a, b) -> Term a Snd :: Term (a, b) -> Term b

This type is like a variant type in OCaml except that (1) the constructors' type signatures are given explicitly as opposed to the standard compact BNF grammar-like syntax for algebraic datatypes, and (2) the constructors do not uniformly return `Term x`

. Instead, the type parameter `x`

in the return type varies from constructor to constructor.

In this particular example, GADTs are used to ensure that well-typed values of type `Term a`

always represent values of type `a`

. In other words, by embedding the type system of the defined language (`Term`

in this example) into the type system of the defining language (Haskell in this example), we are guaranteed that type errors in the defined language are caught by the type checker of the defining language. The following idealized interpreter session shows this behavior.

```
ghci> :t (\x -> Pair (Lit 5) (Fst x))
forall a b. Term (a, b) -> Term (Int, a)
ghci> Fst (Lit 5)
Error: ... expected type "(a, b)" but inferred type "Int" ...
```

Pattern matching rules for GADTs are also stronger than those for vanilla datatypes in that the type checker may discover additional constraints about a type that hold in the context of a particular pattern matching branch. Consider the following evaluator:

eval :: Term a -> a eval (Lit i) = i eval (Pair x y) = (eval x, eval y) eval (Fst xy) = x where (x, y) = eval xy eval (Snd xy) = y where (x, y) = eval xy

In the second clause of the definition of `eval`

, the type checker learns from the type of the constructor `Pair`

that `a = (b, c)`

for some `b`

and `c`

. This information is used to justify returning a value of type `(b, c)`

where a value of type `a`

was expected.

The statically typed evaluator is a well-worn example of GADTs. A perhaps less-worn example is type-preserving rewrite rules.

simplify :: Term a -> Term a simplify (Fst (Pair x \_)) = simplify x simplify (Snd (Pair \_ y)) = simplify y simplify other = other

ghci> eval (Fst (Pair (Lit 5) (Lit 8))) 5 ghci> simplify (Fst (Pair (Lit 5) (Lit 8))) (Lit 5)

Regarding other applications of GADTs, a future post will recount how to write programs exhibiting "intensional polymorphism" using a GADT of type representations a la Stephanie Weirich and then show a nice way to represent a significant subset of OCaml types this way.

Popular language extensions allow one to program directly with GADTs in Haskell as shown here. However, plain Haskell allows us to "roll our own" GADTs with a little extra work.

## Encoding GADTs with type equality

The above description of pattern matching with GADTs hints at a useful fact: the extra power afforded by GADTs has to do with tracking and applying additional type equality constraints. What proper GADTs do for us automatically, we may do for ourselves if we have a way to manipulate type equalities.

To this end, assume we have a type `Equal a b`

that stands for the proposition that `a`

and `b`

are equal types. Furthermore, assume that this type supports the following operations.

efl :: Equal a a symm :: Equal a b -> Equal b a trans :: Equal a b -> Equal b c -> Equal a c lift :: Equal a b -> Equal (f a) (f b) coerce :: Equal a b -> a -> b

Viewed through the propositions-as-types lens, the types of `refl`

, `symm`

, and `trans`

say that `Equal`

is an equivalence relation, while the type of `lift`

says that every type constructor `f`

preserves `Equal`

.

Finally the `coerce`

function says what we can *do* with values of type `Equal a b`

, namely coerce values from type `a`

into an equivalent type `b`

. Since `a`

and `b`

are equal, we expect such a coercion to behave like the identity function. In particular, `coerce`

should in no way inspect its second argument.

The above definition of `Term`

may then be encoded in terms of `Equal`

as follows:

data Term x | forall a b. Pair (Equal x (a, b)) (Term a) (Term b) | forall b. Fst (Term (x, b)) | forall a. Snd (Term (a, x))

The last three constructors here make use of existentially quantified type variables (somewhat confusingly introduced with the keyword `forall`

). This definition yields constructors with the following types

Pair :: Equal x (a, b) -> Term a -> Term b -> Term x Fst :: Term (a, b) -> Term a Snd :: Term (a, b) -> Term b

Partially applying Lit and Pair constructors to `refl`

yields more constructor functions with more natural types.

pair :: Term a -> Term b -> Term (a, b) num = Lit refl pair = Pair refl

Note that these types are the same as those declared for constructors of the GADT version of `Term`

.

When pattern matching, we will find it useful to explicitly manipulate equality constraints in the form of values of type `Equal a b`

for particular types `a`

and `b`

. The `eval`

example becomes

eval :: Term a -> a eval (Lit eq i) = coerce (symm eq) i eval (Pair eq a b) = coerce (symm eq) (eval a, eval b) eval (Fst a) = x where (x, y) = eval a eval (Snd a) = y where (x, y) = eval a

The use of equality values here makes precise the informal reasoning given above for why a GADT-aware type checker accepted the previous version of `eval`

. The burden on the programmer is non-trivial, but we may be willing to pay the price in order to squeeze more assurances from the type system.

## User-defined type equality

How might one implement `Equal`

? Amazingly, there is a GADT-free definition for `Equal`

involving polymorphism over type constructors rather than over types, namely

data Equal a b = Coerce (forall f. f a -> f b)

As a logical formula, this definition says that `a`

equals `b`

if every property `f`

that holds of `a`

also holds of `b`

. The type variable `f`

in this definition ranges over type constructors (like OCaml's `option`

and `list`

) rather than types (like OCaml's `int`

and `string`

).

Reflexivity and transitivity are easy to prove with this definition.

efl :: Equal a a refl = Coerce id trans :: Equal a b -> Equal b c -> Equal a c trans (Coerce f) (Coerce g) = Coerce (g . f)

(Note that the infix dot is Haskell's function composition operator). Symmetry is more difficult, so we leave it for later. Coercion may be defined by instantiating `f`

to the identity type constructor.

ewtype Id a = Id { unId :: a } coerce :: Equal a b -> a -> b coerce (Coerce f) = unId . f . Id

The idiom used in the definition of `Id`

gives us both an injection function `Id :: a -> Id a`

as well as a projection function `unId :: Id a -> a`

. The keyword `newtype`

means that this type definition is treated as a type synonym at runtime, so that `Id`

and `unId`

are both implemented as the identity function and therefore serve only to guide the typechecker. The definition of `lift`

is a variation on this theme.

ewtype Compose f1 f2 a = Compose { unCompose :: f1 (f2 a) } lift :: Equal a b -> Equal (f a) (f b) lift (Coerce f) = Coerce (unCompose . f . Compose)

Though the definition of `Equal`

seems very asymmetric, one may define `symm`

by instantiating the argument coercion to the property `f c = Equal c a`

, which must hold of `b`

since it holds of `a`

(via `refl`

).

ewtype FlipEqual a c = Flip { unFlip : Equal c a } symm :: Equal a b -> Equal b a symm (Coerce f) = (unFlip . f . Flip) refl

## Transcription into OCaml

How might we transcribe this implementation of `Equal`

into OCaml? The most difficult aspect seems to be the universal quantification over a type constructor rather than over a regular type. Parameterization over a type constructor is something that can be done in OCaml only via a functor. Fortunately, OCaml 3.12's new first class modules allow us to embed modules and functors into the value world.

After making this initial step, I found transcribing the remaining Haskell definitions into OCaml is a useful exercise in learning to program with first class modules. Following Stephen's lead in an earlier post, I will give the reader a chance to work out the remaining definitions on his/her own before posting my solution.

## Comparison with previous implementations

Defining type equality in OCaml is not a new idea. Oleg Kiselyov has a clever implementation that only requires explaining away a single unreachable exception-raising expression. Similarly, a simple pair-of-coercions implementation of type equality features in one of the examples of first class modules in the OCaml 3.12 reference manual. So what value is added by importing the implementation used in pure Haskell?

The answer lies in the `lift`

combinator, which is not supported by previous OCaml implementations of type equality. Without it, consider what one would have to do to coerce an `'a list`

into a `'b list`

given a value of type `('a, 'b) Equal.t`

. It seems clear that one must essentially map an `'a`

-to-`'b`

coercion across the list, therefore copying the spine. The implementation presented here, however, allows one to do the coercion without ever inspecting the list.

For lists, this may not be so bad. But in general, the occurrence of the type parameter `'a`

may be arbitrarily deep in the definition of a type constructor, and the deeper the occurrence, the more costly the traversal.

As a side note, I’ve heard that full-on, well-supported GADTs will likely arrive in OCaml 3.13. So these tricks, while cool, will likely not be necessary.

This is neat. I just realized today that first-class modules could be used to encode type equality without requiring higher-kinded type variables, so I wrote it up, and then searched and found your version. I’m amazed at how similar my code is to yours, but I have a suggestion: Your current implementation oftrans composes the coercions of its two arguments, which means that if you combine a bunch of them, it grows and the coercion itself also takes longer. I could imagine wanting to maintain type equality witnesses in a tail-recursive loop using polymorphic recursion, but I don’t want transitive composition to grow without bound. Instead, you can use the first coercion to coerce the second, like this:

let trans (type a) (type b) (type c) t1 t2 =

let module T2 = (val t2 : Equal with type fst = b and type snd = c) in

let module C2 = T2.Coerce (struct type ‘a t = (a, ‘a) equal end) in

C2.f t1

This space safe, because its result is physically equal to its first argument. It’s also a lot shorter. Many of them can be shorter, actually. Here’s my version:

module type TYPE_EQUALITY = sig

type (‘a, ‘b) eq

val coerce : (‘a, ‘b) eq -> ‘a -> ‘b

val uncoerce : (‘a, ‘b) eq -> ‘b -> ‘a

val refl_eq : (‘a, ‘a) eq

val symm_eq : (‘a, ‘b) eq -> (‘b, ‘a) eq

val trans_eq : (‘a, ‘b) eq -> (‘b, ‘c) eq -> (‘a, ‘c) eq

module type TYCON = sig type ‘a t end

module SubstEq (C: TYCON) : sig

val subst_eq : (‘a, ‘b) eq -> (‘a C.t, ‘b C.t) eq

end

end

module TypeEquality : TYPE_EQUALITY = struct

module type TYCON = sig type ‘a t end

module type EQ = sig

type a

type b

module Coerce (C: TYCON) : sig

val coerce : a C.t -> b C.t

end

end

type (‘a, ‘b) eq = (module EQ with type a = ‘a and type b = ‘b)

let coerce (type a) (type b) eq =

let module Eq = (val eq : EQ with type a = a and type b = b) in

let module M = Eq.Coerce (struct type ‘a t = ‘a end) in

M.coerce

let uncoerce (type a) (type b) eq =

let module Eq = (val eq : EQ with type a = a and type b = b) in

let module M = Eq.Coerce (struct type ‘a t = ‘a -> a end) in

M.coerce (fun x -> x)

let refl_eq (type t) =

(module struct

type a = t

type b = t

module Coerce (C: TYCON) = struct

let coerce x = x

end

end : EQ with type a = t and type b = t)

let symm_eq (type a) (type b) eq =

let module Eq = (val eq : EQ with type a = a and type b = b) in

let module M = Eq.Coerce (struct type ‘a t = (‘a, a) eq end) in

M.coerce refl_eq

let trans_eq (type a) (type b) (type c) eq1 eq2 =

let module Eq = (val eq2 : EQ with type a = b and type b = c) in

let module M = Eq.Coerce (struct type ‘a t = (a, ‘a) eq end) in

M.coerce eq1

module SubstEq (D: TYCON) = struct

let subst_eq (type a) (type b) eq =

let module Eq = (val eq : EQ with type a = a and type b = b) in

let module M = Eq.Coerce (struct type ‘a t = (a D.t, ‘a D.t) eq end) in

M.coerce refl_eq

end

end

It’s also possible to make subtyping witnesses that respect variance, with the following signature (though I’m not sure it’s especially useful):

module type SUBTYPING = sig

type (‘a, ‘b) st

val coerce : (‘a, ‘b) st -> ‘a -> ‘b

val refl_st : (‘a, ‘a) st

val trans_st : (‘a, ‘b) st -> (‘b, ‘c) st -> (‘a, ‘c) st

module type TYCON_PLUS = sig type +’a t end

module SubstPlus (C: TYCON_PLUS) : sig

val subst_plus : (‘a, ‘b) st -> (‘a C.t, ‘b C.t) st

end

module type TYCON_MINUS = sig type -’a t end

module SubstMinus (C: TYCON_MINUS) : sig

val subst_minus : (‘a, ‘b) st -> (‘b C.t, ‘a C.t) st

end

end

Here is an implementation of the remaining combinators.

Note that the lift combinator is implemented as a functor. This could be pushed down into the value world, but I think the result would be no more useful.

[ocaml]

module Solution : sig

val refl : (‘a, ‘a) equal

val symm : (‘a, ‘b) equal -> (‘b, ‘a) equal

val trans : (‘a, ‘b) equal -> (‘b, ‘c) equal -> (‘a, ‘c) equal

val coerce : (‘a, ‘b) equal -> ‘a -> ‘b

module Lift :

functor (F : sig type ‘a t end) -> sig

val f : (‘a, ‘b) equal -> (‘a F.t, ‘b F.t) equal

end

end = struct

module type F = sig type ‘a t end

let coerce (type a) (type b) t =

let module T = (val t : Equal with type fst = a and type snd = b) in

let module C = T.Coerce(struct type ‘a t = ‘a end) in

C.f

;;

let refl (type a) =

(module struct

type fst = a

type snd = a

module Coerce = functor (F:F) -> struct

let f x = x

end

end : Equal with type fst = a and type snd = a)

;;

let trans (type a) (type b) (type c) t1 t2 =

let module T1 = (val t1 : Equal with type fst = a and type snd = b) in

let module T2 = (val t2 : Equal with type fst = b and type snd = c) in

(module struct

type fst = a

type snd = c

module Coerce = functor (F:F) -> struct

let f =

let module C1 = T1.Coerce(F) in

let module C2 = T2.Coerce(F) in

fun x ->

C2.f (C1.f x)

end

end : Equal with type fst = a and type snd = c)

;;

let symm (type a) (type b) t =

let module Equals_a = struct

type ‘c t = (module Equal with type fst = ‘c and type snd = a)

end

in

let module T = (val t : Equal with type fst = a and type snd = b) in

let module Coerce = T.Coerce(Equals_a) in

Coerce.f refl

;;

module Lift (G:F) = struct

let f (type a) (type b) t =

let module T = (val t : Equal with type fst = a and type snd = b) in

(module struct

type fst = a G.t

type snd = b G.t

module Coerce = functor (F:F) -> struct

module X = T.Coerce(struct type ‘a t = ‘a G.t F.t end)

include X

end

end : Equal with type fst = a G.t and type snd = b G.t)

end

end

[/ocaml]

ja: Leibniz equality can be made injective if you have type families, as Oleg explains. (Although Oleg’s explanation uses multiple type instances, you can make do with a single instance in practice. Details left as an exercise to the reader!)

I don’t know if it’s possible to add support for injectivity to the Leibniz encoding in OCaml, though.

The higher-order definition of type equality is beautiful, but it is not yet known how to use it to write all of the programs that can be written with GADTs because of the injectivity problem. See section 5.1 of Vytiniotis and Weirich’s “Parametricity, Type Equality and Higher-order Polymorphism” or section 2.4 of Cheney and Hinze’s “First-Class Phantom Types.

Oleg Kiselyov and Jeremy Yallop have a paper at the upcoming ML Workshop that is very much related to this post. It is available online at http://okmij.org/ftp/ML/first-class-modules/