More expressive GADT encodings via first class modules

GADTs allow one to statically enforce stronger program invariants than are otherwise possible in a Hindley-Milner style type system. This post retells the story of how to “roll your own” GADTs using an explicit type of equality constraints. More interestingly, we discuss a particularly versatile definition of type equality in Haskell that can now be transcribed into OCaml due to the recent addition of first class modules.

GADTs

The acronym GADT stands for Generalized Algebraic DataType. One definition of a GADT is a parameterized algebraic datatype in which type parameters may vary from one constructor to the next. Consider the following example (adapted from the Haskell Wiki’s GADT page, which is also a good starting point for papers on this sort of thing):

data Term x where
  Lit :: Int -> Term Int
  Pair :: Term a -> Term b -> Term (a, b)
  Fst :: Term (a, b) -> Term a
  Snd :: Term (a, b) -> Term b

This type is like a variant type in OCaml except that (1) the constructors’ type signatures are given explicitly as opposed to the standard compact BNF grammar-like syntax for algebraic datatypes, and (2) the constructors do not uniformly return Term x. Instead, the type parameter x in the return type varies from constructor to constructor.

In this particular example, GADTs are used to ensure that well-typed values of type Term a always represent values of type a. In other words, by embedding the type system of the defined language (Term in this example) into the type system of the defining language (Haskell in this example), we are guaranteed that type errors in the defined language are caught by the type checker of the defining language. The following idealized interpreter session shows this behavior.

ghci> :t (\x -> Pair (Lit 5) (Fst x))
forall a b. Term (a, b) -> Term (Int, a)
ghci> Fst (Lit 5)
Error: ... expected type "(a, b)" but inferred type "Int" ...

Pattern matching rules for GADTs are also stronger than those for vanilla datatypes in that the type checker may discover additional constraints about a type that hold in the context of a particular pattern matching branch. Consider the following evaluator:

eval :: Term a -> a
eval (Lit i) = i
eval (Pair x y) = (eval x, eval y)
eval (Fst xy) = x where (x, y) = eval xy
eval (Snd xy) = y where (x, y) = eval xy

In the second clause of the definition of eval, the type checker learns from the type of the constructor Pair that a = (b, c) for some b and c. This information is used to justify returning a value of type (b, c) where a value of type a was expected.

The statically typed evaluator is a well-worn example of GADTs. A perhaps less-worn example is type-preserving rewrite rules.

simplify :: Term a -> Term a
simplify (Fst (Pair x _)) = simplify x
simplify (Snd (Pair _ y)) = simplify y
simplify other = other

ghci> eval (Fst (Pair (Lit 5) (Lit 8)))
5
ghci> simplify (Fst (Pair (Lit 5) (Lit 8)))
(Lit 5)

Regarding other applications of GADTs, a future post will recount how to write programs exhibiting “intensional polymorphism” using a GADT of type representations a la Stephanie Weirich and then show a nice way to represent a significant subset of OCaml types this way.

Popular language extensions allow one to program directly with GADTs in Haskell as shown here. However, plain Haskell allows us to “roll our own” GADTs with a little extra work.

Encoding GADTs with type equality

The above description of pattern matching with GADTs hints at a useful fact: the extra power afforded by GADTs has to do with tracking and applying additional type equality constraints. What proper GADTs do for us automatically, we may do for ourselves if we have a way to manipulate type equalities.

To this end, assume we have a type Equal a b that stands for the proposition that a and b are equal types. Furthermore, assume that this type supports the following operations.

refl :: Equal a a
symm :: Equal a b -> Equal b a
trans :: Equal a b -> Equal b c -> Equal a c
lift :: Equal a b -> Equal (f a) (f b)
coerce :: Equal a b -> a -> b

Viewed through the propositions-as-types lens, the types of refl, symm, and trans say that Equal is an equivalence relation, while the type of lift says that every type constructor f preserves Equal.

Finally the coerce function says what we can do with values of type Equal a b, namely coerce values from type a into an equivalent type b. Since a and b are equal, we expect such a coercion to behave like the identity function. In particular, coerce should in no way inspect its second argument.

The above definition of Term may then be encoded in terms of Equal as follows:

data Term x
 = Lit (Equal x Int) Int
 | forall a b. Pair (Equal x (a, b)) (Term a) (Term b)
 | forall b. Fst (Term (x, b))
 | forall a. Snd (Term (a, x))

The last three constructors here make use of existentially quantified type variables (somewhat confusingly introduced with the keyword forall). This definition yields constructors with the following types

Lit :: Equal x Int -> Int -> Term x
Pair :: Equal x (a, b) -> Term a -> Term b -> Term x
Fst :: Term (a, b) -> Term a
Snd :: Term (a, b) -> Term b

Partially applying Lit and Pair constructors to refl yields more constructor functions with more natural types.

num :: Int -> Term Int
pair :: Term a -> Term b -> Term (a, b)

num = Lit refl
pair = Pair refl

Note that these types are the same as those declared for constructors of the GADT version of Term.

When pattern matching, we will find it useful to explicitly manipulate equality constraints in the form of values of type Equal a b for particular types a and b. The eval example becomes

eval :: Term a -> a
eval (Lit eq i) = coerce (symm eq) i
eval (Pair eq a b) = coerce (symm eq) (eval a, eval b)
eval (Fst a) = x where (x, y) = eval a
eval (Snd a) = y where (x, y) = eval a

The use of equality values here makes precise the informal reasoning given above for why a GADT-aware type checker accepted the previous version of eval. The burden on the programmer is non-trivial, but we may be willing to pay the price in order to squeeze more assurances from the type system.

User-defined type equality

How might one implement Equal? Amazingly, there is a GADT-free definition for Equal involving polymorphism over type constructors rather than over types, namely

data Equal a b = Coerce (forall f. f a -> f b)

As a logical formula, this definition says that a equals b if every property f that holds of a also holds of b. The type variable f in this definition ranges over type constructors (like OCaml’s option and list) rather than types (like OCaml’s int and string).

Reflexivity and transitivity are easy to prove with this definition.

refl :: Equal a a
refl = Coerce id

trans :: Equal a b -> Equal b c -> Equal a c
trans (Coerce f) (Coerce g) = Coerce (g . f)

(Note that the infix dot is Haskell’s function composition operator). Symmetry is more difficult, so we leave it for later. Coercion may be defined by instantiating f to the identity type constructor.

newtype Id a = Id { unId :: a }

coerce :: Equal a b -> a -> b
coerce (Coerce f) = unId . f . Id

The idiom used in the definition of Id gives us both an injection function Id :: a -> Id a as well as a projection function unId :: Id a -> a. The keyword newtype means that this type definition is treated as a type synonym at runtime, so that Id and unId are both implemented as the identity function and therefore serve only to guide the typechecker. The definition of lift is a variation on this theme.

newtype Compose f1 f2 a = Compose { unCompose :: f1 (f2 a) }

lift :: Equal a b -> Equal (f a) (f b)
lift (Coerce f) = Coerce (unCompose . f . Compose)

Though the definition of Equal seems very asymmetric, one may define symm by instantiating the argument coercion to the property f c = Equal c a, which must hold of b since it holds of a (via refl).

newtype FlipEqual a c = Flip { unFlip : Equal c a }

symm :: Equal a b -> Equal b a
symm (Coerce f) = (unFlip . f . Flip) refl

Transcription into OCaml

How might we transcribe this implementation of Equal into OCaml? The most difficult aspect seems to be the universal quantification over a type constructor rather than over a regular type. Parameterization over a type constructor is something that can be done in OCaml only via a functor. Fortunately, OCaml 3.12’s new first class modules allow us to embed modules and functors into the value world.

module type Equal = sig
  type fst
  type snd
  module Coerce :
    functor (F : sig type 'a t end) -> sig
      val f : fst F.t -> snd F.t
  end
end

type ('a, 'b) equal = (module Equal with type fst = 'a and type snd = 'b)

After making this initial step, I found transcribing the remaining Haskell definitions into OCaml is a useful exercise in learning to program with first class modules. Following Stephen’s lead in an earlier post, I will give the reader a chance to work out the remaining definitions on his/her own before posting my solution.

Comparison with previous implementations

Defining type equality in OCaml is not a new idea. Oleg Kiselyov has a clever implementation that only requires explaining away a single unreachable exception-raising expression. Similarly, a simple pair-of-coercions implementation of type equality features in one of the examples of first class modules in the OCaml 3.12 reference manual. So what value is added by importing the implementation used in pure Haskell?

The answer lies in the lift combinator, which is not supported by previous OCaml implementations of type equality. Without it, consider what one would have to do to coerce an 'a list into a 'b list given a value of type ('a, 'b) Equal.t. It seems clear that one must essentially map an 'a to 'b coercion across the list, therefore copying the spine. The implementation presented here, however, allows one to do the coercion without ever inspecting the list.

For lists, this may not be so bad. But in general, the occurrence of the type parameter 'a may be arbitrarily deep in the definition of a type constructor, and the deeper the occurrence, the more costly the traversal.