Is it possible in OCaml to implement a universal type, into which any other type can be embedded? More concretely, is possible to implement the following signature?

module type Univ = sig type t val embed: unit -> ('a -> t) * (t -> 'a option) end

The idea is that `t`

is the universal type and that `embed ()`

returns a pair `(inj, prj)`

, which inject to and project from the universal type. Projection is partial (returns an option) because injection is not surjective.

Here is an example of how to use `Univ'.

module Test (U : Univ) = struct let (of_int, to_int) = U.embed () let (of_string, to_string) = U.embed () let r : U.t ref = ref (of_int 13) let () = begin assert (to_int !r = Some 13); assert (to_string !r = None); r := of_string "foo"; assert (to_int !r = None); assert (to_string !r = Some "foo"); end end

Try it for yourself and see if you can implement `module Univ : Univ`

so that `Test (Univ)`

passes. No `Obj.magic`

or other unsafe features allowed!

It is amazing to see you financial people do such theory! I am delighted. This exercise has connection to realizability toposes. I have three remarks.

First, if projection can map into ‘a option then it can map into ‘a because every type is inhabited, say by a diverging term. So you do not really need the option type. But it is probably better to do it that way, if you can, because returning None is friendlier than diverging.

Second, how is this supposed to play with diverging terms? Are we embedding terms or values? Presumably values. Since Ocaml is call-by-value it might be hard to embed terms that denote diverging values.

Third, what does it mean that injection followed by projection is the identity when we embed higher-order types, such as (int->int)->int? In your example above you indicate how it is supposed to work with int and string. Can you give us an example of how it is supposed to work with a function type, where comparison = does not work?

Hiding the type in a reference should do the trick (this is not thread-safe, though!):

module U : Univ = struct

type t = (unit -> unit) * (unit -> unit)

`let embed () =`

let r = ref None in

let put x = (fun () -> r := Some x), (fun () -> r := None) in

let get (f, g) = f (); let res = !r in g (); res in

put, get

end

It would be very tempting to use an exception defined in a local module, but unfortunately the type of the exception’s argument cannot depend on a type variable…

I got a plain solution which passed the given tests, but I wouldn’t consider it universal, as in my solution an an ‘a type value encoded by an ‘a type encoder can’t be decoded by any other ‘a type decoders except the one coming from the same pair. Could you confirm that? or could you give the expected behavior of the following code?

let (of_int, to_int) = U.embed ()

let e = of_int 1

let (of_int’, to_int’) = U.embed ()

let e’ = of_int’ 1

assert (to_int e’ = Some 1);

assert (to_int’ e = Some 1);

And I think (but am not entirely sure) that that is the only behavior that is possible to implement in OCaml.

I am not going to answer, because it would ruin the fun for others. Actually, Stephen was the first to tell me about this little neat trick. Having a universal type turns out to be pretty powerful. You can easily build a type Univ.t list and then you can store arbitrarily typed data into that list.

The consequence is a type safe property list working for any type. When you get the hang of these, many of the usual tricks from dynamically typed languages become possible. But in a type-safe way! The option type on the projection will protect you.

However, I better not ruin Stephens later blog posts. There is more juice to be squeezed from this lemon.

Andrej, yes, an option is much better than diverging (or raising an

exception), because the partiality is expressed in the type system and

the user is forced to think about it. And yeah, since OCaml is call

by value, embedding values is the only thing that makes sense to me.

Embedding higher-order types works just like ground types. No use of

polymorphic comparison is required.

d2bg, I don’t know how to do it so that different calls to embed at

the same type are able to share. And in fact, in the applications

I’ve used this for, one would not want that. So, your example code

would raise an assertion failure.

Alain’s solution using the ref cell is pretty much what I had in mind.

Although I think it can be done a little more efficiently. Alain’s

solution allocates two closures and a 2-tuple per injection, and Some

variant per projection. I think it’s possible to allocate one closure

and a two-tuple per injection and nothing per projection. Also note

how Alain’s solution clears the ref cell by calling `g()` after the

value is extracted. This is essential to avoid a potential space

leak, and is preserved in my (slightly) more efficient solution.

Alain’s solution is very nice. Why do we need two-tuple? It seems to me that

`unit -> unit`

is already universal, with one closure and one ref per embed:module U : Univ = struct

type t = unit -> unit

let embed () =

let r = ref None in

let put x = (fun () -> r := Some x) in

let get f = f (); let x = !r in r := None; x in

put, get

end

I thought you were asking for a solution in which different calls to

`U.embed`

at the same type share embedded values. That seems a bit harder, especially if we’re supposed to do it with a single polymorphic embedding-projection pair. The definition of universal type I am acquainted with asks for afamilyof embeddings-projections, one for each type, whereas you asked for a single polymorphic embedding-projection pair (and the projection does not return an option). With a family we can do it, of course, as we just copy Alain’s solution, separately at each type.By the way, how is the universal type supposed to behave with respect to polymorphic types? Consider this:

let id x = x ;;

let inj, proj = U.embed () ;;

let Some f = proj (inj id) ;;

let x = f 10 ;;

let Some g = proj (inj id) ;;

let y = g “foo” ;;

Ok, second try! This version allocates one closure and a Some variant (not a two-tuple) per injection, and nothing per projection.

type t = bool -> unit

`let embed () =`

let r = ref None in

let put x = let sx = Some x in fun b -> r := if b then sx else None in

let get f = f true; let res = !r in f false; res in

put, get

This is essential to avoid a potential space leak…

I don’t understand that part of your comment. Clearing the ref cell is necessary to have a correct behavior; otherwise, a failed projection would leave some garbage that could cause a further wrong projection not to fail.

But I don’t think that not clearing the ref cell would cause a space leak: the ref cell will not survive any longer than the closure (which must contain the value put into the cell anyway).

This is nice, possibly better than what I had in mind (I’ll post code tomorrow).

As to the space leak, I agree that your solution requires clearing the cell for correctness. I was mixing it up with my solution, which uses a unique id (unit ref) per embedding and avoids storing in the ref cell unless it dynamically knows that it’s the right cell. In that case, the clear is purely to avoid the leak. I do think it would be a leak though, because both the inject and project closures are closed over the ref cell, and should be constant size, while the ref cell could hold a value of arbitrary size (completely unconnected to other values currently live in the program).

Yes, of course, you’re right about the space leak!

Consider the following.

let i1, p1 = U.embed ()

let i2, p2 = U.embed ()

let u1 = i1 13

let u2 = i2 “foo”

let () = assert (p2 u1 = None)

let () = assert (p1 u2 = None)

With your implementation, I believe the the second assert will fail, because `p1 u2` will return `Some 13`. The problem is that `p2 u1` stores in the ref cell for the first embedding (call it r1), but then clears the ref cell for the second embedding (call it r2). Then, `p1 u2` stores into r2, fetches from r1 (which still holds `Some 13`), and clears r1.

Here is my implementation of `Univ`. It is very similar to Alain’s

using the hidden ref cell, except it uses a unique id per embedding

and only stores and clears the ref cell if it knows it will succeed.

module Univ : Univ = struct

type t = {

id : unit ref;

store : unit -> unit;

}

let embed () =

let id = ref () in

let r = ref None in

let put a =

let o = Some a in

{ id = id; store = (fun () -> r := o); }

in

let get t =

if id == t.id then (t.store (); let a = !r in r := None; a) else None

in

(put, get)

end

The tradeoff between this and Alain’s solution is that this one

allocates an extra tuple per injection, but does fewer tests per

projection, and doesn’t store if the projection returns None (Alain’s

solution does two stores in either case.

Another possible difference might be thread safety. In either

solution, once can use a single mutex for the entire module to make

the mutation thread safe. However, I think my solution may be easier

to extend using a mutex per embedding, which would have less

contention. The idea is to wrap `t.store (); let a = !r in r :=

None;` in a critical section that holds the per-embedding mutex. For

my solution, the single mutex suffices because we know that `t.store

()` and `r := None` deal with the same ref cell. For Alain’s

solution, one would need to lock the mutex for both embeddings (which

could deadlock), at which point one might as well just test whether

one has the right embedding, leading naturally back to my solution.

Alain, what do you think?

I am not aware of any other way to implement the `Univ` signature in

OCaml (including using polymorphic variants and exceptions), but would

be happy to see one. It is possible to implement `Univ` in SML using

exceptions, but unfortunately not in OCaml because it doesn’t have

local exception declarations. One could use exceptions in OCaml to

implement a functor that has a similar flavor to `Univ`, but as far as

I know, it is not possible to meet the `Univ` signature. One nice

thing about using exceptions (in either language) is that one gets

thread safety automatically since no references are used.

Maybe just having polymorphic (non-local) exceptions would be sufficient, with something like this:

module type Univ =

sig

type t

val embed : unit -> (‘a -> t) ( (t -> a’ option)

end

module U : Univ =

struct

type t = unit -> unit

exception Message of (t -> ‘a) * ‘a

let embed () =

let rec put x () = raise (Message (get, x))

and get f =

try f () ; None

with Message (get’, y) ->

if get’ == get then Some y else None

in

put, get

end

Here

`get`

plays a double role, the other one is as a unique token (a new closure is generated for`get`

at every call to`embed`

, so this should work). Since ocaml does not have polymorphic exceptions (does SML?) this won’t work. Here is an “approximation” using functors:module type Univ =

sig

type t

module Make : functor (Any : sig type t end) ->

sig

val embed : unit -> (Any.t -> t) * (t -> Any.t option)

end

end

module U : Univ =

struct

type t = unit -> unit

module Make (Any : sig type t end) =

struct

exception Message of (t -> Any.t option) * Any.t

let embed () =

let rec put x () = raise (Message (get, x))

and get f =

try

f () ; None

with Message (get’, y) -> if get’ == get then Some y else None

in

put, get

end

end

However, once we have the exception wrapped inside a functor it becomes local anyway and we can use the solution you had in mind.

SML does have local exceptions.

There is no need to use ‘try’ and ‘raise’ to implement a universal

type with exceptions. One simply takes the universal type to be

`exn`. In OCaml:

module type Univ’ = sig

type t

module Embed (M : sig type t end) : sig

val inj : M.t -> t

val prj : t -> M.t option

end

end

module Univ’ : Univ’ = struct

type t = exn

module Embed (M : sig type t end) = struct

exception E of M.t

let inj m = E m

let prj = function E m -> Some m | _ -> None

end

end

In SML, one could use the same approach with local exceptions and

match the original `Univ` signature.

This trick seems to have been invented a number of times by different people. Davide Sangiorgi and I discussed a variant here:

[http://www.cis.upenn.edu/~bcpierce/papers/polybisim.ps](http://www.cis.upenn.edu/~bcpierce/papers/polybisim.ps)

One interesting twist was that the same trick worked both in polymorphic lambda-calculi with references (ML, etc.) and in the polymorphic pi-calculus (e.g., the Pict language) with channels playing the role of reference cells.

Here is another solution:

module U : Univ = struct

type t = exn

let embed () =

let new type s in

let module M = struct exception E of s end in

(fun x -> M.E x), (function M.E x -> Some x | _ -> None)

end

It relies on the “let new type t in e” construction developed at LexiFi, that I’ve mentioned somewhere else on this blog. This new construction should appear in a later official version of OCaml.

This solution is thread-safe and it should be quite efficient. One can of course move the allocation from the projection function into the injection function by declaring “exception E of s option”.

The construction “let new type t in e” creates a new abstract type “t” visible in the scope of the sub-expression “e”. If this sub-expression “e” is well-typed and the type “t” does not escape to the environment, then the type for “let new type t in e” is obtained from the type for “e” by replacing “t” by a fresh type variable.

Using our first-class modules, one can also write something more modular:

module type S = sig type t exception E of t end

type ‘a prop = (module S with type t = ‘a)

let create (type s) () =

let module M = struct type t = s exception E of t end in

(module M : S with type t = s)

let inj (type s) p x =

let module M = (val p : S with type t = s) in

M.E x

let proj (type s) p y =

let module M = (val p : S with type t = s) in

match y with M.E x -> Some x | _ -> None

let embed () = let p = create () in inj p, proj p

Here, we use first-class modules to represent first-class exception

constructors(which can be used in pattern matching).