One of the annoyances of using monads in OCaml is that the syntax is awkward. You can see why if you look at a simple example. Assume that you're using the usual `option`

monad. If we define `>>=`

to be the bind operator, you might end up with a piece of code that looks like this:

let f x_opt y_opt z_opt = x_opt >>= (fun x -> y_opt >>= (fun y -> z_opt >>= (fun z -> return (x + y + z))))

This is awful for two reasons: the indentation is absurdly deep, and secondly, and there are too many closing parens at the end. One solution to this is

pa_monad, a `camlp4`

syntax extension that lets you write the same piece of code like this:

let f x_opt y_opt z_opt = perform x <-- x_opt; y <-- y_opt; z <-- z_opt; return (x + y + z)

This is much less painful to look at, but introducing new syntax has a cost, and it's worth seeing if we can make things more livable without resorting to a syntax extension. We can make our original code less eye-gougingly awful by just changing the indentation:

let f x_opt y_opt z_opt = x_opt >>= (fun x -> y_opt >>= (fun y -> z_opt >>= (fun z -> return (x + y + z))))

That still leaves the parens. But as it happens, the parens can simply be dropped! This was passed on to me by Adam Chlipala, who got it in turn from Xavier Leroy at POPL this year. Our code can thus be written as follows:

let f x_opt y_opt z_opt = x_opt >>= fun x -> y_opt >>= fun y -> z_opt >>= fun z -> return (x + y + z)

This seems clean enough to actually use in practice. Now all I need is to coerce tuareg into indenting my code this way, and I'll be happy...

I really like letting M-q do all my formatting work for me. I don’t see how you can coerce tuareg to handle this when you’re going to have to have a distinct name for

`>>=`

at any given type.tuareg’s default formatting for your last example isn’t too bad, though:

let f x_opt y_opt z_opt =

x_opt >>= fun x ->

y_opt >>= fun y ->

z_opt >>= fun z ->

return (x + y + z)

Our habit is to only use>>= as the bind operator. So, if I had some stretch of code where I wanted to use the option monad, I would prefix that code with:

let (>>=) = Option.(>>=) in

let (>>|) = Option.(>>|) in

let return = Option.return in

….

and then go from there. It would be nice to have a local open, since then I could just write:

open Option.Monad_infix

in a local scope. Given that we use fixed operators from monadic bind and return, getting tuareg to detect that shouldn’t be too hard.

Monads can be quite useful at times. I’ve just recently been tinkering on

a small iterator combinator library for SML (see interface

and implementation)

although I got the

original idea way back in 2005. Iterators are higher-order functions

having types of the form

(‘a -> unit) -> unit

and usually invoke the given effect with elements drawn from a data

structure or generated using some algorithm. For example, OCaml’s

`List.iter`

function for iterating over a list matches the typeif you flip the arguments. It turns out that iterators form a monad with

plus with the following operations:

let return x f = f x

let (>>=) aM a2bM f = aM (fun a -> a2bM a f)

let zero _ = ()

let (< |>) aM bM f = (aM f ; bM f)

Of course, as you can see from the SML library, many more useful

combinators, both based on the monad operations and as new primitive

operations, can be defined. In fact, here are a few more combinators

needed by the examples below.

let (>>) aM bM = aM >>= fun _ -> bM

let guard b = if b then return () else zero

let filter p aM = aM >>= fun a -> if p a then return a else zero

let rec unfold g s f =

match g s with

| None -> ()

| Some (x, s) -> (f x ; unfold g s f)

let upToBy l u d = unfold (fun l -> if l < u then Some (l, l+d) else None) l

let upTo l u = upToBy l u 1

The Iterator monad in an eager language is in many ways similar to the List monad

in a lazy language. Another related concept is eager

comprehensions as in SRFI-42.

Here are a couple of examples from Sebastian Egner’s paper

on eager comprehensions translated to use the above iterator combinators.

First a program that prints out prime numbers:

let eratosthenes n =

let p = Array.make n true in

let isPrime i = p.(i) in

let primes = filter isPrime (upTo 2 n) in

(primes >>= fun k -> upToBy (2*k) n k)

(fun i -> p.(i) < - false) ;

primes

let n = int_of_string Sys.argv.(1)

let () =

(eratosthenes n)

(Printf.printf "%d\n")

And another program that prints out Pythagorean triples:

let sq x = x * x

let pythagoras n =

upTo 1 (n+1) >>= fun a ->

upTo a (n+1) >>= fun b ->

guard (sq a + sq b < = sq n) >>

upTo b (n+1) >>= fun c ->

guard (sq a + sq b = sq c) >>

return (a, b, c)

let n = int_of_string Sys.argv.(1)

let () =

(pythagoras n)

(fun (a, b, c) -> Printf.printf “%d %d %d\n” a b c)

Now we finally get to the subject of my post. In OCaml, the abstraction

of iterator combinators seems to come with a significant penalty. For

example, the program that prints out Pythagorean triples using iterator

combinators takes about 15 times longer to print the triples up to (n =)

1000 on my machine than the same algorithm implemented using OCaml’s

built-in for-loops. It seems that performance can be improved only

marginally by manually optimizing the combinators. In contrast, the same

example in SML, without any manual optimization of the combinators, and

compiled with MLton, runs as fast as the

OCaml program written using for-loops. (Follow the links to the SML

versions of the eratosthenes

and pythagoras

examples.)

While I believe that iterator combinators are a useful abstraction and

allow concise expression of non-trivial iterators (these examples

demonstrate only a small part of the functionality), I’m not sure whether

one would really want to use them in OCaml. Losing an order of magnitude

in performance due to abstraction penalty is not insignificant. This is

why I personally love to program in SML/MLton. You just compile your

whole program and you are done.

Yep. OCaml can’t inline functions passed as parameters but SML can. As far as I know there are no plans to change this.

So in OCaml any time you decide to use higher order functions, you pay a price. I guess it’s only really relevant in tight loops like your examples above, but it’s sad sometime to think I can either code elegantly or fast, but not both.

There is pa_openin