Trivial meta-programming with cinaps

From now and then, I found myself having to write some mechanical and repetitive code. The usual solution for this is to write a code generator; for instance in the form of a ppx rewriter in the case of OCaml code. This however comes with a cost: code generators are harder to review than plain code and it is a new syntax to learn for other developers. So when the repetitive pattern is local to a specific library or not widely used, it is often not worth the effort. Especially if the code in question is meant to be reviewed and maintained by several people.

Then there is the possibility of using a macro pre-processor such as cpp or cppo which is the equivalent of cpp but for OCaml. This can help in some cases but this has a cost as well:

  • macros generally make the code harder to read
  • errors tends to be harder to understand since they don't point where you'd expect
  • you can say goodbye to merlin

In fact, when the repetitive pattern is specific to one particular case and of reasonable size, committing and reviewing the generated code is acceptable. That's the problem Cinaps tries to solve.

What is cinaps?

Cinaps is an application that reads input files and recognize special syntactic forms. Such forms are expected to embed some OCaml code printing something to stdout. What they print is compared against what follow these special forms. The rest works exactly the same as expectation tests.

The special form is (*$ <ocaml-code> *) for ml source files, /*$ <ocaml-code> */ for C source files and #|$ <ocaml-code> |# for S-expression files.

For instance:

$ cat file.ml
let x = 1
(*$ print_newline ();
    List.iter (fun s -> Printf.printf "let ( %s ) = Pervasives.( %s )\n" s s)
      ["+"; "-"; "*"; "/"] *)
(*$*)
let y = 2

$ cinaps file.ml
---file.ml
+++file.ml.corrected
File "file.ml", line 5, characters 0-1:
  let x = 1
  (*$ print_newline ();
      List.iter (fun s -> Printf.printf "let ( %s ) = Pervasives.( %s )\n" s s)
        ["+"; "-"; "*"; "/"] *)
+|let ( + ) = Pervasives.( + )
+|let ( - ) = Pervasives.( - )
+|let ( * ) = Pervasives.( * )
+|let ( / ) = Pervasives.( / )
  (*$*)
  let y = 2

$ echo $?
1
$ cp file.ml.corrected file.ml
$ cinaps file.ml
$ echo $?
0

Real example

What follows is a real example where using Cinaps made the code much easier to write and maintain. However, I changed the names for this blog post since this code is not released publicly. Note also that this example shows one way we usually write C bindings at Jane Street. It is not meant as a model of how to write C bindings, and the excellent ctypes library should be the default choice in most cases. However, this code pre-dates ctypes and migrating it would be quite a lot of work.

The example itself is part of a C binding that I wrote a few years ago. While doing so I used Core.Flags in order to represent a few C enumerations on the OCaml side. Core.Flags is a module providing a nice abstraction for representing C flags.

The OCaml code looks like what you'd expect from code using Core.Flags:

  1. module Open_flags = struct
  2. external get_rdonly : unit -> Int63.t = "mylib_O_RDONLY" [@@noalloc]
  3. external get_wronly : unit -> Int63.t = "mylib_O_WRONLY" [@@noalloc]
  4. external get_rdwr : unit -> Int63.t = "mylib_O_RDWR" [@@noalloc]
  5. external get_nonblock : unit -> Int63.t = "mylib_O_NONBLOCK" [@@noalloc]
  6. external get_append : unit -> Int63.t = "mylib_O_APPEND" [@@noalloc]
  7. external get_creat : unit -> Int63.t = "mylib_O_CREAT" [@@noalloc]
  8. external get_trunc : unit -> Int63.t = "mylib_O_TRUNC" [@@noalloc]
  9. external get_excl : unit -> Int63.t = "mylib_O_EXCL" [@@noalloc]
  10. external get_noctty : unit -> Int63.t = "mylib_O_NOCTTY" [@@noalloc]
  11. external get_dsync : unit -> Int63.t = "mylib_O_DSYNC" [@@noalloc]
  12. external get_sync : unit -> Int63.t = "mylib_O_SYNC" [@@noalloc]
  13. external get_rsync : unit -> Int63.t = "mylib_O_RSYNC" [@@noalloc]
  14.  
  15. let rdonly = get_rdonly ()
  16. let wronly = get_wronly ()
  17. let rdwr = get_rdwr ()
  18. let nonblock = get_nonblock ()
  19. let append = get_append ()
  20. let creat = get_creat ()
  21. let trunc = get_trunc ()
  22. let excl = get_excl ()
  23. let noctty = get_noctty ()
  24. let dsync = get_dsync ()
  25. let sync = get_sync ()
  26. let rsync = get_rsync ()
  27.  
  28. include Flags.Make(struct
  29. let known =
  30. [ rdonly , "rdonly"
  31. ; wronly , "wronly"
  32. ; rdwr , "rdwr"
  33. ; nonblock , "nonblock"
  34. ; append , "append"
  35. ; creat , "creat"
  36. ; trunc , "trunc"
  37. ; excl , "excl"
  38. ; noctty , "noctty"
  39. ; dsync , "dsync"
  40. ; sync , "sync"
  41. ; rsync , "rsync"
  42. ]
  43. let remove_zero_flags = false
  44. let allow_intersecting = false
  45. let should_print_error = true
  46. end)
  47. end

And there are about 3 modules like this in this file, plus the corresponding stubs in the C file. Writing this code initially was no fun, and adding new flags now that the C library has evolved is still no fun.

The rest of this section explains how to make it more fun with cinaps.

Setting up and using cinaps

First I add a rule in the build system to call cinaps appropriately. I use a few settings specific to our jenga based builds and it is currently not possible to replicate this outside of Jane Street, but assuming you have a Makefile, you can write:

.PHONY: cinaps
cinaps:
    cinaps -i src/*.ml src/*.c

Now whenever you call make cinaps, all the files will be updated in place. You can then do git diff to see what changed.

Then I write a file src/cinaps_helpers. It is plain OCaml source file, however it is not suffixed with .ml so that it is not confused with a regular module of the library. It contains the various bits that are common between the ml/C files in the library:

  1. (* -*- tuareg -*- *)
  2.  
  3. let stub_prefix = "mylib_"
  4. let stub name = stub_prefix ^ name
  5.  
  6. let open_flags =
  7. [ "O_RDONLY"
  8. ; "O_WRONLY"
  9. ; "O_RDWR"
  10. ; "O_NONBLOCK"
  11. ; "O_APPEND"
  12. ; "O_CREAT"
  13. ; "O_TRUNC"
  14. ; "O_EXCL"
  15. ; "O_NOCTTY"
  16. ; "O_DSYNC"
  17. ; "O_SYNC"
  18. ; "O_RSYNC"
  19. ]
  20.  
  21. let other_flags =
  22. [ ...
  23. ]
  24.  
  25.  
  26. let yet_other_flags =
  27. [ ...
  28. ]
  29.  
  30. let all_flags = open_flags @ other_flags @ yet_other_flags
  31.  
  32. open StdLabels
  33. open Printf
  34. let pr fmt = printf (fmt ^^ "\n")
  35.  
  36. let flags_module module_name flags ~prefix ~allow_intersection =
  37. <code to print an Open_flags like module>

Now, in my original .ml file, I can write:

  1. (*$ #use "cinaps_helpers" $*)
  2.  
  3. (*$ flags_module "Open_flags" open_flags ~prefix:"O_" ~allow_intersecting:false *)
  4. module Open_flags = struct
  5. external get_rdonly : unit -> Int63.t = "mylib_O_RDONLY" [@@noalloc]
  6. external get_wronly : unit -> Int63.t = "mylib_O_WRONLY" [@@noalloc]
  7. external get_rdwr : unit -> Int63.t = "mylib_O_RDWR" [@@noalloc]
  8. external get_nonblock : unit -> Int63.t = "mylib_O_NONBLOCK" [@@noalloc]
  9. external get_append : unit -> Int63.t = "mylib_O_APPEND" [@@noalloc]
  10. external get_creat : unit -> Int63.t = "mylib_O_CREAT" [@@noalloc]
  11. external get_trunc : unit -> Int63.t = "mylib_O_TRUNC" [@@noalloc]
  12. external get_excl : unit -> Int63.t = "mylib_O_EXCL" [@@noalloc]
  13. external get_noctty : unit -> Int63.t = "mylib_O_NOCTTY" [@@noalloc]
  14. external get_dsync : unit -> Int63.t = "mylib_O_DSYNC" [@@noalloc]
  15. external get_sync : unit -> Int63.t = "mylib_O_SYNC" [@@noalloc]
  16. external get_rsync : unit -> Int63.t = "mylib_O_RSYNC" [@@noalloc]
  17.  
  18. let rdonly = get_rdonly ()
  19. let wronly = get_wronly ()
  20. let rdwr = get_rdwr ()
  21. let nonblock = get_nonblock ()
  22. let append = get_append ()
  23. let creat = get_creat ()
  24. let trunc = get_trunc ()
  25. let excl = get_excl ()
  26. let noctty = get_noctty ()
  27. let dsync = get_dsync ()
  28. let sync = get_sync ()
  29. let rsync = get_rsync ()
  30.  
  31. include Flags.Make(struct
  32. let known =
  33. [ rdonly , "rdonly"
  34. ; wronly , "wronly"
  35. ; rdwr , "rdwr"
  36. ; nonblock , "nonblock"
  37. ; append , "append"
  38. ; creat , "creat"
  39. ; trunc , "trunc"
  40. ; excl , "excl"
  41. ; noctty , "noctty"
  42. ; dsync , "dsync"
  43. ; sync , "sync"
  44. ; rsync , "rsync"
  45. ]
  46. let remove_zero_flags = false
  47. let allow_intersecting = false
  48. let should_print_error = true
  49. end)
  50. end
  51. (*$*)

And cinaps will check that the text between the (*$ ... *) and (*$*) forms is what is printed by flags_module "Open_flags" .... I write something similar in the .c file. Note the initial (*$ ... $*) form, which is not expected to print anything and is only used for its other side effects.

Adding new flags become trivial: add it to the list in src/cinaps_helper and execute make cinaps.

Pushing the system

Now I decide that I don't like the fact that all my constant flags are initialized at runtime and I want them to be static constant on the ml side. A simple way to do this is to write a C program that include the right headers and output a .ml file defining these constants. I use cynaps to write this C file as well:

  1. /*$ #use "cinaps_helpers" $*/
  2.  
  3. #include <stdio.h>
  4.  
  5. #include <sys/types.h>
  6. #include <sys/stat.h>
  7. #include <fcntl.h>
  8.  
  9. int main()
  10. {
  11. printf("open Core\n");
  12. printf("let mk = Int63.of_int_exn\n");
  13. /*$
  14.   printf "\n";
  15.   let len = longest all_flags in
  16.   List.iter all_flags ~f:(fun f ->
  17.   pr {| printf("let _%-*s = mk %%d\n", %-*s);|} len f len f );
  18.   printf " " */
  19. printf("let _O_RDONLY = mk %d\n", O_RDONLY );
  20. printf("let _O_WRONLY = mk %d\n", O_WRONLY );
  21. printf("let _O_RDWR = mk %d\n", O_RDWR );
  22. printf("let _O_NONBLOCK = mk %d\n", O_NONBLOCK);
  23. printf("let _O_APPEND = mk %d\n", O_APPEND );
  24. printf("let _O_CREAT = mk %d\n", O_CREAT );
  25. printf("let _O_TRUNC = mk %d\n", O_TRUNC );
  26. printf("let _O_EXCL = mk %d\n", O_EXCL );
  27. printf("let _O_NOCTTY = mk %d\n", O_NOCTTY );
  28. printf("let _O_DSYNC = mk %d\n", O_DSYNC );
  29. printf("let _O_SYNC = mk %d\n", O_SYNC );
  30. printf("let _O_RSYNC = mk %d\n", O_RSYNC );
  31. /*$*/
  32. return 0;
  33. }

Updating the various flag modules in the the ml code is as simple as editing src/cinaps_helpers and doing make cinaps:

  1. (*$ flags_module "Open_flags" open_flags ~prefix:"O_" ~allow_intersecting:false *)
  2. module Open_flags = struct
  3. let rdonly = Consts._O_RDONLY
  4. let wronly = Consts._O_WRONLY
  5. let rdwr = Consts._O_RDWR
  6. let nonblock = Consts._O_NONBLOCK
  7. let append = Consts._O_APPEND
  8. let creat = Consts._O_CREAT
  9. let trunc = Consts._O_TRUNC
  10. let excl = Consts._O_EXCL
  11. let noctty = Consts._O_NOCTTY
  12. let dsync = Consts._O_DSYNC
  13. let sync = Consts._O_SYNC
  14. let rsync = Consts._O_RSYNC
  15.  
  16. include Flags.Make(struct
  17. let known =
  18. [ Consts._O_RDONLY , "rdonly"
  19. ; Consts._O_WRONLY , "wronly"
  20. ; Consts._O_RDWR , "rdwr"
  21. ; Consts._O_NONBLOCK , "nonblock"
  22. ; Consts._O_APPEND , "append"
  23. ; Consts._O_CREAT , "creat"
  24. ; Consts._O_TRUNC , "trunc"
  25. ; Consts._O_EXCL , "excl"
  26. ; Consts._O_NOCTTY , "noctty"
  27. ; Consts._O_DSYNC , "dsync"
  28. ; Consts._O_SYNC , "sync"
  29. ; Consts._O_RSYNC , "rsync"
  30. ]
  31. let remove_zero_flags = false
  32. let allow_intersecting = false
  33. let should_print_error = true
  34. end)
  35. end
  36. (*$*)

Tweak: indenting the generated code

You can either write cinaps code that produce properly indented code, or you can use the styler option:

.PHONY: cinaps
cinaps:
    cinaps -styler ocp-indent -i src/*.ml src/*.c

History behind the name

I initially wrote this tool while I did some work on the ocaml-migrate-parsetree project. ocaml-migrate-parsetree was started by Alain Frisch and continued by Frederic Bour and aims at providing a solid and stable base for authors of ppx rewriters or other tools using the OCaml frontend. I helped a bit during development and did some testing on a large scale while rebasing our ppx infrastructure on top it.

Due to its nature, this project contains a lot of repetitive code that cannot be factorized other than by using some kind of meta-programming. Initially we had a small pre-preprocessor that was interpreting a made-up syntax and was working like cpp does. The syntax was yet another DSL and the generated code was generated on the fly. This made the .ml and .mli files harder to understand since you had to decode this DSL in order to understand what the code was.

Cinaps replaced this tool and the name was chosen to emphasize that it is not a preprocessor. It means "Cinaps Is Not A Preprocessing System".

Status

Cinaps is published on github and is part of the upcoming v0.9 Jane Street release. The version that is published doesn't yet support the C/S-expression syntaxes but once the stable release has gone through, an updated version of Cinaps supporting these syntaxes will be released.

A solution to the ppx versioning problem

Ppx is a preprocessing system for OCaml where one maps over the OCaml abstract syntax tree (AST) to interpret some special syntax fragments to generate code.

Ppx rewriters get to work on the same AST definition as the compiler, which has many advantages:

  • The AST corresponds (almost) exactly to the OCaml language. This is not completely true as the AST can represent programs that you can't write, but it's quite close.

  • Given that the compiler and pre-processor agree on the data-type, they can communicate between each other using the unsafe [Marshal] module, which is a relatively cheap and fast way of serializing and deserializing OCaml values.

  • Finally, the biggest advantage for the user is that the locations in the original code are exactly preserved, which is a requirement to get usable error messages. This is not so great for the generated code, as the best one can do is reuse some locations from the original source code and hope for the best. In practice the user sometimes gets non-sensical errors, but this is a commonly accepted trade-off.

There is however one drawback to all this, the compiler AST is not stable and code using it is at the mercy of its evolution. We got lucky with the 4.04 release of OCaml but the 4.03 one was quite disruptive. Even before releases, whenever the AST definition changes during a development cycle, many ppx might not be usable for a while, which make testing a lot harder.

Several ideas have been flying around, such as adding a layer to convert between different versions of the AST. While this would work, it has the drawback that you need this layer for every variant of the AST. And when you want to make a patch modifying the AST, you'll need to do the extra work of updating this layer first.

In this blog post we show how we managed to solve the ppx compatiblity problem in a way that improves the user experience and lets us produce releases that don't depend on ppx rewriters at all.

We did this work while working on Base, our upcoming standard library. In the end, it's likely we'll use only the third of the methods described below for Base, while the others will be used to improve the user experience with the rest of our packages.

What do other code generators do?

Ppx is not the first system for generating code that is a mix of user-written code and machine generated code. A typical class of generators that get it right, i.e. that preserve locations and are independant from the AST definition, are lexer/parser generators, and not only the ones distributed with the compiler.

Let's take the example of lexer generators (parser generators work basically the same way). The users write a series of rules, consisting of a rular expression and an action to take if the input matches it:

rule token = parse
| "+"  { PLUS  }
| "-"  { MINUS }
| '0'..'9'+ as s { INT (int_of_string s) }
| " "* { token lexbuf (* skip blanks *) }

This code is written in a .mll file and the generator then produces a .ml file with code for the lexing engine interleaved with the user written actions.

In order to keep the locations of the user-written code pointing to the right place in the .mll file, the generator produces:

# 42 "lexer.mll"
         token lexbuf (* skip blanks *)

The OCaml compiler interprets the line starting with a # and updates its current location to point to the begining of line 42 in file lexer.mll. This is called a line directive.

To go back into the generated code, the lexer generator produces:

# 100 "lexer.ml"

Where 100 correspond to the real line number in lexer.ml.

With this method, when there is an error in the user-written code, it points to the lexer.mll file, while when there is an error in the generated code it points to the lexer.ml file. Even if the generated code might not be particularly easy to understand, at least you get to see the real code the compiler chokes on.

Another big advantage is that when using a debugger, you can follow the execution through the generated code.

Can we do the same for ppx?

At first glance, it seems that ppx rewriters work in a very different way, but the result is the same: only parts of the file is generated and the rest is taken as if from what the user wrote. In fact, compared to the lexer case, most of the resulting code is user written.

There is however some work to do to get the same result as with lexer generators. First you have to distinguish the generated code from the user code.

If you take a ppx rewriter as a black box, then the only way is to apply some kind of tree diff between the input and the output. In our ppx framework however, we know exactly what fragments of the AST are rewritten by plugins and we know the rewriting is always local. This makes the job a lot simpler and probably faster as well, so we chose to take advantage of this information.

The method

It works this way: while mapping the AST, we collect all the fragments of generated code with the location of the code they replace in the original file. At the end we sort them in the order of the file and make sure there is no overlap. Every fragment is pretty printer to a string.

What we end up with is a list of text substitutions: beginning position, end position, replacemen text. The next step is to simply apply these substitutions to the original file. If you read the bog post about how we switched from camlp4 to ppx, you'll notice the resemblance here.

This is what the transformation looks like:

(* ----- input ----- *)
type t = int [@@deriving sexp_of]

let f x = x + 1

let g x = [%sexp_of: t] x

(* ----- output ----- *)
# 1 "foo.ml"
type t = int [@@deriving sexp_of]
# 4 "foo.ml.pp"
let sexp_of_t = sexp_of_int
#2 "foo.ml"

let f x = x + 1

let g x =
# 11 "foo.ml.pp"
sexp_of_t
# 5 "foo.ml"
                        x

The result for [@@deriving sexp_of] is not bad at all. For code rewritten inside expressions, the result is not as good given that it breaks those expressions up. But given than extensions are often sparse in our source files, this is still acceptable.

This mode can be selected with ppx_driver based rewriter by passing the flag -reconcile.

Solving the compatiblity problem

With this mode, one can first generate a .ml.pp file and feed that to the compiler. Given that the concrete syntax of the language breaks much less often than the internal AST definition, a working ppx is likely to work for a very long time.

We'll soon start releasing a separate package that snapshots one version of the lexer/parser/AST/printer of the OCaml compiler. This package will have its own release schedule and will typically be updated soon after each relase of OCaml. This will give time for ppx authors to upgrade their code when it breaks while still allowing people to try out the new compiler with their favorite packages.

Mode for release tarballs

In addition to the mode described above, ppx_driver has a second mode -reconcile-with-comments where the result is similar to the one with line directives expect than the generated code is enclosed with comment markers:

type t = int [@@deriving sexp_of]
(* GENERATED CODE BEGIN *)
let sexp_of_t = sexp_of_int
(* GENERATED CODE END *)

let f x = x + 1

let g x =
(* GENERATED CODE BEGIN *)
sexp_of_t
(* GENERATED CODE END *)
                        x

This mode is intended for release tarballs. One can replace all the files in-place by the pre-processed version using =-reconcile-with-comments=. The result is readable and has the big advantage that you you don't need to depend on the ppx rewriter, which means the package is faster to install for users.

Jane Street packages will eventually move to this scheme, either for the next stable release or the one after that. One technical issue with this method is that to take full advantage of it, the runtime support libraries of the various ppx rewriters must be installable without the rewriter itself. Splitting the packages in opam is fine but splitting the repository is not desirable as often both components make strong assumption about the other.

For Jane Street packages, we'll need to update our release system so that it supports generating two opam packages from one repository.

ppx as a verfication tool only

While these new methods improve the ppx story in general, for Base we wanted to go even further and allow users to build Base without the need for ppx at all, both for the release and for the development versions. Not only to cut down the dependencies, but also to provide a better experience in general. For instance if you are working on a patched compiler and need the development version of Base, you shouldn't need all the ppx rewriters that might not work for some reason.

We explored various bootstrap story, and while they worked they were not very nice, especially for such an important library. Its development and build processes should be straightforward.

We even looked into not using ppx at all. While this is OK for many ppx rewriters that are mostly syntactic sugars, it is more problematic for [@@deriving ...]. It's not so much that the code is hard to write by hand, most data-structures in Base are either simple datatypes or require and written combinators anyway, but it is a pain to review. This code is very mechanical and you have to make sure that the constant strings correspond to the constructor/field names and other things where the machine can do much better than a human.

In the end we found a solution to keep the best of both worlds, /i.e./ being able to build the original source code without pre-processing and avoid having to write and review this boilerplate code.

The idea is to use ppx in the same way that we write expect tests; the tool only checks that what's comes after the type-definition correspond to what the rewriters derive from it. In case of mismatch it produces a .corrected file just like expect tests.

We are currently experimenting with this method for Base. It's possible that we'll have some marker to delimit the end of the generated code. In the end the code could look like this:

type t = A | B [@@deriving sexp_of]

let sexp_of_t = function
  | A -> Sexp.Atom "A"
  | B -> Sexp.Atom "B"

[@@@end_of_derived_code]

Given that the compiler ignores attributes it doesn't understand, this code compiles just fine without any pre-processing.

When running the ppx rewriter in this expect mode, the generated AST is matched against the source AST without taking locations into account, so that mean that you can reformat the code as you wish and even add comments.

The challenge now is to update our ppx rewriters so that they produce code that we are happy to show. Until now we didn't focus too much on that, but we have a good idea about how to do it. The plan is to move more of the logic of the various deriving system into proper functions instead of generating more code. Note that this is an improvement in general as proper functions are a lot easier to understand and maintain than code generators.

Conclusion

In this blog post we described a simple and clean method to decouple ppx rewriters from the release schedule of the compiler. This method has the advantage that once the ppx is written is likely to work for a long time and especially to work out of the box with development compilers.

Moreover, this method has is better for users as errors point to the real code the compiler sees and when debugging they can follow the execution through generated code without trouble.

All this is currently implemented in ppx_core/ppx_driver. Our github repositories haven't been updated in a while has the Base refactoring disrupted our public release process quite a bit. These new features should be published in the coming weeks and we'll be part of the next stable release of our packages, planned for the beginning of December.

ppx_core: context-free rewriters for better semantics and faster compilation

At Jane Street, we have always been heavy users of pre-processors, first with camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, but it save us a lot of time by taking care of a lot of tedious boilerplate code and in some case makes the code a bit prettier.

All in all, our standard set has 19 rewriters:

  • ppx_assert
  • ppx_bench
  • ppx_bin_prot
  • ppx_compare
  • ppx_custom_printf
  • ppx_enumerate
  • ppx_expect
  • ppx_fail
  • ppx_fields_conv
  • ppx_here
  • ppx_inline_test
  • ppx_js_style*
  • ppx_let
  • ppx_pipebang
  • ppx_sexp_conv
  • ppx_sexp_message
  • ppx_sexp_value
  • ppx_typerep_conv
  • ppx_variants_conv

These rewriters fall into 3 big categories:

  1. type driven code generators: ppx_sexp_conv, ppx_bin_prot, ...
  2. inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench
  3. convenience: ppx_sexp_value, ppx_custom_printf, ...

The first category is the one that definitely justify the use of pre-processors, until we get something better in the language itself.

With such a high number of code transformations, there is an important question of how they compose with each other. For instance what happens if the output of a ppx generates some code that is rewritten by another ppx?

Since the switch from camlp4 to ppx a year ago really, category 1 transformers were handled all at once as a whole-AST mapping pass by ppx_type_conv while all the other one were implemented as separate passes. With the previous list that means 13 passes, given that 7 of them are ppx_type_conv plugins. This means that the output depended on the order in which the various passes were applied.

Intuitively, one would think it's not a big deal, given that it is quite rare for a ppx to produce code that would be rewritten by another ppx. Still we ran into several issues over time:

  • Some ppx rewriters - such as ppx_inline_test that rewrites [%%test ...] extensions - captures a pretty-print of their payload, for debugging purposes. Depending on when ppx_inline_test is applied, the payload won't be the same, as it might have been expanded by other ppx rewriters, which is confusing for users.
  • A few ppx rewriters interpret the payload of a specific extension point as a DSL to be interpreted. This is the case of ppx_sexp_value and ppx_sexp_message. If another ppx messed with the payload before them, the result will be unspecified. We had such an issue with ppx_here: inside [%sexp ...], [%here] is interpreted by ppx_sexp_value and ppx_sexp_message and produces "<filename>:<line>:<column>", while outside it is interpreted by ppx_here and produces a record of type Lexing.position

Initially we dealt with these issues by using a specific order in the default set of rewriters, but that's more of a dirty hack than a real solution. Often developers are not aware of this and might end up using a wrong order when using a custom set of rewriters. Moreover this worked because we have control over the order with Jenga, but in opensource packages using oasis, ocamlbuild and ocamlfind we have no control over the final ordering.

But apart from the semantic problems, there is an obvious performance problem: all the transformations are local, but still we are doing 12 passes over the entire AST. What a waste of CPU time!

The different ways of composing ppx rewriters

Before jumping into the subject of this post, we recall a few of the various methods one can use to compose ppx rewriters.

Via separate process

The default method, that was adopted early by the community is to define each transformation as a separate executable. To compose them, one just has to call all the executables one by one. The main advantage of this approach is that each transformation is a black box and can do whatever dirty hacks it wants.

This is what you get when you are using a ppx by just putting the package name in your build system without doing anything special.

Via a driver

Another approach, that we developed at Jane Street is to link all the transformations into a single executable. For this to work properly all transformations must use the same framework. Technically they all register themselves with ppx_driver via a call to Ppx_driver.register_transformation. Ppx_driver is then responsible for composing them.

There are several advantages of the second approach: since ppx_driver has knowledge of all transformations, it can do extended checks such as making sure that all attributes have been interpreted. This helps detect typos, which in practice saves a lot of debugging time. But what really interest us in this post is that it can use more clever composition methods.

Code transformations using ppx_driver can still export a single executable compatible with the first method, that's why all Jane Street ppx rewriters can be used with both methods.

ppx_driver has an ocamlbuild plugin to simplify building custom drivers.

Context free transformations

Given that all transformations are local, it was clear that they should be defined as such; i.e. if all you want to do is turn [%sexp "blah"] into Sexp.Atom "blah", you don't need to visit the whole AST yourself. You just need to instruct whatever framework you are using that you want to rewrite [%sexp ...] extension points.

Context-free extension expander

We started with this idea a few month ago by adding an API in ppx_core to declare context-free extension expanders. For instance, this shows how you would declare a ppx that interpret an extension [%foo ...] inside expressions:

  1. open Ppx_core.Std
  2.  
  3. let ext =
  4. Extension.declare "foo" Expression
  5. Ast_pattern.(...)
  6. (fun ~path ~loc <parsed-payload...> -> <expansion>)
  7.  
  8. let () = Ppx_driver.register "foo" ~extensions:[ext]

The Ast_pattern.(...) bit describes what the extension expects as its payload.

Since ppx_driver knows about all the local extension expanders, it can expand them all in one pass over the AST. Moreover it can detect ambiguities and error out in such cases.

There was a choice to make as to whether rewrite the AST in a bottom-up or top-down manner. We choose top-down, to allow extension expanders to interpret their payload before anyone else, and so they can correctly implement a DSL.

This solved most of the initial issues and reduced the number of passes to 7:

  • all extension expanders
  • ppx_type_conv
  • ppx_custom_printf
  • ppx_expect
  • ppx_fail
  • ppx_pipebang
  • ppx_js_style

ppx_expect wasn't initially defined as a context-free extension expander for technical reasons.

Making everything context-free

Recently we went even further and added a Context_free module to Ppx_core to cover all of our transformations. It doesn't support all possible rewriting but support enough to implement a lot of common ones:

  • context-free extension expanders
  • some specific support to implement type-driven code generators
  • support for ppx rewriters interpreting a function application at pre-processing time, such as ppx_custom_printf that interprets !"<format>"

With this we reduced the number of passes to only 2:

  • context free transformations
  • ppx_js_style

ppx_js_style is still done in a separate pass for simplicity. It is run last to ensure we don't generate code that doesn't match our coding rules.

Now, whatever order developers specify their ppx in their build system, they will get the exact same output.

Seeing the exact passes

Ppx_driver got a new option to print what passes it will execute, for instance with ppx-jane which a standard driver containing all of the Jane Street ppx rewriters linked in (available in the ppx_jane package in opam):

$ ppx-jane -print-passes
<builtin:freshen-and-collect-attributes>
<bultin:context-free>
<builtin:check-unused-attributes>
<builtin:check-unused-extensions>

$ ppx-jane -print-passes -no-check
<bultin:context-free>

Safety checks are implemented as additional passes, that's why we see more than one passes by default.

Numbers

No performance comparison was done when introducing context free extension expanders, but we did some for the second stage, when we changed all ppx rewriters to use Context_free; processing a file with the resulting driver was twice as fast (check passes included).

But how does this compare to the more traditional method of running each rewriter in a separate process? To find out we did some benchmark by taking one of the biggest ml file in core_kernel (src/command.ml) and comparing the two methods. We put a type error on the first line to be sure we stop just after pre-processing.

For reference, following are the numbers for calling ocamlfind ocamlc on the file with no pre-processing:

$ time ocamlfind ocamlc -c command.ml
File "command.ml", line 1, characters 12-15:
Error: This expression has type char but an expression was expected of type
         int

real 0m0.022s
user 0m0.016s
sys  0m0.006s

To preprocess the file with ppx_jane as a single driver executable, one just has to pass one -ppx option, or a -pp option given that ppx_driver can be used either as a -ppx either as a -pp:

# via -ppx
$ time ocamlfind ocamlc \
  -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.095s
user 0m0.074s
sys  0m0.020s

# via -pp
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.091s
user 0m0.066s
sys  0m0.024s

# via -pp, with checks disabled
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.070s
user 0m0.051s
sys  0m0.018s

# via -pp, without merging passes
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.229s
user 0m0.206s
sys  0m0.022s

Using the other method turned out to be quite painful, given that the various ppx cannot share command line arguments, they had to be specified more than once:

$ time ocamlfind ocamlc -package ppx_jane \
  -ppxopt "ppx_inline_test,-inline-test-lib blah -inline-test-drop" \
  -ppxopt "ppx_bench,-inline-test-lib blah -bench-drop" \
  -ppxopt "ppx_expect,-inline-test-lib blah" \
  -c command.ml 2> /dev/null

real 0m0.339s
user 0m0.233s
sys  0m0.098s

So without surprise the single pass in a single executable method is really a lot faster.

Availability

This code is available on github. The context-free extension point API is already available in opam. The newer one is only in the git repository for ppx_core and ppx_driver. You can try them out by using our development opam repository. You should have a look at this if you care about how your rewriters are composed and/or if you care about compilation speed.

  • ppx_js_style is not currently released; it is an internal ppx that we use to enforce our coding standards.

Converting a code base from camlp4 to ppx

As with many projects in the OCaml world, at Jane Street we have been working on migrating from camlp4 to ppx. After having developed equivalent ppx rewriters for our camlp4 syntax extensions, the last step is to actually translate the code source of all our libraries and applications from the camlp4 syntax to the standard OCaml syntax with extension points and attributes.

For instance to translate code using pa_ounit and pa_test, we have to rewrite:

TEST = <:test_result< int >> ~expect:42 (f x)

to:

let%test _ = [%test_result: int] ~expect:42 (f x)

For small to medium projects it is enough to just take a couple hours to translate the source code by hand. But at Jane Street where we have a huge OCaml code base making extensive use of camlp4, it is simply not realistic. So we needed a tool to do that for us.

Writing a tool to automatically convert the syntax

Since the output of such as tool has to be accepted as the new code source that is committed in our repository, it must preserve the layout of the original file as much as possible and of course keep the comments. This mean that any approach using an AST pretty-printer would be extremely complex.

The path we choosed is to textually substitute the foreign syntaxes in the original file for the new ones. One could imagine doing that with a tool such as sed, awk, perl, ... however doing it properly would be fastidious and it would be pretty-much impossible to be 100% sure it would never translate things it is not supposed to. Plus writing perl is not as fun as writing OCaml programs.

Instead there is an easy way to find the foreign syntaxes: using camlp4 itself. To subsitute the text of foreign syntaxes the only thing we need to know is their location in the original file, and camlp4 can help us with that.

Writing dummy camlp4 syntax extensions

The idea is to write for each camlp4 syntax extension a dummy one that define the same grammar productions as the real one, but instead of generating code it simply record substitutions at certain locations.

Then we do the following:

  • parse a file with camlp4 and our dummy syntax extensions
  • apply all the recorded substitutions to the original file

This approach has the advantage of interpreting the original file in the exact same way as our regular syntax extensions. Giving us good confidence we did not change the syntactic constructions by mistake.

To do so we define this API:

(** [replace loc repl] records a text substitution that replaces the
    portion of text pointed by [loc] by [repl]. *)
val replace : Loc.t -> string -> unit

Then writing a dummy camlp4 syntax extension is pretty easy. For instance for a subset of pa_ounit:

EXTEND Gram
  GLOBAL: str_item;

  test: [[ "TEST" -> replace _loc "let%test" ]];

  name_equal:
    [[ `STRING _; "=" -> ()
     |            "=" -> replace _loc "_ ="
     ]];

  str_item:
    [[ test; name_equal; expr -> <:str_item< >>
    ]];
END

On the fly conversion and diffing the generated code

Since this tool was convenient to use, we used it to check that our newly written ppx rewriters did the same thing as the old camlp4 syntax extensions:

  • for a given OCaml source file of a library or application, we converted it using camlp4-to-ppx and saved the result
  • we processed the original file using camlp4 and the translated one using our ppx rewriters
  • in both case we saved the output of -dparsetree (human-readable version of the internal OCaml AST) and -dsource (pretty-printed code)
  • we diffed the camlp4 and ppx outputs of -dparsetree, as well as the outputs of -dsource

This was all quite easy to do with jenga. We kept looking at the generated diffs until they were all empty.

We have quite a lot of code in our camlp4 syntax extensions and converting them to ppx was a long mechanical job and so quite error-prone. Given that this diffing turned out to be really helpful to find errors.

The Camlp4 Syntax is not quite the OCaml syntax

While using this we noticed that quite a few syntaxes accepted by camlp4 are not accepted by OCaml, for instance:

let _ x = x

let f l = List.map l ~f:fun x -> x + 1

These where quite easy to fix automatically as well using camlp4-to-ppx.

Github repo and extension

We published a slightly modified version of this tool on github.

The method we used doesn't work out of the box with all syntax extensions. For instance to convert code using lwt.syntax some more work needs to be done on camlp4-to-ppx. But it is a good starting point.

4