At Jane Street, we have always been heavy users of pre-processors, first with camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, but it save us a lot of time by taking care of a lot of tedious boilerplate code and in some case makes the code a bit prettier.

All in all, our standard set has 19 rewriters:

  • ppx_assert
  • ppx_bench
  • ppx_bin_prot
  • ppx_compare
  • ppx_custom_printf
  • ppx_enumerate
  • ppx_expect
  • ppx_fail
  • ppx_fields_conv
  • ppx_here
  • ppx_inline_test
  • ppx_js_style*
  • ppx_let
  • ppx_pipebang
  • ppx_sexp_conv
  • ppx_sexp_message
  • ppx_sexp_value
  • ppx_typerep_conv
  • ppx_variants_conv

These rewriters fall into 3 big categories:

  1. type driven code generators: ppx_sexp_conv, ppx_bin_prot, ...
  2. inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench
  3. convenience: ppx_sexp_value, ppx_custom_printf, ...

The first category is the one that definitely justify the use of pre-processors, until we get something better in the language itself.

With such a high number of code transformations, there is an important question of how they compose with each other. For instance what happens if the output of a ppx generates some code that is rewritten by another ppx?

Since the switch from camlp4 to ppx a year ago really, category 1 transformers were handled all at once as a whole-AST mapping pass by ppx_type_conv while all the other one were implemented as separate passes. With the previous list that means 13 passes, given that 7 of them are ppx_type_conv plugins. This means that the output depended on the order in which the various passes were applied.

Intuitively, one would think it's not a big deal, given that it is quite rare for a ppx to produce code that would be rewritten by another ppx. Still we ran into several issues over time:

  • Some ppx rewriters - such as ppx_inline_test that rewrites [%%test ...] extensions - captures a pretty-print of their payload, for debugging purposes. Depending on when ppx_inline_test is applied, the payload won't be the same, as it might have been expanded by other ppx rewriters, which is confusing for users.
  • A few ppx rewriters interpret the payload of a specific extension point as a DSL to be interpreted. This is the case of ppx_sexp_value and ppx_sexp_message. If another ppx messed with the payload before them, the result will be unspecified. We had such an issue with ppx_here: inside [%sexp ...], [%here] is interpreted by ppx_sexp_value and ppx_sexp_message and produces "<filename>:<line>:<column>", while outside it is interpreted by ppx_here and produces a record of type Lexing.position

Initially we dealt with these issues by using a specific order in the default set of rewriters, but that's more of a dirty hack than a real solution. Often developers are not aware of this and might end up using a wrong order when using a custom set of rewriters. Moreover this worked because we have control over the order with Jenga, but in opensource packages using oasis, ocamlbuild and ocamlfind we have no control over the final ordering.

But apart from the semantic problems, there is an obvious performance problem: all the transformations are local, but still we are doing 12 passes over the entire AST. What a waste of CPU time!

The different ways of composing ppx rewriters

Before jumping into the subject of this post, we recall a few of the various methods one can use to compose ppx rewriters.

Via separate process

The default method, that was adopted early by the community is to define each transformation as a separate executable. To compose them, one just has to call all the executables one by one. The main advantage of this approach is that each transformation is a black box and can do whatever dirty hacks it wants.

This is what you get when you are using a ppx by just putting the package name in your build system without doing anything special.

Via a driver

Another approach, that we developed at Jane Street is to link all the transformations into a single executable. For this to work properly all transformations must use the same framework. Technically they all register themselves with ppx_driver via a call to Ppx_driver.register_transformation. Ppx_driver is then responsible for composing them.

There are several advantages of the second approach: since ppx_driver has knowledge of all transformations, it can do extended checks such as making sure that all attributes have been interpreted. This helps detect typos, which in practice saves a lot of debugging time. But what really interest us in this post is that it can use more clever composition methods.

Code transformations using ppx_driver can still export a single executable compatible with the first method, that's why all Jane Street ppx rewriters can be used with both methods.

ppx_driver has an ocamlbuild plugin to simplify building custom drivers.

Context free transformations

Given that all transformations are local, it was clear that they should be defined as such; i.e. if all you want to do is turn [%sexp "blah"] into Sexp.Atom "blah", you don't need to visit the whole AST yourself. You just need to instruct whatever framework you are using that you want to rewrite [%sexp ...] extension points.

Context-free extension expander

We started with this idea a few month ago by adding an API in ppx_core to declare context-free extension expanders. For instance, this shows how you would declare a ppx that interpret an extension [%foo ...] inside expressions:

  1. open Ppx_core.Std
  2.  
  3. let ext =
  4. Extension.declare "foo" Expression
  5. Ast_pattern.(...)
  6. (fun ~path ~loc <parsed-payload...> -> <expansion>)
  7.  
  8. let () = Ppx_driver.register "foo" ~extensions:[ext]

The Ast_pattern.(...) bit describes what the extension expects as its payload.

Since ppx_driver knows about all the local extension expanders, it can expand them all in one pass over the AST. Moreover it can detect ambiguities and error out in such cases.

There was a choice to make as to whether rewrite the AST in a bottom-up or top-down manner. We choose top-down, to allow extension expanders to interpret their payload before anyone else, and so they can correctly implement a DSL.

This solved most of the initial issues and reduced the number of passes to 7:

  • all extension expanders
  • ppx_type_conv
  • ppx_custom_printf
  • ppx_expect
  • ppx_fail
  • ppx_pipebang
  • ppx_js_style

ppx_expect wasn't initially defined as a context-free extension expander for technical reasons.

Making everything context-free

Recently we went even further and added a Context_free module to Ppx_core to cover all of our transformations. It doesn't support all possible rewriting but support enough to implement a lot of common ones:

  • context-free extension expanders
  • some specific support to implement type-driven code generators
  • support for ppx rewriters interpreting a function application at pre-processing time, such as ppx_custom_printf that interprets !"<format>"

With this we reduced the number of passes to only 2:

  • context free transformations
  • ppx_js_style

ppx_js_style is still done in a separate pass for simplicity. It is run last to ensure we don't generate code that doesn't match our coding rules.

Now, whatever order developers specify their ppx in their build system, they will get the exact same output.

Seeing the exact passes

Ppx_driver got a new option to print what passes it will execute, for instance with ppx-jane which a standard driver containing all of the Jane Street ppx rewriters linked in (available in the ppx_jane package in opam):

$ ppx-jane -print-passes
<builtin:freshen-and-collect-attributes>
<bultin:context-free>
<builtin:check-unused-attributes>
<builtin:check-unused-extensions>

$ ppx-jane -print-passes -no-check
<bultin:context-free>

Safety checks are implemented as additional passes, that's why we see more than one passes by default.

Numbers

No performance comparison was done when introducing context free extension expanders, but we did some for the second stage, when we changed all ppx rewriters to use Context_free; processing a file with the resulting driver was twice as fast (check passes included).

But how does this compare to the more traditional method of running each rewriter in a separate process? To find out we did some benchmark by taking one of the biggest ml file in core_kernel (src/command.ml) and comparing the two methods. We put a type error on the first line to be sure we stop just after pre-processing.

For reference, following are the numbers for calling ocamlfind ocamlc on the file with no pre-processing:

$ time ocamlfind ocamlc -c command.ml
File "command.ml", line 1, characters 12-15:
Error: This expression has type char but an expression was expected of type
         int

real 0m0.022s
user 0m0.016s
sys  0m0.006s

To preprocess the file with ppx_jane as a single driver executable, one just has to pass one -ppx option, or a -pp option given that ppx_driver can be used either as a -ppx either as a -pp:

# via -ppx
$ time ocamlfind ocamlc \
  -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.095s
user 0m0.074s
sys  0m0.020s

# via -pp
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.091s
user 0m0.066s
sys  0m0.024s

# via -pp, with checks disabled
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.070s
user 0m0.051s
sys  0m0.018s

# via -pp, without merging passes
$ time ocamlfind ocamlc \
  -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \
  -c command.ml 2> /dev/null 

real 0m0.229s
user 0m0.206s
sys  0m0.022s

Using the other method turned out to be quite painful, given that the various ppx cannot share command line arguments, they had to be specified more than once:

$ time ocamlfind ocamlc -package ppx_jane \
  -ppxopt "ppx_inline_test,-inline-test-lib blah -inline-test-drop" \
  -ppxopt "ppx_bench,-inline-test-lib blah -bench-drop" \
  -ppxopt "ppx_expect,-inline-test-lib blah" \
  -c command.ml 2> /dev/null

real 0m0.339s
user 0m0.233s
sys  0m0.098s

So without surprise the single pass in a single executable method is really a lot faster.

Availability

This code is available on github. The context-free extension point API is already available in opam. The newer one is only in the git repository for ppx_core and ppx_driver. You can try them out by using our development opam repository. You should have a look at this if you care about how your rewriters are composed and/or if you care about compilation speed.

  • ppx_js_style is not currently released; it is an internal ppx that we use to enforce our coding standards.