No (functional) experience required

Jane Street is a serious functional programming shop. We use OCaml, a statically typed functional language for almost everything and have what is probably the largest OCaml codebase anywhere.

This leads lots of people to think that they shouldn’t even bother applying, under the assumption that we are only interested in hiring deep functional programming gurus. I think people get to this conclusion in part because they think of functional languages, especially those with fancy type systems, as arcane tools that can only be used effectively after deep study.

To the contrary, one of the reasons we started building production systems with OCaml was that it was relatively easy to understand, even for people with no formal CS background. Since then, we’ve had good experiences taking students with no functional experience at all and getting them to the point of being able to complete a project in just a few weeks. We also have a very successful “OCaml Bootcamp” program, where over four weeks, we train all of the incoming traders and many other non-developer hires on OCaml and our development tools and libraries. By the end, most of them are able to create useful applications.

» Continue Reading.

Introducing Incremental

I’m pleased to announce the release of Incremental (well commented mli here), a powerful library for building self-adjusting computations, i.e., computations that can be updated efficiently when their inputs change.

At its simplest, you can think of a self-adjusting computation as a fancy spreadsheet. In a spreadsheet, each cell contains either simple data, or an equation that describes how the value in this cell should be derived from values in other cells. Collectively, this amounts to a graph-structured computation, and one of the critical optimizations in Excel is that when some of the cells change, Excel only recomputes the parts of the graph that depend on those changed cells.

» Continue Reading.

Converting a code base from camlp4 to ppx

As with many projects in the OCaml world, at Jane Street we have been working on migrating from camlp4 to ppx. After having developed equivalent ppx rewriters for our camlp4 syntax extensions, the last step is to actually translate the code source of all our libraries and applications from the camlp4 syntax to the standard OCaml syntax with extension points and attributes.

For instance to translate code using pa_ounit and pa_test, we have to rewrite:

TEST = <:test_result< int >> ~expect:42 (f x)

to:

let%test _ = [%test_result: int] ~expect:42 (f x)

For small to medium projects it is enough to just take a couple hours to translate the source code by hand. But at Jane Street where we have a huge OCaml code base making extensive use of camlp4, it is simply not realistic. So we needed a tool to do that for us.

» Continue Reading.

CPU Registers and OCaml

Even though registers are a low-level CPU concept, having some knowledge about them can help write faster code. Simply put, a CPU register is a storage for a single variable. CPU can keep data in memory or cache or in registers and registers are often much faster. Furthermore, some operations are possible only when the data is in registers. Hence, the OCaml compiler tries to keep as many variables as it can in the registers.

Code with more than 13 variables is slow?

Consider this primitive statistics computation:

let stats xs ys =
  let len = Array.length xs in
  if len Array.length ys then
    raise not_of_same_length;
  let sx = ref 0 in
  let sy = ref 0 in
  let sxx = ref 0 in
  let syy = ref 0 in
  let sxy = ref 0 in
  let sax = ref 0 in
  let say = ref 0 in
  for i = 0 to len - 1 do
    let x = Array.unsafe_get xs i in
    let y = Array.unsafe_get ys i in
    let ax = abs x in
    let ay = abs y in
    let xx = x * x in
    let yy = y * y in
    let xy = x * y in
    sx := !sx + x;
    sy := !sy + y;
    sxx := !sxx + xx;
    syy := !syy + yy;
    sxy := !sxy + xy;
    sax := !sax + ax;
    say := !say + ay;
  done;
  !sx, !sy, !sax, !say, !sxx, !syy, !sxy

Rearranging just a few lines produces code 1.5-2x faster:

» Continue Reading.

Reverse web proxy in ~50 lines of BASH

In the spirit of reinventing the wheel for fun, I hacked this together as a quick challenge to myself last week. It’s a little rough around the edges, but I thought it was too cute not to share. If you have any bug fixes, please post them in the comments.

#!/bin/bash

set -e -u

function usage() {
  echo "USAGE: ${0} parent {port} {lower_bound_backend_port} {upper_bound_backend_port}"
}

[[ $# < 1 ]] && usage && exit 1

mode=${1};shift
case ${mode} in
  parent)
    PORT=${1};shift
    LOWER=${1};shift
    UPPER=${1};shift
    socat TCP-LISTEN:${PORT},fork,reuseaddr "EXEC:${0} child ${LOWER} ${UPPER}"
    ;;
  child)
    LOWER=${1};shift
    UPPER=${1};shift
    COUNT=0
    PORT=$(shuf -i ${LOWER}-${UPPER} -n 1)
    let "RANGE = UPPER - LOWER"
    SUCCESS=false
    while ((COUNT <= RANGE)) || ${SUCCESS}; do 
      set +e
      if socat STDIN TCP:127.0.0.1:${PORT},connect-timeout=2; then
        SUCCESS=true
        break
      else
        echo "unable to connect to port ${PORT}" >&2
      fi
      set -e
      let COUNT+=1
      let PORT+=1
      if ((PORT > UPPER )); then
        let 'PORT = LOWER'
      fi
    done
    if ! ${SUCCESS}; then
      echo "HTTP/1.1 500 Internal Server Error"
      echo
      echo "All REDACTED servers are down.  Please report to REDACTED@janestreet.com."
      exit 1
    fi
    ;;
  *)
    usage
    exit 1
    ;;
esac

Building a lower-latency GC

We’ve been doing a bunch of work recently on improving the responsiveness of OCaml’s garbage collector. I thought it would be worth discussing these developments publicly to see if there was any useful feedback to be had on the ideas that we’re investigating.

The basic problem is a well-known one: GCs can introduce unpredictable pauses into your application, and depending on how your GC is configured, these pauses can be quite long. Unpredictable latencies are a problem in a wide variety of applications, from trading systems to web stacks.

One approach people often take is to avoid using the allocator altogether: pool all your objects, and never allocate anything else. You can even keep many of your pooled objects outside of the heap.

» Continue Reading.

Faster OCaml to C calls

The official OCaml documentation “Interfacing C with OCaml” doesn’t document some interesting performance features.

C functions with no OCaml allocation

A C call can allocate OCaml data and pass it back to OCaml, for example using caml_copy_string(s). Between the C call allocating OCaml data and passing it back, it has to make sure that OCaml’s Garbage Collector doesn’t collect it, as the Garbage Collector can be triggered during the C call. There’s an intricate mechanism which assures that, part of which are CAMLparam, CAMLlocal and CAMLreturn.

This mechanism can be bypassed if the C call is not going to allocate any OCaml data. This can yield performance benefits especially in shorter functions. To bypass it, CAMLparam, CAMLlocal and CAMLreturn should not be used and the primitive should be declared with “noalloc”.

» Continue Reading.

Why GADTs matter for performance

When GADTs (Generalized Algebraic Data Types) landed in OCaml, I wasn’t particularly happy about it. I assumed that it was the kind of nonsense you get when you let compiler writers design your programming language.

Which is to say that the standard GADT examples all seem to be about the kinds of things that compiler writers do, like embed domain-specific languages or build typed abstract-syntax trees. But it didn’t seem particularly relevant for the kind of systems programming that I think about.

But it became apparent pretty quickly that I was wrong. In particular, since GADTs have landed, at Jane Street we’ve found lots of examples where GADTs are important for performance, of all things. The theme in these examples is that GADTs enable you to tweak your memory representations in ways that would otherwise be painful or impossible to do safely in OCaml.

» Continue Reading.

A lighter Core

We recently released a version of our open source libraries with a much anticipated change — Async_kernel, the heart of the Async concurrent programming library, now depends only on Core_kernel rather than on Core.

This sounds like a dull, technical change, and it kind of is. But it’s also part of a larger project to make our libraries more lightweight and portable, and so suitable for a wider array of users and applications.

We’ve actually been working on these issues for a while now, and this seems like a good time to review some of the changes we’ve made over the years, and what’s still to come.

» Continue Reading.

Centralizing distributed version control, revisited

7 years ago, I wrote a blog post about how we at Jane Street were using our distributed version control system (hg, though the story would be the same for git) in a partially centralized way. Essentially, we built a centralized repo and a continuous integration system whose job was to merge in new changesets. The key responsibility of this system was to make sure that a change was rejected unless it merged, compiled and tested cleanly.

This half-distributed, half-centralized approach let us enjoy the benefits of a DVCS, while still getting the coherence and easy of sharing that comes from having a central authoritative source.

» Continue Reading.

4