At Jane Street, we often write OCaml programs that communicate over the network with each other, and as such, we need to build lots of little protocols for those programs to use. Macro systems like sexplib and binprot make the generation of such protocols simpler. The basic workflow is to create a module that contains types corresponding to the messages in the protocol. Macros can then be used to generate the serialization and deserialization functions. Just share the protocol module between the different programs that need to communicate with each other, and –poof– you have a protocol. This is a highly convenient idiom, and it makes it much easier to quickly throw together a networked application. But things get more complicated when you need to start changing the protocol. In some cases, you can upgrade the entire system in one fell swoop. In that case, you can just modify your protocol, install the new system, and you’re off to the races.

The complicated (and more common) case is where you can’t afford to upgrade the entire system at once. Then you need to deal with version mismatches. The main approach we’ve taken to this problem is to make components support multiple versions of a given protocol at once. To do this, we keep around the modules that describe old versions of the protocol, and keep explicit version numbers associated with each protocol module. We then write conversion functions that allow for translation between different versions of the protocol, allowing one program to speak multiple versions of a protocol. When two different components need to communicate, they first negotiate the version of the protocol they will speak to each other, picking the largest version that they both support.

This approach works reasonably well, but it has some downsides. The translation functions are somewhat tedious to write, and therefore error-prone. And even though the idea sounds simple, it’s hard to get the details right. We’ve had to play around with a few different approaches to writing the conversion functions. One approach is to write upgrade and downgrade functions from each version to and from its successor. You can then achieve any conversion by chaining the conversion functions together. Another approach we’ve tried is having another set of types that are an internal model of the communication protocol, and to have conversion functions between each supported version and the model. Both approaches are workable, but each has its own advantages and disadvantages.

This is a problem that I’m sure many people have grappled with. I’ve just given a quick overview of how we deal with it. I’d love to see other people’s comments on how they’ve approached the same issues.