Unraveling of the tech hiring market

Recruiting talented people has always been challenging.

In some years that meant competing with a hot new company that aggressively courted every fresh graduate with promises of stock options and IPO glory.  In other years there wasn’t a specific company so much as an entire rising industry looking for people (I’m looking at you cloud services, driverless cars, and peer-to-peer sharing).  Either way, we understood the yearly back and forth.  Our job was to explain to candidates how we stacked up, and more importantly, why a career at Jane Street might be the right choice for many of them.

But this year I got to learn a new name for a new challenge.  “Unraveling”.

I first encountered it in a book I was reading for fun: "Who Gets What, and Why", by the Nobel Prize-winning economist Alvin Roth.  He does a lovely job explaining the idea of a matching market.  In a matching market each person wants only one of each item, each item is unique, and each item can be given to at most one person at a time.  Jobs are a classic matching market, and just like any market, matching markets can work well, or poorly.

Unraveling is one of the primary things that makes a matching market fail.  When a market unravels  matches start to happen earlier and earlier, to the point where people no longer get a complete view of their options.  In the book Roth relates the story of a person who stepped off of a plane to find three voicemails on his phone.  The first offered him a job, the second urged him to respond soon, and the last rescinded the offer because he hadn't responded quickly enough.

We call them exploding offers, and this year they have gotten completely out of hand as companies race to the bottom in their efforts to recruit the next wave of interns and fresh graduates.

Colleges try to impose deadline limits explicitly to stop unraveling, and in the past these have largely been honored.  The cheating and fudging, such as it was, was kept to the fringes.  But this year it seems like the seal is broken, and we've seen major companies delivering internship and full-time offers with 2 week (and less) hard deadlines.  Other companies now routinely deliver expiring bonus offers for signing early.  Many of these offers circumvent or outright break the guidelines set down by schools, and if past matching markets are a model for this one, next year will come with even earlier offers and worse conditions.

This unraveling has been the subject of a lot of discussion, both internally at Jane Street and with the various schools we recruit at, who see it - rightly - as bad for their students.  How can someone make a thoughtful decision about where they want to build a career without the time to interview at more than one or two places?  Unfortunately, most of this discussion is out of the public light, and so the unraveling continues.

We can't control the actions of others, but we also don’t have to follow the herd, so we'd like to be clear:

Jane Street is committed to making sure that you have the time and information you need to decide on an offer from us.  Our offer letters do have good-until dates as a matter of professional practice, but we try to work with every candidate to choose a date that works for them.  We are also happy to extend the date if something unexpected comes up, or, frankly, if someone just needs more time.

Choosing where to start your career is a big decision and we hope you have the time to make a good one.

13 Virtues

Very early on in his life, while on lengthy voyage from London to Philadelphia, Ben Franklin created a system of thirteen virtues to live his life by. He spent the remainder of his days giving special focus to one virtue per week in a 13 week cycle, as well as noting the virtues he failed to live up to at the end of each day.

Over time he credited the system with making him more productive and more fulfilled.

My aspirations are not so lofty, but in the spirit of the new year, I present Ben's thirteen virtues as they relate to code review and discussion. Nothing here is meant to be taken as gospel, but together they give me a path towards the type of collaboration we value at Jane Street.

My simple hope is to waste less time, produce better code, and have fewer arguments over the next 12 months than I did over the last.

Temperance

Review thoroughly, but not to the point of exhaustion. Be mindful of the limits of review.

Silence

Say only things that clearly benefit the code; avoid trifling comments and tangents.

Order

Create the structure (files, modules and types) necessary to give every concept a clear place. Be suspicious of catchall modules.

Resolution

Respond to feedback and change requests quickly. Momentum is important.

Frugality

Don't waste people's time with frivolous review, careless comments, or code that isn't ready for review. Attention is expensive.

Industry

Prefer to respond with working code over additional commentary. Focus review on immediately productive outcomes instead of uncertain concerns.

Sincerity

Come to discussions with an innocent mind. Engage in code review with the clear goal of helping.

Justice

Weigh code decisions on the evidence at hand today, and not on personal preferences, prejudices, or obsolete past constraints.

Moderation

Avoid extremes in both style and approach. Incorporate strong views slowly.

Cleanliness

Spend time to lay out code in a clear and visually pleasing way. When reviewing, leave the code neater than you found it.

Tranquility

Don't become impassioned or incensed over trifles. Engage in all conversation with an open balanced tone and a sense of patience.

Chastity

Proliferate new ideas through the code base cautiously and be aware that even a good idea may not work in all places.

Humility

Take a modest view of your own contributions and feedback. Be unpretentious and respectful in your comments. Accept that you may be wrong.

Interviewing At Jane Street

Welcome to our version of the seemingly obligatory post about technical interviews. This topic has been covered by a lot of people already, so I'm going to do my best to not repeat all of the copious advice already out there.

Like many companies, we are looking for extremely talented technical people, and we have a challenging interview process that we think does a good job of selecting people who will do well here.

That said, we know that we miss lots of good people too. Some of that is because of the awkwardness of the interview process itself: time is short, the questions are weirdly artificial, and, of course, people get nervous. It's made even worse by the fact that programming on a whiteboard, a web browser, or even just on a computer that isn't yours is a bit like playing classical guitar with mittens on. It can put even an accomplished person off their game.

Missing out on good people makes us sad.

That's what this post is for. We hope that by talking a bit about what we're looking for, the ways people do poorly, and how we think you might be able to prepare, that we'll reduce the mitten handicap - at least a bit.

What Are We Looking For?

From our perspective, the main thing we want to figure out when we interview someone is: are they someone we want to work with?

That seems obvious enough, but it's a point that can get lost in the puzzles and whiteboard coding of an interview. Really, we think of our interviews as little simulations of what it's like to work together with the candidate. And while at the time, it may seem to the candidate that the interview is all about solving the problem, it's really not. We're much more interested in learning about how you work than we are in whether you actually finish whatever problem we put in front of you.

It's not that technical skill is irrelevant --- far from it. But it's only part of the story. Just as important to us is whether we can have a productive and fun discussion with the candidate.

To that end, we try to avoid algorithm bingo and puzzles with clever "aha" solutions. We prefer more open-ended problems that have no single right answer, since they give us more freedom to work together, and to see how the candidates' thinking plays out.

That sounds nice enough, but it's a bit high-level and hand-wavy. So here's a more concrete list of suggestions that follow from our overall approach.

Be nice

The smartest, best person in the world won't get hired at Jane Street if they aren't courteous and pleasant to talk to most of the time. Almost nothing will end your interview faster than being rude, pushy, or obnoxious.

Remember, we are looking for someone to work with, not just someone who can win an argument.

Be clear

And by clear, we mean simple and to the point. Use words and examples that get the core idea across to the widest technical audience possible.

Avoid showy, highly technical or deeply obscure terms of art, especially if you don't fully understand them. In the best case we'll likely just ask exactly what you meant by "hylomorphism", which wastes precious time. In the worst case it will become clear that you should have said "metamorphism" instead, which is just embarrassing for everyone involved.

Know what you don't know

Naturally we like it when candidates are good at solving the problems we put in front of them. But just as important, perhaps more important, is their ability to think reasonably about their own level of understanding.

In other words, we really like people who can express appropriate levels of confidence: admitting ignorance when they're unsure, and speaking confidently when they have the basis to do so. At a basic level this means being willing to say, "I don't know" rather than guessing and hoping when we ask you about something you aren't familiar with.

Know your language

Code is a wonderful language for communicating complex ideas because it provides a clear, concise and unambiguous way of expressing them. But, like any foreign language, it takes a lot of time and practice to get really comfortable.

We need you to be comfortable with it, because we communicate ideas in code a lot.

Now, comfortable doesn't mean that you have to be the world's best coder, or that you need to have memorized your favorite algorithms book. It means that you should be able to read and write code in at least one language without constant access to reference materials for common things, such as:

  • Standard control structures (loops/if-then/etc.)
  • Function, module, class, type, etc. definitions
  • Common data types like arrays, lists, hash tables/maps/dictionaries
  • Exceptions and other error handling techniques

Also, pick coding tools you understand well. Don't pick a functional language to make us happy. We'd much prefer you use a language that you know well. Similarly, when picking which features of the language to use, pick the ones you understand best. We're not going to be impressed with your elegant use of Python decorators if you don't really understand the details of what they do.

In other words, pick a simple, clunky solution that you understand over a fancy, elegant one that you don't.

Remember CS 101

We've hired plenty of successful people who didn't have a traditional college background in CS, and we certainly don't require a masters or a PhD. That said, we need you to have a solid working knowledge of core computer science concepts, including:

  • Abstraction layers like functions, objects, and modules

  • Basic algorithms and data structures, including binary search, sorting, hashing, breadth/depth first search, hashtables, binary trees and heaps.

  • Techniques for estimating CPU and memory costs, including big-O notation.

So if you can't for the life of you recall what amortized analysis is, and you can't nimbly bang out the code for a depth-first search it's probably worth reviewing some of this material.

Think about real computers

Depending on your point of view it's either a sad or beautiful fact that the most elegant code can only run on top of the giant, complex, odd stack of parts and abstractions that is a real computer. Since we need programs to actually run, we need people who understand the inner workings of that behemoth.

Now, this doesn't mean that we quiz every candidate about deep kernel internals, or the differences between SDRAM vs SGRAM. But for some jobs in systems development we do expect a fair amount of detailed knowledge, and in general it's a big plus if in your thinking you can take into account things like cache effects, IO patterns, memory representations, and the capabilities of real CPUs.

What We Don't Look For

  • Perfection. Our questions are often designed to be open ended enough that even the best people we've seen couldn't answer them fully in the time alloted. We want to keep the conversation going to learn everything we can, and we don't expect that you'll answer everything 100% perfectly.

  • We don't ask developers mental math, or math olympiad questions despite what you might have read online. Dev interviews are about programming.

  • We don't ask developers logic puzzles about pirates, people who only tell the truth, or which door the tiger is behind. Dev interviews are about programming.

How do people do poorly?

The most common issue is, of course, that some candidates just don't have the background for the job they are applying for. The solution to that is to learn more, and practice more. But there are a few other less obvious reasons that interviews with otherwise technically good people can go awry.

They're careless

One of the most common pieces of negative feedback from our interviewers is that the candidate was careless. This usually doesn't mean that the candidate didn't make progress, or didn't have good insights. It means that the candidate didn't think carefully and systematically about how their program might go wrong.

We care that you make progress, but we are rarely concerned about finishing a problem. In fact, many of the problems are designed to go on for far longer than the average interview length. It's better to step back and check your work carefully before you claim that it is finished than to rush.

They talk more than they code

Some candidates do a good job of talking through the problem and explaining their thinking, but never quite get around to concretely answering the question. We want to hear your ideas, but we also want to see you produce concrete solutions, which almost always involves writing code. There are lots of things that we can't learn about a candidate without seeing their code and talking through the details.

Take some time at the beginning to think about the solution and to talk about your plans, but make sure you start writing code - even if it isn't the code you would write if you had more time.

They don't generalize

We try to keep our interviews interactive, and we'll often stop candidates to ask about something they have just done, or to point out something that we think might be confusing or incorrect. We understand that we've seen these problems over and over again, and that you are coming to them fresh, so you shouldn't worry just because we've found a problem with your solution.

What you should do is generalize the advice. If we point out that you missed a case, consider other cases you might have missed. If we remind you of an invariant you forgot, find a way to protect yourself from making the mistake in other places in your code.

They say one thing and do another

We love it when a plan comes together, but it's extra hard to watch when a good plan falls apart on execution. If you hear a question, and discuss a plan of attack with your interviewer, do what you claim you will do. Don't change the plan in the middle, or drop it entirely in favor of a better idea without some discussion. You have a very limited amount of time to describe your solution in code, and executing a decent plan well is better than producing a Frankenstein's monster of 3 different plans that doesn't quite come to life.

If you do get partway through and start to lose faith step back and talk about it. Explain exactly why you are concerned, and whether you think it might be fatally flawed at the core, or just not ideal. If there really is a fatal flaw and you've seen it, we'll help you get out of the jam, and we'll appreciate that you articulated it. If it's just not quite perfect we'll likely encourage you to continue.

So, what can you do to prepare?

This part is short and sweet. Build something - from scratch - in a language you like. Don't stop short. Build the whole thing.

Now, show it to the smartest people you know, get feedback, tear it down and build it again with what you've learned.

Repeat with a new problem.

We are looking for people to build real things with us, and practice really does make perfect.

Clearly Failing

The Parable Of The Perfect Connection

Every programmer in the Intertube connected era eventually has to write, or at least use, an API for a network service - something like a database, a message queue, or web service. And, each and every one of them begins with the enthusiasm of the recently inducted as they realize that they can reach out their hand and control something else. And, each and every one of them experiences that moment of frustration and anger when they realize that their buddy out in cyberspace is a bit of a flake.

Now, we aren't talking about a seriously unreliable friend. In fact, your buddy isn't really unreliable at all. He's there 99.9% of the time, and even when he's out for a quick coffee break he tends to come back quickly. Besides, you don't have any real control over him. He's maintained by some other people in a bunker far far away. Those people are confusing, hard to reach, and don't seem to care about your problems. So, you do what countless programmers have done in the past...

You write a loop.

let rec connect_until_success host_and_port =
  connect host_and_port
  >>= function
  | Ok t -> t
  | Error _ ->
    after (sec 5.)
    >>= fun () ->
    connect_until_success host_and_port

Because you are feeling helpful you enshrine the loop in an API to help other people, because, after all, your buddy is pretty reliable, and it would be a shame if other people had to deal with all the nasty complexity that you've just programmed away.

There are a lot of classic twists and variations on this core storyline:

  • count the number of failures and give up after x tries (x is usually 3 or 1)

  • back off exponentially so you don't "hammer" the service

  • don't wait at all and actually hammer the service in a tight loop because latency is important

  • log the error, because someone will look at the logs carefully. Then retry.

  • keep careful track of the time of the last failure, and always retry, unless the last retry was "recent", because one blip makes sense but not two.

  • return an elaborate type that encompasses all possible failure modes, including the fact that we are retrying. Maybe deliver that information in a side channel stream of updates.

  • forget giving people a connect method at all. Just give them a query method and handle the pesky connection details away from prying eyes. You get bonus points if the API doesn't look like you can ever fail.

Hidden Failure Is Still Just Failure

Sadly, the problem isn't in the cleverness of the technical wizardry you use to cover up for your buddy, it's the fact that covering up failure is just another form of failing.

The connection failed. Not telling the world outside of your API is like hiding a bad grade from your parents. They might not catch you once or twice, but you still got the bad grade, and eventually they are going to notice that something is very very wrong - likely after things have really gone off the rails.

Which leads us to three useful principles of failure that apply to self-healing network connections, and most other failure besides.

Fail Quickly, Clearly, and Cleanly

When you design an API, or a system, or even a big complex collection of systems, and you think about how it should fail, make sure that the failure is:

  • Quick: Taking too long to fail is a cardinal sin. Don't retry a thousand times, don't get an hour deep into a computation only to realize that one of the config parameters is bad, and don't forget to add a timeout when the other side might never respond. The sooner you can tell the outside world that you have failed the sooner it can react.

  • Clear: Make sure that your failure behavior is clear, well documented, and can't be missed in a decently written program. It should be obvious from a read of the API and shouldn't require a dip into the underlying code to understand. Beyond that, don't mumble when you fail (I'm looking at you errno in C). Similarly, don't go on about all the little nuances surrounding your failure with a 20 case variant response. Most API consumers only care about the binary state of failure in the code. The details are generally uninteresting outside of debug logs and human readable messages.

  • Clean: Clean up anything and everything you can after you fail, as aggressively as you can. That means close your file descriptors, free your memory, kill your child process. Work harder than normal to make the cleanup portion of your code simple and obviously correct. But still remember to be quick. Do your work after you tell everyone that you have failed if there is any chance that you won't succeed. Don't be that function/program/system that never responds again because it hung trying to clean up before it reported the error.

How Should It Look?

Something like the following API, comments and all.

This makes heavy use of some nice things from our publicly released libraries. If you aren't already familiar with them you can take a deeper look here.

If you want the TLDR version, you really only need to understand Deferred and Or_error to get the gist.

A Deferred is a value that will get filled in at some point in the future (these are sometimes called promises), and when you read it here it just means that the function doesn't return immediately - usually because some network communication needs to happen to get the result.

Or_error is a fancy way of saying, "this might work, or it might give you an error". Returning an Or_error forces the caller to check for an error case in a very clear and explicit way. It's our standard way in an API to indicate that a function might not succeed because, unlike a comment about an exception that might be thrown, or a special return value (like NULL), Or_error can't be missed.

So, if you see something like:

response Or_error.t Deferred.t

You can read it as, "this won't return immediately, and when it does it will either be an error, or a response".

type t

(** connect to the service, returning t or an error if the connection could not
    be established. *)
val connect : ?timeout:Time.Span.t -> ... -> t Or_error.t Deferred.t

(** a simple helper function that calls connect with the original parameters.
    The passed in t is always closed when reconnect is called.  Multiple calls
    to reconnect on the same t will result in multiple connections. *)
val reconnect : t -> t Or_error.t Deferred.t

(** connects to the service and runs the provided function if successful.
    If the connection fails or [f] raises an Error is returned.  [close] is
    automatically called on [t] when [f] completes or raises. *)
val with_t
  :  ?timeout:Time.Span.t
  -> ...
  -> f:(fun t -> 'a Deferred.t)
  -> 'a Or_error.t Deferred.t

(** If timeout is not given it defaults to a sensible value. *)
val query : t -> ?timeout -> ... -> response Or_error.t Deferred.t

val query_exn : t -> ?timeout -> ... -> response Deferred.t

(** If timeout is not given it defaults to a sensible value.  The returned
    reader will be closed when the underlying connection is closed, either by
    choice or error.  It is a good idea for the update type to express the closed
    error to differentiate a normal close from an error close.  *)
val pipe_query
  :  t
  -> ?timeout:Time.Span.t
  -> ...
  -> update Pipe.Reader.t Or_error.t Deferred.t

val pipe_query_exn : t -> ?timeout -> ... -> update Pipe.Reader.t Deferred.t

(** close is idempotent and may be called many times.  It will never raise or
    block.  Once close has been called all future queries will return Error
    immediately.  A query in flight will return error as soon as possible. *)
val close : t -> unit

(** fulfilled when t is closed for any reason *)
val closed : t -> unit Deferred.t

(** closed is an error state.  Once a connection is in an error state it will
    never recover. *)
val state : t -> unit Or_error.t

Seriously, Never?

Up until now I've been making the case for try once, fail quickly and clearly, and I think that much, if not most of the time, it's the argument that should hold. But the world is a complex place. Sometimes things fail, and somebody somewhere has to try again. So where should that happen, and what should we consider when we start talking about retry logic?

How will this stack?

Loops stack poorly and lead to confusing non-linear behavior. This means that you should usually confine retry logic to a component near the bottom or the top of your stack of abstractions. Near the bottom is nice, because, like TCP, everyone can rely on the behavior. Near the top is nice because you have the most knowledge of the whole system there and can tune the behavior appropriately. Most network service API's are in the middle somewhere.

Can I opt out?

TCP sits on top of UDP and provides a solid retry mechasnism that works really well for most of the world, but it would be a mistake in design to only expose the TCP stack. If you are going to provide a self-healing connection/query system as part of your API, make sure to build and expose the low level simple API too. This lets clients with needs you didn't anticipate interact in the way that they want.

Love shouldn't be forever

It's more likely to be a mistake to try forever than to retry once, or for a set period of time. It's one thing to protect a client against a transient failure, but when the transient error lasts for minutes or hours, it's probably time to give up.

Your resource usage should be bounded

Loops, especially loops that create and clean up resources, have a tendency to consume more than their fair share. This is especially true when the loop is trying to cover for an error case, where things like resource cleanup might not work entirely as advertised. So, it's on the writer of a loop to test it heavily and to have strong bounds on how much CPU, memory, file handles, bound ports, etc. a single self-healing connection can take. Getting this right is hard, and you should be nervous about doing it quickly.

How bad is failure?

It's much easier to justify a looping retry if it's the only thing keeping a large complex system from crashing completely, and it's correspondingly harder to justify when it covers just one more case that any client needs to deal with anyway. For instance, a retry loop on my database connection might cleanly cover the occasional intermitent outage, but there are probably real reasons that the database might be out (network failure, bad credentials, maintenance window), and my program likely has to handle this case well anyway.

Not all failure is created equal

Some failures justify a retry. Some failures don't. It's important in retry logic to avoid big try/with blocks that catch any and every error on the assumption that any query or connection will eventually succeed. Retrying because my connection closed is different than retrying my malformed query. Sadly you can't always tell the difference between the two cases, but that doesn't mean you shouldn't make an effort.

You still have to consider failure

You can use a retry loop to limit errors above a certain abstraction boundary, or to limit the impact of small glitches, but you can't recover gracefully from all of the errors all of the time. When you add a retry loop to your system at any level stop to consider what should happen when the error is a real error and isn't transient. Who is going to see it? What should they do about it? What state will clients be in?

It's easier to solve a specific problem than a general one

It's much easier to come up with retry logic that makes sense for a particular application in a particular environment than it is to come up with retry logic that is generically good for all clients. This should push you to confine retry logic to clients/API's that have a single well considered role and to keep it out of API's that may be used in many different contexts.

Quick, Clear, and Clean still (mostly) apply

Even when you are considering retry logic, make sure you think about getting stuck (quick), getting debug information about your state to the outside world (clear), and keeping resource usage bounded (clean).

4