Making something out of nothing (or, why None is better than NaN and NULL)

Null is a pervasive concept in computing. Virtually all programming languages have a way of expressing nothing, nullity, no answer. But handling nulls correctly turns out to be tricky, and many of the contexts in which you find nulls, you’ll also find confusing and error-prone semantics surrounding them.

The heart of the problem is that, in an attempt to make programming with null easier, nulls are often propagated implicitly through computations, allowing the programmer to write code that deals with nulls without explicitly contemplating how nulls should be dealt with.

My experience has been that this is a mistake; that if you want robust, easy-to-reason-about code, the programmer must think explicitly about how to handle null cases, and that programming languages would do well to provide programmers with good support for the requisite case analysis

The point can be illustrated by considering some of the contexts in which null arises.

IEEE Floating point arithmetic

This is an oft-forgotten case, but an important one. The null value in this case is NaN, which stands for not-a-number. NaN is the value you get when a basic arithmetic operation has no reasonable answer, e.g. zero times infinity.

In the IEEE standard, NaN behaves quite differently from other floats. The standard requires that applying the primitive arithmetic operations to NaN always produces NaN. At first glance, this seems sensible. What’s 3 times I-don't-know? Clearly it’s I-don't-know!

But what about primitive operations that don’t return floats, like comparisons? According to the standard, comparison functions return false when one of their arguments is NaN. Now this should make you nervous, since it’s not at all obvious that the value of I-don't-know > 4 should be false.

Indeed, this behavior violates a bunch of fundamental invariants. For example, it’s simply not the case that x > y is equivalent to not (x <= y) when x or y are NaN. Also, comparison functions don’t give a total order on floats – see what happens to your favorite binary tree when you throw in some NaNs. Reflexivity doesn’t even hold, since NaN = NaN is false!

To see how weird the results of this can be, consider the following function for computing the max of two numbers (the example is in OCaml, but the same behavior will show up in any language that provides IEEE compliant floating point semantics):

let max x y =
  if x > y then x else y

If either x or y is NaN, the test will return false, and y will be chosen. Thus, max NaN 3 is 3, and max 3 NaN is NaN. Note that max doesn’t have the null-propagation property that the IEEE standard wanted for primitive float-producing operations. max doesn’t even have the symmetry property you would expect: max x y is not the same as max y x.

Here’s another example. Consider the following two functions, which are intended to validate whether the provided float is between 1 and 10.

type result = Fail | Pass

let check1 x =
  if x < 1. || x > 10. then Fail else Pass

let check2 x =
  if x >= 1. && x <= 10. then Pass else Fail

The two functions certainly look equivalent, and if you’re not clued in to the strange semantics of NaN, you might be surprised to discover that check1 returns Pass for NaN, whereas check2 returns Fail.

SQL’s NULL

Much ink has been spilled about the problems associated with SQL’s NULL. The wikipedia page on NULL is a pretty good summary, so instead of trying to give a complete picture here, I’ll just give a taste of how NULL works in SQL, and what the associated problems are.

At first glance, SQL’s NULL looks a lot like NaN. If you call a simple function and give it NULL as one of its arguments, the result will be NULL as well e.g., 3 + NULL is NULL. But unlike NaN, SQL’s NULL is not restricted to floating point numbers; You can have a NULL in a column of arbitrary type.

Let’s see what happens when we start doing comparison functions in SQL. It turns out that SQL propagates a kind of NULL into boolean expressions as well. This seems more consistent than the behavior with NaN. Indeed, the odd way that NaN propagates into comparison functions is what sunk our implementation of max, and so you might expect max implemented in SQL will work properly. Let’s try. Assume we have a table t which contains some NULLs:

|    i |    j |
|------|------|
|    1 |    2 |
| NULL |    2 |
|    1 | NULL |

If we compute the max of these two columns, we should expect the result to be:

|    i |    j | max  |
|------|------|------|
|    1 |    2 | 2    |
| NULL |    2 | NULL |
|    1 | NULL | NULL |

Here’s some SQL code code for computing the max from two columns.

SELECT
i, j,
CASE WHEN i > j THEN i ELSE j END
AS max
FROM t

Sadly, we see the exact same bizarre behavior we got out of our floating-point max function.

|    i |    j |  max |
|------|------|------|
|    1 |    2 |    2 |
| NULL |    1 |    1 |
|    1 | NULL | NULL |

Why is this? Because SQL doesn’t hold its ground when it comes to the way it handles NULL in conditional expressions. A NULL condition is treated as equivalent to false, where the more consistent behavior would be for the entire CASE expression to evaluate to NULL.

The root of the problem here is the attempt to pick a reasonable default behavior in the presence of NULL in a way that doesn’t just scuttle the entire computation. Saving such a computation isn’t hopeless – these null-handling heuristics often produce reasonable answers. But they can produce unreasonable answers as well, and the end result is that the behavior of SQL in the presence of NULL is inconsistent and confusing.

The above example is really just the tip of the iceberg. You can see examples of strange behavior with aggregate functions, selects and joins as well.

Fundamentally, both SQL’s NULL and the floating-point NaN fail because the choice of how to salvage a calculation that encounters null depends on things that can not be known by the heuristic.

One good aspect of SQL’s NULL handling is that SQL provides ways of enforcing constraints that given columns contain no NULLs. Given the odd behavior of NULLs, it’s an essential feature.

Null references

Another common form of null is the null reference (or null pointer). Null references appear in most mainstream statically typed programming languages, including C, C++, C#, Java. In the following, I’ll talk about Java’s null references, but the same basic issues show up in many languages.

Unlike IEEE floating point arithmetic and SQL, Java does not try to automatically salvage computations that encounter nulls. Instead, any computation that tries to make a method call on a null reference will result in an exception. In some sense, exceptions are their own kind of null, a special value that a function can return when it can’t return anything else. (non-termination is yet another way for a computation to refrain from returning an ordinary value.)

The problem with null references is that while their handling is explicit at the level of values, it’s implicit at the level of types. Java’s type system allows for all object references to potentially be null references, even though for most variables most of the time, null will never show up. This is unlike SQL, where some values can be declared as non-NULL.

Because the type system gives you no clue as to the presence of nulls, null reference exceptions are rather ubiquitous in Java programs. Indeed, elaborate systems like ESC Java (ESC stands for “Extended Static Checking”) have been developed to enforce various static checks, prominent among them the elimination of runtime null reference exceptions.

As a side note, it’s interesting that Java has such a loosey-goosey approach to null object references, when it has a fairly strict approach to exception handling. Indeed, Java requires one to be quite strict and explicit when dealing with so-called checked exceptions.

ML’s option type

The problem with Java’s null references is different from the problem with SQL and IEEE floating point arithmetic. In those two cases, the system tries to “do the right thing” when null comes up, and that right thing turns out to sometimes be wrong; the problem with Java is that it doesn’t provide sufficient tools in the language to ease the programmer’s life in dealing with nulls.

Languages like ML and Haskell that are based on the Hindley-Milner type system, on the other hand, provide quite powerful tools for null handling. Rather than talk about such languages collectively, I’ll talk in terms of the instance I know best, OCaml. In OCaml, null is not lurking in every type. If a variable has type int array, then that variable is always populated with an int array, and never with null.

When you need a variable whose value might or might not be there, it is modeled explicitly in the type system. One common way of doing so is using what’s called an option. The option type is not a primitive of the language, but an ordinary data type, which can be declared as follows:

type 'a option =
  | None
  | Some of 'a

This is a tagged union with two cases: None, indicating the lack of a value; and Some, indicating the presence of a specified value. To see how this might be used in practice, consider the following simple hashtable interface:

module Table : sig
  type ('key,'data) t

  val create : unit -> ('key,'data) t
  val find : ('key,'data) t -> 'key -> 'data option
  val replace : ('key,'data) t -> 'key -> 'data -> unit
end

Notice that find returns an optional value. The reason is that find may fail to find an entry corresponding to the given key. This fact is modeled in the type by the fact that the return value is of type 'data option rather than simply 'data.

In addition to allowing one to express in the type system that a value may be missing, OCaml also provides powerful tools for doing the corresponding case analysis. Here’s a simple example deals with options explicitly using match statements.

let increment_count table key =
  let current_count =
    match Table.find table key with
    | None -> 0
    | Some x -> x
  in
  Table.replace table key (current_count + 1)

Here we explicitly state that we want the null case (where the key can’t be found in the table of counts) to be treated as if the table had an entry with value equal to zero.

Explicit null handling like the kind shown above can be done in, say, SQL, using the COALESCE statement. But in OCaml, the case analysis is obligatory – you can’t silently use an optional value without contemplating what will happen in the None case. Consider the following implementation of increment_count:

let increment_count table key =
  Table.replace table key (Table.find table key + 1)

While the analogous code would compile in SQL, the OCaml compiler will reject it, noting that the expression Table.find table key was used as if it were an int, but was in fact an int option. This may seem cumbersome, but in reality, the explicit separation of cases where null is and is not possible is a great relief, since it frees the programmer from worrying about the cases not flagged by the compiler.

The requirement to handle nulls explicitly doesn’t mean that we can’t reduce the amount of boilerplate required. If we have a particular null-handling policy in mind, we can write helper functions to automate it. For example, we can write a function:

let option_get ~if_none option =
  match option with
  | Some x -> x
  | None -> if_none

This allows us to rewrite our increment_count function as follows

let increment_count table key =
  let current_count = option_get ~if_none:0 (Table.find table key) in
  Table.replace table key (current_count + 1)

Indeed, in Jane Street’s Base library there is an Option module devoted to useful helpful helper functions of this sort, including an option monad which is a quite general and powerful technique for dealing with option-generating computations.

OCaml’s approach is far from perfect. While OCaml is quite explicit about the use of options, it is quite the opposite when it comes to exceptions. Exceptions, like null references in Java, lurk in every function, and this is a very real problem. Indeed the inability to track exceptions in the type system has lead us to try to avoid exceptions except for truly exceptional conditions (and to use options instead).

If you’re interested in working at a place where functional programming meets the real world, you should think about applying to Jane Street. We’re always looking to hire great programmers with an interest in functional programming.