A couple of months ago, Pascal noticed some missed optimizations in OCaml's float unboxing optimizations. In some cases, code that looked like it should be compiled down to a sequence of allocation-free floating point operations turned out to involve quite a lot of allocation, with floats getting boxed and then immediately unboxed for no purpose. The fact that the compiler missed this particular optimization forced us in a few spots to do some ugly manual inlining, and generally made us sad.

But we are sad no more! We filed a bug report, and it just got fixed in OCaml's CVS. You can see the details here. Now all we're waiting for is a fix to the missed optimization for equality on polymorphic variants.

let fmin (x:float) y = if x > y then y else x

let fmax (x:float) y = if y > x then y else x

let min_point {x=x1;y=y1} {x=x2;y=y2} =

let x = fmin x1 x2 in

let y = fmin y1 y2 in

{x=x; y=y}

Can you explain why x and y inside min_point are boxed?

`(function camlTest__min_point_69 (param/110: addr param/111: addr)`

(let

(x/74

(let

(y/138 (load float64u param/111) y/119 (alloc 2301 y/138)

x/139 (load float64u param/110) x/120 (alloc 2301 x/139))

(if (>f x/139 y/138) y/119 x/120))

y/75

(let

(y/136 (load float64u (+a param/111 8)) y/121 (alloc 2301 y/136)

x/137 (load float64u (+a param/110 8)) x/122 (alloc 2301 x/137))

(if (>f x/137 y/136) y/121 x/122)))

(alloc 4350 (load float64u x/74) (load float64u y/75))))

As a general rule, I would say the compiler does not do a great load of optimization to unbox floats. The only case that gets optimized is “unbox (box v) -> v”, i.e. in CMMnotation “

`load float64u (alloc 2301 x/123) -> x/123`

”The first patch that I proposed (see there) got integrated into the compiler and will unbox floats across a let, i.e. “

`load float64u (let (x/123 (alloc 2301 y/234))) -> let (x/123 y/234)`

“.The second patch I proposed was not integrated. In your example, it would have moved the allocs from the lets into the branches of the if, allocating only one value instead of two. That is not enough to unbox completely though. In order to do that, the compiler would have to notice that both branches of the if are allocs and factorize them (

`x/74`

and`y/75`

would now be allocs). Then, as another pass, it would have to notice that`x/74`

and`y/75`

always get unboxed and hence could be simplified. In short, three different kinds of mechanical transforms are needed in order to fully optimize your code…So this is quite complicated… It’s a pity. Trying to improve floating-point performance on 32-bit x86 platform I have merged floating-point SSE2 code generator from amd64 ocamlopt back end to i386 one. It works ok, but unnecessary boxing still degrades performance. Well, I will try your second patch and see if things is getting better.