OCaml for the recovering Java programmer, part 1: objects and subtyping

It’s said that the fox knows many tricks, but the hedgehog knows one big trick. If Java is the hedgehog, with objects as its one big trick, then OCaml is the fox, with lots of different tools for structuring code. Many of the things you’d use objects for in Java have simpler, cleaner, or safer alternatives in OCaml: tuples and records for structuring data, higher order functions in place of one-method anonymous inner classes, parametric polymorphism for collections instead of pervasive downcasts (although this has improved with the introduction of Java generics), functors and signatures in place of (compile-time) parameterization of code with interfaces.
Nonetheless, sometimes you want objects—as I did recently when interfacing with some object-oriented native code—and you can get them in OCaml too (objects are of course the O). But they aren’t quite the objects that you’re used to in Java. In Java, you can put two objects with a common superclass into a single List. I tried to do that in OCaml and got a mysterious type error. It took me some time, a little research, and a little profanity, but I got my code working and learned some things.
One difference between Java and OCaml is nominal vs. structural subtyping. In Java, one class is a subclass of another only if you declare it to be so (e.g. Cat extends Pet); what matters is the names of the classes involved (hence “nominal”). In OCaml what matters is the methods that the class supports (its structure, hence “structural”); if you declare classes pet with a legs:int method and cat with legs:int and snooty:bool methods, then cat is a subclass of pet even though you have declared no relationship between them (as with “duck typing”, but statically checked.)
A second difference is that in Java subtyping coercions happen automatically, while in OCaml you must request them explicitly with the :> operator. In Java you can write
Pet p = new Cat();
while in OCaml you must write
let (p : pet) = (new cat : cat :> pet)
(In many cases you may omit the : cat part; see the manual for precise details.)
This is why I couldn’t put my cats and pets in the same list. There’s only one type variable in ‘a list; it can be instantiated with cat or pet but not both simultaneously. However, you can explicitly coerce the cats to pets and put them all in a pet list, and that’s what I ended up doing.
But hold on, this sounds kind of annoying. The whole point of subtyping is subsumption, the ability to pass a cat to a function expecting a pet. It would be a pain if you had to explicitly coerce the cat to a pet. In fact you don’t need the coercion in OCaml when making a function call, but the way this is accomplished is completely different from Java. In Java, the argument object is implicitly coerced to the supertype at the function call site; in OCaml the function is polymorphic, and a “row” variable is instantiated at the function call site.
Before we explain row variables, let’s review ordinary parametric polymorphism. Consider the identity function in OCaml:
let id o = o
The type of id is ‘a -> ‘a, where ‘a is a type variable which may be instantiated with any type. If you write id 3, ‘a is instantiated with int, and the type of id at this call is int -> int. So the type of the result is int (and in general will always be same as the type of the argument).
Contrast with this similar Java function (leaving aside generics):
Object id(Object o) { return o; }
The type of this function (in OCaml terms) is Object -> Object. If you pass it an Integer it is implicitly coerced to Object at the call site. The type of the result is always Object no matter what the argument type is.
Now consider the following function:
let print_legs o = print_int o#legs; o
You can see that whatever o is, it must have a legs:int method. And because o is returned, the result type should be the same as the argument type.
Typing this into the top level shows the type of print_legs to be (<legs:int; ..> as ‘a) -> ‘a. The syntax .. indicates an anonymous row variable, which may be instantiated with any collection of methods, such as the empty collection, or foo:float; bar:(unit -> unit), or snooty:bool. The syntax as ‘a names the entire argument type so it can be referred to in the result type.
Say we want to pass a cat to print_legs. The type cat is an abbreviation for <legs:int; snooty:bool>. The anonymous row variable may be instantiated with snooty:bool, giving print_legs the type <legs:int; snooty:bool> -> <legs:int; snooty:bool>, or equivalently cat -> cat. Or we can pass a pet by instantiating the row variable with the empty row, giving print_legs the type <legs:int> -> <legs:int>, or pet -> pet.
It’s important that the argument is not coerced; rather the row variable is instantiated at the call site to match the argument. So the result has the same type as the argument, just as with the identity function above. The Java equivalent:
Pet print_legs(Pet p) { out.print(p.legs()); p; }
always returns a Pet even if you pass it a cat.
One last thing: suppose we need to coerce a cat list (or some other parametric type containing cats). Clearly cat list should be a subtype of pet list (we are spared Java’s ArrayTypeMismatchException here because lists are immutable). But OCaml does not know how a type ‘a t should vary with ‘a. If ‘a t = ‘a list then it should be covariant (cat list is a subtype of pet list because cat is a subtype of pet); if ‘a t = ‘a -> unit then it should be contravariant (a pet -> unit function may be used anywhere a cat -> unit function is expected). OCaml lets us declare the variance of a type variable: (+’a) t for covariance, (-’a) t for contravariance. (Unfortunately the standard library types have no variance declarations, but you can add them for your own types.)
I don’t expect that I’ll need the OCaml object system much but it’s nice to understand better how it works.