Skip to main content

Test Driving Generics

Posted by dwalend on July 14, 2004 at 9:53 PM PDT

I'm impressed that people can blog while attending JavaOne. My head's just clearing up from all the new ideas slamming into the old ones. To relax on the way home, I started stitching generics into JDigraph, a general library for representing directed graphs. The effort mostly went smoothly, with a few hiccups.

Scouting

The documentation for generics in the jdk 1.5 beta is pretty thin. "This long-awaited enhancement to the type system allows a type or method to operate on objects of various types while providing compile-time type safety. It adds compile-time type safety to the Collections Framework and eliminates the drudgery of casting. Refer to JSR 14." That's right, it says read the spec for JSR-14. JSR-14 was pretty easy to read even though it's primarily for compiler writers. However, it didn't have any advice on how or when to use generics. I wound up unzipping the jdk 1.5 src.zip file and looking at the examples in java.util's collection kit.

Bags -- Something Easy First

The net.walend.collection package has the Bag interface and MapBag implementation. Bag is a multi set -- an implementation of Collection that can contain zero or more of an Object in no particular order. I used generics to create a Bag that contains a specific type of thing, much the way the rest of the collections kit works. It was pretty straightforward. I changed the Bag interface to Bag<Elem> extending Collection<Elem>, and the MapBag class to MapBag<Elem> implementing Bag<Elem>. MapBag uses an internal Counter class to track how many times an Elem appears in the Bag, so I made the internal Map be a Map<Elem,Counter>. Then I replaced all the Object parameters and return types to Elem parameters, and all the inbound Collections to Collection<Elem>s. From there, I let the compiler do most of the work, pointing out places where Collection's use of generics didn't match my own. The only remotely tricky part was the inner class BagIterator. It turns out that the outer class' generics are available to the inner class, so I caused myself some grief by declaring BagIterator<Elem> instead of just leaving it alone. I wound up changing it back.

The other thing I stumbled on was the exactness of the match for methods in the interface and implementation. I had thought that <?> was basically a no-op that I could leave it out, and that <? extends E> was as the same as <E>. I may be missing some subtlety here; they seem to mean the same thing to me. But they don't look the same to the compiler. Where Collection asks for <?>, I had to use a <?>, and where Collection specifies a <? extends E> I had use <? extends E> instead of <E>.

The crew updating the collections kit at Sun seems to have used generics only where they could save a cast. For example Collection's boolean contains(Object o) did not change to boolean contains(E o). The compiler won't balk at a developer coding a contains() call with the wrong type, but will balk if a code calls add() with the wrong type. This change breaks code only when the type safety is an issue. I'd argue that calling contains with the wrong type is probably a problem in the code, but it's a tough choice either way. The other odd thing were the toArray() methods. Colleciton.toArray() still returns Object[], not E[]. Collection.toArray(T[] array) returns an array of Ts (also not Es). The source code has a nice trick for getting the class of T out and constructing an array of Ts.

Heaps -- Something More Complex

The net.walend.collection package also defines a Heap interface, and a Fibonacci Heap implementation. A Fibonacci heap is a fairly complex animal. I wanted to make sure I could use generics to create general implementations of algorithms, and use generics to unite a family of specific classes designed to work together. (My long-term hope is to use generics to define semirings, but that's a different blog.) My Heap code required the Heap and HeapComparator interfaces, plus the FibHeap and HeapMember classes. Heap has three type variables: <Key>, the key in the heap. <Comp>, a HeapComparator that works on <Key>s. And <Memb>, a HeapMember that works with <Key>, <Comp> and <Memb> (oddly enough). Now someone can use a FibHeap that uses doubles as <Key>s, largest double first. Someone else can use a FibHeap that uses BigDecimals as <Key>s, smallest first. And someone else can use Strings as <Key>s, last in alphabet first. All three use the same FibHeap algorithm code.

The Heap system works well, but has a strange feel to it. <Key> can be any class (provided the class can support HeapComparator's getMinimumPossible() method). <Comp> has to be a HeapComparator that works with <Key>s. <Memb> is the strange one; <Memb> has to know a bit about the Heap that contains it, plus it has to know about itself. I didn't find a way to say that this specific HeapMember must be <Memb>. I might have just missed a way to make this circular declaration.

Both interfaces and superclasses in the type parameters list use the "extends" key word (not implements). In this system of objects with three type parameters, one-letter-name type parameters quickly gave me a headache. The java.util package uses mostly T and E. I had to leave that behind to keep things straight. There's an art to naming the type parameters that I have yet to master.

Digraphs -- A Big System of Classes

The net.walend.digraph and net.walend.digraph.path packages define three families of directed graph representations. This package shows the boundary for sane use of interfaces. I want to be able to have generified Objects for Nodes and Edges (where they exist). Digraph<Node> is the top-level interface for directed graphs. GEDigraph<Node> (all the edges are identical) and CEDigraph<Node,Edge> (edges can repeat, but are individual objects and have to be represented as such) extend Digraph<Node>. Interfaces that describe mutable versions of GEDigraph and CEDigraph extend those interfaces, followed by implementations. The path subpackage includes interfaces for Path<Node> (which extends Digraph<Node>), GEPath<Node> (which extends Path<Node> and GEDigraph<Node>) and CEPath<Node,Edge> (which extends Path<Node> and CEDigraph<Node,Edge>), plus mutable versions of these interfaces, and implementations built on List and a Digraph of Paths (useful for path algorithms). It's a big system. Generics should let me eventually convert the net.walend.measured package from shortest (double) path algorithms that operate on CEDigraphs into semiring algorithms that operate on any Digraph.

While retrofitting net.walend.digraph, I found starting at the most general interface (Digraph) and working my way toward the most specific worked best. I think that's a side effect of generics being grafted on Java late; a subclass or subinterface can simply ignore the super's generics and still compile. I was able to work with just the <Node> type parameter first, then move through the classes where the <Edge> type parameter should exist. However, I was not able to mix the two approaches well; if a subclass uses one type parameter, it needed to recognize both. To mitigate that, I found I could set the extra type parameter to <Object> and change it later. I was disappointed that Throwables can't have type parameters. NodeMissingException can hold the missing node as an Object, but not a Node. And I found a strange interaction between inner classes, type parameters and instanceof that I still haven't puzzled through. The outer class' type parameters carry through to the inner classes, and work fine for everything I tried except instanceof. I had to add a type parameter to the NodePair inner class in AbstractCEHashMap and AbstractGEHashMap.

Overall, generics worked well and promise to let us build powerful libraries of general algorithms.

Reading Version Control With Subversion, Pilato, Collins-Sussman, and Fitzpatrick (Thanks, Helen. When do we get Subversion on java.net?)

Hearing 9th Symphony , Beethoven (on Boston Common)