Skip to main content

The tense relationship between JPA, enums and generics

Posted by saintx on April 19, 2008 at 11:10 AM PDT

In the last two months, I've come to understand in excruciating detail the various tradeoffs between using generics and enums in my JPA-ready entity library. Most recently, I've been inspired to write down some of my notes, to save myself and others some headache in the future.First of all, as always, a pattern is necessary to illustrate the problems I've seen. Consider the task of persisting a unit definition. Here are some examples of instances of a Unit object: kilogram, second, meter, candela.

Clearly these objects all would benefit from having a name, description and ID field. But consider these as well: joule, watt, newton. Now, the first three units were SI base units. The second three can be defined in terms of the first three. For example, a newton is equal to m∙kg∙s−2. So it becomes clear that units need to be able to be defined according to an underlying terminology.

So, let's say we have a Term class, with a coefficient, a radix and an exponent. The newton Unit instance would now contain a List of three Term instances. The first term has a coefficient of 1, a radix of the "meter" instance, and an exponent of 1. The second term is much like the first, save for the fact that its radix is equal to the "kilogram" instance of the Unit class. The third follows a similar pattern, but in addition to referencing the "second" Unit instance, it also has an exponent of -2.

Now, for consistency and convenience, we'll define base units in terms of themselves. So, a "meter" instance contains a list of one Term, with a coefficient of 1, a radix of the "meter" instance, and an exponent of 1.

Finally, we can also give the Unit class a boolean field, in order to distinguish between base units and derived units. This very basic definition of units and terms will suffice for what I'm trying to explain.

Now, so far, we have two entities, Unit and Term, and a very simple Many-to-Many relationship between them. But here's where the trouble starts.

Part of the purpose of capturing the concept of a Unit in an object-oriented manner is so that we can use them to create constraints on logical behavior that are more rich and efficient than we'd be able to accomplish were they mere text fields. We also want to capture, in some way, the relationships between different units.

To illustrate the first point, consider that you're writing an application where you want to add together a collection of units to determine their sum total. Now, imagine your assumption is that you're trying to get a combined measurement of mass. If you have an object that represents "3 kilograms" and another that represents "5 kilograms" you can easily accomplish this. But what if in addition to these, someone slipped in an object representing "4 seconds"? What is 3kg + 5kg + 4s?

Now we're dealing with dimensional analysis. There are rules that stipulate, you cannot add these together. It'll break your logic. So, what you want to do--what you need to do, is to somehow capture the idea that your "kilogram" units are tied to the concept of "mass", that your "second" objects are tied to the concept of "time", that your "tesla" objects are tied to the concept of "magnetic flux density", and so forth.

But what are these concepts "time", "mass", "magnetic flux density", "photoelastic work", "molar entropy", and so forth? Well, it turns out there is some fog in the answer to that question. Depending on which poorly written wikipedia article you find, these concepts are collectively called "quantities", "dimensions", "magnitudes" or even "quantitative properties of particles". It turns out there is no highly rigorous name for them, but since Object-Oriented programming is nothing if not nominalist and aristotelian in nature, they needed to be named. In order not to limit my future use of the above terms, I decided against adopting any of them and named these objects according to their relationship to the Unit. Since these concepts serve to limit the scope of what the unit can be used to quantify, I refer to them as the Quantitative Scopes of a unit. Please, if you're a physicist studying dimensional analysis, don't be upset.

Because, whatever these are called, we are now straying into trouble with Java. Case in point, how does one best represent a quantitative scope?

Well, generics give us one tantalizing option. I'd like to be able to create a new Unit<Mass>(), and keep it in a Set<Unit<Mass>>. I think this would afford me with the best and easiest way to constrain the use of these units. However, that leaves us with the difficult problem of how "Mass" is represented.

We have only two options here. Class or Interface. Class is problematic, because every instance of "Mass" would be identical to every other. So, it makes more sense to use Interfaces. Ah, but herein lies the rub, because our original goal was to make these objects persistent. And Interfaces, bless their bytecode, certainly do not fit this bill.

So, it seems objects are the only option. But this, once again, leaves us with the problem of instance control. I could rattle off 126 examples of a QuantitativeScope object, each differing from the other only in name. But if we define each as a class, then presumably we'd have 126 database tables filled with carbon copied records, which is just not going to happen. Thus, the quandary. Interfaces cannot be made persistent, but classes are the wrong instrument to accomplish the goal.

Well, what about enums? It's an idea--an enum would nicely solve the persistence problem, but it doesn't save us on the application layer because enum-valued objects cannot be used in generic fields. Furthermore, enum-valud objects cannot be given generic fields themselves. This makes sense, given what enums are for, but it leads us back to the same problem. How, given all three of these tools, are we to accomplish the goal of being able to discriminate easily between units of different quantitative scope while not abandoning the ability to persist the objects?

I came up with an ugly hack to solve this problem. The good news is that it makes the best use of the available technology that I'm able to determine. The bad news is that it's an ugly hack and it fills me with doubt about Java and JPA. But I'm invested in making this work, so here goes:

First, I created a "Scope" enum, with 126 different values in it. Scope.Mass, Scope.Time, etc. Then, I made 126 corresponding interfaces, "Mass", "Time", etc., that extend a base "QuantitativeScope" interface. Third, I made a generified "Graft" class that serves to bind one of these interfaces "Q extends QuantitativeScope" to one of these enum-valued objects. Finally, I defined a library class containing 126 public abstract final instances of this "Graft" class, each mapped to the appropriate "Scope" enum-valued object. With these graft instances, I was set.

Now, when defining a Unit, I can pass one of these "Graft" objects to the UnitFactory. It can get both the generic type from this Graft object, and assign the Graft.getScope() enum-valued object to a "scope" field in the Unit class. When I persist the Unit into JPA, the "scope" field, which is defined as @Enumerated(EnumType.STRING), goes into the database. When I get a collection of Unit objects back out of the database, I can run them through a seive and inspect each of their Unit.getScope() values in a switch statement, then place them into appropriately generified Set objects. This piece works sort of like a coin sorter, but when I'm done I can ask for the Set<Unit<Mass>> and know that my results are reliable.

The main problem with this workaround is that I had to duplicate a lot of data and encase it into interfaces, a large enum, and a graft object library in order to make it function. There are other lingering problems with this approach as well, and I'm sure that I'll uncover more and more of them as I continue.

What this has taught me is that enums and generics are exceedingly tricky to use. Although with generics, Types can now be used as compile-time constraints on behavior, they don't help you at runtime, and are therefore tough to work into a persistence application. This worsens the intrinsic impedence mismatch between the application layer and the persistence layer in application design. Further, enums in Java behave like pseudotypes, somewhere between Interfaces and Classes, but because they cannot be used as Generic Types, they even further aggravate the impedence mismatch when worked into a persistent application design. If I could have used the enum valued objects in the generic fields, this would be a non-problem.

Finally, the best solution for my particular problem might have nothing to do with enums or generics after all. What I'm trying to replicate through this design is actually Invariants, Preconditions and Postconditions on method behavior, class definitions, and collection compositions. There are languages such as VDM-SL and Eiffel that wonderfully exemplify this sort of language feature, and old tools such as iContract that might make it useful in Java, but it's a shame that these useful tools are not built into the language itself.

Comments

No, this is not really a flaw. You can design immutable objects to be enumerated types, for example. Enumerated types are pushed into the datastore as either their numeric index or a string-based representation, so no subclassing is used for them and they don't need to be annotated. A common pattern for EJB/JPA applications is to move persistence data from runtime classes into a persistence-capable POJO bean, then save the POJO into the datastore. On the way back, you can marshall the data from the entity beans and use it to reconstitute the immutable object, which you use in your data layer. OR, if all of the data for the immutable object is included in the source code, then you don't even need to reconstitute its fields. I'd just make this an enum and call it a day.

The date/time classes being proposed in JSR-310 are immutable. They mostly have rather too many values to be represented as enums, and having a complete set of shadow classes to persist them would be tedious.

Cool! I was just looking at JodaTime this weekend, scratching around for Date performance topics, so thanks for the reference. Guess the true answer to this is "it depends". I tend to regard this particular example as a limitation of JSR-310 / JodaTime. OOC, do their objects have JDBC bindings yet? I want an object that can be moved into a database, and Date is already well supported, so I'm using Date for now. Performance isn't free, so if you need a more performant logic layer, you might have some impedence mismatch between it and the data layer and pay for it in code complexity. There's give and take with everything. Knowing your options and picking the best one for your goal seems like the way to go. If I found myself storing immutable data in a database for the sake of performance on the logic layer, I might reevaluate that decision ... something seems off. I'm interested in your opinion on the subject, though.

Just in case I'm giving the wrong impression, I really just want a programming language that hides 100% of persistence for you. The lights go out, you turn the machine back on, it keeps going from where it left off, and you don't know how the data is preserved. That is my dreamworld. JPA is nowhere near this ideal--I don't like putting annotations into my Java code, and testing the annotations is tedious. It has been forcing me to build a better, tighter, more consistent data model, though. So it's good for that.

Isn't the main flaw in JPA which doesn't have a means (absent vendor extensions) to register proxies which enable the persisting of otherwise unpersistable classes? Particularly troublesome is the lack of means to extend the set of immutable classes which can be persisted. Immutable classes are highly desireable, but just don't work with JPA.

Couldn't you make use of something like :

@Inheritance(strategy=InheritanceType.SINGLE_TABLE)
@DiscriminatorColumn(name="QuantitiveScopeType",discriminatorType=DiscriminatorType.STRING)
...
@DiscriminatorValue("Mass")

for your QuantitiveScope subclasses, thus only having one table, and not the 126 (sic) that drive you towards not using objects in the first place ?

(Apologies, but formatting seems broken)