O/R Mapping and Performance
I am a huge fan of O/R mappers like Hibernate and JDO. They insulate developers from the database mechanics, speeding up development and boosting productivity. They also add a layer of insulation above the database itself, which aids portability. However, insulating developers from the database layer completely is not always a good thing. Developers still need to be aware of the performance impact of the code they are writing at that moment. There are a number of issues that can arise. Here are a couple that I have noticed on a recent project:
- Improperly configured or non-existant lazy initialization. An object with multiple dependencies and relationships with other objects may trigger a raft of
SELECTs when instantiated. These other objects, in turn, may trigger more statements to initialize themsleves, and so on. Objects within a large inheritance hierarchy are also vulnerable,as are objects that have a one-to-many (e.g. a parent-child) relationship, where the cardinality on the 'many' side is huge. To counter this, lazy initialization can help. Most modern ORM frameworks will use dynamic proxying or bytecode enhancement to hand back proxies to objects that may not be fully initialized yet. As the object is used, further parts of the object graph may be loaded into memory transparently. This saves loading many child objects at the beginning, only for them not to be referenced at all. Even better, the lazy initialization should be configurable per persistent entity, so each persistent class can have its own lazy loading policy. One size rarely fits all in such situations.
- Inability to use database-specific optimizations. I know one of the major pluses of O/R mappers is that they obviate the need to write straight SQL, however, sometimes, writing some straight SQL can be the best answer to alleviate performance issues. Every database vendor has product-specific optimizations, and accessing these can sometimes make a huge difference. Purists may be unhappy with it, but in the real world, being able to access the underlying query code is a huge plus. I'm glad that JDO 2.0 is taking this factor into consideration.
- Cache Configuration. A good caching policy is vital. If there are a lot of objects that are relationship-heavy, and also, say a security framework is involved, which loads associated security profiles for each object instantiated from the database, the number of database round-trips will be huge. Tailoring a cache layer to absorb a lot of the most frequently-fetched objects will save a lot of time. Preferably the ORM framework will give you the flexibility to plug in the cache framework of your choice. e.g. Hibernate even allows you to plug in a cluster-enabled cache (OSCache has some clustering features). The ability to mark a cache as read-only and use it as a store for immutable objects will enable further optimizations.