Oracle Benchmarks BDB vs Apache Derby
First of all, it's a great complement that Oracle would consider putting the effort into running the benchmarks and writing the white paper. That must mean they're getting peppered with questions about why they should choose BDB over Derby.
The white paper claims (without any source code or details) that BDB Java Edition consistently outperforms Derby by a factor of three to ten. I actually believe their measurements are suspect -- see the end of this blog for some juicy details. It's also problematic that they don't provide a link to the code, nor do they describe their test runs in any real detail. But lets assume for now that BDB is faster than Derby.
They admit that "in some areas, this comparison is apples-to-apples, and in other cases apples-to-oranges." Well, no kidding -- BDB does not provide any SQL support., and thus doesn't have to pay for the overhead of the SQL layer. But if you want SQL, then this is kind of a problem.
If you don't want SQL, then it's worth considering BDB. But I think you need to be very clear about the path you're taking prior to jumping over to BDB:
- You may not need SQL today, but think about if you may need it later. I have seen way too many projects that started with a simple key/value storage, only to add on query support, secondary indexes, and so on, as the project grows in complexity
The BDB license is clear: if your product that uses BDB is closed-source, or you want indemnification, then you need to us pony up $$$ for the commercial license. If your product is open source, you can use the BDB open source license, which is a form of GPL, where if you use it, you have to open source your software as well.
- The API for BDB is non-standard and provided by a single vendor: Oracle. Are you comfortable having your product be dependent on this single vendor?
The other interesting point they make in the white paper is that BDB JE's support for native storage of Java objects provides big performance benefits over the Java Persistence implementations provided by Hibernate and others, because you don't have to map between Java and SQL. Again, a good point on the face of it, but they forget to mention that the BDB object interface is completely non-standard and is a wide open door to vendor lockin, whereas JPA and JDO are standards with multiple competing, and often open source, implementations.
I think there is a lot of need for a simple key/value transactional data storage for Java, where SQL and querying is not needed. Right now the only real player in this game is BDB, but it is non-standard and owned by a single vendor.
It would be great if we could define a Java standard for key/value storage so that a user doesn't get locked in to a particular solution. Something similar to JPA or JDO, but which is significantly simpler, where we have an EntityManager that does simple get/put operations on Java objects and provides transactional semantics. No query support, and none of the overhead that comes with that.
JavaSpaces defines something very much like this, except that it is not a standard. Perhaps it can be taken in that route...
Maybe Oracle would be willing to work with the Java community to offer their expertise in this area and help define such a standard. Then other folks could provide alternate implementations, customers would have the freedom to leave, and then we'd really be comparing apples to apples...
Under the Hood...
Ok, first of all, as Mike Matrigali observed on the derby-dev list, on the first graph the difference is 83 to 89 operations/sec, or a mere 7%. But the graphic doesn't start at 0 so it makes it look like a big difference.
The even say in the text "while both product are disk bound, JE is still significantly faster than Derby." What bunk!
The next graph actually does show a significant difference (and actually does start at 0), but this graph is with the disk write cache enabled.
They even call them "non-durable writes." Hm, that doesn't give me a warm fuzzy. As I write in
the story of the write cache and half a worm, having the write cache enabled is really exciting in terms of performance, but it does have drawbacks, such as potential loss or corruption of data. No biggie...
Reading on, it turns out that all subsequent performance comparisons are done with the write cache enabled. Hm...
It goes to show, again, that performance measurements are dangerous things to count on others to do, especially those with an agenda. I would recommend
you do your own performance tests before you make any decisions to commit yourself to BDB, and Oracle, for what could be a very long ride...