Skip to main content

JPA is great but damned slow

Posted by mkarg on January 3, 2010 at 7:20 AM PST

I did some experiments with JPA, which is a really cool and simple API for entity persistence. In fact, writing an entity bean is as simple as writing a pojo plus adding some single annotations like @Entity and @Id (to identify the PK fields). That's it. Cool. :-)

See this sample code:

 @Entity
public class MySample {

@Id
private int x;

public int getX() {
return this.x;
}

public void setX(int x) {
this.x = x;
}
}

Yes, it's just that simple. No more DD needed (but can be optionally added for admin's customizations). And accessing objects is as simple as the following samples showcase:

 EntityManagerFactory emf = Persistence.createEntityManagerFactory("pu1");
EntityManager em = emf.createEntityManager();

em.getTransaction().begin();

// INSERT INTO MySample is just as easy as:
MySample mySample = new MySample();
mySample.setX(12345); // Alternatively an entity can use @Generated to have an autogenerated PK!
em.persist(myTable);

// SELECT FROM MySample is just as easy as:
Query q = em.createQuery("SELECT OBJECT(m) FROM MySample m"); // This is not SQL but OOP: All names are getting MAPPED internally!
for (final MySample aSample : q.getResultList()) {
   System.out.println(aSample.getX());
}

em.getTransaction().rollback();

(Sorry if there are any typos)

Sounds cool? It IS cool!

But then, I compared the performance of such a simple benchmark like the above one to pure JDBC4. Guys, this simplicity really is expensive. I measured that JPA needs ten times longer than pure JDBC4! Why? It's just as easy as this: JPA has a lot of power inside. You can do a real lot of things in one code line that would need ten or twenty code lines in JDBC. I used TPTP (the Eclipse Profiler) to find out what JPA internally works like, and then it was all clear to me. That flexibility must be come from somewhere, and that somewhere is paid by thousands of code lines running even for the most simple SELECT statement.

So what can we learn? JPA is a pretty cool thing since we can write a single annotation instead of 10 or 20 JDBC lines. But we have to take care not to access more data rows than needed, since every data line is 10 times more expensive in terms of CPU power than JDBC is. As with everything, it is a tradeoff between spending time in coding and spending runtime. One must take good care not to write silly code (like the above one: Who actually wants to load all rows from a table without using a WHERE clause to select the needed ones?). If that rule is applied, I think the performance drawbacks will not be a problem in real life applications.

Comments

Real World Snapshot

The benchmark was a real world measuring: We replaced JDBC by JPA in a piece of our application's code and triggered that code from the benchmark driver. What we liked to see is, what happens when the exact same scenario is running with JDBC compared to JPA. So it was not a synthetic test but a direct comparison inside our application. The result was to learn what will happen exactly our application will migrated from JDBC to JPA. It was not intended as a clean lab JPA benchmark. In fact, now that the application is migrated to JPA we noticed that JPA in fact needs some overhead compared to JPA, but all the nice features that you will get pay off the overhead. As we have measured on the real world, the application actually is slower, but it is not really a problem (BTW, the problem arises by the fact that some queries are simple in JDBC SQL, but are complex in JPA QL, resulting in more complex SQL generated by the ORM -- and no, there is no solution, we already discussed with the JPA Spec Lead at Sun and the RI Team at Oracle).

It also depends on which JPA provider you use.

Not all the JPA implementations provide the same speed. Some are faster and some are slower. See the JPA Performance Benchmark.

My comments: JDBC (local) or

My comments:

JDBC (local) or JTA transactions make a lot of difference, which one did you use ?

Benchmark with only one INSERT and SELECT is so simplistic. P lease put some thousands of rows there.

With JDBC you can use the Statement.getGeneratedKeys() to retrieve the DB generated keys, just to let you know.

I have done corporate tests related to JPA and JDBC and (considering the queries and tables relationship are well designed) the BIG difference between JPA and pure JDBC statments are listed below:

* JPA cache plays a significant role (for good or bad)
* JOIN clauses, sub selects,
* row locks
* select for update
* Not optimized ORM dialects
* Memory pagination

If the SQL needs to be optimized it can be done at the ORM dialect layer, or stick to pure JDBC if ORM is not need.

 

 

Not very meaningful (usually...)

Did you compare total times or just the time spent in the JVM? A good way to do this kind of benchmark is running the DBMS server in a separate machine; then, you can measure the "pure" CPU overhead of JDBC and JPA in the main machine. For real-world application performance, it's meaningless to make your Java persistence code even 10X faster, if that's still a tiny fraction of the cost inside the DBMS (except of course, if your app happens to perform enormous numbers of such trivial queries).

You must also find a way to factor ORM optimizations like caching of generated code & metadata for mapping. This is similar to the prepared statement caches used in JDBC drivers, but one level above. Good JPA impls will certainly do more work per query, but they will try to reuse part of that work whenever possible.

For less trivial queries, e.g. something that returns or updates a thousand records, the ORM overheads will also typically be much smaller proportionally. And you will find that ORMs will try to merge many operations together -- using outer joins to batch many loads into one, and bulk updates for DMLs - which further reduces these overheads for complex transactions containing many persistence operations. All in all, the real-world application performance will often be undistiguishable from hand-written JDBC (and often better, as not all JDBC/SQL programmers will work hard and well enough to hand-tune everything they can). There are still scenarios where JDBC will be faster, but these are a tiny minitory for most apps (and you will most likely identify such scenarios not with a Java profiler but by inspecting your DBMS's execution statistics).