Skip to main content

Understanding the cache of TopLink Essentials(GlassFish JPA)

Posted by guruwons on September 7, 2006 at 8:38 PM PDT

Introduction

The hype of JPA has slowed down and time has come to apply this to real applications.

To use JPA properly knowing JPA spec is not enough because JPA spec doesn't cover all aspects and the behaviour can be a little bit different in persistence providers. Especially 2nd-level cache is the one. Which is not covered by JPA spec, but most providers provides it. To increase performance and get expected results, you should understand the cache.

I will talk about the cache extension of TopLink Essentials(GlassFish JPA RI), and I think this is essential part to use JPA properly in GlassFish.

This article is based on GlassFish V2 b15.

Persistence Context

(If you know Persistence Context well, you can skip this section.)

Persistence Context is a key concept in JPA. Persistence context is similar to first-level cache. To be more accurate persistence context is not cache but the working set of managed entities. Which is synchronized to database when flushing or committing. (I will not explain persistence context in detail here, if you need you should refer to other materials.)

Entities in the persistence context are never evicted unless persistence context is cleared explicitly or implicitly(EntityManager is closed when tx complete). Also the persistence context is not refreshed unless you explicitly invoke EntityManager.refresh() method.

Persistence context maintains one entity instance for one persistence identity(primary key) like below.

// in the same persistence context
Employee e1 = em.find(Employee.class, 100);
...
Employee e2 = em.find(Employee.class, 100);
...
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", 100)
    .getSingleResult();

// e1 == e2 && e2 == e3

When using container-managed transaction-scoped EntityManager, the scope of persistence context is a transaction. So this first-level cache is used in transaction scope.

It's not enough to only understand persistence context. You need to know also vendor-specific 2nd-level cache as well. Let's continue...

Session Cache

TopLink Essentials provides 2nd-level cache called Session Cache.
Session cache is maintained in internal server session. And one internal server session is created per a persistence unit. The session cache is shared over all the clients. See the following picture.

multiread.gif

(From TopLink Developer Guide Server and Client Sessions)

In JPA client session corresponds to EntityManager(persistence context). So all EntityManagers from same persistence unit shares the session cache.

Will be session cache shared between applications? Nop. Persistence unit is maintained per Java EE application. So the scope of session cache is one Java EE application. It's not shared between applications. Also TopLink Essentials doesn't provide distributed cache so it can not be shared between clustered applications over several appserver instance.

Session cache is turned on by default so you can use it now without any extra configuration. With this 2nd-level cache you can get performance benefits.

When you get entities which are not in the persistence context TopLink uses cached entities in the session cache(clones of cache entites are returned). But EntityManager.find() and a SELECT query behave differently. EntityManager.find() checks session cache first before goes to database, but a SELECT query doesn't check cache first and always goes to database. Although a query goes to database, it avoids rebuilding an entity and the entity in the session cache is reused(some performance gain).

Normally when a transaction is completed a persistence context(or a EntityManager) is closed and new persistence context is used in another transaction. Session cache is useful in this situation.

For example,

//EntityManager is created and closed for a transaction
Long id = 100L;

//FIRST TRANSACTION
begin();//create a new em. start a transaction.
Employee e1 = em.find(Employee.class, id);
commitAndClose();//commit the transaction. close em.

//SECOND TRANSACTION
begin();
Employee e2 = em.find(Employee.class, id);
commitAndClose();

//THIRD TRANSACTION
begin();
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", id)
    .getSingleResult();
commitAndClose();

If Employee#100 was never retrived from database before the FIRST transaction, FIRST em.find() goes to database, build an entity and stores it on the session cache. SECOND em.find() gets a cache hit so don't goes to database. The THIRD query goes to database always but doesn't rebuild an entity and the entity in session cache is used.

If there is modifications/deletions of entities in the persistence context they are synchronized to session cache after a transaction committed, so the state of session cache is updated.

Cache Options

There are several properties related to session cache.


  • toplink.cache.type.xxx – the type of session cache.
  • toplink.cache.size.xxx – the size of session cache.
  • toplink.cache.shared.xxx - whether or not the session cache is shared

(xxx is entity name or "default")

These properties are well explained in the TopLink JPA Extensions for Caching. Check it out!

If there are external changes...?

It will be no problem if the application is only one which modifies the database. But if there is a external change to the database (i.e. by an other application, by SQL/JDBC or by manual), session cache will be out of date. Whenever you read entities by EntityManager.find() or JPQL query, your application may get out-dated entities.

Let’s see the following example.

//EntityManager is created and closed for a transaction
Long id = 100L;

//FIRST TRANSACTION
begin();//create a new em. start a transaction.
Employee e1 = em.find(Employee.class, id);
println("address1 = " + e1.getAddress());
commitAndClose();//commit the transaction. close em.

//LET'S UPDATE THROUGH JDBC (EXTERNAL CHANGE!)
Statement stmt = connection.createStatement();
stmt.executeUpdate("UPDATE EMP SET address = 'New' WHERE EMPID = " + id);
stmt.close();

//SECOND TRANSACTION
begin();
Employee e2 = em.find(Employee.class, id);
println("address2 = " + e2.getAddress());
commitAndClose();

//THIRD TRANSACTION
begin();
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", id)
    .getSingleResult();
println("address3 = " + e3.getAddress());
commitAndClose();

There are three transactions and I modified the address of Employee#100 between FIRST and SECOND transaction through JDBC. Session cache doesn’t know this kind of external change, so result is like below.

### address1 = Old
### address2 = Old
### address3 = Old

In the THIRD transaction the query actually goes to database (invoke SELECT statement), but current mechanism does not update the session cache even in this case and the out-dated entity from cache is returned.

So you should know that even the first time retrieval in the transaction may not be fresh one from database.

Is this what you expect? If not, how to avoid this? There are several ways to solve this situation. I will explain them in the following sections.

Getting fresh results from database

To get fresh data from database there are several ways. First way is using refresh operations. Other ways are explained in the following sections.

There are two kinds of refresh operations – portable EntityManager.refresh() and TopLink-specific refresh hint.

1. EntityManager.refresh()

Use find and refresh combination like below. This is a simple and portable way to get fresh data.

Employee e = em.find(Employee.class, id);
try {
  em.refresh(e);
} catch(EntityNotFoundException ex){
  e = null;
}

EntityNotFoundException should be caught around refresh() because the entity may be removed externally.

This way has some issues. refresh() requires a transaction (in case of container-managed EntityManager) so you need to start a transaction even if you are just doing read-only operations. Another issue is it may invoke two SELECT statements in find() and refresh() if there is no such entity in the session cache. If find() get fresh one from database refresh() is redundant operation, but there is no way to determine that find() returns fresh result, so you have to do refresh() anyway. But this case will happen just first time.

2. TopLink-specific refresh hint

TopLink provides query hint "toplink.refresh", use query like below.

try {
    e = (Employee)em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
        .setHint("toplink.refresh", "true")
        .setParameter("id", id)
        .getSingleResult();
} catch (NoResultException ex) {
    e = null;
}

NoResultException should be caught because Query.getSingleResult() can throw it if there is no such entity.

If portability doesn't matter, this is better way than the find and refresh combination because it doesn't require a transaction and it will trigger just one SELECT statement. Also this way can be used to retrieve group of entities like below.

List list = em.createQuery("SELECT e FROM Employee e WHERE e.name = :name")
    .setHint("toplink.refresh", "true")
    .setParameter("name", name)
    .getResultList();

To be more convenient you'd better make this kind of query as named query and compose a utility method like below.

@NamedQuery(name="Employee.freshFindById", 
  query="SELECT e FROM Employee e WHERE e.id = :id",
  hints=@QueryHint(name="toplink.refresh", value="true"))
public class Employee {
...
    public Employee freshFind(EntityManager em, Integer primaryKey){
        Employee e;
        try {
            e = (Employee)em.createNamedQuery("Employee.freshFindById")
                .setParameter("id", primaryKey)
                .getSingleResult();
        } catch (NoResultException ex) {
            e = null;
        }
        return e;
    }

I placed the freshFind() utility method in the entity class, but it can be placed in other class as you wish.

Pessimistic locking

Pessimistic locking is another way to gurantee that loaded entities are fresh ones. This is more powerful because it also lock rows until transaction complete, so prevent another transaction from using the entities.

Pessimistic locking is not JPA standard, but TopLink provides this feature through a query hint "toplink.pessimistic-lock" like below.

e = (Employee)em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
.setHint("toplink.pessimistic-lock", "Lock")
.setParameter("id", primaryKey)
.getSingleResult();

Value "Lock" issues "SELECT ... FOR UPDATE" and "LockNoWait" issues "SELECT ... FOR UPDATE NO WAIT".

If you turn on pessimistic locking it automatically turn on "toplink.refresh=true" so it always go to database and update session cache and persistence context. You will always get fresh ones. Also it gurantees that entities are not modified in the database until transaction complete.

It is a good pattern to have this as named query like below.

@NamedQuery(name="Employee.lockedFindById", 
  query="SELECT e FROM Employee e WHERE e.id = :id",
  hints=@QueryHint(name="toplink.pessimistic-lock", value="Lock"))

As you know pessimistic locking can be performance bottleneck. So use this carefully or consider using optimistic locking.

Disabling shared session cache

Dissabling shared cache is the final way. It is not recommended but in some cases it requires. Think of web tier which just only reads from database and need to get latest contents per request. It even doesn't need a transaction. (Normally it uses container-managed EntityManager out of transaction, so entities are not managed). In this case I don't need session cache and want to do simple find/query without extra work like refreshing.

It can be done by turning shared-cache property "toplink.cache.shared.xxx" off(xxx is entity name).

If you disable shared cache, the shared session cache in server session is not used for the specified entities and instead isolated client session caches are used. An isolated cache is the session cache which a client session has. Isolated cache is TopLink's concept but in JPA it doesn't have actual meaning because the first-level cache - persistence context - is used while EntityManager(client session) is alive.

For example, if you set "toplink.cache.shared.Employee=false" all Employee entities are not stored in shared cache. See the following configuration in persistence.xml.

<persistence ...>
  <persistence-unit name="HR">
  ...
    <properties>
    ...
      <property name="toplink.cache.shared.Employee" value="false"/>
      <property name="toplink.cache.shared.Department" value="false"/>

    </properties>
  </persistence-unit>
</persistence>

CAUTION: If the entity has relationships, the associated entities should be set to false too. In this case Department should be set to false.

There is also "toplink.cache.shared.default" property for all entities like below.

 <property name="toplink.cache.shared.default" value="false"/>

But this is not recommended[*]. Also it has a bug so it doesn’t work now.

Setting the cache type "toplink.cache.type.default" or "toplink.cache.type.xxx" to "NONE" has similar effect as above but this setting can result in infinite recursion if there is a cycle of eagerly loaded relationships[*]. So it not recommeded at all.

Future enhancements

The cache feature of TopLink Essentials is powerful but there are some limitations compared to commercial TopLink or Hibernate.

Cache synchronization between clustered application

If applications are different or applications are clustered, the session cache is not shared. TopLink provides cache synchronization and Hibernate also provides clustered 2nd-level cache.

Cache invalidation policy

Currently the session cache is controlled by Weak or Soft references(see cache-type options). So we cannot invalidate cache as time/daily basis or per query. TopLink provides several invalidation policies, but currently TopLink Essentials is lack of this.

Query cache

In TopLink Essentials every query goes to database. In some cases query results don’t change, so caching query results is good for performance. TopLink provides this feature, but currently TopLink Essentials is lack of this.

I hope these features are added to TopLink Essentials in the future.

References

Related Topics >>