The Source for Java Technology Collaboration
User: Password:



Wonseok Kim

Wonseok Kim's Blog

Understanding the cache of TopLink Essentials(GlassFish JPA)

Posted by guruwons on September 07, 2006 at 08:38 PM | Comments (1)

Introduction

The hype of JPA has slowed down and time has come to apply this to real applications.

To use JPA properly knowing JPA spec is not enough because JPA spec doesn't cover all aspects and the behaviour can be a little bit different in persistence providers. Especially 2nd-level cache is the one. Which is not covered by JPA spec, but most providers provides it. To increase performance and get expected results, you should understand the cache.

I will talk about the cache extension of TopLink Essentials(GlassFish JPA RI), and I think this is essential part to use JPA properly in GlassFish.

This article is based on GlassFish V2 b15.

Persistence Context

(If you know Persistence Context well, you can skip this section.)

Persistence Context is a key concept in JPA. Persistence context is similar to first-level cache. To be more accurate persistence context is not cache but the working set of managed entities. Which is synchronized to database when flushing or committing. (I will not explain persistence context in detail here, if you need you should refer to other materials.)

Entities in the persistence context are never evicted unless persistence context is cleared explicitly or implicitly(EntityManager is closed when tx complete). Also the persistence context is not refreshed unless you explicitly invoke EntityManager.refresh() method.

Persistence context maintains one entity instance for one persistence identity(primary key) like below.

// in the same persistence context
Employee e1 = em.find(Employee.class, 100);
...
Employee e2 = em.find(Employee.class, 100);
...
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", 100)
    .getSingleResult();

// e1 == e2 && e2 == e3

When using container-managed transaction-scoped EntityManager, the scope of persistence context is a transaction. So this first-level cache is used in transaction scope.

It's not enough to only understand persistence context. You need to know also vendor-specific 2nd-level cache as well. Let's continue...

Session Cache

TopLink Essentials provides 2nd-level cache called Session Cache. Session cache is maintained in internal server session. And one internal server session is created per a persistence unit. The session cache is shared over all the clients. See the following picture.

multiread.gif
(From TopLink Developer Guide Server and Client Sessions)

In JPA client session corresponds to EntityManager(persistence context). So all EntityManagers from same persistence unit shares the session cache.

Will be session cache shared between applications? Nop. Persistence unit is maintained per Java EE application. So the scope of session cache is one Java EE application. It's not shared between applications. Also TopLink Essentials doesn't provide distributed cache so it can not be shared between clustered applications over several appserver instance.

Session cache is turned on by default so you can use it now without any extra configuration. With this 2nd-level cache you can get performance benefits.

When you get entities which are not in the persistence context TopLink queries session cache. If there is a cache hit, it doesn't need to go to database and cached entity will be returned(clone of it is returned). Normally when a transaction is completed a persistence context(or a EntityManager) is closed and new persistence context is used in another transaction. Session cache is useful in this situation.

//FIRST TRANSACTION
begin();//create a new em. start a transaction.
Employee e1 = em.find(Employee.class, 100);
commitAndClose();//commit the transaction. close em.

//SECOND TRANSACTION
begin();
Employee e2 = em.find(Employee.class, 100);
commitAndClose();

In the above example, if Employee#100 is retrived from database in the 1st transaction then it is stored on session cache and it is used in next transactions.

If there is modifications/deletions of entities in the persistence context they are synchronized to session cache after a transaction committed, so the state of session cache is updated.

Cache Options

There are several properties related to session cache.

  • toplink.cache.type.xxx – the type of session cache.
  • toplink.cache.size.xxx – the size of session cache.
  • toplink.cache.shared.xxx - whether or not the session cache is shared

(xxx is entity name or "default")

These properties are well explained in the TopLink JPA Extensions for Caching. Check it out!

If there are external changes...?

It will be no problem if the application is only one which modifies the database. But if there is a external change to the database (i.e. by an other application, by SQL/JDBC or by manual), session cache will be out of date. Whenever you read entities by EntityManager.find() or JPQL query, your application may get out-dated entities.

Let’s see the following example.

//EntityManager is created and closed for a transaction
Long id = 100L;

//FIRST TRANSACTION
begin();//create a new em. start a transaction.
Employee e1 = em.find(Employee.class, id);
println("address1 = " + e1.getAddress());
commitAndClose();//commit the transaction. close em.

//LET'S UPDATE THROUGH JDBC (EXTERNAL CHANGE!)
Statement stmt = connection.createStatement();
stmt.executeUpdate("UPDATE EMP SET address = 'New' WHERE EMPID = " + id);
stmt.close();

//SECOND TRANSACTION
begin();
Employee e2 = em.find(Employee.class, id);
println("address2 = " + e2.getAddress());
commitAndClose();

//THIRD TRANSACTION
begin();
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", id)
    .getSingleResult();
println("address3 = " + e3.getAddress());
commitAndClose();

There are three transactions and I modified the address of Employee#100 between FIRST and SECOND transaction through JDBC. Session cache doesn’t know this kind of external change, so result is like below.

### address1 = Old
### address2 = Old
### address3 = Old

In the THIRD transaction the query actually goes to database (invoke SELECT statement), but current mechanism does not update the session cache even in this case and the out-dated entity from cache is returned. (This is an interesting issue which needs more investigation.)

So you should know that even the first time retrieval in the transaction may not be fresh one from database.

Is this what you expect? If not, how to avoid this? There are several ways to solve this situation. I will explain them in the following sections.

Getting fresh results from database

To get fresh data from database there are several ways. First way is using refresh operations. Other ways are explained in the following sections.

There are two kinds of refresh operations – portable EntityManager.refresh() and TopLink-specific refresh hint.

1. EntityManager.refresh()

Use find and refresh combination like below. This is a simple and portable way to get fresh data.

Employee e = em.find(Employee.class, id);
try {
  em.refresh(e);
} catch(EntityNotFoundException ex){
  e = null;
}

EntityNotFoundException should be caught around refresh() because the entity may be removed externally.

This way has some issues. refresh() requires a transaction (in case of container-managed EntityManager) so you need to start a transaction even if you are just doing read-only operations. Another issue is it may invoke two SELECT statements in find() and refresh() if there is no such entity in the session cache. If find() get fresh one from database refresh() is redundant operation, but there is no way to determine that find() returns fresh result, so you have to do refresh() anyway. But this case will happen just first time.

2. TopLink-specific refresh hint

TopLink provides query hint "toplink.refresh", use query like below.

try {
    e = (Employee)em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
        .setHint("toplink.refresh", "true")
        .setParameter("id", id)
        .getSingleResult();
} catch (NoResultException ex) {
    e = null;
}

NoResultException should be caught because Query.getSingleResult() can throw it if there is no such entity.

If portability doesn't matter, this is better way than the find and refresh combination because it doesn't require a transaction and it will trigger just one SELECT statement. Also this way can be used to retrieve group of entities like below.

List list = em.createQuery("SELECT e FROM Employee e WHERE e.name = :name")
    .setHint("toplink.refresh", "true")
    .setParameter("name", name)
    .getResultList();

To be more convenient you'd better make this kind of query as named query and compose a utility method like below.

@NamedQuery(name="Employee.freshFindById", 
  query="SELECT e FROM Employee e WHERE e.id = :id", 
  hints=@QueryHint(name="toplink.refresh", value="true"))
public class Employee {
...
    public Employee freshFind(EntityManager em, Integer primaryKey){
        Employee e;
        try {
            e = (Employee)em.createNamedQuery("Employee.freshFindById")
                .setParameter("id", primaryKey)
                .getSingleResult();
        } catch (NoResultException ex) {
            e = null;
        }
        return e;
    }

I placed the freshFind() utility method in the entity class, but it can be placed in other class as you wish.

Pessimistic locking

Pessimistic locking is another way to gurantee that loaded entities are fresh ones. This is more powerful because it also lock rows until transaction complete, so prevent another transaction from using the entities.

Pessimistic locking is not JPA standard, but TopLink provides this feature through a query hint "toplink.pessimistic-lock" like below.

e = (Employee)em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
.setHint("toplink.pessimistic-lock", "Lock")
.setParameter("id", primaryKey)
.getSingleResult();

Value "Lock" issues "SELECT ... FOR UPDATE" and "LockNoWait" issues "SELECT ... FOR UPDATE NO WAIT".

If you turn on pessimistic locking it automatically turn on "toplink.refresh=true" so it always go to database and update session cache and persistence context. You will always get fresh ones. Also it gurantees that entities are not modified in the database until transaction complete.

It is a good pattern to have this as named query like below.

@NamedQuery(name="Employee.lockedFindById", 
  query="SELECT e FROM Employee e WHERE e.id = :id", 
  hints=@QueryHint(name="toplink.pessimistic-lock", value="Lock"))

As you know pessimistic locking can be performance bottleneck. So use this carefully or consider using optimistic locking.

Disabling session cache

Dissabling session cache is the final way. It is not recommended but in some cases it requires. Think of web tier which just only reads from database and need to get latest contents per request. It even doesn't need a transaction. (Normally it uses container-managed EntityManager out of transaction, so entities are not managed). In this case I don't need session cache and want to do simple find/query without extra work like refreshing.

It can be done by turning shared-cache property "toplink.cache.shared.xxx" off(xxx is entity name).

For example, if you set "toplink.cache.shared.Employee=false" all Employee entities are not stored in shared cache. See the following configuration in persistence.xml.

<persistence ...>
  <persistence-unit name="HR">
  ...
    <properties>
    ...
      <property name="toplink.cache.shared.Employee" value="false"/>
      <property name="toplink.cache.shared.Department" value="false"/>

    </properties>
  </persistence-unit>
</persistence>

CAUTION: If the entity has relationships, the associated entities should be set to false too. In this case Department should be set to false.

There is also "toplink.cache.shared.default" property for all entities like below.

 <property name="toplink.cache.shared.default" value="false"/>

But this is not recommended[*]. Also it has a bug so it doesn’t work now.

Setting the cache type "toplink.cache.type.default" or "toplink.cache.type.xxx" to "NONE" has similar effect as above but this setting can result in infinite recursion if there is a cycle of eagerly loaded relationships[*]. So it not recommeded at all.

Future enhancements

The cache feature of TopLink Essentials is powerful but there are some limitations compared to commercial TopLink or Hibernate.

Cache synchronization between clustered application

If applications are different or applications are clustered, the session cache is not shared. TopLink provides cache synchronization and Hibernate also provides clustered 2nd-level cache.

Cache invalidation policy

Currently the session cache is controlled by Weak or Soft references(see cache-type options). So we cannot invalidate cache as time/daily basis or per query. TopLink provides several invalidation policies, but currently TopLink Essentials is lack of this.

Query cache

In TopLink Essentials every query goes to database. In some cases query results don’t change, so caching query results is good for performance. TopLink provides this feature, but currently TopLink Essentials is lack of this.

I hope these features are added to TopLink Essentials in the future.

References



Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Thanks for putting this out there. I has it on my to-do list as it is important information and can be confusing to new users. Overall it is good and reads well.

    The following are just some minor comments:

    I know it is somewhat confusing with client session and server sessions both having a cache. To make things more clear I tend to avoid saying 'session cache' as it is not always clear which one you mean. As an example I would re-name your section "Disabling session cache" to be something like "Disabling Shared Cache" or "Disabling L2 Cache"

    Isolated cache == client session cache
    Shared cache = server session cache (L2)
    Bug 1054: EM.clear() does not clear entities cached in the client session's isolated cache. It does clear the EM's associated UnitOfWork used to manage the persistence context tracking any changes made. This means that after a clear if you are using isolated caching the next find for an already cached entity will not go to the database as expected.

    Cache hits on the isolated (client) or shared server session caches which completely avoid a database call are only for em.find calls. All queries will go to the database and then when processing results the session caches will be used to avoid re-building entities that are already cached (still an important performance gain). This explains what you are seeing in the THIRD transaction of your section: "If there are external changes...?". Your blog mentions that you will get a cache hit but does not make it clear that is currently only for finds.

    As far as the additional functionality concerning caching features in Oracle TopLink that do not yet exist in TopLink Essentials we appreciate the feedback and are evaluating enhancement requests of this nature as we lay out our roadmap for this product. Please ensure that all of your requests have enhancement requests filed with detailed explanation of the customer need so that we can prioritize them appropriately.

    Doug Clarke
    Principal Product Manager
    Oracle TopLink
    TopLink Blog

    Posted by: djclarke on September 08, 2006 at 07:58 AM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds