The Source for Java Technology Collaboration
User: Password:



Wonseok Kim

Wonseok Kim's Blog

Understanding the cache of TopLink Essentials(GlassFish JPA)

Posted by guruwons on September 07, 2006 at 08:38 PM | Comments (26)

Introduction

The hype of JPA has slowed down and time has come to apply this to real applications.

To use JPA properly knowing JPA spec is not enough because JPA spec doesn't cover all aspects and the behaviour can be a little bit different in persistence providers. Especially 2nd-level cache is the one. Which is not covered by JPA spec, but most providers provides it. To increase performance and get expected results, you should understand the cache.

I will talk about the cache extension of TopLink Essentials(GlassFish JPA RI), and I think this is essential part to use JPA properly in GlassFish.

This article is based on GlassFish V2 b15.

Persistence Context

(If you know Persistence Context well, you can skip this section.)

Persistence Context is a key concept in JPA. Persistence context is similar to first-level cache. To be more accurate persistence context is not cache but the working set of managed entities. Which is synchronized to database when flushing or committing. (I will not explain persistence context in detail here, if you need you should refer to other materials.)

Entities in the persistence context are never evicted unless persistence context is cleared explicitly or implicitly(EntityManager is closed when tx complete). Also the persistence context is not refreshed unless you explicitly invoke EntityManager.refresh() method.

Persistence context maintains one entity instance for one persistence identity(primary key) like below.

// in the same persistence context
Employee e1 = em.find(Employee.class, 100);
...
Employee e2 = em.find(Employee.class, 100);
...
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", 100)
    .getSingleResult();

// e1 == e2 && e2 == e3

When using container-managed transaction-scoped EntityManager, the scope of persistence context is a transaction. So this first-level cache is used in transaction scope.

It's not enough to only understand persistence context. You need to know also vendor-specific 2nd-level cache as well. Let's continue...

Session Cache

TopLink Essentials provides 2nd-level cache called Session Cache. Session cache is maintained in internal server session. And one internal server session is created per a persistence unit. The session cache is shared over all the clients. See the following picture.

multiread.gif
(From TopLink Developer Guide Server and Client Sessions)

In JPA client session corresponds to EntityManager(persistence context). So all EntityManagers from same persistence unit shares the session cache.

Will be session cache shared between applications? Nop. Persistence unit is maintained per Java EE application. So the scope of session cache is one Java EE application. It's not shared between applications. Also TopLink Essentials doesn't provide distributed cache so it can not be shared between clustered applications over several appserver instance.

Session cache is turned on by default so you can use it now without any extra configuration. With this 2nd-level cache you can get performance benefits.

When you get entities which are not in the persistence context TopLink uses cached entities in the session cache(clones of cache entites are returned). But EntityManager.find() and a SELECT query behave differently. EntityManager.find() checks session cache first before goes to database, but a SELECT query doesn't check cache first and always goes to database. Although a query goes to database, it avoids rebuilding an entity and the entity in the session cache is reused(some performance gain).

Normally when a transaction is completed a persistence context(or a EntityManager) is closed and new persistence context is used in another transaction. Session cache is useful in this situation.

For example,

//EntityManager is created and closed for a transaction
Long id = 100L;

//FIRST TRANSACTION
begin();//create a new em. start a transaction.
Employee e1 = em.find(Employee.class, id);
commitAndClose();//commit the transaction. close em.

//SECOND TRANSACTION
begin();
Employee e2 = em.find(Employee.class, id);
commitAndClose();

//THIRD TRANSACTION
begin();
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", id)
    .getSingleResult();
commitAndClose();

If Employee#100 was never retrived from database before the FIRST transaction, FIRST em.find() goes to database, build an entity and stores it on the session cache. SECOND em.find() gets a cache hit so don't goes to database. The THIRD query goes to database always but doesn't rebuild an entity and the entity in session cache is used.

If there is modifications/deletions of entities in the persistence context they are synchronized to session cache after a transaction committed, so the state of session cache is updated.

Cache Options

There are several properties related to session cache.

  • toplink.cache.type.xxx – the type of session cache.
  • toplink.cache.size.xxx – the size of session cache.
  • toplink.cache.shared.xxx - whether or not the session cache is shared

(xxx is entity name or "default")

These properties are well explained in the TopLink JPA Extensions for Caching. Check it out!

If there are external changes...?

It will be no problem if the application is only one which modifies the database. But if there is a external change to the database (i.e. by an other application, by SQL/JDBC or by manual), session cache will be out of date. Whenever you read entities by EntityManager.find() or JPQL query, your application may get out-dated entities.

Let’s see the following example.

//EntityManager is created and closed for a transaction
Long id = 100L;

//FIRST TRANSACTION
begin();//create a new em. start a transaction.
Employee e1 = em.find(Employee.class, id);
println("address1 = " + e1.getAddress());
commitAndClose();//commit the transaction. close em.

//LET'S UPDATE THROUGH JDBC (EXTERNAL CHANGE!)
Statement stmt = connection.createStatement();
stmt.executeUpdate("UPDATE EMP SET address = 'New' WHERE EMPID = " + id);
stmt.close();

//SECOND TRANSACTION
begin();
Employee e2 = em.find(Employee.class, id);
println("address2 = " + e2.getAddress());
commitAndClose();

//THIRD TRANSACTION
begin();
Employee e3 = (Employee)
    em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
    .setParameter("id", id)
    .getSingleResult();
println("address3 = " + e3.getAddress());
commitAndClose();

There are three transactions and I modified the address of Employee#100 between FIRST and SECOND transaction through JDBC. Session cache doesn’t know this kind of external change, so result is like below.

### address1 = Old
### address2 = Old
### address3 = Old

In the THIRD transaction the query actually goes to database (invoke SELECT statement), but current mechanism does not update the session cache even in this case and the out-dated entity from cache is returned.

So you should know that even the first time retrieval in the transaction may not be fresh one from database.

Is this what you expect? If not, how to avoid this? There are several ways to solve this situation. I will explain them in the following sections.

Getting fresh results from database

To get fresh data from database there are several ways. First way is using refresh operations. Other ways are explained in the following sections.

There are two kinds of refresh operations – portable EntityManager.refresh() and TopLink-specific refresh hint.

1. EntityManager.refresh()

Use find and refresh combination like below. This is a simple and portable way to get fresh data.

Employee e = em.find(Employee.class, id);
try {
  em.refresh(e);
} catch(EntityNotFoundException ex){
  e = null;
}

EntityNotFoundException should be caught around refresh() because the entity may be removed externally.

This way has some issues. refresh() requires a transaction (in case of container-managed EntityManager) so you need to start a transaction even if you are just doing read-only operations. Another issue is it may invoke two SELECT statements in find() and refresh() if there is no such entity in the session cache. If find() get fresh one from database refresh() is redundant operation, but there is no way to determine that find() returns fresh result, so you have to do refresh() anyway. But this case will happen just first time.

2. TopLink-specific refresh hint

TopLink provides query hint "toplink.refresh", use query like below.

try {
    e = (Employee)em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
        .setHint("toplink.refresh", "true")
        .setParameter("id", id)
        .getSingleResult();
} catch (NoResultException ex) {
    e = null;
}

NoResultException should be caught because Query.getSingleResult() can throw it if there is no such entity.

If portability doesn't matter, this is better way than the find and refresh combination because it doesn't require a transaction and it will trigger just one SELECT statement. Also this way can be used to retrieve group of entities like below.

List list = em.createQuery("SELECT e FROM Employee e WHERE e.name = :name")
    .setHint("toplink.refresh", "true")
    .setParameter("name", name)
    .getResultList();

To be more convenient you'd better make this kind of query as named query and compose a utility method like below.

@NamedQuery(name="Employee.freshFindById", 
  query="SELECT e FROM Employee e WHERE e.id = :id", 
  hints=@QueryHint(name="toplink.refresh", value="true"))
public class Employee {
...
    public Employee freshFind(EntityManager em, Integer primaryKey){
        Employee e;
        try {
            e = (Employee)em.createNamedQuery("Employee.freshFindById")
                .setParameter("id", primaryKey)
                .getSingleResult();
        } catch (NoResultException ex) {
            e = null;
        }
        return e;
    }

I placed the freshFind() utility method in the entity class, but it can be placed in other class as you wish.

Pessimistic locking

Pessimistic locking is another way to gurantee that loaded entities are fresh ones. This is more powerful because it also lock rows until transaction complete, so prevent another transaction from using the entities.

Pessimistic locking is not JPA standard, but TopLink provides this feature through a query hint "toplink.pessimistic-lock" like below.

e = (Employee)em.createQuery("SELECT e FROM Employee e WHERE e.id = :id")
.setHint("toplink.pessimistic-lock", "Lock")
.setParameter("id", primaryKey)
.getSingleResult();

Value "Lock" issues "SELECT ... FOR UPDATE" and "LockNoWait" issues "SELECT ... FOR UPDATE NO WAIT".

If you turn on pessimistic locking it automatically turn on "toplink.refresh=true" so it always go to database and update session cache and persistence context. You will always get fresh ones. Also it gurantees that entities are not modified in the database until transaction complete.

It is a good pattern to have this as named query like below.

@NamedQuery(name="Employee.lockedFindById", 
  query="SELECT e FROM Employee e WHERE e.id = :id", 
  hints=@QueryHint(name="toplink.pessimistic-lock", value="Lock"))

As you know pessimistic locking can be performance bottleneck. So use this carefully or consider using optimistic locking.

Disabling shared session cache

Dissabling shared cache is the final way. It is not recommended but in some cases it requires. Think of web tier which just only reads from database and need to get latest contents per request. It even doesn't need a transaction. (Normally it uses container-managed EntityManager out of transaction, so entities are not managed). In this case I don't need session cache and want to do simple find/query without extra work like refreshing.

It can be done by turning shared-cache property "toplink.cache.shared.xxx" off(xxx is entity name).

If you disable shared cache, the shared session cache in server session is not used for the specified entities and instead isolated client session caches are used. An isolated cache is the session cache which a client session has. Isolated cache is TopLink's concept but in JPA it doesn't have actual meaning because the first-level cache - persistence context - is used while EntityManager(client session) is alive.

For example, if you set "toplink.cache.shared.Employee=false" all Employee entities are not stored in shared cache. See the following configuration in persistence.xml.

<persistence ...>
  <persistence-unit name="HR">
  ...
    <properties>
    ...
      <property name="toplink.cache.shared.Employee" value="false"/>
      <property name="toplink.cache.shared.Department" value="false"/>

    </properties>
  </persistence-unit>
</persistence>

CAUTION: If the entity has relationships, the associated entities should be set to false too. In this case Department should be set to false.

There is also "toplink.cache.shared.default" property for all entities like below.

 <property name="toplink.cache.shared.default" value="false"/>

But this is not recommended[*]. Also it has a bug so it doesn’t work now.

Setting the cache type "toplink.cache.type.default" or "toplink.cache.type.xxx" to "NONE" has similar effect as above but this setting can result in infinite recursion if there is a cycle of eagerly loaded relationships[*]. So it not recommeded at all.

Future enhancements

The cache feature of TopLink Essentials is powerful but there are some limitations compared to commercial TopLink or Hibernate.

Cache synchronization between clustered application

If applications are different or applications are clustered, the session cache is not shared. TopLink provides cache synchronization and Hibernate also provides clustered 2nd-level cache.

Cache invalidation policy

Currently the session cache is controlled by Weak or Soft references(see cache-type options). So we cannot invalidate cache as time/daily basis or per query. TopLink provides several invalidation policies, but currently TopLink Essentials is lack of this.

Query cache

In TopLink Essentials every query goes to database. In some cases query results don’t change, so caching query results is good for performance. TopLink provides this feature, but currently TopLink Essentials is lack of this.

I hope these features are added to TopLink Essentials in the future.

References


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Thanks for putting this out there. I has it on my to-do list as it is important information and can be confusing to new users. Overall it is good and reads well.

    The following are just some minor comments:

    I know it is somewhat confusing with client session and server sessions both having a cache. To make things more clear I tend to avoid saying 'session cache' as it is not always clear which one you mean. As an example I would re-name your section "Disabling session cache" to be something like "Disabling Shared Cache" or "Disabling L2 Cache"


    Isolated cache == client session cache

    Shared cache = server session cache (L2)


    Bug 1054: EM.clear() does not clear entities cached in the client session's isolated cache. It does clear the EM's associated UnitOfWork used to manage the persistence context tracking any changes made. This means that after a clear if you are using isolated caching the next find for an already cached entity will not go to the database as expected.


    Cache hits on the isolated (client) or shared server session caches which completely avoid a database call are only for em.find calls. All queries will go to the database and then when processing results the session caches will be used to avoid re-building entities that are already cached (still an important performance gain). This explains what you are seeing in the THIRD transaction of your section: "If there are external changes...?". Your blog mentions that you will get a cache hit but does not make it clear that is currently only for finds.

    As far as the additional functionality concerning caching features in Oracle TopLink that do not yet exist in TopLink Essentials we appreciate the feedback and are evaluating enhancement requests of this nature as we lay out our roadmap for this product. Please ensure that all of your requests have enhancement requests filed with detailed explanation of the customer need so that we can prioritize them appropriately.

    Doug Clarke
    Principal Product Manager
    Oracle TopLink
    TopLink Blog

    Posted by: djclarke on September 08, 2006 at 07:58 AM

  • Thanks Doug for your kind pointing out,
    I modified contents in Session Cache, Disabling shared sesion cache and some other sections as you commented.

    Please comment if there is still something strange.

    Posted by: guruwons on September 08, 2006 at 05:47 PM

  • Nice post -- Sahoo

    Posted by: ss141213 on September 10, 2006 at 10:14 PM


  • Thanks for this post. This is the first time of my life I have just started to see how the cache hierarchy could help to solve transaction isolation ! Wouah !


    This being said, have you got any idea about future enhancements roadmap for TopLink essentials (~ persistence in Glassfish) ? According to Glassgish v2 roadmap, there is nothing inside v2 about that. So, it looks like users will have to wait for v3 at least... so, at least 2 more years if they wait the final v3 !

    Any news ? Thanks.

    Posted by: dmdevito on September 28, 2006 at 06:01 AM


  • Can we imagine that select queries go up to the database, AND update the session cache (L2 cache) if needed !? If such queries hit the database, they fetch data from the base up to TopLink, so TopLink layer can treat such received data to update the session cache, no !?

    Well, is it possible to detect, through a trick, that a returned piece of data has changed and that the session cache has to create a NEW POJO to store the new data ? I mean, not to update the POJO in the session cache as it could be shared in numerous L1 caches. Such a trick could be given through a 'date' column or a period of refresh...

    It would avoid the "toplink.refresh" and try-cach madness.


    Posted by: dmdevito on September 28, 2006 at 06:26 AM

  • Hi, dmdevito

    Select queries doesn't update cache unless 'toplink.refresh' is true as I said above in Session Cache section.

    "Although a query goes to database, it avoids rebuilding an entity and the entity in the session cache is reused(some performance gain)."

    Of course caches can be expired by toplink.cache.type policy(which depends on garbage collection).

    I guess there are some mis-understandings. TopLink doesn't return same (shared) objects from session cache, but it always returns cloned ones. So even if caches are updated, it doesn't affect already retrieved objects in persistence contexts.

    I'm not sure what you want, but if you use @Version field (date or integer) you can detect it has changed. Or if you want some some kind of automatic refresh period, it can be implemented by cache invalidation policy in the future.

    I don't know there is a plan for TopLink in GlassFish v2. I inquired of TopLink team in the following mailing list.
    https://glassfish.dev.java.net/servlets/BrowseList?listName=persistence

    - Wonseok

    Posted by: guruwons on September 29, 2006 at 06:21 AM

  • You write that

    will disable the session cache for entity Employee. First off, it is unclear, whether it should be Employee or com.mypackage.Employee in this configuration. Second, it does not state whether it works for EntityManager.find only or also for named queries. In my case it does not work for named queries.

    Posted by: ulim on October 30, 2006 at 01:44 AM

  • My HTML was garbled.

    You write that

    property name="toplink.cache.shared.Employee" value="false"/

    will disable the session cache for entity Employee.

    Posted by: ulim on October 30, 2006 at 01:46 AM

  • It can be toplink.cache.shared.[entity-name] or toplink.cache.shared.[fully-qualified-class-name]. So both are right.

    This property is applied to metadata-level at processing time and it should work in EM.find() and JPA queries. If you see wrong behaviour, could you send it to persistence at glassfish.dev.java.net? Then, I can check it.

    -Wonseok

    Posted by: guruwons on October 30, 2006 at 02:19 AM

  • In my case, cache is used when I make em.persist(entity) and then make query in the same JVM/app - old data is returned from cache in this case.
    But the cache is NOT used when I do NOT make em.persist() - in this case each query return latest data from database.

    I am preparing a couple of tests...

    Posted by: vicnov on November 16, 2006 at 09:57 AM

  • hi kim, nice reading.

    I have a problem while refreshing the master-detail tables, says (Employee and EmployeeDetail)

    I used .setHint("toplink.refresh", "true") for my NamedQuery, it can refresh the data in Employee; however it cannot refresh the data in EmplooyeeDetail

    How can I solve this? Thanks

    Posted by: no9876543210000 on March 13, 2007 at 02:34 AM

  • There is a cascade bug for toplink.refresh hint. Current workaround is using em.refresh(). :-(

    Posted by: guruwons on March 13, 2007 at 04:34 AM

  • Hi Kim,
    How can i re-write the code to use em.refresh(), if my code is em.createNamedQuery("Employee.findByKeys").setHint("toplink.refresh", "true").setParameter("location", location).getResultList();?
    Thanks

    Posted by: no9876543210000 on March 13, 2007 at 08:20 PM

  • I think there are two options:

    (1) use em.refresh() instead of toplink.refresh hint

    // assume that Employee has CASCADE.REFRESH for relationships
    List employees = em.createNamedQuery(x)...getResultList();
    for(Object o : employees) {
    em.refresh(o);
    }
    // this will trigger many SELECTs :-(


    (2) use both

    List employees = em.createNamedQuery(x).setHint("toplink.refresh", "true")...getResultList();
    for(Object o : employees) {
    // manually refresh only interested relationships
    EmployeeDetails d = ((Employee)o).getEmployeeDetail();
    if(d != null) {
    em.refresh(d);
    }
    }
    // this does not require CASCADE.REFRESH and trigger less SELECTs, but is lengthy

    Posted by: guruwons on March 13, 2007 at 09:50 PM

  • Hi Kim,

    I tried the method 1, but it returns Can not refresh not managed object
    Thanks

    Posted by: no9876543210000 on March 14, 2007 at 01:11 AM

  • Non managed object means it is not in persistence context. It seems that you're using container-managed EntityManager in GlassFish container. Then, You need to wrap around the code with transaction.
    Begin UserTransaction before em.createNamedQuery().
    Java EE 5 Tutorial will help.

    Posted by: guruwons on March 14, 2007 at 02:37 AM

  • Thanks Kim, it works, hope the bug fix will be release soon, thanks again

    Posted by: no9876543210000 on March 14, 2007 at 11:39 PM

  • According JSR220 find will synchonise with database and return the entity. Then why we need to explicity refresh it.

    Can you assist me by providing some use case.

    Posted by: sauravsaurav on June 07, 2007 at 01:50 AM

  • Hi, sauravsaurav.

    I couldn't find em.find() synchronize with database in the spec. Could you elaborate it?

    em.find() will find an entity with primary key in the current persistence context(PC), so it could be out-dated. The issue here is even though the entity is not in PC, 2nd-level cache(shared cache) will return the entity to improve performance(it's not spec-violation).

    Therefore, it's not good to assume that the entity returned by find() is always the current state of database.

    Posted by: guruwons on June 13, 2007 at 06:16 PM

  • Hi Kim

    Nive and usefull post !!! I've been some questions...Would you mind answer ?
    I saw that session cashe is set true by default....but it work with jse applications ? or junt only within containers jee 5 (web or ejb) ?

    Posted by: fernandofranzini on September 05, 2007 at 07:27 AM

  • Hi Fernando, the session cache is being used also in Java SE mode. This is not something provided by Java EE container, but internal to TopLink. - Wonseok

    Posted by: guruwons on September 05, 2007 at 09:58 PM

  • About this suggestion of yours:


    List employees = em.createNamedQuery(x)...getResultList();
    for(Object o : employees) {
    em.refresh(o);
    }

    This approach has the potential for a severe error: refreshing the entities could change them, so that they would not match the query anymore. Say your query selects a list of all employees from a certain department. Suppose these are 17 employees, but one of them is just now transferred to department Y, you refresh and return the result: a list of 16 employees from department X and one from department Y.
    Therefore, you can only refresh entities, which you selected from an unchangeable field such as the primary key. For all other entities you have to decide whether it is better to return outdated data or wrong data.

    Posted by: ulim on October 24, 2007 at 07:05 AM

  • Good point, ulim. You're correct.
    I think we don't need the workaround anymore because the cascade refresh hint bug are fixed already.

    Posted by: guruwons on October 24, 2007 at 08:26 AM

  • Hi,
    I'm sure I am not alone in wanting to minimise round trips to the database for slowly changing data. If I understand your blog correctly, the secondary cache should be ideal for this purpose, *but*, my application uses Stateless EJBs driven by e.g. WebServices and therefore when doing a lookup they are much more likely to be starting from e.g. EMPLOYEE.FIRSTNAME and EMPLOYEE.LASTNAME as opposed to an internally generated primary key number like EMPLOYEE.ID. So, quite often the EJB will have to do a "select .." which will then retreive an ID, however I lose the benefit of the ID at the end of that business method when the EJB is returned to the pool, and the next time the EJB is called it has to start again with a "select .."

    Do you have any good tips how a stateless EJB could be more successful at utilising primary keys which are unknown to the external interface?
    Is it really the case that in order to get a performant second level cache I will have to upgrade to a commercial persistence provider?

    Posted by: freddiefishcake on December 11, 2007 at 09:39 AM

  • Hello, i have a problem and maybe you can help me, I have a problem refresing the entityManager.

    I have an swing application, an i use it in 2 computer in a lan and when I do an update in one of the computer, then i read form the other the query get the update result but the object still have the old value..

    -apellidos is a String
    -aseguradora is an Object
    - cirujano is an Object

    This is the code of how a read.

    entityManager.getTransaction().begin();
    Query query = entityManager.createQuery(createQuerySearchPacientes(apellidos, aseguradora,cirujano));
    query.setHint("toplink.refresh", "true");

    if((apellidos!=null)&&(apellidos.length()>0)) {
    apellidos += "%";
    query.setParameter("apellidos", apellidos);
    }
    if((aseguradora!=null)&&(aseguradora.isValid())) {
    query.setParameter("idAseguradora", aseguradora);
    }
    if((cirujano!=null)&&(cirujano.getIdCirujano()>0)) {
    query.setParameter("idCirujanoPrivado", cirujano);
    }
    List pacientes = query.getResultList();
    entityManager.getTransaction().commit();
    Can you tell me where is the problem?

    Tnaks a lot.

    Posted by: ricardo123 on January 12, 2008 at 03:07 PM

  • Please post questions regarding cache to persistence@glassfish.dev.java.net mailing list. This is the better place for Q&A.

    http://www.nabble.com/java.net---glassfish-persistence-f13455.html

    https://glassfish.dev.java.net/servlets/SummarizeList?listName=persistence

    Posted by: guruwons on January 12, 2008 at 11:48 PM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds