Skip to main content

Let's keep those cache entries for little bit longer

Posted by rah003 on March 26, 2010 at 10:35 AM PDT

Have you got magnolia-4.3? Setup multiple sites? All right, you are all done. The only thing left is to observe the load on the server and how many requests you can serve. You might have noticed that after activating the content, load on the public is bit higher even though there is no increase in traffic.

Why? Simply because after activating the piece of content, cache on the public instance have been flushed. Well, so far this is actually nothing else but what you want. You activated new content and you indeed want to show world what is there and not the stale old stuff. Never the less, when using multiple sites, you might have also noticed that you are getting the hit also for the other sites then the one for which you activated the content. And in most cases, this is completely unnecessary since each site has own content and changes to one site do not affect the other.

So how to get out of this bind? If you dig bit into the way cache works (which you might do by reading older entries here and the official documentation), the obvious solution you will most likely come up with is to write a custom flush policy and when new content is activated, clear from the cache only those entries that belong to the same subtree as activated content. I've been there too. But imagine this - you've got say 10 sites each with approximately 500 pages and each page on average consisting of 15 resources. That gives you total of about 75K cache entries. Do you really want to iterate through all of them just to figure out which ones should be flushed and which should be cached? You might, but it might cost you as much as you would save by not having to re-cache all content. 

The next best idea one gets after this is to keep the mapping of between the subtrees and cache keys. That would work, but requires lots of custom coding, you need grab that when the cache entries are put into the cache, you need to make sure such mapping is persistent, it must have get cleaned when the cache is flushed. That's sort of doable, but too much work, isn't it?

Rather then writing that amount of code, I thought there must be a better way. And indeed there is. All it takes is a bit of configuration and just about 10 lines of code.

The trick is to use multiple caches rather then one and cache each subtree into the separate cache so on activation we just need to figure out which subtree cache to flush completely without need to touch a single cache entry on its own. How cool is that.

Now to do this, I've needed multiple cache filter configurations - one per each subtree and one default one for everything else. You also need to configure bypasses properly so that the subtree cache filters listen each just for their own subtree while the default one listens for everything else. Meaning everything except those requests handled by the subtree caches. I know I'm repeating it again, but I really really want to get this point across. You don't want to have 2 cache filters caching the same things.

 

The second bit that goes with the multiple cache filters is multiple cache configurations. Since 4.3 those got simpler then ever. Just create one node for each subree cache in /modules/cache/config/configurations and in that node, create nodeData "extends" pointing to default configuration - "../default"

 

That's it for the configuration. Now to the last bit - the code. We need to change the flush policy. The default FlushAllListeningPolicy would flush everything from the cache as we want, but it would also kick in on each content update. What we want here is to kick in only when update happens to the subtree that given cache is concerned with.

Rather then going through the events manually, it is much more convenient to let observation code do this and register cache observation for the proper level of the content only. Since the observed path is set on startup up, we need to override start() method of the policy to do so. And of course if we register it ourselves, we need to make sure we deregister on shutdown properly as well. And that's it really.

 

public void start(Cache cache) {
  ...
  final CacheCleaner cacheCleaner = new CacheCleaner(cache, repository);
  final EventListener listener = ObservationUtil.instanciateDeferredEventListener(cacheCleaner, 5000, 30000);
  final String path = "default".equals(cache.getName()) ? "" : cache.getName();
  try {
    ObservationUtil.registerChangeListener(repository, "/" + path, listener);
  } catch (Exception e) {
    ...
  }
  registeredListeners.put(repository, listener);
  ...
}

Looking at the code you might have noticed that the "default" is treated differently and rather then registering to the subtree (which is assumed to match the name of the cache in this case), it registers for all content updates. Why? Simply to make sure that the default cache is still flushed on each update since it might contain content related to any and all sites. That's why.

 

As usually, complete code will in the svn as soon as I get back home. Hope this gives you inspiration for writing your own policy to make caching more effective in your specific environment.

 

AttachmentSize
cache-filter-config.png35.82 KB
cache-config.png40.08 KB

Comments

Let's keep those cache entries for little bit longer

Am I right that info.magnolia.module.cache.FlushSubpathCacheListeningPolicy has been renamed and moved to the Advanced Cache Module as info.magnolia.module.advancedcache.SiteAwareFlushAllListeningPolicy?