Magnolia Cache in clustered environment
I might have mentioned something about cache in Magnolia here before, today let's look at another aspect of it.
While in general Magnolia follows well known and understood publish/subscribe model when it comes to page activation (the activation is always done in direction from authoring to public instance), there is one notable exception to this model - public generated content. This it the kind of content like forums, page comments, etc. This is the content that is created on public instance and resides on public instance only. No big deal you could say. Such public generated content is in another workspace, completely separated from the web content and only loosely connected in case of features like page comments.
Yeah right. Let's add cache to the picture.
The cache on public instance automatically flushes itself when new piece of content is published, to make sure users are not served stale version of the page. Still no issue here. Also when new page comment is generated, commenting module is going to instruct cache to flush the page for which comment was generated for since it knowns where the comment is coming from.
Still looks kind of OK. Let's add multiple public instances to the picture.
Yuck! We've got the issue, if I have forum or page comments deployed on multiple public instances and they are not aware of each other and of each others content, we've got an issue. The solution to this is quite simple, let's just connect our public instances into a cluster. Is that possible? Yes, why not ... as long as we use JCR implementation that is clusterable, everything should be fine. There are various reasons why one might not want to cluster all the workspaces, but use multiple repositories and cluster for example only the forum/commenting workspace in separate repository. You can do both in Magnolia and it is not the point of this exercise so I will not say more on this subject. What I want to explore today is how this affects cache ... which is not clustered in either case.
For the plain activation of the content, full or partial clustering is not an issue. When fully clustered, only one node from the cluster is subsribed to the author instance and author will publish content only to this one public. When content gets published, all cluster nodes will be notified about new content via events distributed in all nodes of the cluster and all node will flush their cache as they should. When publishing content to not clustered workspace, all of the cluster nodes need to be subscribed, so the author instance will publish content to each of them and again all of the public instances will get their own event notifications and flush the cache upon receiving the appropriate event.
Now about public generated content, and more specifically about page comments. We definitively want to cluster workspace into which they are stored so we can share them across all cluster nodes. If it is not clear from the above, the page comments are not part of the page itself, but are stored separately and only reference page to which they belong. So there is nothing to flush from the cache, based on event notification when new comment is added since the comment is not a page that anyone can see directly. Hence the original solution of telling the cache directly "hey go and flush page xyz since there is new comment on it, even though you don't know about it". Unfortunately this approach break as soon as clustering comes into the play. The commenting module can discover only cache that is local to it (running in the same cluster node), so it can at most flush given page only from the cache at give node, but not from the others. Now you see where the problem lies. We need to notify and flush affected page from all the nodes across the cluster.
What can do as an extreme solution, would be to turn the caching off for all cluster nodes. After all we have a cluster, and can add extra nodes if we need to scale up, so why to bother with caching. Somehow I don't think that is a good idea, but yeah, it would be a solution to the problem.
Another option would be to cluster the cache as well. The default implementation of Magnolia cache uses ehCache underneath, so this is quite possible, but it would complicate configuration of the instances. I was looking for some painless solution to the issue.
The solution I choose in the end was, to use existing event notification mechanism that works well in clustered environment and is already used for flushing the pages from cache on activation. So, now there is a flush policy implemented that listens for updates in forum/commenting workspace and if it detects page update, it goes and figures out if there is related page and if so, it will flush it from the cache. No big deal. Like usual in such cases, figuring out the right solution to the issue took longer then actually implementing it.
Below find the diagram showing interaction of page comments and cache in clustered Magnolia environment. As you can see when content is stored in the workspace, even notification is distributed to listeners on all cluster nodes. Appropriate flush policy that understands what is being stored in forum workspace (as we use forum workspace to store page comments) and how it affects pages is ensuring that cache will be updated as appropriate. And we have no need for direct commenting-cache interaction.