|
|
||
Jan Haderka's BlogJune 2008 ArchivesCaching strategies for Magnolia 3.6Posted by rah003 on June 26, 2008 at 09:19 AM | Permalink | Comments (1)The stable release of Magnolia 3.6 will soon be out and it will ship with a brand new cache system implemented from scratch. The intention was not to reinvent the wheel, instead, the new cache system decouples Magnolia's presentation layer from the underlying cache mechanisms and allows to plug in any cache engine of choice. Magnolia 3.6 will be distributed with EHCache as the default cache engine, but anyone who is willing to implement a custom cache wrapper can run Magnolia with any other cache engine from the upcoming 3.6 release onwards. With the new cache system in place, we, the Magnolia developers started to look into which advanced cache strategies should be included. The cache strategies that would allow server administrators to quickly achieve the best cache configuration for their particular use case with minimal effort. New strategies will be available in Magnolia 3.6 Enterprise Edition, adding more in subsequent versions of the Enterprise Edition (EE). We have a bunch of ideas on what strategies we would like to implement based on custom solutions implemented for clients in the past. Some of the cache strategies are more difficult to implement then others. Which ones we deliver with Magnolia EE 3.6 depends on how many obstacles we hit during implementing them and how well we can test those strategies before the release. The picture below shows a graph of what we ended up on the whiteboard while brainstorming possibilities for advanced strategies to be delivered with Magnolia 3.6. One option is a more granular cache to minimize server load by re-using previously generated output of single page elements. Instead of re-creating a page's complete HTML by retrieving all content elements every time a page is being requested, the granular cache would fetch all the content pieces only once and keep them as gzipped cache entries to assemble a complete page by re-using unchanged cachied content and re-building only modified content pieces. Implementing the granular cache mechanism is actually quite tricky, because Magnolia allows content to appear as page element on various other places of a website for dynamic content reuse. For example, the title of a page in one part of the site can be used in the navigation menu at some only remotely relevant part of the same site just to provide link to it. Tracking all the places where content elements appear is actually nearly impossible without constructing a complete content usage graph within Magnolia. Hence, a basic cache flushing strategy would be to simply flush all cached pages of a website content has been updated. The main problem with such an approach is that the user who visits the website immediately after the cache has been flushed will be the one who suffers from the performance penalty imposed by the server that will now need time to generate all the new cache entries. While this approach to caching is quite safe it is also quite heavy weight. Rather then relying on this brute force solution, Magnolia EE 3.6 will be delivered with different re-caching strategies that will be more appropriate for different scenarios. Another typical issue is the fact that many concurrent requests affect frequently accessed entries. (Isn't that why they are called frequently accessed after all? ;) ). It is absolutely correct to trigger the generation of cache entries upon the first request, since they do not exist yet. It might happen that more requests hit the server while requested cache entry while it is still being generated. In this case it would be a waste of computing power to again trigger the creation of the same entry for those other requests. In fact, requests that have been issued after the first one won't get the results faster then the first request can be served. Additionally, if they triggered the retrieval of complete representation of a page composed from all the pieces in the repository they will increase the load of the server and slow it down. Mind we are talking millisecond differences here, but it still matters. For this reason, the cache system included in Magnolia 3.6 blocks all subsequent requests to the cache until a cache entry is ready and then Magnolia will serve it to the visitors. Then again you don't want your server to hang forever in case such an entry can't be generated, no matter whether it is due to a page that doesn't exist or content that is corrupted or even due to a broken template. Since we implement scenarios for Magnolia EE where access to the cache is distributed over multiple modules and comes from multiple threads we made extra effort to make sure the new cache system never ever ends up in a deadlock. As for the new cache strategies considered for inclusion, let's look at few examples of what we consider implementing. Serve Old Content While Re-caching Strategy: Instead of flushing the cache on update, old content is kept and served until updated cache entry is created.This way first request for content after an update will trigger generating of new entry, while all the subsequent requests for same entry will be served old content until it is re-cached. Eager Re-cache Strategy: It allows to serve already cached content to website visitors while re-caching the most accessed pages (e.g. the top 100 high traffic pages) in the background before flushing the cache. The idea behind this is to be able to have highly demanded content being served by Magnolia at highest possible speed at all times. With the Eager Re-cache, visitors will not have to wait for changed content to be cached, because this has been done in the background before they accessed the page. The ability to configure a specific number of entries to be re-cached makes it possible for server administrators to balance the trade-off between time-to-publishing and a performance penalty for users. Another example would be what we call a Content Driven Timeout Cache Strategy that allows authors of a page to mark a page as expired based on a time set by the author. Currently such scenario is already possible to implement, but requires collaboration between a Magnolia template developer and page author instead of leaving the decision making power completely in the hands of the author. Why would you use such a timout cache strategy? Just think of a situation where a page includes the results of a news feed and the content editor wants to make sure it is updated at least once a day even though there is no actual content update since the data is being fetched from a third-party RSS feed. Or imagine the opposite scenario that could be handled with same strategy: a page that triggers data mining mechanisms which are quite expensive in terms of server load to generate the output while it has no dependency on any other content. You would be able to mark such a page as expiring in specified intervals only and ignoring all other content updates knowing that that page is not affected by modifications of authors at all. In sum, the new Magnolia cache implementation - apart from being more efficient in storing and serving the content - also allows for more advanced handling of the cached entries in the Magnolia Enterprise Edition.
SwingX 0.9.3 ReleasedPosted by rah003 on June 10, 2008 at 11:45 AM | Permalink | Comments (9)0.9.3 release instead of concentrating on new features, delivers bug fixes for most pressing issues and should provide users with stable and usable code. This is ever more important as there will be some API changes made as a part of API cleanup planned for release 0.9.4. So hopefully 0.9.3 will provide necessary stability for those who won't be able or willing to jump up the wagon and accommodate API changes in their code immediately. Binaries should be in central maven repo within a week. In the mean time you can get the release from the downloads page. Magnolia's New Transactional Activation ModulePosted by rah003 on June 05, 2008 at 10:27 AM | Permalink | Comments (2)I never wrote anything about Magnolia on this blog yet, so for those of you who never heard of it here comes a brief summary. If you know what Magnolia is about, feel free to skip to the next paragraph. Magnolia is a Content Management System (CMS). It is Open Source and there is a Community Edition available for those who are in need of a CMS and are happy to work it out themselves and rely on the developer community for help. For corporate clients who require more reliable support there is an Enterprise Edition. Magnolia is being developed in Java and is based on JSR-170 (Java Content Repository) compatible stores. By default it comes with the JSR-170 reference implementation: JackRabbit, but can be setup with any JSR-170 compatible repository. You can think of Magnolia as glue between web server and the backend repository. Activation loop.When having multiple public websites served from one authoring instance, you want to make sure that they stay in sync while publishing new pieces of content. This is exactly what the new transactional activation module of the upcoming Magnolia 3.6 release is about. This module ensures a transaction-safe publishing process for a multi-site setup. In effect, this means that Magnolia will publish updated content to either all or none of your public sites. Of course one could object that if one of the public sites goes down you still want to publish on the remaining ones so your readers get the news you want to publish, but this is something else. If one of the servers goes down, you can still remove it from the list of subscribers and happily continue publishing. The point is that removing one public server from the list becomes a conscious decision of the admin within Magnolia, and when adding it back, this action coincides with making sure all the content is being republished. So how can activation of content be transactional? Or even better what is the transaction in this context? The definition of transaction is simple: copy a piece of content (or a bunch of them) from an authoring instance to all subscribed public instances. So what Magnolia does is that it tries to copy the content to each and every subscriber. If all content has successfully been propagated, everything is fine and Magnolia issues a "commit" command during which all the temporary data is being cleared. In case some of the subscribed Magnolia instances don't respond in time or return errors, all the other subscribers are asked by the authoring instance via "rollback" command to restore to the previous state. ![]() Each activation transaction has multiple phases. It starts with a transmission phase during which content is being transmitted from a Magnolia authoring instance to a public instance and ends with a collection phase during which feedback (or lack thereof) is collected from all public instances. On each public instance, those phases can look differently depending on what the exact content of a transaction is like. Let's look at those different scenarios in more detail:
Now let us take a look at the implementation details: Since Magnolia uses a JCR-based back-end storage, the obvious choice would be to use the versioning capabilities of the repository, which actually works like charm. You want to publish a new version of the content and preserve the existing one? All you have to do is to create a new version of the content and you can easily restore it back with the help of the rollback functionality. However, there is a catch. Once content has been deleted, it cannot be restored. Hence, while versioning with a JCR would work nicely for activation, it will not work in the case of deactivation. Once the latest version of a piece of content has been deleted, it doesn't matter that we still have previous versions of it in the repository since the JCR will not allow us to restore any older version. Another option would be to move the content to some parking area of the site and move it back in case of rollback. That would work, but ... hey, this is your website we are talking about! You surely don't want to have some secret stash of content there, not only it doesn't look nice, but such content is potentially exposed to the whole wide world and can be linked by other web content or manipulated by mistake by some administrator who would not know where it came from if he logged in just in the middle of a Magnolia multi-site transaction. So what Magnolia does instead is to make this temporary storage place yet another secure workspace. This way it can stash all the old versions of content safely away not polluting a live Magnolia website and it can easily restore during rollback if necessary. ![]() The only open issue that remains is that there could still be a brief period of inconsistency on a publicly visible Magnolia server in between the moment after the transaction was initiated and before it would be rolled back (in case of the aforementioned scenario of failure to deliver to one of the other public servers). Fortunately, Magnolia solves this problem due to a cache mechanism included on each public server. Content is being served to website visitors from the cache and the cache is being flushed only after the activation process is completely finished. So during the process of activation, a public server keeps displaying old cached content to the outside world until all activation work is done and over. Happy publishing. | ||
|
|