The Source for Java Technology Collaboration
User: Password:



Jan Haderka's Blog

Jan Haderka Jan Haderka is an independent software developer and technology consultant focusing on desktop and enterprise applications. He has been writing software for number of years and since 1995 he has been doing so for living, working for various companies from small startups to big corporations. Lately he has been involved in number of projects, Swinglabs and Magnolia among others.



Caching strategies for Magnolia 3.6

Posted by rah003 on June 26, 2008 at 09:19 AM | Permalink | Comments (1)

The stable release of Magnolia 3.6 will soon be out and it will ship with a brand new cache system implemented from scratch. The intention was not to reinvent the wheel, instead, the new cache system decouples Magnolia's presentation layer from the underlying cache mechanisms and allows to plug in any cache engine of choice. Magnolia 3.6 will be distributed with EHCache as the default cache engine, but anyone who is willing to implement a custom cache wrapper can run Magnolia with any other cache engine from the upcoming 3.6 release onwards.

With the new cache system in place, we, the Magnolia developers started to look into which advanced cache strategies should be included. The cache strategies that would allow server administrators to quickly achieve the best cache configuration for their particular use case with minimal effort. New strategies will be available in Magnolia 3.6 Enterprise Edition, adding more in subsequent versions of the Enterprise Edition (EE). We have a bunch of ideas on what strategies we would like to implement based on custom solutions implemented for clients in the past. Some of the cache strategies are more difficult to implement then others. Which ones we deliver with Magnolia EE 3.6 depends on how many obstacles we hit during implementing them and how well we can test those strategies before the release.

The picture below shows a graph of what we ended up on the whiteboard while brainstorming possibilities for advanced strategies to be delivered with Magnolia 3.6.

cache-thumb.png

One option is a more granular cache to minimize server load by re-using previously generated output of single page elements. Instead of re-creating a page's complete HTML by retrieving all content elements every time a page is being requested, the granular cache would fetch all the content pieces only once and keep them as gzipped cache entries to assemble a complete page by re-using unchanged cachied content and re-building only modified content pieces.

Implementing the granular cache mechanism is actually quite tricky, because Magnolia allows content to appear as page element on various other places of a website for dynamic content reuse. For example, the title of a page in one part of the site can be used in the navigation menu at some only remotely relevant part of the same site just to provide link to it. Tracking all the places where content elements appear is actually nearly impossible without constructing a complete content usage graph within Magnolia. Hence, a basic cache flushing strategy would be to simply flush all cached pages of a website content has been updated.

The main problem with such an approach is that the user who visits the website immediately after the cache has been flushed will be the one who suffers from the performance penalty imposed by the server that will now need time to generate all the new cache entries. While this approach to caching is quite safe it is also quite heavy weight. Rather then relying on this brute force solution, Magnolia EE 3.6 will be delivered with different re-caching strategies that will be more appropriate for different scenarios.

Another typical issue is the fact that many concurrent requests affect frequently accessed entries. (Isn't that why they are called frequently accessed after all? ;) ). It is absolutely correct to trigger the generation of cache entries upon the first request, since they do not exist yet. It might happen that more requests hit the server while requested cache entry while it is still being generated. In this case it would be a waste of computing power to again trigger the creation of the same entry for those other requests. In fact, requests that have been issued after the first one won't get the results faster then the first request can be served. Additionally, if they triggered the retrieval of complete representation of a page composed from all the pieces in the repository they will increase the load of the server and slow it down. Mind we are talking millisecond differences here, but it still matters. For this reason, the cache system included in Magnolia 3.6 blocks all subsequent requests to the cache until a cache entry is ready and then Magnolia will serve it to the visitors.

Then again you don't want your server to hang forever in case such an entry can't be generated, no matter whether it is due to a page that doesn't exist or content that is corrupted or even due to a broken template. Since we implement scenarios for Magnolia EE where access to the cache is distributed over multiple modules and comes from multiple threads we made extra effort to make sure the new cache system never ever ends up in a deadlock.

As for the new cache strategies considered for inclusion, let's look at few examples of what we consider implementing.

Serve Old Content While Re-caching Strategy: Instead of flushing the cache on update, old content is kept and served until updated cache entry is created.This way first request for content after an update will trigger generating of new entry, while all the subsequent requests for same entry will be served old content until it is re-cached.

Eager Re-cache Strategy: It allows to serve already cached content to website visitors while re-caching the most accessed pages (e.g. the top 100 high traffic pages) in the background before flushing the cache. The idea behind this is to be able to have highly demanded content being served by Magnolia at highest possible speed at all times. With the Eager Re-cache, visitors will not have to wait for changed content to be cached, because this has been done in the background before they accessed the page. The ability to configure a specific number of entries to be re-cached makes it possible for server administrators to balance the trade-off between time-to-publishing and a performance penalty for users.

Another example would be what we call a Content Driven Timeout Cache Strategy that allows authors of a page to mark a page as expired based on a time set by the author. Currently such scenario is already possible to implement, but requires collaboration between a Magnolia template developer and page author instead of leaving the decision making power completely in the hands of the author.

Why would you use such a timout cache strategy? Just think of a situation where a page includes the results of a news feed and the content editor wants to make sure it is updated at least once a day even though there is no actual content update since the data is being fetched from a third-party RSS feed. Or imagine the opposite scenario that could be handled with same strategy: a page that triggers data mining mechanisms which are quite expensive in terms of server load to generate the output while it has no dependency on any other content. You would be able to mark such a page as expiring in specified intervals only and ignoring all other content updates knowing that that page is not affected by modifications of authors at all.

In sum, the new Magnolia cache implementation - apart from being more efficient in storing and serving the content - also allows for more advanced handling of the cached entries in the Magnolia Enterprise Edition.

SwingX 0.9.3 Released

Posted by rah003 on June 10, 2008 at 11:45 AM | Permalink | Comments (9)

0.9.3 release instead of concentrating on new features, delivers bug fixes for most pressing issues and should provide users with stable and usable code. This is ever more important as there will be some API changes made as a part of API cleanup planned for release 0.9.4. So hopefully 0.9.3 will provide necessary stability for those who won't be able or willing to jump up the wagon and accommodate API changes in their code immediately.

Binaries should be in central maven repo within a week.

In the mean time you can get the release from the downloads page.



Magnolia's New Transactional Activation Module

Posted by rah003 on June 05, 2008 at 10:27 AM | Permalink | Comments (2)

I never wrote anything about Magnolia on this blog yet, so for those of you who never heard of it here comes a brief summary. If you know what Magnolia is about, feel free to skip to the next paragraph. Magnolia is a Content Management System (CMS). It is Open Source and there is a Community Edition available for those who are in need of a CMS and are happy to work it out themselves and rely on the developer community for help. For corporate clients who require more reliable support there is an Enterprise Edition.

Magnolia is being developed in Java and is based on JSR-170 (Java Content Repository) compatible stores. By default it comes with the JSR-170 reference implementation: JackRabbit, but can be setup with any JSR-170 compatible repository. You can think of Magnolia as glue between web server and the backend repository.

Activation loop.

When having multiple public websites served from one authoring instance, you want to make sure that they stay in sync while publishing new pieces of content. This is exactly what the new transactional activation module of the upcoming Magnolia 3.6 release is about. This module ensures a transaction-safe publishing process for a multi-site setup. In effect, this means that Magnolia will publish updated content to either all or none of your public sites.

Of course one could object that if one of the public sites goes down you still want to publish on the remaining ones so your readers get the news you want to publish, but this is something else. If one of the servers goes down, you can still remove it from the list of subscribers and happily continue publishing. The point is that removing one public server from the list becomes a conscious decision of the admin within Magnolia, and when adding it back, this action coincides with making sure all the content is being republished.

So how can activation of content be transactional? Or even better what is the transaction in this context? The definition of transaction is simple: copy a piece of content (or a bunch of them) from an authoring instance to all subscribed public instances. So what Magnolia does is that it tries to copy the content to each and every subscriber. If all content has successfully been propagated, everything is fine and Magnolia issues a "commit" command during which all the temporary data is being cleared. In case some of the subscribed Magnolia instances don't respond in time or return errors, all the other subscribers are asked by the authoring instance via "rollback" command to restore to the previous state.


Each activation transaction has multiple phases. It starts with a transmission phase during which content is being transmitted from a Magnolia authoring instance to a public instance and ends with a collection phase during which feedback (or lack thereof) is collected from all public instances. On each public instance, those phases can look differently depending on what the exact content of a transaction is like.

Let's look at those different scenarios in more detail:

  • activation of new content
    • transmit phase: receive content and store it in the website workspace
    • commit action: nothing
    • rollback action: just delete the content from website workspace

  • activation of new version of previously activated content
    • transmit phase: receive content, move existing version to temporary storage place and replace current version in website workspace with the received content
    • commit action: delete previous version of content from temp storage
    • rollback action: delete new version of the content from website workspace and restore previous version from temporary storage

  • deactivation of content
    • transmit phase: move content from website workspace to the temporary storage
    • commit action: delete the content from temporary storage
    • rollback action: restore content from temp storage back to website workspace

    Now let us take a look at the implementation details: Since Magnolia uses a JCR-based back-end storage, the obvious choice would be to use the versioning capabilities of the repository, which actually works like charm. You want to publish a new version of the content and preserve the existing one? All you have to do is to create a new version of the content and you can easily restore it back with the help of the rollback functionality. However, there is a catch. Once content has been deleted, it cannot be restored. Hence, while versioning with a JCR would work nicely for activation, it will not work in the case of deactivation. Once the latest version of a piece of content has been deleted, it doesn't matter that we still have previous versions of it in the repository since the JCR will not allow us to restore any older version.

    Another option would be to move the content to some parking area of the site and move it back in case of rollback. That would work, but ... hey, this is your website we are talking about! You surely don't want to have some secret stash of content there, not only it doesn't look nice, but such content is potentially exposed to the whole wide world and can be linked by other web content or manipulated by mistake by some administrator who would not know where it came from if he logged in just in the middle of a Magnolia multi-site transaction. So what Magnolia does instead is to make this temporary storage place yet another secure workspace. This way it can stash all the old versions of content safely away not polluting a live Magnolia website and it can easily restore during rollback if necessary.


    The only open issue that remains is that there could still be a brief period of inconsistency on a publicly visible Magnolia server in between the moment after the transaction was initiated and before it would be rolled back (in case of the aforementioned scenario of failure to deliver to one of the other public servers). Fortunately, Magnolia solves this problem due to a cache mechanism included on each public server. Content is being served to website visitors from the cache and the cache is being flushed only after the activation process is completely finished. So during the process of activation, a public server keeps displaying old cached content to the outside world until all activation work is done and over.

    Happy publishing.



World Edition SVN

Posted by rah003 on April 16, 2008 at 01:09 PM | Permalink | Comments (0)

Few months ago drive in my notebook stopped working and I lost some data which I was very sorry about. That's when I started to look into some backup solution. Then recently I found this little (physical size wise) disk - WD MyBook World Edition 2TB. Since memory of having lost my data is still fresh in my mind I configured this drive to run in RAID-1 mode even though it meant reducing size to 1TB only. "1TB only", indeed, reminds me of beginning of 90-ies when I got desktop with super king size HDD of 80MB.

Anyway, after discovering there was a linux running inside of box, and finding out how to enable ssh (Thank you Martin) it was just a matter of time, to try to install svn server on this neat box.

Installing svn was not as simple as I imagined originally, but was not too complex either, so if anyone gets box with same linux (should be any of the World Edition drives, no matter of what size) what's written below can save you stumbling across same problems I had and should help you to make installation smooth and simple.

If you don't know your Linux, you might screw up your World Edition Hard Drive big time. If you proceed it is at your own risk, so proceed with caution!

Get started - Apache2

Login to your box using ssh as whatever user you have created during installation of your box, then execute following to get your apache installed in default location (/usr/local/apache2):

wget http://www.apache.org/dist/httpd/httpd-2.2.8.tar.gz
tar -xvzf httpd-2.2.8.tar.gz
./configure --enable-dav --enable-so --enable-maintainer-mode
make
sudo make install

Installation itself takes quite while so don't be nervous. I would be very happy if WD offered boxes with more powerful version of little ARM brain it uses, but true is that what they put inside is enough to do the job.

Be there a light - Neon

After successful installation of Apache2 you are ready to get started with subversion

wget http://subversion.tigris.org/downloads/subversion-1.4.6.tar.bz2
tar -xvjf subversion-1.4.6.tar.bz2

You will also need subversion dependencies (goes into same installation folder as subversion itself

wget http://subversion.tigris.org/downloads/subversion-deps-1.4.6.tar.bz2
tar -xvjf subversion-deps-1.4.6.tar.bz2

Now we need to get rid of apr since we will use one that came with apache2. This is important as using different versions would result in seq faults later

cd subversion-1.4.6
rm -rf apr
rm -rf apr-util

Build neon separately. Why? First it is recommended in Subversion installation guide and second I could not make it work when trying to build it together with subversion itself

cd neon
# provide configure with expat location under apache
LDFLAGS="-L/usr/local/apache2/lib" CPPFLAGS="-I/usr/local/apache2/include" ./configure --with-ssl
make
sudo make install
cd ..
rm -rf neon

Big Finale - Subversion

Now we have all mandatory prerequisites ready and can start building subversion itself

LDFLAGS="-L/usr/local/apache2/lib" CPPFLAGS="-I/usr/local/apache2/include" ./configure --with-ssl --with-apr=/usr/local/apache2 --with-apr-util=/usr/local/apache2 --with-apxs=/usr/local/apache2/bin/apxs --with-ssl
make

Before running make install you need to update ldconfig so create (or update if you have it already) /etc/ld.so.conf and add following lines

/usr/local/lib
/usr/local/apache2/lib

afterward update ld cache by executing

sudo ldconfig

and finally install subversion

sudo make install

Post Mortem

Now having successfuly installed apache and subversion all that's left is to configure it. Add following to the end of /usr/local/apache2/conf/httpd.conf

<Location /svn>                                        
    DAV svn                                                                    
    SVNParentPath /internal/shares/SVN    
    AuthType Basic                                                            
    AuthName "Subversion repository"                                     
    AuthUserFile /etc/svn/passwd                                       
    Require valid-user                     
</Location>

Another configuration steps you should perform are

  • change Listen from 80 to something else i.e. 78
  • change User from deamon to apache
  • change Group from deamon to apache

Now we have configured everything we need to make sure that the configuration is also valid. To do so execute:

sudo adduser apache
sudo svnadmin create /shares/internal/SVN
sudo chown -R apache:apache /shares/internal/SVN
sudo mkdir /etc/svn
sudo /usr/local/apache2/bin/htpasswd -c /etc/svn/passwd  your_svn_user_name

With all above done only remaining step is to start your apache

sudo /usr/local/apache2/bin/apachectl start

Now you can create as many svn repositories under /shares/internal/SVN as you want. To do so just

cd /shares/internal/SVN
sudo svnadmin create NewRepoName
sudo chown -R apache:apache NewRepoName

each of such the created repositories will be accessible on http://ip_of_your_box:78/svn/NewRepoName

Phenix - Come to life on restart

Optional part is to have svn to start automatically on restart of your box. To enable this just

cd /etc/init.d
sudo vi S98Apache

And add following contentto the file:

#!/bin/sh                                              
#                                                      
# Start apache                                        
#                                                      
                                                       
start() {                                              
        echo "Starting Apache"                        
        /usr/local/apache2/bin/apachectl start
        }                                
                                         
stop() {                                 
        echo "Stopping Apache"                
        /usr/local/apache2/bin/apachectl -k stop
       }                         
restart() {                                
        /usr/local/apache2/bin/apachectl -k restart
     }                                           
                                            
case "$1" in                                                
    start)                                                    
      start                                                                  
      ;;      
    stop)                                
      stop                               
      ;;                                 
    restart)                                   
      restart                                                    
      ;;                         
    cleanup)                               
      ;;                                    
    *)                                               
      echo $"Usage: $0 {start|stop|restart}"     
      exit 1                                
esac                                                        
                                                              
exit $?                                                                      
      

after saving the file, don't forget to make it executable by running

sudo chmod 755 S98Apache

That's it. It is that simple. It took me few hours to put the whole setup together and to make it work, so hopefully it will save time to someone else. I should also probably mention that installation was not done on a clean box, but I had already some software installed because of previous installation of Firefly described here. Notably I installed sqlite3 which I've noticed subversion picked up during configuration and installation, but you should be still OK even without it.

Good Luck!



June 2008
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          


Search this blog:
  

Categories
Community: JavaDesktop
Linux
Open Source
Tools
Web Applications
Archives

June 2008
April 2008
March 2008
December 2007
November 2007
October 2007

Recent Entries

Caching strategies for Magnolia 3.6

SwingX 0.9.3 Released

Magnolia's New Transactional Activation Module

Articles

Fling Scroller
Does your Swing work focus on "look" and not so much on "feel"? The gestures available to a user can make a big difference in how your UI is enjoyed. In this article, Jan Haderka introduces a new behavior to JLists to allow users to "fling" off the top or bottom of the list and have the scrolling continue briefly as a result of the gesture. Sep. 27, 2007

All articles by Jan Haderka »



Powered by
Movable Type 3.01D


 Feed java.net RSS Feeds