Skip to main content

JRuby Performance on Glassfish V3 -- Part 1

Posted by eileeny on December 10, 2009 at 10:12 AM PST

One of the new features of Glassfish V3 is directory deployment of ruby applications.  This makes is much easier to develop and deploy ruby applications on Glassfish since developers no longer need to package ruby apps as wars using goldspike or warbler as was required for Glassfish V2.  However there are other good reasons to run your rails applications on Glassfish V3.  One of those reason is performance.

This blog will be broken into three parts.  The first part will discuss general runtime characteristics of jruby and how to tune for them.  The second part will discuss Glassfish V3 tunings in particular.  The third part will compare Glassfish V3 performance against some other popular rails web servers.

performance tuning for JRuby

Command line options to optimize heap size and garbage collection can improve jruby performance by as much as 40%, so it's important to recognize runtime characteristics of jruby applications to tune the VM properly.  If you're not familiar with the Sun VM, especially it's heap management and garabage collection strategies, you may want to check out one of the many garabage collection tuning guides before reading this blog.

JRuby uses the client compiler by default with a default heap size of -Xmx500m and stack size of -Xss1024k.  For development purposes, these values are probably sufficent.  For maximum runtime performance though, it's generally always better to use the server compiler which is enabled by using the -server flag.  It may also be necessary to adjust the default heap sizes to minimize application disrupting garabage collections. In addition, you should always upgrade to the latest JVM and JRuby versions when you can. Bug fixes and performance improvements on new releases can be significant, so upgrading is an easy win for the developer.

JRuby runtimes are generally characterized by lots of short lived objects.  For the default garbage collector used by the Sun JVM, this means increasing the young generation size relative to the tenured generation size to ensure that short lived objects  will not be promoted to the tenured generation.  The young generation size can be set explicitly using -XX:NewSize=<value> -XX:MaxNewSize=<value>, but I generally prefer to use the -XX:NewRatio=<value> flag which sizes the young generation as a relative value depending on the size of the heap.  -XX:NewRatio=2 instructs the VM to set a 2:1 ratio between young and tenured generations, resulting in a new generation which is approximately 1/3 of the total heap.  This is set by default for Glassfish V3

Jruby is also characterized by  a high number of classes loaded in memory.  This is usually the reason the VM will complain about OutOfMemory errors, especially when running with multiple runtimes.  The easy fix for this situation is to adjust the size of the permanent generation where the classes reside in the heap.  Even without multiple runtimes, it's advisable to increase the PermGen to avoid putting pressure on the permgen and triggering a full garbage collection.  For glassfish, we generally recommend 20M per runtime, but this could vary depending on the complexity of your application.  Glassfish V3 sets -XX:MaxPermSize=192m. On other jruby servers, I like to size the permgen size to about 50M per runtime. To set the PermGen size:

-XX:PermSize=50m -XX:MaxPermSize=50m

Finally, it's always a good idea to check what's going on in the heap with one of the many tools that are available.  For a graphical interface, use jconsole.  Otherwise jstat is a good low overhead way to track your memory utilization:

jstat -gcutil <pid> <time interval>

On Glassfish gem with four runtimes, java command line tuning produced these results on an internal workload similar to the olio project.

Compiler Heap Size PermGen Size   NewRatio   GC Algorithm   Score
default (client)   default (500m)   default (8) default default 78
server default 192m default (8) default 91
server 768m 256m default (8) default 97
server 768m 378m 2 default 113
server 1024m 256m 2 -XX:UseParallelOldGC   99
server 1024m 256m 2 -XX:UseConcurrentMarkSweepGC   80

As you can see, I also tried some of other types of garabage collection, but these did not improve performance on this workload. As with all performance tuning recommendations, it's best to take things with a grain of salt until you've proven things on your own applications and systems.

Related Topics >>