A Case Study of JVM HotSpot Flags
Just recently I was engaged to assist with an application that wasn’t behaving. The application, running in a 1.7.0_45 JVM, relied heavily on a 3rd party SAAS framework. That vendor provided my client with a list of 26 different JVM flags that should be set. When faced with this long list of flags I couldn’t resist asking why all the flags and why these flags. After all there are more than 700 product flags defined in the JVM and to be honest, I’ve only a vague idea of the effect may have on a runtime. Take the flag AggressiveOpts for example. At JavaONE I was involved in a discussion about flags with a group that included a couple of members of the HotSpot team. We all had a guess at what AggressiveOpts currently does but it was only when we checked the source code that we realized that all of our guesses were incorrect. Mean while the vendor’s recommendations hadn’t changed since JDK 1.5.0. Lets look at these flags to see which flags are needed, which need further investigation, and determine if any are redundant.
-server -ea -Xmx4G -Xms4G -XX:PermSize=300M -XX:MaxPermSize=300M -XX:NewRatio=2 -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:InitialTenuringThreshold=15 -XX:MaxTenuringThreshold=15
-XX:SurvivorRatio=4 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
-XX:TargetSurvivorRatio=90 -XX:+DisableExplicitGC -XX:+CMSClassUnloadingEnabled
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -XX:+AggressiveOpts
-XX:+UseBiasedLocking -XX:+UseCompressedOops -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled
What an eye sore!!! Just looking at all those flags gave me a headache and that was before I started to sort out what effect they maybe having on the run time. First things first, lets identify the redundant flags by figuring out what the default settings are. To do that I ran
java -XX:+PrintFlagsFinal in a Unix shell and searched through the output (way too long to post here). The table below summarizes the default values for each flag.
|-server||true in 64 bit JVM, false in 32 bit JVM|
|-Xmx4G -Xms4G||1/4 of physical RAM/1/64th of physical RAM|
|-XX:PermSize=300M||20M for my machine|
|-XX:MaxPermSize=300M||82M for my machine|
|-XX:+UseParNewGC||FALSE, TRUE when using CMS|
|-XX:+CMSParallelRemarkEnabled||N/A, FALSE when using CMS|
|-XX:InitialTenuringThreshold=15||7 or 15 but 15 in this case|
|-XX:MaxTenuringThreshold=15||6 or 15 but 15 in this case|
|-XX:+UseCMSInitiatingOccupancyOnly||N/A, FALSE when using CMS|
|-XX:CMSInitiatingOccupancyFraction=70||N/A, 69 when using CMS|
|-XX:+CMSClassUnloadingEnabled||N/A, false when using CMS|
|-XX:+UseCompressedOops||TRUE (64 JVM bit only)|
|-XX:+ExplicitGCInvokesConcurrent||FALSE (only applicable with CMS)|
|-XX:+CMSPermGenSweepingEnabled||Deprecated since 1.6|
The first sensible thing to do is eliminate the deprecated flag and those flags whose value is set to the default. Doing leave us with;
-ea -Xmx4G -Xms4G -XX:PermSize=300M -XX:MaxPermSize=300M -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=4 -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90 -XX:+DisableExplicitGC
-XX:+CMSClassUnloadingEnabled -XX:+AggressiveOpts -XX:+ExplicitGCInvokesConcurrent
With less clutter it becomes easier to see that CMSClassUnloadingEnabled has been duplicated. Lets get rid of that flag before sorting through the remaining 15 to see what they do and if they are needed.
DisableExplicitGC, ExplicitGCInvokesConcurrent, and RMI Properties
One of the ways to programatically trigger a garbage collection is to make a call System.gc(). However it’s rarely a good idea to do this. Calling for a collection generates a lot of overhead and if you call it at some random point in time it’s very likely that you won’t get any return for call. In other words, the ROI on making the call is often very low. So what do you do if someone has decided to make that call but it’s in code you can’t touch? One way of dealing with this problem is to DisableExplicitGC. Another way to deal with the problem is simply have the call invoke a Concurrent Mark Sweep cycle instead of a Full GC. You can do that by setting ExplicitGCInvokesConcurrent. To set both flags doesn’t make sense as the former will over-ride the later. Why were both set in this case is hard to say but one could speculate that it maybe in response to RMI triggering a period Full GC. Unfortunately you can’t turn that off but you can throttle it back by setting the properties sun.rmi.dgc.client.gcInterval and sun.rmi.dgc.server.gcInterval. Below is the code that sets the default values for these client (sun.rmi.transport.DGCClient). From this you can see that the default value is 3600000 milliseconds (for both the client and server) which means these properties do not need to be explicitly set.
/** maximum interval between complete garbage collections of local heap */
private static final long gcInterval = // default 1 hour
Xmx, Xms, NewRatio, SurvivorRatio and TargetSurvivorRatio
The -Xmx flag sets the maximum size of Java heap. All JVMs must support specifying a maximum heap size and therefore the all must support the -mx flag, the standard form of setting max heap size. This set of configurations also sets minimum heap and in this case sets it to maximum heap size. Configuring heap this way will prevent it from resizing. The downside of fixing max heap size to min heap size is that the JVM will not be able to adapt to changes in load. This ability to adapt allows the JVM to run more efficiently as the demand for memory changes. However, there are times when you can’t seem to get the JVM to settle in on a configuration that allows it to run smoothly and in those cases you may want to consider fixing min heap to max heap but I’d only resort to that after first proving that the adaptive sizing is not working as expected.
Java heap is further broken up into and young and old generation. In this case the size of young will be defined by the NewRatio which has a default value of 2. Java heap is fixed at 4096M which results in young being 1365.3M and old being 2730.7M. Young is further split into Eden and two Survivor spaces. The size of the survivor spaces is determined by the SurvivorRatio flag. In this case that value is 4 meaning that each of the Survivor spaces is 227M leaving Eden with 911M. One of the tricks used to tune GC is to control the frequency of young generational collections. The frequency is a product of the allocation rate time the size of Eden so if you know your allocation rates you can control GC frequency by sizing Eden accordingly. A full discussion as to how to tune is outside of the scope of this article but it suffices to say that you’ll want to use garbage collection logs to help you make memory pool sizing decisions.
Another aspect of Survivor spaces is TargetSurvivorRatio. This (poorly named??) flag indicates how much of survivor should be occupied after a young gen collection. If the amount of data after a collection exceeds this threshold than older data will be tenured to meet the target occupancy. At the time of writing it’s unclear as to the efficacy of this setting and thus we would want to bench to see if it has any effect on GC efficiency.
Technically speaking, Perm space is not considered to be a memory pool. In JDK 8 Perm Space has been removed in favour of Meta space and so the flags PermSize and MaxPermSize will be deprecated in 8 (ignored with a warning message). In this case the flags have been set so that Perm space is 300M and it will never resize. Having it resize to the size needed rarely has any negative side effects but not allowing it to resize may result in longer than necessary pause times for collections of old space. Thus setting a Max Perm space size is ok, fixing it to that size generally isn’t. Again, you’ll want to use the GC logs to understand how big Perm should be.
UseConcMarkSweepGC, CMSParallelRemarkEnabled, UseCMSInitiatingOccupancyOnly CMSInitiatingOccupancyFraction
By default the JVM will choose the PSYoung, the parallel collector for young and PSOld, the parallel collector or perm, old and young spaces. By specifying UseConcMarkSweepGC the JVM will use the mostly Concurrent Mark Sweep collector. The CMS collector will not work with PSYoung and instead works with a different young gen parallel collector known as ParNew. If you specify UseConcMarkSweepGC it will automatically configure the JVM to use the ParNew collector.
The CMS collector is known as a Just-In-Time collector in that it will start before heap is full and must finish before heap fills. There are two techniques used to control when the collection cycle starts. Both rely on the CMSInitiatingOccupancyFraction (IOF). On each collection of young the JVM will track (and estimate) the amount of data that will be promoted. If the amount promoted plus the current occupancy of old space exceeds the IOF than a CMS cycle will be triggered. Having old space fill before the CMS cycle completes is known as a concurrent mode failure (CMF). The JVM will revert to a full single threaded stop the world collection to handle this failure. As you can imagine having the JVM start the collection at the right time is critical to good application performance. In this case the flag UseCMSInitiatingOccupancyOnly tell the JVM to use the IOF without regard to recent rates of promotion. The danger here is if rates of promotion suddenly become higher than expected you do risk triggers CMFs. However, it might be best to ignore recent rate of promotions if your application promotes in bursts. In this case the CMSInitiatingOccupancyFraction has been set to 70% which is 1% greater than the default so it’s likely that neither of these flags make any difference.
Finally, remark, the 4th phase of the CMS, is by default single threaded. You can parallelize it using the flag CMSParallelRemarkEnabled. That said, there are indications that dirty card rescanning, one of the steps of the remark phase, can be a source of false sharing. There are also other data structures that will be contended for so it’s not a given that a parallelized remark will be faster than the single threaded version. The decision to use the parameter should come from a benchmark or measurements in your production system.
By default the CMS collector will collect in Perm space but it will not unload classes. If you are frequently loading classes that maybe unloaded in the future then you risk running out of Perm space. For this reason you most likely will need to set CMSClassUnloadingEnabled. Be aware that it will result in longer concurrent and remark (pause) collection times.
The AgressiveOpts flag is one that turns on (or off) flags that resulted in very good performance on a set of benchmarks. As you can imagine, this set of flags can change from build to build. That said, this list of optimizations it applies hasn’t changed in a long time and in fact the only optimizations it enables are; aggressive elimination of auto boxing, setting a very large auto boxing cache and resetting the BiasedLockingDelay settings. The decision to use the parameter should come from a benchmark or measurements in your production system.
Of the 26 flags we were very quickly able to eliminate 11 of them as either being at the default setting or a duplication. With a deeper look we were able to devise criteria to evaluate the efficacy of the remaining flags and come with up a recommended starting point. That starting point was;
-ea -mx4G -XX:+MaxPermSize=300M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:SurvivorRatio=4 -XX:+DisableExplicitGC.
Now that the configuration is a bit more approachable, I’ll use the GC logs obtained either from production or a suitable QA/Benchmark to let me evaluate if these setting should stand as it. As well, I’ll start looking to see if any of the other flags should be turned back on and if so what values should be used.
I think it's fantastic that we have a technology that is so configurable, so flexible. But this flexibility is a double edged sword and one shouldn’t just jump blindly into using all of that configurability. One thing I do know is that your applications performance does depend on how it’s configured. Messing up on even one flag can have a detrimental effect on the performance of your application and getting it wrong is far easier than getting it right. And quite often, the JVM does get it right, right out of the box.