Memory Leak in Eclipse 4.2????
More than a few days ago a friend pinged me complaining that recent Eclipse release was quite sluggish. Since she had taken my performance tuning seminar she knew exactly how to get and read a GC log but as we all do, she was looking for a second opinion. After looking at the log for a minute it because quite apparent that the default configuration left the IDE starved for memory. After a bit more analysis we came to the conclusion that it is very likely that the 4.2 version of Eclipse is suffering from a memory leak. I thought I’d jot down a few notes about the process that led us to that conclusion.
The first thing I did was to load the GC log up into Censum, a tool that I wrote for analyzing GC logs. One of the first views is a summary of what the collectors have been up to. The summary for Rabea’s log file is figure 1.
Figure 1. Garbage Collection Summary
The first thing that struck me was the high Full GC to GC ratio. Overall there was very little time spent in the infamous stop the world GC pause. That said, when GC did run, it was performing the more expense full collection 89% of the time. This in it’s self is a clear indication that something’s wrong. To find out why the JVM was so frequently running full collections I looked at heap occupancy after collection (figure 2) along with measures of tenuring behaviour.
Figure 2. Heap Occupancy after GC
The line at the top of the chart indicates the size of Java heap. In this case the JVM is consuming 512 megs. The blue dots are scavenges while the reds represent full collections. The clusters of red dots isn’t a good sign. Looking at occupancy we can see a slow but steady rise over time. Also, the clusters of full collections on the right hand side of the chart are much denser. These are typical signs of a memory leak. As heap fills up, the JVM struggle to free memory. This behavior is evidenced by the transition of mostly mostly scavenges to mostly full collections.
That said, we do need to be careful when making a leak diagnosis. After all, this is an IDE and by it’s nature, as you write more code, the JVM will require more heap to store it. That said, I doubt that even the most prolific developers can develop this much code so quickly. But it’s hard for me to say how much meta data is also being generated so I’m not going to treat it as a leak just yet. Instead I dug a little deeper to see if anything else is happening.
Figure 3. Tenuring Summary
Figure 3 contains information about how survivor spaces are being utilized. What we’re looking for in particular are the number of times that a collection in Eden yielded more surviving object than would fit into the survivor space. In this case 39.14% of all collections have resulted in action having to be taken to avoid flooding the active survivor space. This problem is known as premature promotion. Premature promotion puts extra pressure on tenured space and that will result in more frequent full collections. The solution to premature promotion is to make the survivor spaces bigger.
Figure 4. is a further clarification to this problem. It shows the tenuring values that hotspot calculates before the start of each young gen scavenge. Note that the tenuring threshold deviation from the default value of 15 corresponds to clusters of full collections.
Figure 4. Calculated Tenuring Thresholds
The solution to this problem is to increase the size of the survivor space. But to do that you will have to take memory from eden. That would have the undesirable effect increasing GC frequency which is very likely to only exacerbate the premature promotion problem. So we must increase survivor space sizes without taking any space from eden. To do that we’ll need to make young generational space bigger. Again, we have a condition of zero sum gain that we need to consider. If we simply make young bigger, we’ll make tenured smaller. That can also cause more full collections. To avoid this, we will have to increase the size of heap as part of the process of increasing the size of survivors while maintaining the size of eden.
As a starting point, we decided to set -mx to 750m. This will have the effect of increase all spaces according to default ratios. Lets look at the same 4 charts to see if it made a difference.
Figure 5. Garbage Collection Summary
Starting from the summary we should notice that GC full to GC ratio has been reduced to 24%. Although that ratio is much lower, it’s still much higher than it ideally should be. Lets look at the other charts to see what can be done to induce further reductions. Since premature promotion was a problem in the past lets start there.
Figure 6. Tenuring Summary
At a rate of 3.99%, we surely can declare victory over the premature promotion problem. So there must be something else at work here.
Figure 7. Heap Occupancy after Collection
In this view we can see that there are two periods of activity where heap utilization increases. More importantly, heap occupancy doesn’t decrease. The data points in the upper right hand corner indicate that even after a period of 50,000 seconds, the collector hasn’t be able to recover memory. Now we must remember that in this case Eclipse is being used as an IDE and IDE’s do tend to collect data over time. Hence I would expect that a memory profile of an IDE would look like a memory leak. In this case however, the amount of memory being consumed strongly suggests memory leak. Next step? We need to instrument Eclipse to see what’s going on. That story is for another post.