Skip to main content

Solution to Performance Puzzler with a Stacktrace

Posted by kcpeppe on October 10, 2011 at 7:21 AM PDT

It’s after JavaONE and as promised, here is the answer to the performance puzzler with a stack trace. But first a word about JavaONE. The physical layout was the same as last year and as a result, JavaONE suffered from some of the same problems as last year. This year Oracle took a lot of advice from an advisory committee and the result was the conference made very good use of the poor physical layout it is unfortunately stuck with. That said, the conference had a lot more energy than in past years. Oracle’s delivery of Java 7 along with the release of JavaFX seems to have revived interest in Java. Better yet, Java on the iPad, Twitter joining OpenJDK, loads of good talks from IBM... all in all a very positive experience. Now to fold Google and Apache back into the conference and all will be good. I had fun presenting my GC log talk and it was fun to hear people shouting out the answers to the puzzlers. Instead of a slide drop, I’m thinking of releasing them here as more puzzlers. Just for fun.

Back to the puzzle at hand. I received quite a few emails with suggestions, guesses and in a couple of cases, the correct hypothesis and the required next step to confirm the hypothesis. Lets work through the process to see how to arrive at the hypothesis. Lets start by looking at the information provided.

The application was running in OS X Lion burning 135% CPU when it should have been idle. The application had been previously running normally. The first thing to do with the thread dump is filter out all the threads that could not possibly be burning CPU. The first thread a top of thread  dump is an “Attach Listener” and it’s “waiting on condition”. Threads “waiting on condition” are parked and cannot possibly be burning CPU. Based on this information we can filter it from our list of “threads of interest”. Digging deeper we will find that there are a number of threads “waiting on condition”. Lets get rid of those. In addition to thread parked for that reason, you will find threads parked in a TIMED_WAITING (on object monitor), sleeping,  and some threads marked runnable but are actually blocked on the native method, socketAccept. Lets filter out all those and see what we’re left with.


 

"AWT-AppKit" daemon prio=5 tid=7f846325e800 nid=0x7fff70b3e960 runnable [00000000]
    java.lang.Thread.State: RUNNABLE

"Poller SunPKCS11-Darwin" daemon prio=1 tid=7f84638cf800 nid=0x1140a4000 runnable [1140a3000]
    java.lang.Thread.State: RUNNABLE
    at sun.security.pkcs11.wrapper.PKCS11.C_GetSlotInfo(Native Method)
    at sun.security.pkcs11.SunPKCS11.initToken(SunPKCS11.java:767)
    at sun.security.pkcs11.SunPKCS11.access$100(SunPKCS11.java:42)
    at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:700)
    at java.lang.Thread.run(Thread.java:680)

"Low Memory Detector" daemon prio=5 tid=7f8463094000 nid=0x113538000 runnable [00000000]
    java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=9 tid=7f8463092000 nid=0x11322f000 runnable [00000000
    java.lang.Thread.State: RUNNABLE


"VM Thread" prio=9 tid=7f8463806000 nid=0x112c62000 runnable

Gang worker#0 (Parallel GC Threads)" prio=9 tid=7f8463002800 nid=0x10e44a000 runnable

“Gang worker#1 (Parallel GC Threads)" prio=9 tid=7f8463003000 nid=0x10e54d000 runnable

"Gang worker#2 (Parallel GC Threads)" prio=9 tid=7f8463003800 nid=0x10e650000 runnable

"Gang worker#3 (Parallel GC Threads)" prio=9 tid=7f8463004000 nid=0x10e753000 runnable

"Concurrent Mark-Sweep GC Thread" prio=9 tid=7f846307f800 nid=0x11249a000 runnable

"VM Periodic Task Thread" prio=10 tid=7f84630a5800 nid=0x11363b000 waiting on condition

"Exception Catcher Thread" prio=10 tid=7f8463001800 nid=0x10b272000 runnable

 

As is the case with all things performance, you tend to end up collecting lots of information but finding very little of it to be of any use. To further frustrate efforts, it's common that the useful data only provides some direction to the answer and not the answer it’s self. In this case we've been able to widdle down the data to 12 threads, all in the service of the VM. This shorter list is much easier on the eye. Lets see if we can shorten the list even further. First, BlueDragon is a server application so there’s nothing for the AWT-AppKit thread to be doing. Even so, it’s a single thread and as such can’t be responsible for more than 100%. Same can be said for the Poller SunPKCS11-Darwin, Signal Dispatcher, VM Thread, and the Exception Catcher Thread.

When ever I’m offered a set of data that is conflicting or confusing I tend to drop back to first principles and the things I can count on. First, the CPU burn was caused by the JVM. Second, no application or application server threads were responsible. Hence the only possible conclusion is that threads native to the JVM were responsible. While it was possible that the native threads that I ruled out were responsible, each thread by it’s self could not have been responsible. That suggest that at least two different problems in the new deployment environment are responsible for . While not impossible, it's more likely that a single problem is responsible. The only single activity that is threaded in this case is GC. The next obvious step is to turn on GC logging to see what the collector is up to. In the process of setting the GC logging flags, the settings “-Xmx1024m -Xmn1024m” were discovered. Setting max heap size to young gen size caused GC to run continuously. Mystery solved!