|
|
||
Binod's BlogTools ArchivesSailFin work and BTracePosted by binod on June 12, 2008 at 05:37 AM | Permalink | Comments (7)Lately I have been a spending significant amount of time analyzing issues coming from sailfin performance team and system testing team. One of them was particularly tricky. Testing team was running a test 24x7 on a 10 instance sailfin cluster with CLB etc. There were 5 odd machines running many sipp clients pumping traffic to the sailfin. They started observing a memory leak and I went about debugging it. jmap they produced (huge.. in GBs) showed the possibility of a leak of SipSession objects. The problem with some memory leaks at times is the difficulty to confirm that the leak is in a particular data structure in a particular part of the code. Here also the case was the same. So, I took the help of btrace. I wrote a btrace script, without much difficulty. I ran the server under load for quite some time and then stopped the traffic so that SipSessions gets cleaned up. Then attached the btrace script to the running java process. I didnt have to compile the script just used the java file directly. Here is the script, ConcurrentHashMap.java
package com.sun.btrace.binod;
import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;
import java.lang.reflect.Field;
@BTrace public class ConcurrentHashMapTrace {
@OnMethod(
clazz="java.util.concurrent.ConcurrentHashMap",
method="put"
)
public static void onPut(java.util.concurrent.ConcurrentHashMap me, Object key, Object value) {
//toString() of SipSessionDialogImpl contains the string SipSession.
if (indexOf(str(value),"SipSession") > -1) {
println(value);
println(size(me));
}
}
}
Then I ran the command : btrace <pid> ConcurrentHashMapTrace.java. This attaches our btrace script to the JVM and waits. And then sent just one INVITE message to the server. Btrace script printed the size of the concurrenthashmap that hold the sessions and it was more than I expected. That confirms the leak. I also used modified versions of the script to make sure that there is no other related object leak (eg: SipApplicationSessions). After figuring out where the leak is, it was a matter of code inspection, further debugging and then checking in the fix.. Next, I am trying to see if System.gc is ever called during the test run so that we can try removing DisableExplicitGC flag from GC settings. Another btrace script.... | ||
|
|