Skip to main content

Crash course on JVM crash analysis

Posted by kohsuke on February 19, 2009 at 5:32 PM PST

One cannot call oneself a Java geek if you haven't done JVM crash dump analysis. I mean, a C programmer would laugh at you if you tell them you don't know how to look at the stack dump. Well, I just had the pleasure of doing an analysis, so I'll show you how to do it, in case you don't know how :-) — This is on Windows, BTW.

Everything starts with the hs_err_pid*.log file that JVM creates upon a crash. First you look at the cause of the death, which is at the very top of the crash dump file:

#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x6d962f80, pid=3628, tid=628

...

siginfo: ExceptionCode=0xc0000005, reading address 0x00000000

So it's trying to read from a NULL pointer, resulted in an access violation. This is the native equivalent of NullPointerException.

Now that we know why it's dead, next we check where it's dead. For this, you look at the reconstructed stack trace:

<br />Stack: [0x0ae60000,0x0aeb0000],  sp=0x0aeae810,  free space=314k<br />Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)<br />V  [jvm.dll+0xd2f80]<br />C  [hudson-remoting38602.jnidispatch.dll+0x764c]<br />j  com.sun.jna.Native.initIDs()V+0<br />j  com.sun.jna.Native.<clinit>()V+70<br />v  ~StubRoutines::call_stub<br />

Unlike Java, in the native land the stack trace reconstruction cannot be reliably done, so this may contain some wrong information (in fact it turns out it was missing one stack frame in this case.) But it's always good approximation to start with.

From this we can see that my native code (which is com.sun.jna.Native.initIDs() — this is a native method) called into JNI and that killed it. So now the question is, which line in initIDs() called which JNI method with what parameters? The initIDs method is 20-30 lines long, so it's not obvious which part of this is offset 0x764c.

So now we need to disassemble the DLL to pin-point where the offset 0x764c is. The tool you use for this purpose is dumpbin, and you can install it through free Visual C++ Express Edition.

First, check the loading address, as below. In this case, this DLL wants to be loaded to 0x10000000, so all the disassembly you'll see starts from 0x10000000. In other words, jnidispatch.dll+0x764c should appear as 0x1000764c.

$ dumpbin /headers jnidispatch.dll | less
...
OPTIONAL HEADER VALUES
             10B magic # (PE32)
            2.56 linker version
            A800 size of code
            4A00 size of initialized data
             800 size of uninitialized data
            1060 entry point (10001060)
            1000 base of code
            C000 base of data
        10000000 image base (10000000 to 1004BFFF)

Now disassemble the DLL:

$ dumpbin /exports /disam jnidispatch.dll > jnidispatch.asm

... and look for 1000764c:

  10007633: 8B 45 EC           mov         eax,dword ptr [ebp-14h]
  10007636: 8B 84 85 68 FF FF  mov         eax,dword ptr [ebp+eax*4+FFFFFF68h]
            FF
  1000763D: 89 44 24 04        mov         dword ptr [esp+4],eax
  10007641: 8B 45 08           mov         eax,dword ptr [ebp+8]
  10007644: 89 04 24           mov         dword ptr [esp],eax
  10007647: E8 50 EF FF FF     call        1000659C
  1000764C: 89 C2              mov         edx,eax
  1000764E: 8B 45 08           mov         eax,dword ptr [ebp+8]
  10007651: 8B 08              mov         ecx,dword ptr [eax]
  10007653: 89 54 24 0C        mov         dword ptr [esp+0Ch],edx
  10007657: 8B 45 F0           mov         eax,dword ptr [ebp-10h]
  1000765A: 89 44 24 08        mov         dword ptr [esp+8],eax
  1000765E: 8B 45 F4           mov         eax,dword ptr [ebp-0Ch]
  10007661: 89 44 24 04        mov         dword ptr [esp+4],eax
  10007665: 8B 45 08           mov         eax,dword ptr [ebp+8]
  10007668: 89 04 24           mov         dword ptr [esp],eax
  1000766B: 8B 81 68 02 00 00  mov         eax,dword ptr [ecx+00000268h]
  10007671: FF D0              call        eax
  10007673: 83 EC 10           sub         esp,10h
  10007676: 8D 45 EC           lea         eax,[ebp-14h]
  10007679: FF 00              inc         dword ptr [eax]
  1000767B: E9 31 FF FF FF     jmp         100075B1
  10007680: 8D 65 F8           lea         esp,[ebp-8]
  10007683: 5E                 pop         esi
  10007684: 5F                 pop         edi
  10007685: 5D                 pop         ebp
  10007686: C2 08 00           ret         8

OK, so it's calling into 0x1000659c, and now we know that the reconstructed stack frame in the hs_err_pid*.log file missed this frame.
Also, we notice that the "ret 8" statement is very close, so this must be almost at the very end of the initIDs function (for some reasons, Visual C++ compiler never seem to produce multiple ret statements for a function.)

So this must be the assembly for the following line in the C source code:

      (*env)->SetStaticObjectField(env, cls, fid, newJavaPointer(env, types[i]));
    }
  }

In reality, you should spend some time looking at the native code to make sure the mapping between C code and assembly is correct. If you have the *.pdb file produced by the linker for the DLL, this step is substantially simplified because of the debug information produced in the PDB file. dumpbin is capable of loading it and using it for disassembly if it's available side-by-side with the DLL.

Now, let's see what's in the function 0x1000659c. The corresponding C code is the following, so if our guess is right, it should be a fairly short function:

jobject 
newJavaPointer(JNIEnv *env, void *p)
{
    jobject obj = NULL;
    if (p != NULL) {
      obj = (*env)->NewObject(env, classPointer, MID_Pointer_init, A2L(p));
    }
    return obj;
}

And indeed, the disassembly is simple, and matches this C code exactly:

  1000659C: 55                 push        ebp
  1000659D: 89 E5              mov         ebp,esp
  1000659F: 83 EC 18           sub         esp,18h
  100065A2: C7 45 FC 00 00 00  mov         dword ptr [ebp-4],0
            00
  100065A9: 83 7D 0C 00        cmp         dword ptr [ebp+0Ch],0
  100065AD: 74 35              je          100065E4
  100065AF: 8B 45 08           mov         eax,dword ptr [ebp+8]
  100065B2: 8B 08              mov         ecx,dword ptr [eax]
  100065B4: 8B 45 0C           mov         eax,dword ptr [ebp+0Ch]
  100065B7: BA 00 00 00 00     mov         edx,0
  100065BC: 89 44 24 0C        mov         dword ptr [esp+0Ch],eax
  100065C0: 89 54 24 10        mov         dword ptr [esp+10h],edx
  100065C4: A1 40 F4 00 10     mov         eax,dword ptr ds:[1000F440h]
  100065C9: 89 44 24 08        mov         dword ptr [esp+8],eax
  100065CD: A1 30 F2 00 10     mov         eax,dword ptr ds:[1000F230h]
  100065D2: 89 44 24 04        mov         dword ptr [esp+4],eax
  100065D6: 8B 45 08           mov         eax,dword ptr [ebp+8]
  100065D9: 89 04 24           mov         dword ptr [esp],eax
  100065DC: 8B 41 70           mov         eax,dword ptr [ecx+70h]
  100065DF: FF D0              call        eax
  100065E1: 89 45 FC           mov         dword ptr [ebp-4],eax
  100065E4: 8B 45 FC           mov         eax,dword ptr [ebp-4]
  100065E7: C9                 leave
  100065E8: C3                 ret

So at this point it looks like the NewObject call crashed. So now, what are the actual parameter values that we passed? Was one of them null?

To answer this question, we go back to the hs_err_pid*.log and look at the stack top:

Top of Stack: (sp=0x0aeae810)
0x0aeae810:   6d9634fa 0aeae8e4 1000c068 06be9978
0x0aeae820:   0adf4800 0ad80be8 0000021e 6da7ab04
0x0aeae830:   0adf4800 00000006 00000004 0aeae8e4
0x0aeae840:   1000c068 0aeae948 06be9978 0adf4800
0x0aeae850:   0ad80be8 0000021e 6da7ab04 100065e1
0x0aeae860:   0adf4914 00000000 00000000 1000e610
0x0aeae870:   00000000 00000000 0aeae948 1000764c
0x0aeae880:   0adf4914 1000e610 0aeae890 0ad80bf4

In C, parameters are passed through stack, so if we are lucky and the JVM didn't use that many stack before it dies, we can see our parameters in this list.
The key to find it is to look for a known value, which is the return address of the call. See, we know that the newJavaPointer method will return to jnidispatch.dll+0x764c. So if we can locate this in the stack, we can locate the parameters.

To do this, we first check the DLL table to see what address our DLL is actually loaded into. This information is also captured in hs_err_pid*.log:

Dynamic libraries:
0x00400000 - 0x00423000 C:\jdk1.6.0_10\bin\java.exe
0x7c900000 - 0x7c9b0000 C:\WINDOWS\system32\ntdll.dll
0x7c800000 - 0x7c8f4000 C:\WINDOWS\system32\kernel32.dll
0x77dd0000 - 0x77e6b000 C:\WINDOWS\system32\ADVAPI32.dll
0x77e70000 - 0x77f01000 C:\WINDOWS\system32\RPCRT4.dll
0x7c340000 - 0x7c396000 C:\jdk1.6.0_10\jre\bin\msvcr71.dll
0x6d890000 - 0x6dae6000 C:\jdk1.6.0_10\jre\bin\client\jvm.dll
0x77d40000 - 0x77dd0000 C:\WINDOWS\system32\USER32.dll
0x77f10000 - 0x77f58000 C:\WINDOWS\system32\GDI32.dll
0x76b40000 - 0x76b6d000 C:\WINDOWS\system32\WINMM.dll
0x6d340000 - 0x6d348000 C:\jdk1.6.0_10\jre\bin\hpi.dll
0x76bf0000 - 0x76bfb000 C:\WINDOWS\system32\PSAPI.DLL
0x6d840000 - 0x6d84c000 C:\jdk1.6.0_10\jre\bin\verify.dll
0x6d3e0000 - 0x6d3ff000 C:\jdk1.6.0_10\jre\bin\java.dll
0x6d880000 - 0x6d88f000 C:\jdk1.6.0_10\jre\bin\zip.dll
0x10000000 - 0x1004c000 C:\WINDOWS\Temp\hudson-remoting38602.jnidispatch.dll
0x77c10000 - 0x77c68000 C:\WINDOWS\system32\msvcrt.dll

So we can see that our DLL is loaded to 0x10000000, exactly where it wanted to be loaded to (but this is not always the case.) So, jnidispatch.dll+0x764c=0x1000764c. Do we see it in the top of the stack? Yes!

Top of Stack: (sp=0x0aeae810)
0x0aeae810:   6d9634fa 0aeae8e4 1000c068 06be9978
0x0aeae820:   0adf4800 0ad80be8 0000021e 6da7ab04
0x0aeae830:   0adf4800 00000006 00000004 0aeae8e4
0x0aeae840:   1000c068 0aeae948 06be9978 0adf4800
0x0aeae850:   0ad80be8 0000021e 6da7ab04 100065e1
0x0aeae860:   0adf4914 00000000 00000000 1000e610
0x0aeae870:   00000000 00000000 0aeae948 1000764c
0x0aeae880:   0adf4914 1000e610 0aeae890 0ad80bf4

From here, we can see the two parameters to the newJavaPointer, which are env=0x0adf4914, and p=0x1000e610. Looks OK so far.

Then you mentally trace through the code. 0x0aeae948 must be the result of push ebp, then sub esp,18h must have put esp to 0x0aeae85c.

  1000659C: 55                 push        ebp
  1000659D: 89 E5              mov         ebp,esp
  1000659F: 83 EC 18           sub         esp,18h

Then by comparing a bunch of mov statements and C code, you can see that the parameters are the following. The next item in the stack is 100065e1 which corresponds to the return address from the NewObject call, so we've got everything accounted for:

                              0xadf4914
                              |    0x00000000    0x00000000        0x000000001000e610
                              |    |             |                 |
                              v    v             v                 v
      obj = (*env)->NewObject(env, classPointer, MID_Pointer_init, A2L(p));

And indeed, we are passing NULLs! Aha!

So, I hope you (and in particular me in 6 months) get the idea of how to do this. You saw that hs_err_id*.log file is full of useful information.

And please, for the love of God, create the map file and the PDB file and keep them somewhere. The forensic analysis like this would be so much easier if you have them.

Related Topics >>

Comments

Crash course on JVM crash

I think there may be a typo in your example:

0x0aeae948 must be the result of <tt>push ebp</tt>, then <tt>sub esp,18h</tt> must have put <tt>esp</tt> to 0x0aeae85c.

I think esp should be 0x0aeae860. That would leave esp in the right place to push the eip (0x0aeae85c) when you call into NewObject.

tyop in second dumpbin command

In the following line: dumpbin /exports /disam jnidispatch.dll > jnidispatch.asm replace "/disam" with "/disasm"
thanks for the blog post!

A JVM crash needs your help

Dear guru I encountered the JVM crash problem as follows:- --------------- T H R E A D --------------- Current thread (0x55495cc8): JavaThread "javawsApplicationMain" [_thread_in_native, id=3572] siginfo: ExceptionCode=0xc0000005, reading address 0x57c81000 Registers: EAX=0x57c81000, EBX=0x00000000, ECX=0x569b57ca, EDX=0x59bbca02 ESP=0x564bd5bc, EBP=0x564bd5bc, ESI=0x564bd988, EDI=0x00000000 EIP=0x77c47fd4, EFLAGS=0x00010206 Top of Stack: (sp=0x564bd5bc) 0x564bd5bc: 564bd5d0 77c369a6 57c80ff8 59bbca12 0x564bd5cc: 564bd988 564bd604 59b56e5c 00000000 0x564bd5dc: 57c80ff8 564bd924 569b9de0 57c80ff8 0x564bd5ec: 00000001 00000000 42c444a8 6d6dfadd 0x564bd5fc: 00000016 6d6c8b07 564bd618 59b56cd8 0x564bd60c: 59bbc9a0 564bd924 564bd988 564bd62c 0x564bd61c: 59a67cc9 00000000 564bd9ce 001199e6 0x564bd62c: 564bd924 59a67ed0 564bd958 564bd8f4 Instructions: (pc=0x77c47fd4) 0x77c47fc4: 5b 5d c3 cc cc cc cc cc 8b ff 55 8b ec 8b 45 08 0x77c47fd4: 66 8b 08 40 40 66 85 c9 75 f6 2b 45 08 d1 f8 48 Stack: [0x563c0000,0x564c0000), sp=0x564bd5bc, free space=1013k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [MSVCRT.dll+0x37fd4] C [MSVCRT.dll+0x269a6] C [fpSpru70.ocx+0x126e5c] C [fpSpru70.ocx+0x126cd8] C [fpSpru70.ocx+0x37cc9] C [fpSpru70.ocx+0x37ed0] C [fpSpru70.ocx+0x126c74] C [fpSpru70.ocx+0xce429] C [fpSpru70.ocx+0xccc47] C [fpSpru70.ocx+0x5bb0f] C [fpSpru70.ocx+0x5b59a] C [fpSpru70.ocx+0x10159b] C [MFC42u.DLL+0xce09] C [MFC42u.DLL+0xc6ec] C [swt-win32-3139.dll+0x2cc56] J org.eclipse.swt.internal.ole.win32.COM.VtblCall(IIILorg/eclipse/swt/internal/ole/win32/GUID;IILorg/eclipse/swt/internal/ole/win32/DISPPARAMS;ILorg/eclipse/swt/internal/ole/win32/EXCEPINFO;[I)I J org.eclipse.swt.ole.win32.OleAutomation.invoke(II[Lorg/eclipse/swt/ole/win32/Variant;[ILorg/eclipse/swt/ole/win32/Variant;)I J org.eclipse.swt.ole.win32.OleAutomation.setProperty(ILorg/eclipse/swt/ole/win32/Variant;)Z J com.mot.motos.spread.Spread.drawRow(I)V Do you think it is eclipse.swt problem or fpSpru70.ocx one? Any hints for me to troubleshoot the case? Thank you Karl

A JVM crash needs your help

Thanks for the article. It is exactly what I am looking for. I have few questions.
I am not able to locate the arguments passed to my failed method in the stack trace[Similar to newjavaPointer] in your example.
I am not even getting the return address in the stack trace.
But rather, i get the below PC information in the crash dump file.
Registers:
EAX=0x80004001, EBX=0x0a7ebd90, ECX=0x00000000, EDX=0x0f217627
ESP=0x0fe3f2fc, EBP=0x0fe3f37c, ESI=0x0a7ebd90, EDI=0x0ebffc00
EIP=0x0f21762d, EFLAGS=0x00010206
Top of Stack: (sp=0x0fe3f2fc)
0x0fe3f2fc: 00000001 00000000 0fe3f2b4 00000000
0x0fe3f30c: 00000000 00000000 0ee3d028 0ee3d018
0x0fe3f31c: 0ee3d008 0ee3acb8 0ebf9e84 0ebf9e80
0x0fe3f32c: bcb25000 01cb7b25 00000001 0ebf85a0
0x0fe3f33c: 0ee39f10 0ee3acb8 00c001c0 80004001
0x0fe3f34c: 0ee3a5c8 0ebf8580 0ee3d028 00000000
0x0fe3f35c: 00000000 00000000 6da15f44 0ec00600
0x0fe3f36c: 0ebf8560 00172884 0ee3d008 0ee3d018
Instructions: (pc=0x0f21762d)
0x0f21761d: 08 8b 55 f4 52 8b 41 44 ff d0 89 45 cc 8b 4d dc
0x0f21762d: 83 39 00 75 06 c6 45 cb 01 eb 24 8b 55 dc 81 3a
Stack: [0x0fe00000,0x0fe40000], sp=0x0fe3f2fc, free space=252k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [test.dll+0x762d]
C [test.dll+0x93d7]

the test dll address range is 0x0f210000 - 0x0f22e000
Do you have any suggestions to identify the invalid argument values that got passed to the function causing the issue?

Hi, there will also be a core, if it is on a unix, then possibley the core can be created by the java executable. then gdb java coreFile will also show useful information.

Great posting. Looks like someone already pointed you at the troubleshooting guide. I REALLY appreciate when someone takes the time to create valuable and informative postings like this. Thank you for taking the time to do this. -kto

Thanks for the pointer. It's more like the format reference, not so much of a trouble-shooting guide, but still it's useful.

There is a detailed troubleshooting guide at http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/felog.html It is worth anyone time to learn more.

I got an question offline about where you can find hs_err_pid*.log files. According to vmError.cpp, they are created in the current directory of the JVM, and if that's not writable for some reason it goes to the tmp folder. The source code also appears to recognize "-XX:ErrorFile" JVM launch option, although I've never used it myself. More interesting discovery that I did was the -XX:OnError switch, which allows you to run an arbitrary program upon a failure, to collect more information.

Eduardo --- yes, you are right. And I'm sure those electronics engineers would laugh at us if we tell them we don't know how to use oscilloscope and those physics guys will laugh at us if we tell them computers are 0s and 1s. cowwoc --- the way x86 works is that when your CPU executes the "call" instruction, it puts the return address into the stack, which is the next instruction. In this case, CALL is at 10007647 and its next instruction is 1000764C. So yes, I could have highlighted 764C, but it's not that that instruction is crucial --- it can be anything. The CALL instruction is more interesting, because it'll take us where the CPU went, and that's why I highlighted it.

You said we should look at address 1000764c but then you highlighted 10007647 in the disassembled code. Is this a typo?

Ah, but, have you toggled in a new instruction into the console of your computer? :-) Seriously, you are a lot of fun. I'm very happy you are in my team! - eduard/o

<p>Hi,</p> <p>THanks for you post. It's very useful. But I ...

Hi,

THanks for you post. It's very useful. But I do have an issue regarding this process. I don't find a hs_err_pid*.log file in my folder. Actually, not all the times. It's generated some times and sometimes it's not. Is there a definite way to make sure that this log file is generated?

Regards,

Murali V