Skip to main content

Async Thread Dumps on CVM

Posted by mlam on June 22, 2007 at 1:45 AM PDT

There are times in the course of your development effort when your application just seems to hang forever. At those times, you wish you had some way of knowing where the hang is occurring. If you're running on JavaSE, chances are you'll have a lot of advanced tools that makes life easy for you. But if you're running on an embedded device, suddenly, your options are now severely limited. For the phoneME Advanced VM (CVM), there's a way to get help on this even when there is not advance debugging support on your device.

What I'll be showing here is an old trick to get an asynchronous dump of the stacks of all the threads that are currently alive in the VM. First of all, you need to know that this is a hack i.e. it's not good and clean code. That's why I haven't already committed it to the source repository, and won't be doing so. The reason it is a hack will be explained below later under Why this is a Hack!!!. But even though it is a hack, it is useful when you need it. Many of my colleagues as well as customers have often asked me for the code patch for this hack to help with debugging the hangs in their applications. I figure you might find it helpful too.

So, here it is ...

the Code Patch

Step 1: In src/linux/javavm/runtime/sync_md.c, add the following function:

/* BEGIN for Debug Use only */
#include "javavm/include/interpreter.h"

/* Warning: This thread dumper is only for the use of debugging code.
   There's a risk that it can potentially crash the VM if invoked at
   the wrong time.  Hence, this is not to be incorporated into a
   production build.  It is only for assisting in debugging efforts
   when needed.

static void threadDumpHandler(int sig)
    CVMExecEnv *ee = CVMgetEE();
    CVMBool success;
    int threadCount = 1;

    success = CVMsysMutexTryLock(ee, &CVMglobals.threadLock);
    if (!success) {

    CVMconsolePrintf("\nStart thread dump:\n");
    CVM_WALK_ALL_THREADS(ee, threadEE, {
       CVMconsolePrintf("Thread %d: ee 0x%x", threadCount, threadEE);
    CVMconsolePrintf("End thread dump\n\n");

    CVMsysMutexUnlock(ee, &CVMglobals.threadLock);
/* END for Debug Use only */

Step 2: In linuxSyncInit(), add:

            /* BEGIN for Debug Use only */
            {SIGQUIT, threadDumpHandler, SA_RESTART},
            /* END for Debug Use only */

If you'll look in src/linux/javavm/runtime/sync_md.c, you'll note that this code is set up to use the same SIGQUIT signal that JVMPI is also using. So, you need to make sure that there is no conflict i.e. either you aren't using JVMPI at the same time, or that the threadDumpHandler function needs to be called from the JVMPI signal handler function instead. In this case, "using JVMPI at the same time" means that you had built CVM with the CVM_JVMPI=true option.

How Does it Work?

Basically, in the above patch, we're setting up a signal handler in Linux for the signal SIGQUIT. When CVM receives the SIGQUIT signal, it will call the threadDumpHandler function, and that function will iterate through all the live threads and dump their stacks.

Note that the trigger mechanism used here is a signal on Linux. You should be able to use this for other OSes as well provided that you can set up an asynchronous request handler whereby CVM can receive a request from the user. That request handler should call threadDumpHandler. Disclaimer: Some finessing may be necessary if you try to use this for other OSes. I've only tested this code on Linux.

How to Use it

After building CVM with this hack added, you can run CVM as usual (with the arguments that you normally specify). Whenever you like to get a thread stack dump after that, you need to open another terminal window and get the process ID for CVM. One way to do this is by typing "ps -ef | grep cvm" at the command line. That should list the processes that are running cvm.

Then, issue a "kill -QUIT " where is the process ID of the CVM instance you want a thread dump from. This will send a SIGQUIT signal to CVM and trigger the thread dump. You can request this dump as many times as you like and at different times to see if there are changes in the stack traces. Bear in mind that there is a chance that the request may crash the VM because this is a hack (and not a clean solution).

If you don't want to use SIGQUIT as the trigger signal, you can choose a different signal in the code, and issue a different kill command from the terminal.

Why this is a HACK!!!

As mentioned many times above, this trick is a hack i.e. it is not clean code that you would want to put into your production VM. Only include this code for your personal debugging use. Note that because it is a hack, it can crash the VM if you're not lucky when using it.

Here are all the reasons why this trick is a hack:

  1. This mechanism uses a signal handler. The mechanism also requires that we lock the threadLock mutex. This is because we need to iterate over the list of live threads and we can't have any of these threads dying on us (and being freed) while we're trying to dump their stack.

    However, according to the Linux man pages on pthread_mutex_lock, mutex functions are not async-signal safe, and that calling these functions from a signal handler may deadlock the calling thread.

  2. The stack dumps mechanism CVMdumpStack makes use of CVMconsolePrintf which prints to stderr using fprintf. I don't know if fprintf is reentrant or not from a signal handler (I didn't see anything in the man pages). I suspect that it is not.

    In general, it is not good practice to do a lot of IO work (like printing to stderr) from signal handlers anyway. Also, printing to stderr will be synchronized some where underneath fprintf. Hence, the mutex locking problem also applies here.

  3. If you've built CVM with CVM_JIT=true, then chances are some (or all) of your threads are running JIT compiled code. When running in JIT compiled mode, the threads do not always flush their stack context to the thread's stack data structure. Some of the context values are simply kept in registers, and only flushed to the stack when absolutely needed.

    An asynchronous inspection of the thread's stack data structure (as is done by this dumper mechanism) will not necessarily see the top most methods in the thread's execution. This is because their information may not have been flushed to the stack. Again, this is because the compiled code has no need to flush its context to the stack if it is able operate out of registers.

    Hence, the stack dump may not be precise. On the bright side, in practice, it will tend to get you close enough to where the thread is actually executing. In the least, it takes you to its caller (or its caller's caller, etc).

WARNING! Again, I caution you: DO NOT put this code in the production VM that you deploy in your products. It can crash your VM. That's why I don't want to commit it into the source repository (for fear that someone will enable it without knowing what the consequences are).

Using the Thread Stack Dump Info

Is this information enough to prove a deadlock, or solve your problem completely? No, not necessarily. All it is guaranteed to do is to give you more information about what your application is doing at that moment in time when you requested the thread dumps.

To actually determine if you have a deadlock or not will require some additional info regarding the state of the monitors that a thread is blocked on. You will also need to know who owns those monitors. That is a topic for another day. The thread dumps may be enough to suggest the existence of a deadlock as the source of your application's hang, and thereby justify further investigation in that direction. Alternatively, it can show that you don't have a deadlock either.

Final Words

Again, remember that this is a hack! Use it with caution, and do not include it in your deployed products. In spite of its imperfect (and hacky) nature, I do hope that it will help you out should you find yourself in the sticky situation of having to debug hangs.

Have a nice day. =)

Personal Update

I've been really busy with projects for work ... so much that I haven't had much time to sit back and think of more relevant subjects to write about. As such, I guess I've been feeling a little bit uninspired in the blogging department. Couple that with my now very limited to non-existent free time, and the result is ... very infrequent blog updates. For this, I do sincerely apologize.

While I often feel guilty about not updating the blog regularly, I also don't want to just write entries that wouldn't provide you with something useful. I doubt you'll really want to hear about what I have for lunch each week (or some such mundane details). So instead, if you have a question or a request for a discussion on any specific subject that interests you, please ask me about it by entering a comment in my blog. That will inspire me to write as I do prefer to talk about things that are relevant to you, the Java developer.

Till next time then.

Tags: href="" rel="tag">CVM href="" rel="tag">Java href="" rel="tag">J2ME href="" rel="tag">JavaME href="" rel="tag">JIT href="" rel="tag">phoneME href="" rel="tag">phoneME
Advanced rel="tag">embedded systems

Related Topics >>