Skip to main content

ThreadLocal + Thread Pool = bad idea (or: dealing with an apparent Glassfish memory leak)

Posted by jjviana on June 10, 2010 at 10:00 AM PDT

One of the not-so-great things about developing Java web applications is the fact that, after a few redeployments, sooner or later the web container JVM needs to be restarted due to Out Of Memory errors. This has been true in every combination of development environment and web server I have used so far, and until last week Netbeans 6.8 + Glassfish 3.0.1 was no exception.

The cause of Out Of Memory errors in web containers are many, but the most common of all is a ClassLoader leak. Each web application is loaded under its own ClassLoader and once an application is undeployed all references to its classes should be cleared, which makes them (and all objects referenced by them) eligible for garbage collection.

The problem is sometimes (I would say most of the time) some reference fails to be cleared when a web application is undeployed. When that happens the ClassLoader associated with the application cannot be garbage colledted and a number of objects referenced by its loaded classes remain in memory. After some redeployments these objects clutter the heap space causing a OOM exception.

After a few months of development I got fed up with Glassfish 3.0.1 getting slower and slower and finally crashing after a few redeployments. I used jvisualvm to analyze the heap contents and found out that as I expected the used memory  got bigger after each redeployment.

After some analysis it became clear that most of the retained heap was being held in place by a reference coming from the class com.sun.enterprise.security.authorize.HandlerData , which in turn was being refrred to by a ThreadLocal instance.

ThreadLocal variables are, as the name suggests, variables bound to the Thread scope: each thread has a separate copy of the variable. This kind of variable is very useful to propagate context information between application layers.

The problem in this case seems to be that during application deployment some code in Glassfish is creating a ThreadLocal variable holding a reference to an instance of com.sun.enterprise.security.authorize.HandlerData, which in turn seems to hold a reference to a lot of other objects (mostly related to proxy classes generated during EJB and CDI deployment).  Glassfish doesn't seem to clean this reference after the deployment finishes, which wouldn't be much of a problem if the thread that deploys the application terminates.

Unfortunately the application deployment thread never dies, because it is not created solely for the purpose of application deployment. It is instead taken from the Glassfish thread pool and is returned to the pool once it finishes the deployment task. That means the ThreadLocal reference never gets cleaned and over time causes the heap to overflow.

I didn't have the time to go through the Glassfish source code and find the exact place where this reference is being leaked in order to fix it. However, I did have time to add a web applcation context listener to my application that cleans the problem reference from the thread pool:

@WebListener
public class ThreadLocalCleanup implements ServletContextListener {

    public void contextInitialized(ServletContextEvent sce) {
    }

    public void contextDestroyed(ServletContextEvent sce) {


        try {
            System.err.println("Starting thread locals cleanup..");
            cleanThreadLocals();
            System.err.println("End thread locals cleanup");


        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

    private void cleanThreadLocals() throws NoSuchFieldException, ClassNotFoundException, IllegalArgumentException, IllegalAccessException {

        Thread[] threadgroup = new Thread[256];
        Thread.enumerate(threadgroup);

        for (int i = 0; i < threadgroup.length; i++) {
            if (threadgroup[i] != null) {
                cleanThreadLocals(threadgroup[i]);
            }
        }
    }

    private void cleanThreadLocals(Thread thread) throws NoSuchFieldException, ClassNotFoundException, IllegalArgumentException, IllegalAccessException {

        Field threadLocalsField = Thread.class.getDeclaredField("threadLocals");
        threadLocalsField.setAccessible(true);

        Class threadLocalMapKlazz = Class.forName("java.lang.ThreadLocal$ThreadLocalMap");
        Field tableField = threadLocalMapKlazz.getDeclaredField("table");
        tableField.setAccessible(true);

        Object fieldLocal = threadLocalsField.get(thread);
        if (fieldLocal == null) {
            return;
        }
        Object table = tableField.get(fieldLocal);

        int threadLocalCount = Array.getLength(table);

        for (int i = 0; i < threadLocalCount; i++) {
            Object entry = Array.get(table, i);
            if (entry != null) {
                Field valueField = entry.getClass().getDeclaredField("value");
                valueField.setAccessible(true);
                Object value = valueField.get(entry);
                if (value != null) {
                    if (value.getClass().getName().equals("com.sun.enterprise.security.authorize.HandlerData")) {
                        valueField.set(entry, null);
                    }
                }

            }
        }


    }
}

The code (inspired in this great blog post)  is quite messy because it needs to resort to reflection in order to access the private ThreadLocal members. It goes through all threads in the application server thread group and cleans up any ThreadLocal references to the class com.sun.enterprise.security.authorize.HandlerData.

After I put this class in my web application, no more need for Glassfish restarts even after dozens of redeployments, and jvisualvm shows that memory does get cleaned up after each deployment cycle. Fantastic!

I will take some time to report and maybe find a fix for this bug in the Glassfish project, but this episode left me wondering: ThreadLocals were probably never meant to be used in conjunction with thread pools, but that ends up happening anyway and is the cause of many memory leaks found in production systems.So why doesn't the thread pool implementations perform this ThreadLocal cleanup every time a thread is returned to the pool? I can't think of any use case in which one would like recycled threads to retain the ThreadLocal references. Implementing this logic in the thread pool would automatically fix many bugs like these.

Maybe the lack of public visibility into the ThreadLocal implementation is to blame? It would be worth adding a Thread.cleanThreadLocals() call to the API to make sure thread pools can perform this recycling in a secure and portable way.

 

 

 

Comments

happens in the best families

ThreadLocals are the simplest choice to pass info through the JEE layers, but must be used with care, since they are source of leaks.
A simple solution would be a ThreadLocal constructor that indicates to the ThreadPool that it should clear the variable when the Thread goes back to the pool, something like this:

public ThreadLocal( boolean cleanWhenBackToPool )

Until it, we must clean them by hand, using ThreadLocal.remove() method with try/finally blocks.

Great Detective Work

Great detective work! I've tried to tell people that ThreadLocal comes with this kind of added complexity, but now I have a blog to prove it.

Oh dear, haven't the developers heard of SoftReference!

You don't need to clear expensive ThreadLocals if you wrap the data the ThreadLocal in a SoftReference; if memory gets low the SoftReferences will get clearedby the GC.
This is not hard, so developers should be hang their heads in shame.
I'm astonished that ThreadLocal still does not provide a constructor parameter to specify the type of reference used to store values!

I extended ThreadLocal to make a SoftReference version, for just this annoyance:

public abstract class SoftThreadLocal<T> extends ThreadLocal<T> {
// Encapsulation required because Generics is stupid about references, and there is no interface for ThreadLocal,
//so I can't simply extend a single ThreadLocal, grr! Java has so many brittle design mistakes in it.
private final ThreadLocal<SoftReference<T>> local = new ThreadLocal<SoftReference<T>>();

@Override
public T get() {
SoftReference<T> ref = local.get();
T result = null;
if (null != ref) {
result = ref.get();
}
if (null == result) {
result = initialValue();
ref = new SoftReference<T>(result);
local.set(ref);
}
return result;
}

@Override
public void set(T value) {
if (null == value) {
remove();
} else {
local.set(new SoftReference<T>(value));
}
}

@Override
public void remove() {
local.remove();
}
}


BTW, this comment system is broken, it is too much work to get comments and code to format correctly here, because it ignore spaces and end of lines, and generics < and > have to be converted into XML lt and gt entities,and I have to add loads of br's, no way can I be bothered to add loads of nbsp entities for leading spaces, grr!

This site needs a proper text editor pane to do the hard work for you, no way should I have to type raw XHTML, because this site is too lazy to provide a proper comment editor, like AJAX sites do!

<p>Soft references do not fully&nbsp; solve the problem. ...

Soft references do not fully solve the problem. When you store a SoftReference<T> in ThreadLocal, the T may go away after a while, but the SoftReference itself remains until you call ThreadLocal.remove(). If T is a class from your web app, this may help to avoid a classloader leak, but the (expired) SoftReferences still accumulate in ThreadLocal.