The Source for Java Technology Collaboration
User: Password:



Scott Oaks's Blog

Dynamically sizing threadpools

Posted by sdo on June 07, 2007 at 10:15 AM | Comments (2)

Almost every thread pool implementation takes great pains to make sure that it can dynamically resize the number of threads it utilizes: you specify the mininum number of threads you want, the maximum number, and the thread pool in its wisdom will automatically configure itself to have the optimal number of threads for your workload. At least, that's the theory...

But what about in practice? I'd argue that its utility is very limited, and that in many cases, a dynamically-resizing threadpool will actually harm to the performance of your system.

First, a quick review of why we have threadpools. From a perfomance perspective, the most important task of a threadpool is to throttle the number of simulatneous tasks running on your system. I know that you may think that the purpose of a threadpool is to allow you to conveniently run multiple things at once. It does that, but more importantly, it prevents you from running too many things at once. If you need to run 100 CPU-bound tasks on a machine with 4 CPUs, you will get optimal throughput if you run only 4 tasks at a time: each task fully utilizes the CPU while it is running. Since you can't run more that 4 tasks at once, you won't get get any better throughput by having more threads -- in fact, if you add more threads to the saturated system, your throughput will go down: the threads will compete with each other for CPU and other system resources, and the operating system will spend more time than necessary managing the competing threads.

In the real world, of course, tasks are never 100% CPU-bound, so you'll usually want more threads than CPUs to get optimal use of your system. How many more is a function of your workload:  how much time it waits for external resources like a database, and so on. But there will be an optimal number, usually quite less than the number of simultaneous tasks your can handle (particularly if those tasks represent jobs coming in from remote users -- e.g. a web or application server handling thousands of connections). The determining rule is this: is you have more tasks to perform AND you have idle CPU time, then it makes sense to add more threads to the pool. If you have more tasks to perform but no idle CPU time, then it is counter-productive to add threads to the pool. And that's my problem with dynamically resizing threadpools: if they choose to add threads because there are tasks waiting (even though there is no available CPU time), they will hurt your performance rather than help it.

Conceivably, you could use some native code to figure out the idle CPU time on your system and have a threadpool that takes that information into account. That would be better, but even that is insufficient. Say you have an application server accessing a remote database using JPA. Now if the database becomes a bottleneck, you'll have idle CPU time on your application server, and it will have tasks that are waiting. But adding threads to run those tasks will again make things worse: it will increase the work needed to be done by the already-saturated database, and your overall throughput will suffer. In the final analysis, you are the only one that will have all the necessary information to know if it is productive to increase the size of your thread pool.

So you are responsible for setting the maximum size of the threadpool to a reasonable value, so that the system will never attempt to run too many threads at once. Given you've done that, is there a point in having a mininum number of threads? The claim is that there is, because it can save on system resources. But I would argue that the impact of that is really minimal. Each thread has a stack and so consumes a certain amount of memory. But if the thread is idle and the machine doesn't have enough physical memory to handle everything on the system, that idle memory will simply be paged out to virtual memory. Even if the thread exits, the memory it used for its stack still belongs to the JVM process -- the JVM might reuse that memory for something else, but in general, the memory cannot be returned to the operating system for use by other processes. So the memory issue doesn't really have much impact. Depending on the application, it's conceivable that fewer idle threads may have a small impact because when a thread is reused, it might happen to have some important data in the CPU cache (whereas an idle thread selected to run a task won't have any data in the CPU cache), but the effects of that in the real world are pretty much non-existent. So it doesn't hurt to have a minimum number of threads, but you get no real advantage from that either.

One area that can be very subtle in this regard is the ThreadPoolExecutor, which can be configured to have three values: a minimum, a core value, and an absolute maximum. In general, threads are added when tasks are waiting until the system runs the desired core value of threads. Then everything chugs along nicely, even though a certain number of tasks may be waiting in the queue. Now say that the system can't keep up with the tasks queue: the task queue length grows beyond some defined value. In response to this, the executor will start adding threads (up to the absolute maximum). But if the system is CPU-bound, or if the system is causing a bottleneck on an external resource, adding those threads is exactly the wrong thing to do. And because this happens only under circumstances such as an increased load, it might be something that you fail to catch in normal testing: during normal testing, you'll usually run with the core number of threads and may not even notice that you've misconfigured the maximum number of threads to a value the system cannot handle. The converse of this argument is that the thread pool executor can add new threads when a burst of traffic comes, and as long as there are resources available to execute those threads, the executor can handle the additional tasks (and then, once the burst is over, the extra threads can exit and reduce system resource usage). But given the minimal-at-best effect that has on system resources, handling a burst like that doesn't make a lot of sense to me, particularly given the potential for increasing load on the system at exactly the wrong time.

All of that is why I always choose to ignore dynamically sizing threadpools, and just configure all my pools with a static size.

Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • I like the SEDA approach. Be able to measure throughput. Then when adding a thread to the pool see if that improved throughput; if not take it away. http://www.eecs.harvard.edu/~mdw/proj/seda/

    Posted by: cliffwd on June 29, 2007 at 12:23 PM

  • I tend to agree that dynamically adding threads when the system is already saturated would be a bad idea.

    I have a related posting here: http://forum.java.sun.com/thread.jspa?forumID=534&threadID=5185432 But from the angle of a bounded vs. unbounded blocking queue.

    In short, it asks the question how to appropriately manage tasks that your system can't keep up with such that you don't run out of memory. Furthermore, add the restriction that the predefined policies, AbortPolicy, CallerRunsPolicy, etc. are not appropriate - e.g. you can't reject tasks entirely and you can't run them in the calling thread.

    It seems like the SEDA link in the previous comment aims to solve this problem, although I haven't really delved into HOW it does that.

    Posted by: justinmiller on July 02, 2007 at 06:13 AM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds