Skip to main content

ab considered harmful

Posted by sdo on March 23, 2007 at 3:09 PM PDT


For the fifth time this year, I've been contacted by a distraught user
claiming that glassfish doesn't scale or run well based on results seen
from ab (the href="">Apache
Benchmark). And so again, I've had to explain why ab is a terrible
tool to use to measure the performance of your application (or web)

To be fair, glassfish does have some out-of-the-box settings that make
its benchmark test results less than ideal. Jeanfrancois has href="">this
excellent blog that describes the basic settings you need to change
before even beginning to do serious performance analysis. I'm hopeful
that we'll have better profiles by the time FCS runs around so that a
performance-based profile is easily available to end users. [There are
some conflicts between optimal settings for developers and production,
which is one cause of our problem here, not to mention some historical
baggage we have for backward-compatibilty. But that's a topic for
another day.]

But once you have a reasonably configured appserver, ab is still not
the best tool to use to measure your performance. The biggest problem
is that ab is a single-threaded process, and you're typically
interested in measuring the performance of your multi-CPU machine
running the multi-threaded appserver. You can (I hope) see the inherent
problem: you have 1 CPU of client-side resources and, say, 4 CPUs of
server-side resources. Which side will become the bottleneck first? The
client side -- meaning all you've accomplished is measuring the
performance of ab itself.

This all depends on what you're measuring, of course. Lately, using ab
to measure the retrieval of a single static image seems to be all the
rage, and this is the worst possible test. Let's say that it takes the
appserver 50% longer to process the request for http://host/foo.gif
than it takes for ab to send the request and parse the response to make
sure it came back correctly (and drain the socket of all the data).
Even that is unrealistic, but what it means is that you'll end up using
1.5 CPUs on your appserver by the time your client gets saturated.
Nothing you do to the appserver will make this better; the bottleneck
is ab.

So now you're thinking: what if I have multiple CPUs on my client and I
use that -c option to ab: the option that's supposed to send
"concurrent" requests. Won't that scale? Unfortuantely not, because the
"concurrent" requests are still processed sequentially by ab. ab has
only a single thread available to it, so all it does is send multiple
requests (one after the other), read any responses that have been sent
back (still only one at a time), send any new requests, and so on. It
is still limited to utilizing at most a single CPU.

And what of the timings you get out of this? The single ab thread sends
a request at time 0. Then if it has other responses to process, it will
do so. Say there are 10 more reponses to process (which means draining
the socket of data, and sending the next request on the socket), and
then say ab takes 10 milliseconds for each request. Only then will it
again look for a response to the original request. If the response to
the original request is waiting for ab, ab will report that it took 110
milliseconds for that request to be processed. But that's only because
ab itself spend 100 milliseconds handling other details; it has
erroneously charged all of that time it spends sequentially processing
data to the pending response. Client-side overhead in any
load-generating tool is a problem, but the sequential design of ab
makes the problem much worse in ab than in other load generators.

Finally, what about those responses? If you run ab -c 100, there are
100 channels open to the server, and ab will report how much throughput
comes through those 100 channels. But it won't tell you anything about
fairness: 100 responses could come from one channel, or 1 response
could come from each channel, and ab will give you the same answer. In
fact, given its sequential design, an application server that responds
unfairly to requests will show better response times in ab than an
application server that responds to requests fairly. But somehow, I
don't think the actual users of the first application server will be
all too happy (well, one of them will be quite happy indeed!).

Are there alternatives to ab? I'm quite happy with href="">faban, an open-source
benchmarking toolkit developed by some of my colleagues. It is
multi-threaded, can access arbitrary URLs, and measures fairness among
other things. It is trickier to set up than ab, though in a future blog
I'll explore how it can be used as an ab alternative. Until then, if
someone offers you ab, just say no.

Related Topics >>