What does it mean to be faster?
As a performance engineer, I'm often asked which X is faster (for a variety of X). The answer to that question always depends on your perspective.
Today, I'll talk about the answer in terms of hardware and application servers.
People quite often measure the performace of their appserver on, say, their laptop and a 6-core, 24-thread Sun Fire T1000
and are surprised that the cheaper laptop can serve single requests much faster
than the more expensive server.
There are technical reasons for this that I won't delve into -- there are architecture guides that go into all that. Rather I want to explore the question
of which of these machines is actually faster, particularly in a Java EE context. In an appserver, you typically want to process multiple requests at the same time. So looking at the speed of a single request isn't really interesting: what
is the speed of multiple requests?
To answer this, I took a simple program that does a long-running nonsense calculation. Running this on my laptop and 24-thread T1000, I see the following times
(in seconds) to calculate X items:
As you'd expect, the performance of the laptop degrades linearly, to where it takes 16.6 seconds to perform 24 calculations. The performance of the T1000 isn't
a linear scale, but even though it takes twice as as the laptop long to perform
a single calculation, it can perform 24 calculations in one-third of the time of the laptop.
In the context of an appserver, think of the calculation as the time required for the business methods of your app. I've walked through this explanation a number of times, and often I'm told that the business method is the critical part of
the app, and it must be done in .6 seconds for each user -- and hence the throughput of the T1000 isn't important. And that's fine: if you need to calculate a single method in .6 seconds, then you must use the single-threaded machine. But if you need to calculate two of those at the same time, then you'll need to get two of those machines, and if you need to calculate 24 of them, you'll need to get 24 machines.
So this brings us back to our question: which machine is faster? And it depends
on what you need. If you need to only do one calculation at a time, then the laptop is faster. If you need to do 3 or more calculations at the same time, then the T1000 is faster. Which is faster for you will depend on your application, your traffic model, and many other variables. As always, the best thing is to try your application, but if that's not feasible, be very careful about extrapolating whatever data you do have: you cannot simply extrapolate performance data from
a simple (single-threaded) model to a complex system.