Skip to main content

JAX-WS RI 2.1 benchmark details

Posted by kohsuke on February 2, 2007 at 10:41 AM PST

In this post I'm going to talk about the details of the benchmark Bharath did (kudos to him and the rest of the performance team.) For more about the JAX-WS RI 2.1 release in general, please refer to Vivek's post.

Summary

small.png

The basic idea of the benchmark is to have a lot of clients send a lot of requests to the server concurrently. The server echos back the data to the client, and then we measure how many requests a server is processing. The test is repeated with 15 different test payload; echo(Void|Integer|String|Date|Struct) tests a small payload, where the data is just one int, string, etc. echoSyntetic...K is the binary payload, where 1K,4K,8K,12K represents the size of the binary. Finally, echoArray and echoOrder tests have a significantly larger payload.

The picture on the right shows the summary. "Number of requests processed per second" is normalized so that you can always see Axis2 as 100%. Depending on the data point, you see that the JAX-WS RI is 30% to 100% faster.

On smaller payloads, such as echoString, echoInteger, etc, you tend to see a larger difference. This is because (relatively speaking) the weight of the databinding is small, and so the test reveals the true difference in the web service layer proper. On larger payloads, the time tends to be spent more on the databinding side, so the difference tends to become small.

Setup

SunFire_x4600.jpg

The client machine is SunFire x4600 with 15.5 GB of memory and 8 Opteron CPUs. It runs JDK 1.5.0_10-b03 on Solaris 10. We used this monster just to make sure that we have enough clients to keep the server busy all the time. We run total of 32 threads on this machine, each uses JAX-WS to send a SOAP request to the server as fast as possible. We verified that the server CPU was fully saturated.

The server machine is another SunFire x4600. It has the same amount of memory, same OS, same JDK, except that there are only 4 CPUs. We used Glassfish v2 milestone 4 as the container, with -server -Xms2g -Xmx2g as the JVM option. Glassfish is a JavaEE 5 container, which includes StAX. So this means we are using its StAX implementation, SJSXP.

We tried Axis2 1.1.1 with XMLBeans and JAX-WS RI 2.1 with JAXB. We tried to use Axis Data Binding first, but we noticed that under a high load it fails with what seemingly like a concurrency related data corruption. So we decided to move on to XMLBeans, which is listed next to ADB in their quick start guide. We'll see if we can figure out what's going on with Axis+ADB in the future. It could be a Glassfish problem, who knows.

Each test was run for 2 minutes. The first 1 minute is just for the warm-up time, and the measurement only considers the 2nd minute. So one complete test run takes 2 minutes x 15 tests x 2 toolkits = 1 hour. Our harness runs this 4 times, and throw away the first two runs as additional warm-up. The data shown below were the result of the last 2 runs (out of 4.)

We plan to make the benchmark code available on java.net, so stay tuned.

Results

The raw numbers are as shown on the right:

test case Axis2 JAX-WS
TPS stddev TPS stddev
echoVoid 11712.373 90.376 19620.673 48.772
echoInteger 8753.796 13.278 17798.787 18.859
echoFloat 8840.666 22.376 17613.225 56.008
echoString 8728.365 17.779 17696.861 7.601
echoDate 8175.369 27.88 17137.887 19.917
echoStruct 7761.562 26.211 16753.703 90.206
echoSynthetic1K 6599.274 24.458 12754.212 162.251
echoSynthetic4K 4004.815 3.561 7701.041 18.397
echoSynthetic8K 2773.538 0.541 4867.424 38.1
echoSynthetic12K 2071.586 2.372 3343.501 48.81
echoArray40 1640.998 0.489 2375.238 1.419
echoArray80 923.301 1.065 1258.433 7.039
echoArray120 643.322 0.166 850.133 3.641
echoOrder200 516.847 0.218 715.522 2.389
echoOrder500 210.802 0.022 284.707 0.192

The 'TPS' stands for 'transactions per second' and this represents the number of requests that was processed per second. The 'stddev' is the standard deviation between different runs, so you can use that to see how many digits of TPS you can trust.

Analysis

We've been working on this for a long time now, and coincidentally another group posted another web service stack benchmark just a few days ago. While we only had very limited time to look at it, we noticed that their benchmark, despite being run on 4-way Xeon system, records roughly around 3,000 reqs/sec (for example on echoVoid test.) Our benchmark recorded more than 10,000 reqs/sec, even for Axis2, on a 4-way Opteron system. While one cannot really compare reqs/sec on different systems in a meaningful way, we nevertheless wonder if their Xeon system could have done much better than 3,000 reqs/sec.

It's also clear we've got more work to do here. We need to get to the bottom of the ADB issue for one thing. We also want to test the scalability of these stacks.

In the end, however, what really matter to you is your own appliation with your own data. So we want you to compare the toolkits by yourself, with your own use cases, and let us know what your findings are.

Related Topics >>