The Source for Java Technology Collaboration
User: Password:



Jean-Francois Arcand

Jean-Francois Arcand's Blog

Can a Grizzly run faster than a Coyote?

Posted by jfarcand on March 19, 2006 at 07:50 PM | Comments (10)

Un coyote court t-il plus vite qu'un grizzly? Or can an NIO based HTTP connector be as fast as a traditional IO HTTP Connector or a C HTTP Connector? The next couple of lines will compare Tomcat 5.5.16, both the Coyote HTTP11 and the Tomcat Native (version 1.1.2) connector (aka APR) with the GlassFish Grizzly NIO powered HTTP Connector. Grizzly is an NIO extension of the HTTP11 implementation.

But don't be fooled by the numbers I'm gonna publish. My goal here is to clarify the myth that NIO non-blocking sockets cannot be used along with the HTTP protocol. This blog is not against Tomcat, and I'm still part of the Tomcat community (although I'm not helping a lot those days, but very interested by Costin works on NIO). OK enough rant.....

First, if my numbers aren't matching your real life application, I will be interested to hear about it. If you like APR/OpenSSL functionalities and think they should be included in GlassFish, I will be more than happy to port them in GlassFish. But let waits for the numbers before saying yes :-)

Oh...BTW some Grizzly numbers has already been published as part of the SJSAS PE 8.2 specJ2004 results. All results can be found here.

Passons maintenant aux choses serieuses....

Differences between Tomcat and GlassFish
First, in order to compare the two, let's explore the differences between the products. Since GlassFish is a J2EE Container, the bundled WebContainer has to support more extensions than Tomcat. Those extensions mainly consist of supporting EJB and the JavaTM Authorization Contract for Containers. Both extensions have an impact on performance because internaly, it means extra events notification needs to happen (In Catalina, in means LifecycleEvent).

A perfect integration will means no performance regressions when extensions are added, but unfortunaly having to support EJB is adding a small performance hit. Fortunalty, JavaTM Authorization Contract for Containers isn't impacting performance. Hence the best comparison would have been to compare JBoss and GlassFish, or Tomcat with Grizzly in front of it. But I'm too lazy to install JBoss....


Out-of-the-box difference

The main difference are:
+ GlassFish has Single Sign On enabled, Tomcat doesn't
+ GlassFish has Access Logging enabled, Tomcat doesn't
+ GlassFish starts using the java -client, Tomcat doesn't set any.

Let's turn off the differences in GlassFish, by adding, in domain.xml:


<property name="accessLoggingEnabled" value="false" />
<property name="sso-enabled" value="false" />
</http-service>

and starts both product using java -server. For all the benchmarks I'm using Mustang:


Java(TM) SE Runtime Environment (build 1.6.0-beta2-b75)
Java HotSpot(TM) Server VM (build 1.6.0-beta2-b75, mixed mode)

Also, Grizzly has a different cache mechanism based on NIO, where Tomcat cache static file in memory using its naming implementation. Grizzly Cache is configured using:


<http-file-cache file-caching-enabled="true" file-transmission-enabled="false" globally-enabled="true" hash-init-size="0" max-age-in-seconds="600" max-files-count="1024" medium-file-size-limit-in-bytes="9537600" medium-file-space-in-bytes="90485760" small-file-size-limit-in-bytes="1048" small-file-space-in-bytes="1048576"/>

And I've turned off the onDemand mechanism:

-Dcom.sun.enterprise.server.ss.ASQuickStartup=false

to fully starts GlassFish.


Envoyons de l'avant nos gens, envoyons de l'avant!

ApacheBench (ab)
Let start with a well know stress tool called ab (google ApacheBench if you don't know it). I will use ab to compare performance of:

+ small, medium, and very large gif files (large & medium gives similar results)
+ basic Servlet
+ basic JSP (the default index.jsp page from Tomcat)

I'm gonna compare Tomcat HTTP11, Tomcat APR and GlassFish. I have tried to comes with the best configuration possible for Tomcat. I got the best number with:


<Connector useSendFile="true" sendfileSize="500" port="8080" maxHttpHeaderSize="8192" pollerSize="500" maxThreads="500" minSpareThreads="25" maxSpareThreads="250" enableLookups="false" redirectPort="8443" acceptCount="5000" connectionTimeout="20000" disableUploadTimeout="true" />

I'm putting more threads than the expected number of simultaneous connections (300) here so HTTP11 can be tested properly (since HTTP11 is one connection per thread). I've also try adding:

firstReadTimeout="0" pollTime="0"

but the result wasn't as good as with the default.

For Grizzly, I've used:


<http-listener id="http-listener-1" ... acceptor-thread="2" .../>
<request-processing header-buffer-length-in-bytes="4096" initial-thread-count="5" request-timeout-in-seconds="20" thread-count="10" thread-increment="10"/>

Yes, you read it it correctly. Tomcat needs 500 threads where Grizzly needs only 10. NIO non-blocking is facinating, is it?

The ab command I've used is:

% ab -q -n1000 -c300 -k http://perf-v4.sfbay.sun.com:8080/XXX

Ready to see numbers? Not yet, here is the machine setup:

server

OS : Red Hat Enterprise Linux AS release 4 (Nahant Update 2)
ARCH : x86
Type : x86
CPU : 2x3.2GHz
Memory : 4 GB

client

OS : RedHat Enterprise Linux AS 4.0
ARCH : x86
ype : x86
CPU : 2x1.4GHz
Memory : 4GB

OK, now the numbers. For each test, I've run 50 times the ab command and took the means (for large static resource, I've ran between 10 and 13 because it takes time). The number aren't changing if I remove the ramp up time for every Connector, so I decided to not remove them.

Small static file (2k)

% ab -q -n1000 -c300 -k http://perf-v4.sfbay.sun.com:8080/tomcat.gif

4k.jpg
Grizzly: 4104.32 APR: 4377.2 HTTP11: 4448.08
Here HTTP11 is doing a very good job, where Grizzly seems to be hunting and maybe servicing requests. Why? My next blog will explain a problem I'm seeing with MappedByteBuffer, small files and FileChannel.transferTo.

Medium static file (14k)

% ab -q -n1000 -c300 -k http://perf-v4.sfbay.sun.com:8080/grizzly2.gif

small.jpg
Grizzly: 746.27 APR: 749.63 HTTP11: 745.65
OK here Grizzly is better (thanks MappedByteBuffer).

Very large static file (954k)

% ab -q -n1000 -c300 -k http://perf-v4.sfbay.sun.com:8080/images.jar

large.jpg
Grizzly: 11.88 APR: 10.5 HTTP11 10.6.
Hum... here APR has connections errors (means 10) as well as HTTP11 (means 514) and keep-alive hasn't been honored for all connections. Grizzly is fine on that run.

Simple Servlet

% ab -q -n1000 -c300 -k http://perf-v4.sfbay.sun.com:8080/tomcat-test/ServletTest

Servlet.jpg
Grizzly: 10929.93 APR: 10600.71 HTTP11: 10764.67
Interesting numbers...but can't say if it's the Connector or Catalina (Servlet Container). GlassFish Catalina is based on Tomcat 5.0.x (plus severals performance improvement)).

Simple JSP

% ab -q -n1000 -c300 -k http://perf-v4.sfbay.sun.com:8080/index.jsp

JSP.jpg
Grizzly: 1210.49 APR: 1201.09 HTTP11: 1191.57
The result here is amazing because Tomcat supports JSP 2.0, where GlassFish supports JSP 2.1. Kin-Man, Jacob and Jan (to name a few) doesn't seems to have introduced performance regressions :-).

One thing I don't like with ab is it doesn't measure the outliers, meaning some requests might takes 0.5 seconds, some 5 seconds. All requests are counted, whatever they took to be serviced. I would prefer a Connector that avoid outlier (or at least I don't want to be the outlier when I log on to my bank account!). Let see what the second benchmarks can tell:

Real world benchmark
The purpose of my second benchmark is to stress the server with a real world application that contains complex Servlet, JSP and Database transaction. I think ab is good indicator of performance but focus more on throughput than scalability. The next benchmark simulates an e-commerce site. Customers can browse through a large inventory of items, put those items into shopping carts, purchase them, open new accounts, get account status: all the basic things you'd expect. There is also an admin interface for updating prices, adding inventory, and so on. Each customer will execute a typical scenario at random; each hit on the website is counted as an operation.

The benchmark measures the maximum number of users that the website can handle assuming that 90% of the responses must come back within 2 seconds and that the average think time of the users is 8 seconds. The result are:

j-tpcw.jpg
Grizzly: 2850 APR: 2110 HTTP11: 1610

Note: I have try APR with 10 threads, 1000 threads and the best results was obtainned using 500 threads.

Conclusion
Do your own conclusion ;-) My goal here was to demonstrate that NIO (using non blocking sockets) HTTP Connector are ready for prime time. I hope the myth is over....

Once I have a chance, I would like to compare Grizzly with Jetty (since Jetty has an NIO implementation). I just need to find someone who knows Jetty and can help me configuring it properly :-)

Finally, during this exercise I've found a bug with the 2.4 kernel and the performance was really bad. Also, it is quite possible you run benchmarks where Tomcat perform better. I would be interested to learn about the way Tomcat was configured.....


technorati:


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • I thought we found that there is no need to switch off on-demand initialization and the performance issue was due to the VM bug you mentioned inthe conclusion.

    So, what is the use of switching off on-demand initialization?

    Posted by: binod on March 19, 2006 at 09:49 PM

  • In doing some perf benchmarks and looking at more rich UI interactions over HTTP via XML, JSON, or DWR, there's a lot more information being posted from the client. We've benchmarked some pretty poor performance from popular containers, including Tomcat, but I'm wondering if you could add numbers as they relate to posting 500B to 5KB transmissions up to the server?

    Posted by: jhook on March 19, 2006 at 09:50 PM

  • To Binod: We closed bugafter after we found the 2.4 problem. I did all the tests before and didn't want to respin everything. There is no performance hit by the onDemand stuff :-)

    Posted by: jfarcand on March 20, 2006 at 01:53 PM

  • To Jacob: Can you elaborate more on the way you want the numbers? Do you want to measure performance of POSTing 500b to 5KB or GETing those? With ab it is only GET. If you have a link to the benchmark you are using, I can run it on GlassFish.

    Posted by: jfarcand on March 20, 2006 at 01:56 PM

  • it's out of curiosity, we're trying to analyze possible slow downs with posting large amounts of info and then processing, but it seems that simply requesting the parameter that contains large amounts of data caused a bottleneck in the way that the inputstream is handled by the containers.
    I guess a good test would be to post a parameter value of varying sizes: 500b to 5KB of XML or random data and then on the servlet, simply just get the parameter and ACK back to the client. There's a lot of focus on download performance from these connectors, but with AJAX, DWR, JSF, etc, it'd be nice to see if there's differences with upload performance with posting parameter information.

    Posted by: jhook on March 20, 2006 at 02:12 PM

  • Jacob: all Servlet API provided methods to get parameters are inherently slow. This also includes readLine. I suppose you were talking about that.

    The benchmark itself is bogus, but it's to be expected when a benchmark is done by a competitor (I like the last test where the APRized HTTP manages exactly 500 more connections than the regular HTTP). It's also quite irrelevant, everything is most likely fast enough anyway.

    My problem is the "My goal here was to demonstrate that NIO (using non blocking sockets) HTTP Connector are ready for prime time. I hope the myth is over....". The problem is that the myth, as far as I can tell, is not really a myth. NIO has a significant number of implementation (due to its complexity ?) issues, or platform specific issues, like the one you found during your testing. If you run into an issue like this, you have to either upgrade your VM/OS, or work around it in some way. This is not very production friendly. In addition to this, NIO doesn't give access to many native features, and you may have trouble scaling well to a larger amount of threads, which some applications may require.

    People suggested that I give APR/OpenSSL a try, and in indeed worked well enough, giving real independence on the critical IO code, while consolidating all the work done by the ASF on httpd.

    Posted by: remm on March 20, 2006 at 03:36 PM

  • Remy, why do you think the benchmarks are bogus? I've used ab in part because everytime performance is discussed on tomcat-dev, ab is used as an indicator, so I gived a try event if I don't like ab.....The last benchmark doesn't use ab but a jmeter like implementation, which is more accurate of the real world I think. About the "by a competitor", well it was clear from the beginning I was ready to report any numbers I was getting. The tomcat.gif is clearly an example of that.

    About the VM bug, right now we think it is a RedHat thread pool problem (I was using an rh8 with getconf GNU_LIBPTHREAD_VERSION = linuxthreads-0.10) When I use a 2.4 kernel with an updated thread pool, I don't get the problem. It is most probably an OS problem(still, your point is valid about upgrading the OS).

    Posted by: jfarcand on March 20, 2006 at 03:54 PM

  • Remy: With the parameter fetching, I'm glad that this is a confirmed issue, but was wondering if there's any way to fix this within the kackles of the connector implementations? I guess I look at what people are doing on the net now with rich clients both browser and RPC and I'm wondering where these containers can go with increasing the responsiveness in the first part of these transactions. I would almost consider this an evolutionary change in requirements with containers developers actively seek Web 2.0 nirvana (or whatever).

    Posted by: jhook on March 20, 2006 at 05:07 PM

  • Jacob: I think we're doing all we can parsing wise, but the operation itself is expensive. You have to URL decode (cheap), char convert (not cheap), parse (cheap), create Strings and all the other objects (not that cheap). If you stream uploaded data (file upload), it should be fast, but if you want to send 1000 URL encoded form parameters, there's going to be performance issues, I'm afraid.

    About the validity of the benchmark, as you know it's fairly complicated business. If you want to benchmark "fairly" you have to use either OOB settings, or, if you do not, learn to tweak the other implementation inside out. The configuration used here looks really wrong, and of course the usage of Mustang doesn't seem a very good idea for now either.

    Posted by: remm on March 21, 2006 at 02:48 AM

  • Tomcat's HTTP 1.1 connector has horrible performance for "real world" applications. I have spent a lot of time inspecting it and comparing performance to major commercial application servers.

    Fundamentally, the combination of blocking I/O and using the same thread pool to handle I/O and servlets is the issue. I/O might get the best performance at 500 threads (for blocked I/O), but the servlet pool probably is most optimal with 20 threads. You either have horrible I/O issues due to too few threads but decent servlet performance, or tons of threads thrashing around in the servlet engine unoptimally.

    If you really want to stress Coyote out, do the following

    use a servlet or jsp that spins for 50ms (as if it is doing work), sleeps for 100ms (as if it is talking to a database), and spits out about 35k of html.

    Use any benchmarking tool that behaves like real users (http keep alives with reasonable intervals between requests at a minimum).

    With a good http connector you can go up to 2000 http connections easily. With Coyote, good luck maintaining peak throughput past 150.

    I have not investigated this issue on Tomcat 5.5 with the new connector, perhaps some of the above has changed.

    Posted by: sputnik78 on September 20, 2006 at 12:49 PM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds