A first look at V3 Performance
For most of the year, I've been working on session replication code for Sailfin. When I came back to work with the Glassfish performance team, I found that we had some pretty aggressive goals around performance, particularly considering that Glassfish V3 had a completely new architecture, was a major rewrite of major sections of code, and implements the new Java EE 6 specification. Glassfish V3 in those terms is essentially a .0 release, and I was convinced we'd see major performance regressions from the excellent performance we achieved with Glassfish V2.
Color me surprised; in the end, we met or exceeded all of our goals for V3 performance. For the most part, our performance tests are based on customer applications, industry benchmarks, and other proprietary code that we can't open source (nor share results of). But I can discuss some of those tests, and in this blog we'll look at our first set of sanity tests. These test the basic servlet operations of the web container; we'll look at three such tests:
- HelloServlet -- the "Hello, world" of servlets; it simply prints out 4 lines of HTML in response to each request, incrementing a global counter to keep track of the total number of requests.
- HelloServletBeanJsp -- that same servlet, but now for each call it instantiates a simple JavaBean and then forwards the request (and bean) to a JSP page (i.e., the standard MVC model for servlets)
- HelloSessions -- the hello servlet that keeps track of a session counter (in a session attribute) instead of a global counter
Our goal here is that V3 would be at least as fast at V2 on these tests and remain ahead of the pack among open source application servers. The application servers are hosted on a Sun X4100, which is a 4 core (2 chip) AMD box running Solaris 10. The load is driven using the Faban HTTP Benchmarking program (fhb), which can drive load to a single URL from multiple clients (each client running in a separate thread -- an important consideration in a load generator). As a first pass, we run 20 users with no think time to see how much total load we can generate in the server:
I've normalized the chart to V2 performance. And what we see is that even on the simplest test -- the HelloServlet -- V3 manages to increase the total server throughput by a few percentage points. And while I was concerned about the effects of a new architecture, the OSGI classloading architecture and reworking of the glassfish classloading structure meant that we could take care of a long-standing issue in the V2 classloader -- so now every time we call Beans.instantiate() (or do anything else related to class loading), we can operate much more quickly. When it comes to session management, V2 and V3 come out the same.
The other columns in the chart represent jBoss 5.1 and Tomcat 6.0.20; our goal was to beat those containers on these tests, and we did. However, you might take that with somewhat of a grain of salt, as I am not an expert in those containers, and there are possibly container tunings that I might have missed for those. In fact, these tests are done with a small amount of tuning:
- JVM options for all products are set to -server -Xmx2000m -Xms2000m -XX:NewRatio=2 -Xss128k. Using Sun's JDK 6 (6U16) means that ergonomics will kick in and use the parallel GC collector with 4 threads on this machine.
- The thread pool size for all products is set to 10 (both min and max; I'm not a fan of dynamically resizing threadpools).
- The server will honor all keep alive requests (fhb specifies this automatically) and allow up to 300000 requests from a single client before closing the socket (maxKeepAliveRequests)
- The server will use 2 acceptor threads and a backlog of 300000 requests (that tuning is really needed only for the scalability test discussed below)
- For jBoss, I followed the recommendation to use the Tomcat APR connector. As far as I can tell, Netty is not integrated into jBoss 5, though if you know otherwise, I'd love a link to the details.
- For tomcat, I used the Http11NIOProtocol connector
- In the default-web.xml for JSPs, genStrAsCharArray is set to true and development is set to false
Happy with a simple throughput test, I proceeded to some scalability tests. For these tests, we also use fhb -- but in this case we run multiple copies of fhb, each with 2000 users and a 1 second think time. This allows us to vary the number of users and test within a pre-defined response time (which is at most 1 second, or the client will fall behind the desired think time). The number of connections that we can run at each test will vary depending on the work -- the HelloServlet test had an initial throughput of almost 42,000 operations per second, and so we were able to test to 56,000 connected users with 28 copies of fhb (which we distributed among 7 x4100 machines; each core essentially running 2000 users). The test involving forwarding to a JSP does almost twice the work, and we can only run 32,000 users within these timing constraints; for the session tests we can run 40,000 users.
Here are the results: So despite all my initial qualms, V3 has performed admirably; it handled those 56,000 simultaneous clients without breaking a sweat. [Well, if a CPU can sweat, it might have -- it was quite busy. :-)] There are no results from tomcat or jBoss for this test, both failed in the configurations I had with these large number of users. In fact, they failed with even smaller numbers of users; I didn't test below 10000, but neither could handle the load even that high. Again, this is certainly possibly due to my lack of knowledge about how to configure the products. Though I'm not convinced about that -- tomcat failed because it had severe GC problems which are caused by a finalizer in the org.apache.tomcat.util.net.NioBlockingSelector$KeyReference class; the finalizer And jBoss failed because of severe lock contention around some lock in the org.apache.tomcat.util.net.AprEndpoint class. Still, there might be a workaround for both issues.
At any rate, I'm a happy camper today: glassfish V3 is going out the door with excellent performance characteristics, thanks to lots of hard work along the way by the engineering community -- thanks guys!