Template- vs. C++-Interpreter shootout
The default interpreter that comes with the Hotspot VM is the so called "Template Interpreter". It is called template interpreter, because it is basically created at runtime (every time the Hotspot starts) from a kind of assembler templates which are translated into real machine code. Notice, that although this is code generation at runtime it should not be confused with the ability of the Hotspot to do Just In Time (JIT) compilation of computationally expensive program parts.
While a JIT compiler compiles a whole method (or even more methods together if we consider inlining) into executable machine code, the template interpreter, although generated at runtime, is still just an interpreter. It interprets a Java program bytecode by bytecode. The advantage of the template interpreter approach is the fact hat most of the code that gets executed for every single bytecode is pure machine code as well as the dispatching from one bytecode to the next, which can also be done in native machine code. Moreover this technique allows a very tight adaption of the interpreter to the actual processor architecture so the same binary will still run on an old 80486 while it may well use the latest and greatest features of the newest processor generation if available.
Beside the slightly increased startup time, the second drawback of the template interpreter approach is the fact that the interpreter itself is quite complicated. It requires for example a kind of builtin runtime assembler, which translates the code templates into machine code. Therfore porting the template interpreter to a new processor architecture is not an easy task and requires quite a profound knowledge of the underlying architecture.
In the earlier Java days (around JDK 1.4) a second interpreter existed beside the template interpreter - the so called C++ Interpreter. It was probably named that way, because the main interpreter loop was implemented as a huge switch statement in C++. Despite its name however, even the C++ Interpreter isn't completely implemented in C++. It still contains large parts like for example the frame manager which are written in assembler. It doesn't rely on recursive C++ method invocations to realize function calls in Java but instead uses the frame manager just mentioned before, which controls the stack manually. But despite these issues, the C++ interpreter is probably still easier to port to a new architecture than the template interpreter.
In Java 1.4 the C++ interpreter has been used for the Itanium port of the Hotspot. But after SUN abandoned the support for the Itanium architecture, it got quite silent around the C++ Interpreter although it was still present in the Hotspot sources. With the advent of OpenJDK, the demand from the developer community to get a working example of the C++ interpreter grew (see BugID: 6571248) and so the C++ interpreter was finally reactivated in build 20 of OpenJDK, (at least for the i486 and the SPARC architecture).
The C++ interpreter was basically working out of the box for the 32-bit x86 debug build and for the 32-bit opt and debug build on SPARC. If you would like to try the opt build on a 32-bit x86 platform, you'll currently have to apply this small patch: bytecodeInterpreter.patch. To make the C++ interpreter 64-bit clean on SPARC, a few more changes have to made, but I succeeded to get it running (at least for the JVM98 and the DaCapo benchmark suits) by applying these patches: bytecodeInterpreter_sparc.hpp.patch, cppInterpreter_sparc.cpp.patch, parseHelper.cpp.patch. After applying the patches you can build the Hotspot VM with the C++ interpreter instead of the usual template interpreter by setting the environment variable CC_INTERP in the shell where the build is started.
Beside the expected porting effort, performance will be probably one of the other main reasons for the decision for or against one of the two interpreters. I have therfore run the DaCapo performance test suite with both interpreters in interpreter only mode (-Xint) and in mixed mode (-Xmixed) together with the C2 server JIT compiler. The tests have been executed with a 32-bit VM on Linux/x86 and with a 32- and a 64-bit VM on Solaris/SPARC. The results can be seen in the following tables.
| 32 bit | 64 bit | |||||
|---|---|---|---|---|---|---|
| Template Interpreter |
C++ Interpreter |
(Tmpl*100) C++ |
Template Interpreter |
C++ Interpreter |
(Tmpl*100) C++ |
|
| antlr | 126516 ms | 257359 ms | 49.16% | 131355 ms | 289253 ms | 45.41% |
| bloat | 327444 ms | 851316 ms | 38.46% | 352711 ms | 956596 ms | 36.87% |
| chart | 250255 ms | 600670 ms | 41.66% | 265860 ms | 677299 ms | 39.25% |
| eclipse | 1003766 ms | 2180171 ms | 46.04% | 1041304 ms | 2454685 ms | 42.42% |
| fop | 19114 ms | 44072 ms | 43.37% | 20614 ms | 49592 ms | 41.57% |
| hsqldb | 67514 ms | 159739 ms | 42.27% | 76838 ms | 186426 ms | 41.22% |
| jython | 184255 ms | 445747 ms | 41.34% | 197455 ms | 504520 ms | 39.14% |
| luindex | 317580 ms | 726604 ms | 43.71% | 325140 ms | 809468 ms | 40.17% |
| lusearch | 57484 ms | 139343 ms | 41.25% | 61858 ms | 158497 ms | 39.03% |
| pmd | 153715 ms | 376361 ms | 40.84% | 164771 ms | 430127 ms | 38.31% |
| xalan | 69368 ms | 171061 ms | 40.55% | 75989 ms | 196171 ms | 38.74% |
| CmdLine | java [-d64 | -d32] -Xint -server -Xms256m -Xmx256m -jar dacapo-2006-10-MR2.jar\ -s default -n 7 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan Best out of 3 runs (only interpreter - no need to warmup) |
|||||
| System | SunOS 5.10, Sun-Fire-V440, 16GB memory, 4 UltraSPARC-IIIi CPUs at 1281 MHz | |||||
| 32 bit | 64 bit | |||||
|---|---|---|---|---|---|---|
| Template Interpreter |
C++ Interpreter |
(Tmpl*100) C++ |
Template Interpreter |
C++ Interpreter |
(Tmpl*100) C++ |
|
| antlr | 37962 ms | 39326 ms | 96.53% | 37339 ms | 45151 ms | 82.70% |
| bloat | 12018 ms | 24324 ms | 49.41% | 13403 ms | 29218 ms | 45.87% |
| chart | 14344 ms | 17339 ms | 82.73% | 16610 ms | 20054 ms | 82.83% |
| eclipse | 139999 ms | 172798 ms | 81.02% | 154389 ms | 195541 ms | 78.95% |
| fop | 3036 ms | 3700 ms | 82.05% | 3382 ms | 4018 ms | 84.17% |
| hsqldb | 11258 ms | 15007 ms | 75.02% | 16359 ms | 20612 ms | 79.37% |
| jython | 9792 ms | 15659 ms | 62.53% | 11562 ms | 18601 ms | 62.16% |
| luindex | 80190 ms | 83652 ms | 95.86% | 82075 ms | 86279 ms | 95.13% |
| lusearch | 6692 ms | 8671 ms | 77.18% | 7731 ms | 9742 ms | 79.36% |
| pmd | 11364 ms | 16937 ms | 67.10% | 17218 ms | 23836 ms | 72.24% |
| xalan | 7901 ms | 9768 ms | 80.89% | 10517 ms | 13019 ms | 80.78% |
| CmdLine | java [-d64 | -d32] -Xmixed -server -Xms256m -Xmx256m -jar dacapo-2006-10-MR2.jar\ -s default -n 7 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan Best out of 7 runs (mixed mode - need enough warmup) |
|||||
| System | SunOS 5.10, Sun-Fire-V440, 16GB memory, 4 UltraSPARC-IIIi CPUs at 1281 MHz | |||||
| Interpreted execution (-Xint) | Mixed mode execution (-Xmixed) | |||||
|---|---|---|---|---|---|---|
| Template Interpreter |
C++ Interpreter |
(Tmpl*100) C++ |
Template Interpreter |
C++ Interpreter |
(Tmpl*100) C++ |
|
| antlr | 58452 ms | 107494 ms | 54.38% | 31660 ms | 35035 ms | 90.37% |
| bloat | 136235 ms | 335865 ms | 40.56% | 6201 ms | 17728 ms | 34.98% |
| chart | 90805 ms | 209499 ms | 43.34% | 7574 ms | 11154 ms | 67.90% |
| fop | 8381 ms | 19088 ms | 43.91% | 1489 ms | 1956 ms | 76.12% |
| hsqldb | 32907 ms | 68857 ms | 47.79% | 4629 ms | 7192 ms | 64.36% |
| jython | 83621 ms | 188785 ms | 44.29% | 4403 ms | 8259 ms | 53.31% |
| luindex | 161362 ms | 344860 ms | 46.79% | 67150 ms | 73282 ms | 91.63% |
| lusearch | 33548 ms | 86230 ms | 38.91% | 4425 ms | 7198 ms | 61.48% |
| pmd | 69562 ms | 161983 ms | 42.94% | 5574 ms | 9899 ms | 56.31% |
| xalan | 49219 ms | 115101 ms | 42.76% | 5335 ms | 7449 ms | 71.62% |
| CmdLine | java [-Xmixed | -Xint] -Xms512m -Xmx512m -server -jar dacapo-2006-10-MR2.jar\ -converge antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan |
|||||
| System | SLES 9, kernel 2.6.5-7.283-bigsmp, 4GB memory, 4 Intel P4/Xeon CPUs at 3.06GHz | |||||
Although the numbers should be treated with some caution because of some possible measurements inaccuracies, all in all the results could be interpreted as follows. In interpreter mode (-Xint) the performance of the C++ interpreter varies between 35 and 50 percent of the performance of the template interpreter. In mixed mode (-Xmixed) a VM that runs with the C++ interpreter reaches from 45 up to 90 percent of the performance of a VM that runs with the template interpreter. The still sometimes huge differences between a VM with template versus one with C++ interpreter in mixed mode, where most of the "hot" code should be compiled anyway, may be in part explained by the lack of interpreter profiling in the C++ interpreter (the C++ interpreter runs with -XX:-ProfileInterpreter). This may lead to less optimal code generation but more details have to be further evaluated.
If you want to get more information about the current status of the C++ interpreter, you should probably follow the C++ Interpreter threads on the OpenJDK Hotspot mailing list. You can also read Gary Bensons online diary. There he writes about his experience of porting the OpenJDK to PowerPC using the C++ interpreter.
- Login or register to post comments
- Printer-friendly version
- simonis's blog
- 1983 reads





