Skip to main content

Template- vs. C++-Interpreter shootout

Posted by simonis on November 16, 2007 at 3:22 AM PST

The Template-Interpreter

The default interpreter that comes with the Hotspot VM is the so called "Template Interpreter". It is called template interpreter, because it is basically created at runtime (every time the Hotspot starts) from a kind of assembler templates which are translated into real machine code. Notice, that although this is code generation at runtime it should not be confused with the ability of the Hotspot to do Just In Time (JIT) compilation of computationally expensive program parts.

While a JIT compiler compiles a whole method (or even more methods together if we consider inlining) into executable machine code, the template interpreter, although generated at runtime, is still just an interpreter. It interprets a Java program bytecode by bytecode. The advantage of the template interpreter approach is the fact hat most of the code that gets executed for every single bytecode is pure machine code as well as the dispatching from one bytecode to the next, which can also be done in native machine code. Moreover this technique allows a very tight adaption of the interpreter to the actual processor architecture so the same binary will still run on an old 80486 while it may well use the latest and greatest features of the newest processor generation if available.

Beside the slightly increased startup time, the second drawback of the template interpreter approach is the fact that the interpreter itself is quite complicated. It requires for example a kind of builtin runtime assembler, which translates the code templates into machine code. Therfore porting the template interpreter to a new processor architecture is not an easy task and requires quite a profound knowledge of the underlying architecture.

The C++-Interpreter

In the earlier Java days (around JDK 1.4) a second interpreter existed beside the template interpreter - the so called C++ Interpreter. It was probably named that way, because the main interpreter loop was implemented as a huge switch statement in C++. Despite its name however, even the C++ Interpreter isn't completely implemented in C++. It still contains large parts like for example the frame manager which are written in assembler. It doesn't rely on recursive C++ method invocations to realize function calls in Java but instead uses the frame manager just mentioned before, which controls the stack manually. But despite these issues, the C++ interpreter is probably still easier to port to a new architecture than the template interpreter.

In Java 1.4 the C++ interpreter has been used for the Itanium port of the Hotspot. But after SUN abandoned the support for the Itanium architecture, it got quite silent around the C++ Interpreter although it was still present in the Hotspot sources. With the advent of OpenJDK, the demand from the developer community to get a working example of the C++ interpreter grew (see BugID: 6571248) and so the C++ interpreter was finally reactivated in build 20 of OpenJDK, (at least for the i486 and the SPARC architecture).

The C++ interpreter was basically working out of the box for the 32-bit x86 debug build and for the 32-bit opt and debug build on SPARC. If you would like to try the opt build on a 32-bit x86 platform, you'll currently have to apply this small patch: bytecodeInterpreter.patch. To make the C++ interpreter 64-bit clean on SPARC, a few more changes have to made, but I succeeded to get it running (at least for the JVM98 and the DaCapo benchmark suits) by applying these patches: bytecodeInterpreter_sparc.hpp.patch, cppInterpreter_sparc.cpp.patch, parseHelper.cpp.patch. After applying the patches you can build the Hotspot VM with the C++ interpreter instead of the usual template interpreter by setting the environment variable CC_INTERP in the shell where the build is started.

Template- vs. C++-Interpreter shootout

Beside the expected porting effort, performance will be probably one of the other main reasons for the decision for or against one of the two interpreters. I have therfore run the DaCapo performance test suite with both interpreters in interpreter only mode (-Xint) and in mixed mode (-Xmixed) together with the C2 server JIT compiler. The tests have been executed with a 32-bit VM on Linux/x86 and with a 32- and a 64-bit VM on Solaris/SPARC. The results can be seen in the following tables.

Table 1: Interpreted execution (-Xint) on Solaris/Sparc
  32 bit 64 bit
  Template
Interpreter
C++
Interpreter
(Tmpl*100)
C++
Template
Interpreter
C++
Interpreter
(Tmpl*100)
C++
antlr 126516 ms 257359 ms 49.16% 131355 ms 289253 ms 45.41%
bloat 327444 ms 851316 ms 38.46% 352711 ms 956596 ms 36.87%
chart 250255 ms 600670 ms 41.66% 265860 ms 677299 ms 39.25%
eclipse 1003766 ms 2180171 ms 46.04% 1041304 ms 2454685 ms 42.42%
fop 19114 ms 44072 ms 43.37% 20614 ms 49592 ms 41.57%
hsqldb 67514 ms 159739 ms 42.27% 76838 ms 186426 ms 41.22%
jython 184255 ms 445747 ms 41.34% 197455 ms 504520 ms 39.14%
luindex 317580 ms 726604 ms 43.71% 325140 ms 809468 ms 40.17%
lusearch 57484 ms 139343 ms 41.25% 61858 ms 158497 ms 39.03%
pmd 153715 ms 376361 ms 40.84% 164771 ms 430127 ms 38.31%
xalan 69368 ms 171061 ms 40.55% 75989 ms 196171 ms 38.74%

 

Table 2: Mixed mode execution (-Xmixed) on Solaris/Sparc
  32 bit 64 bit
  Template
Interpreter
C++
Interpreter
(Tmpl*100)
C++
Template
Interpreter
C++
Interpreter
(Tmpl*100)
C++
antlr 37962 ms 39326 ms 96.53% 37339 ms 45151 ms 82.70%
bloat 12018 ms 24324 ms 49.41% 13403 ms 29218 ms 45.87%
chart 14344 ms 17339 ms 82.73% 16610 ms 20054 ms 82.83%
eclipse 139999 ms 172798 ms 81.02% 154389 ms 195541 ms 78.95%
fop 3036 ms 3700 ms 82.05% 3382 ms 4018 ms 84.17%
hsqldb 11258 ms 15007 ms 75.02% 16359 ms 20612 ms 79.37%
jython 9792 ms 15659 ms 62.53% 11562 ms 18601 ms 62.16%
luindex 80190 ms 83652 ms 95.86% 82075 ms 86279 ms 95.13%
lusearch 6692 ms 8671 ms 77.18% 7731 ms 9742 ms 79.36%
pmd 11364 ms 16937 ms 67.10% 17218 ms 23836 ms 72.24%
xalan 7901 ms 9768 ms 80.89% 10517 ms 13019 ms 80.78%

 

Table 3: Interpreted and mixed mode execution on Linux/x86
  Interpreted execution (-Xint) Mixed mode execution (-Xmixed)
  Template
Interpreter
C++
Interpreter
(Tmpl*100)
C++
Template
Interpreter
C++
Interpreter
(Tmpl*100)
C++
antlr 58452 ms 107494 ms 54.38% 31660 ms 35035 ms 90.37%
bloat 136235 ms 335865 ms 40.56% 6201 ms 17728 ms 34.98%
chart 90805 ms 209499 ms 43.34% 7574 ms 11154 ms 67.90%
fop 8381 ms 19088 ms 43.91% 1489 ms 1956 ms 76.12%
hsqldb 32907 ms 68857 ms 47.79% 4629 ms 7192 ms 64.36%
jython 83621 ms 188785 ms 44.29% 4403 ms 8259 ms 53.31%
luindex 161362 ms 344860 ms 46.79% 67150 ms 73282 ms 91.63%
lusearch 33548 ms 86230 ms 38.91% 4425 ms 7198 ms 61.48%
pmd 69562 ms 161983 ms 42.94% 5574 ms 9899 ms 56.31%
xalan 49219 ms 115101 ms 42.76% 5335 ms 7449 ms 71.62%

 

Although the numbers should be treated with some caution because of some possible measurements inaccuracies, all in all the results could be interpreted as follows. In interpreter mode (-Xint) the performance of the C++ interpreter varies between 35 and 50 percent of the performance of the template interpreter. In mixed mode (-Xmixed) a VM that runs with the C++ interpreter reaches from 45 up to 90 percent of the performance of a VM that runs with the template interpreter. The still sometimes huge differences between a VM with template versus one with C++ interpreter in mixed mode, where most of the "hot" code should be compiled anyway, may be in part explained by the lack of interpreter profiling in the C++ interpreter (the C++ interpreter runs with -XX:-ProfileInterpreter). This may lead to less optimal code generation but more details have to be further evaluated.

If you want to get more information about the current status of the C++ interpreter, you should probably follow the C++ Interpreter threads on the OpenJDK Hotspot mailing list. You can also read Gary Bensons online diary. There he writes about his experience of porting the OpenJDK to PowerPC using the C++ interpreter.

Related Topics >>