Skip to main content

Compiling the HotSpot VM with Clang

Posted by simonis on February 10, 2011 at 9:22 AM PST

Updated Feb. 22nd 2011: after some very good feedback from (among others) Mark Wielaard, Florian Weimer, Roman Divacky and Chris Lattner himself, I decided to re-run my tests with new compiler versions (Clang trunk rev. 125563 and GCC 4.5.2) and an improved Clang configuration which now finally fully enables precompiled header support for the Clang build.

At the FOSEDM 2011 I've heared Chris Lattner's very nice "LLVM and Clang" keynote. The claims he made in his talk have been very impressing: he was speaking about Clang being a "production quality" "drop-in replacement" for GCC with superior code generation and improved compile speed. Already during the talk I decided that I would be interesting to prove his pretensions on the HotSpot VM which in generally is not known as the worlds most simple C++ project. Following you can find my experiences with Clang and a small Clang patch new Clang patch for the OpenJDK if you want to do some experiments with Clang yourself.

GCC compatibility

GCC is the standard C/C++ compiler on Linux and available on virtually any Unix platform. Any serious challenger should therefore have at least a GCC compatibility mode to ease its adoption. Clang pretends to be fully GCC compatible, so I just created a new Clang configuration by changing some files and creating some new, Clang specific ones from their corresponding GCC counterparts:

> hg status -ma
M make/linux/makefiles/buildtree.make
M src/os_cpu/linux_x86/vm/os_linux_x86.cpp
M src/share/vm/adlc/output_c.cpp
M src/share/vm/utilities/globalDefinitions.hpp
A make/linux/makefiles/clang.make
A make/linux/platform_amd64.clang
A src/share/vm/utilities/globalDefinitions_clang.hpp

and started a new build (for a general description of the HotSpot build process see either the README-builds file or the more detailed but slightly outdated explanation in my previous blog):

> ALT_BOOTDIR=/share/software/Java/jdk1.6.0_20 \
  ALT_OUTPUTDIR=../output_x86_64_clang_dbg \
  make jvmg USE_CLANG=true

One of the very first observations is the real HUGE amount of warnings issued by the compiler. Don't get me wrong here - I really regard this as being a major feature of Clang, especially the clear and well-arranged fashion in which the warnings are presented (e.g. syntax colored, with macros nicely expanded). But for the current HotSpot code base this is really too much. Especially the issue "6889002: CHECK macros in return constructs lead to unreachable code" leads to a bunch of repeated warnings for every single compilation unit which make the compilation output nearly unreadable. So before I started to eliminate the warnings step by step I decided to turn the warnings off all together in order to get a first impression of the overall compatibility and performance:

> ALT_BOOTDIR=/share/software/Java/jdk1.6.0_20 \
  ALT_OUTPUTDIR=../output_x86_64_clang_dbg \

Except for the -fcheck-new option, Clang seems to understand all the other compiler options used during the HotSpot build process. For -fcheck-new a warning is issued advertising that the option will be ignored. So I just removed it from make/linux/makefiles/clang.make. I have also removed obvious workarounds for some older GCC versions in the new Clang files which were derived from their corresponding GCC counterparts. The following compiler options have been used in the dbg and opt build respectively:

dbg-options: -fPIC -fno-rtti -fno-exceptions -m64 -pipe -fno-omit-frame-pointer -g -MMD -MP -MF
opt-options: -fPIC -fno-rtti -fno-exceptions -m64 -pipe -fno-omit-frame-pointer -O3 -fno-strict-aliasing -MMD -MP -MF

Besides this, I only had to change the source code of two files to make the HotSpot compilable by Clang. The first change was necessary only because the ADLC part of the make does not honor the general warning settings of the HotSpot build and always runs with -Werror. Here's the small patch which prevents a warning because of an assignment being used as a Boolean value:

-- a/src/share/vm/adlc/output_c.cpp    Tue Nov 23 13:22:55 2010 -0800
+++ b/src/share/vm/adlc/output_c.cpp    Wed Feb 09 16:39:30 2011 +0100
@@ -3661,7 +3661,7 @@
     // Insert operands that are not in match-rule.
     // Only insert a DEF if the do_care flag is set
-    while ( comp = comp_list.post_match_iter() ) {
+    while ( (comp = comp_list.post_match_iter()) ) {
       // Check if we don't care about DEFs or KILLs that are not USEs
       if ( dont_care && (! comp->isa(Component::USE)) ) {

Updated Feb. 22nd 2011: I decided to leave the file output_c.cpp untouched and instead change the ADLC make file adlc.make to use the same warning flags like the main HotSpot make instead of using -Werrer.

--- a/make/linux/makefiles/adlc.make    Wed Feb 16 11:24:17 2011 +0100
+++ b/make/linux/makefiles/adlc.make    Tue Feb 22 12:59:37 2011 +0100
@@ -60,7 +60,7 @@

# CFLAGS_WARN holds compiler options to suppress/enable warnings.
# Compiler warnings are treated as errors
-CFLAGS_WARN = -Werror


The second change was necessary because of a strange inline assembler syntax which was used to assign the value of a register directly to a variable:

diff -r f95d63e2154a src/os_cpu/linux_x86/vm/os_linux_x86.cpp
--- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Tue Nov 23 13:22:55 2010 -0800
+++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Wed Feb 09 16:45:40 2011 +0100
@@ -101,6 +101,10 @@
   register void *esp;
   __asm__("mov %%"SPELL_REG_SP", %0":"=r"(esp));
   return (address) ((char*)esp + sizeof(long)*2);
+#elif CLANG
+  intptr_t* esp;
+  __asm__ __volatile__ ("movq %%"SPELL_REG_SP", %0":"=r"(esp):);
+  return (address) esp;
   register void *esp __asm__ (SPELL_REG_SP);
   return (address) esp;
@@ -183,6 +187,9 @@
   register intptr_t **ebp;
   __asm__("mov %%"SPELL_REG_FP", %0":"=r"(ebp));
+#elif CLANG
+  intptr_t **ebp;
+  __asm__ __volatile__ ("movq %%"SPELL_REG_FP", %0":"=r"(ebp):);
   register intptr_t **ebp __asm__ (SPELL_REG_FP);

Updated Feb. 22nd 2011: to compile the newest HotSpot tip revision another small change was necessary to overcome a problem with a method name look-up of an non-dependent method name in dependent base classes (see M. Cline's C++ FAQ 35.19 for a nice explanation). This was wrongly accepted by GCC (see GCC bug 47752) but it will be correctly rejected by Clang. The problem is tracked as bug 7019689 and will be hopefully fixed soon in the HotSpot code base:

diff -r 55b9f498dbce -r c83e921b1bf7 src/share/vm/utilities/hashtable.hpp
--- a/src/share/vm/utilities/hashtable.hpp      Thu Feb 10 16:24:29 2011 -0800
+++ b/src/share/vm/utilities/hashtable.hpp      Wed Feb 16 11:09:16 2011 +0100
@@ -276,7 +276,7 @@

   int index_for(Symbol* name, Handle loader) {
-    return hash_to_index(compute_hash(name, loader));
+    return this->hash_to_index(compute_hash(name, loader));

In summary, the overall compatibility can be rated as very good. Taking into account that the newly build VM could successfully run the SPECjbb2005 * benchmark it seems that also the code generation went mostly well although more in depth tests are probably required to ensure full correctness (well - at least the same level of correctness known from GCC).

Compilation performance and code size

After the build succeeded, I started to do some benchmarking. I measured the time needed for full debug and opt builds with one and three parallel build threads respectively. As you can see in table 1, the results are very clear: Clang 2.8 is always significantly (between two and three times) slower than GCC 4.4.3:

Table 1: Resulting code size and user (wall) time for a complete HotSpot server (C2) build compared to GCC 4.4.3
GCC 4.4.3 1 GCC 4.5.2 1 Clang 2.8 2 Clang trunk 3 Clang trunk 4 GCC 4.4.3 1 GCC 4.5.2 1 Clang 2.8 2 Clang trunk 3 Clang trunk 4
dbg 4m46s 4m38s 97% 16m40s 349% 7m42s 162% 4m07s 86% 3m17s 3m05s 94% 9m41s 294% 4m54s 149% 2m48s 85%
opt 5m04s 4m55s 97% 10m45s 212% 3m10s 63%   3m05s 3m03s 99% 6m12s 201% 2m01s 65% size5  
dbg 135Mb 122Mb 90% 797Mb 591% 798Mb 591% 306Mb 228%          
opt 12Mb 13Mb 103% 12Mb 98% 12Mb 95%            

Honestly speaking these numbers where somehow disappointing for me - especially after Chris Lattner's talk at FOSDEM. I haven't done a more in depth research of the reasons but I suspect the shiny results presented at the conference are mainly based on the fact that they focus more on Objective-C than on C++ and they have been measured against older 4.0 and 4.2 version of GCC. This assumption was also confirmed after looking at the Clang Performance page.

Updated Feb. 22nd 2011: I've written the previous paragraph under the impression of my first measurements. It turned out however, that the Clang build was not using precompiled headers properly. This is because Clang is not fully GCC compatible with respect to precompiled header files. GCC transparently searches for a precompiled version of directly included header files whereas Clang only considers a precompiled version for headers which are included explicitly on the command line as prefix headers with the -include option (see the Precompiled Headers section of the Clang Users Manual). The HotSpot project uses a precompiled header file which is directly included in most of the source files, but for the reasons just mentioned, this has no effect with Clang - it just uses the bare header file instead of the precompiled version.

To successfully enable PCH support for Clang, I had to change the Clang configuration such that it emits corresponding "-include precompiled.hpp" compiler flags for the files (and only for them) which include precompiled.hpp directly. This didn't work correctly with Clang 2.8, where it led to strange errors during compilation, but with a brand new trunk version from SVN (rev. 125563) the problems were gone. As you can see in the columns labeled "Clang trunk3", this roughly doubled the compilation speed in the debug build and made the opt build more than tree times faster! Compared to GCC 4.4.3, this still ranks Clang at about 150% for the debug build, but already for the opt build Clang now considerably outperforms GCC and uses only 65% of the time required by GCC for the full build.

Another point that concerned me during the first measurements was the size of the resulting shared library. While the size was basically the same for the opt build, the Clang debug build produces a huge, ~700MB file which is nearly seven times larger compared to the results produced by GCC. I haven't looked into this deeper either - perhaps some Clang/LLVM wizard can comment on this topic? It turned out that this was a known problem which can be partially worked around by using the -flimit-debug-info flag. As you can see in the columns labeled "Clang trunk4" this not only reduces the size of the resulting shared library by about 50%, it also makes the debug build up to 15% faster compered to the corresponding GCC build.

Runtime performance

After I had successfully compiled the HotSpot I decided to run some benchmarks to see what the code quality of the Clang generated HotSpot is. Because I know that for the SPEC JVM98 benchmark the VM spents most of the time (about ~98% if we have a proper warm-up phase) in compiled code, I decided to use SPECjbb2005 * which at least does a lot of garbage collection and the GC is implemented in C++ in the HotSpot VM.

For the tests I used an early access version of JDK 7 (b122) with a recent HotSpot 20 from The exact version of the JDK I used is:

> java -version
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b122)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b03, mixed mode)
> java -Xinternalversion
Java HotSpot(TM) 64-Bit Server VM (20.0-b03) for linux-amd64 JRE (1.7.0-ea-b122),
built on Dec 16 2010 01:03:29 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)

As you can see, the original HotSpot was compiled with GCC 4.3.0 while I used 4.4.3 on my local machine. The SPECjbb2005 benchmark was configured to use 16 warehouses. I have compared the scores of the two versions compiled by me with GCC and Clang respectively with the score achieved by the original HotSpot version from the early access binary package:

Table 1: SPECjbb2005 score
  JDK 1.7.0-ea-b122, HotSpot 64-Bit Server VM (20.0-b03)
GCC 4.3.0 GCC 4.4.3 GCC 4.5.2 Clang 2.8 Clang trunk
108997 100% 108792 100% 109494 100% 104612 96% 103808 95%

Again, the Clang compiled code loses against its GCC counterpart. It is approximately 4% slower. One feature which was actively promoted at the FOSDEM presentation was link time optimization (LTO). Unfortunately I couldn't get this running with Clang 2.8 on my Linux box. I searched the web a bit and found the following interesting blog: "Using LLVM's link-time optimization on Ubuntu Karmic". However, it only describes how to get LTO working with llvm-gcc, which is a GCC front end based on LLVM. Clang itself only seems to support LTO on MacOS X out of the box.

Updated Feb. 22nd 2011: I also did performance measurements for the two new compiler version, but here the results didn't changed significantly, so I just add the new numbers here for reference. (Notice that the results oscillated +/-1% during benchmarking, so the actual differences shouldn't be taken too seriously.)


Updated Feb. 22nd 2011: While the overall GCC compatibility is excellent the Clang compile times and and the compile times are impressive, the performance of the generated code is still lacking behind a recent GCC version. Nevertheless, Clang has an excellent C/C++ front end which produces very comprehensive warnings and error messages. If you are developing macro intensive C or heavily templateized C++ code, this feature alone can save you much more time than you loose trough longer compile times. Taking into consideration Clangs nice design and architecture and the fact that it must still be considered quite new, I think it may become a serious challenger for the good old GCC in the future.


Please note that the SPECjbb2005 results published on this page come from non-compliant benchmark runs and should be considered as published under the "Research and Academic Usage" paragraph of the "SPECjbb2005 Run and Reporting Rules"

Related Topics >>


Interesting. Yep, the way precompiled headres are done in ...

Interesting. Yep, the way precompiled headres are done in hotspot causes troubles for every compiler and I had my troubles with Sun Studio. I think Sun Studio still has limitations on where pch files can be located... anyway, interesting numbers for clang, given that jit matters a lot more than C compiler for java benchmarks in a long run those 5% seems like a big difference. Thanks for this write up