Skip to main content

Hacking javac

Posted by tball on September 16, 2006 at 3:55 PM PDT

hack (hăk) n., A non-obvious solution to an interesting problem.

This definition is on the front of a tee-shirt I have from O'Reilly Media to promote their Hack Series, which includes one of my favorite books, Swing Hacks by fellow bloggers Joshua Marinacci and Chris Adamson. The reason I like the Hack Series is that even for subjects you know fairly well, these books describe interesting solutions which I hadn't realized. It's that constant discovery of "non-obvious solutions" which has kept me so interested in programming over the years.

Take compilers, for example: they just compile source code into object code, right? Hacking them is generally frowned upon because as Ken Thompson discused in his ACM award acceptance speech, Reflections on Trusting Trust, they are a great place to hide a trojan horse. But if the compiler is designed to function as a tool library, as javac is, much more interesting (and benign) hacks are possible. Mustang (excuse me, Java 6) has three related API which all Java tool hackers should check out: JSR-199: JavaTM Compiler API, JSR 269: Pluggable Annotation Processing API, and the Tree API (com.sun.source.tree and com.sun.source.util).

The Compiler API is deceptively simple: it gives you the ability to programmatically invoke javac (actually, any tool that implements the Tool interface such that ServiceLoader can find it) and compile one or more source files to class files. That's nice if you are writing an appserver container, perhaps, but not very hack-inspiring, right?

As Robin Williams said in the Disney cartoon Aladdin (I have small kids), "Wrong! But thanks for playing." The first hack leverages JSR-199's DiagnosticListener interface, which lets you listen to the errors and warnings created by the compiler. NetBeans uses this technique to display errors while you are editing Java sources; compilation is run regularly in the background but only the diagnostic events are used. JSR-199 improves on the old trick of parsing error strings with Diagnostic instances, which provide accurate source position information, and locale-independent error IDs with locale-specific error text.

Still bored? How about creating your own scripting language? Rather than write a complicated intepreter, write a (hopefully) simpler compiler which outputs Java source files. Then use the Compiler API with a custom ClassLoader to dynamically load these classes on-the-fly, as if they were interpreted. Think the process is slow? The Jackpot rule language parser generates Java sources and uses this hack to compile them (look in $HOME/.jackpot). We used to conditionally compile scripts only if they had changed, but found that javac is so fast that caching didn't make a difference (we just keep the files for troubleshooting). If you don't want to write anything to disk, a related hack involves implementing the JavaFileManager interface to use memory instead of files -- javac doesn't care about files since it only uses streams supplied by whatever JavaFileManager implementation you provide.

JSR-269 is officially the "Pluggable Annotation Processing API", and while it does that very well it also enables lots of other hacks. A general way to think of JSR-269 is that it gives you access to all of the types and elements (symbols) in any set of source files, not just its annotations. Its javax.lang.model.util package has easily extendable visitor and scanner base classes which can be used by tools to inspect projects at the semantic level. One group of tools which can use this hack include error checkers that validate design rules, such as that any class which extends Object.equals() also extends Object.hashCode(). One advantage of using JSR-269 types and elements rather than your own parser for these sorts of tests is that while you can infer many properties from a parse tree, the javac semantic model knows the correct properties.

If you want to dig deeper than types and elements, the Tree API makes this sort of hacking simpler. Because the Tree API is not defined by a JSR, the JSR-199 and JSR-269 APIs cannot directly refer to it. So if you just look at those API, it seems impossible to inspect class members beyond their declarations. The semi-secret hack here is that javac's implementation returns a CompilationTask instance which is also an instance of com.sun.source.util.JavacTask. This class is much more interesting to tool hackers, since it give you access to the parse trees and all JSR-269 information, plus control over javac's execution. Don't need to generate class files? Just invoke JavacTask.analyze() instead of JavacTask.generate(). This class also provides access to the Types and Elements utility classes, which make hacking even easier. So if you know you are using javac as the tool provider, you can just cast the task instance to JavacTask and have fun.

These API provide a read-only model of Java source code, so it is difficult to modify source code programmatically (you can overright the files, of course, but comments and formattting get blown away). For hacking source files without angry mobs coming after you, you'll want the Jackpot API, which extends these API to provide model transforming and formatted source rewriting. These four API work together to define a toolkit to create just about any Java language-aware tool. What sort of "non-obvious solutions" can you create with them?

Related Topics >>