The Source for Java Technology Collaboration
User: Password:



Tom Ball

Tom Ball's Blog

Hacking javac

Posted by tball on September 16, 2006 at 03:55 PM | Comments (20)

hack (hăk) n., A non-obvious solution to an interesting problem.

This definition is on the front of a tee-shirt I have from O'Reilly Media to promote their Hack Series, which includes one of my favorite books, Swing Hacks by fellow java.net bloggers Joshua Marinacci and Chris Adamson. The reason I like the Hack Series is that even for subjects you know fairly well, these books describe interesting solutions which I hadn't realized. It's that constant discovery of "non-obvious solutions" which has kept me so interested in programming over the years.

Take compilers, for example: they just compile source code into object code, right? Hacking them is generally frowned upon because as Ken Thompson discused in his ACM award acceptance speech, Reflections on Trusting Trust, they are a great place to hide a trojan horse. But if the compiler is designed to function as a tool library, as javac is, much more interesting (and benign) hacks are possible. Mustang (excuse me, Java 6) has three related API which all Java tool hackers should check out: JSR-199: JavaTM Compiler API, JSR 269: Pluggable Annotation Processing API, and the Tree API (com.sun.source.tree and com.sun.source.util).

The Compiler API is deceptively simple: it gives you the ability to programmatically invoke javac (actually, any tool that implements the Tool interface such that ServiceLoader can find it) and compile one or more source files to class files. That's nice if you are writing an appserver container, perhaps, but not very hack-inspiring, right?

As Robin Williams said in the Disney cartoon Aladdin (I have small kids), "Wrong! But thanks for playing." The first hack leverages JSR-199's DiagnosticListener interface, which lets you listen to the errors and warnings created by the compiler. NetBeans uses this technique to display errors while you are editing Java sources; compilation is run regularly in the background but only the diagnostic events are used. JSR-199 improves on the old trick of parsing error strings with Diagnostic instances, which provide accurate source position information, and locale-independent error IDs with locale-specific error text.

Still bored? How about creating your own scripting language? Rather than write a complicated intepreter, write a (hopefully) simpler compiler which outputs Java source files. Then use the Compiler API with a custom ClassLoader to dynamically load these classes on-the-fly, as if they were interpreted. Think the process is slow? The Jackpot rule language parser generates Java sources and uses this hack to compile them (look in $HOME/.jackpot). We used to conditionally compile scripts only if they had changed, but found that javac is so fast that caching didn't make a difference (we just keep the files for troubleshooting). If you don't want to write anything to disk, a related hack involves implementing the JavaFileManager interface to use memory instead of files -- javac doesn't care about files since it only uses streams supplied by whatever JavaFileManager implementation you provide.

JSR-269 is officially the "Pluggable Annotation Processing API", and while it does that very well it also enables lots of other hacks. A general way to think of JSR-269 is that it gives you access to all of the types and elements (symbols) in any set of source files, not just its annotations. Its javax.lang.model.util package has easily extendable visitor and scanner base classes which can be used by tools to inspect projects at the semantic level. One group of tools which can use this hack include error checkers that validate design rules, such as that any class which extends Object.equals() also extends Object.hashCode(). One advantage of using JSR-269 types and elements rather than your own parser for these sorts of tests is that while you can infer many properties from a parse tree, the javac semantic model knows the correct properties.

If you want to dig deeper than types and elements, the Tree API makes this sort of hacking simpler. Because the Tree API is not defined by a JSR, the JSR-199 and JSR-269 APIs cannot directly refer to it. So if you just look at those API, it seems impossible to inspect class members beyond their declarations. The semi-secret hack here is that javac's javax.tools.JavaCompiler implementation returns a CompilationTask instance which is also an instance of com.sun.source.util.JavacTask. This class is much more interesting to tool hackers, since it give you access to the parse trees and all JSR-269 information, plus control over javac's execution. Don't need to generate class files? Just invoke JavacTask.analyze() instead of JavacTask.generate(). This class also provides access to the Types and Elements utility classes, which make hacking even easier. So if you know you are using javac as the tool provider, you can just cast the task instance to JavacTask and have fun.

These API provide a read-only model of Java source code, so it is difficult to modify source code programmatically (you can overright the files, of course, but comments and formattting get blown away). For hacking source files without angry mobs coming after you, you'll want the Jackpot API, which extends these API to provide model transforming and formatted source rewriting. These four API work together to define a toolkit to create just about any Java language-aware tool. What sort of "non-obvious solutions" can you create with them?


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Tom: I've been wondering for a long time now if it wouldn't be possible to plug in a compiler for another JVM-compatible language, such that references from a Java source file to a class in that other language would be resolved by deferring to the second compiler. For example, Rhino has a bytecode compiler--would it be possible using these new APIs that a reference to a class written in Rhino be compiled on-the-fly by having Javac defer to the Rhino compiler when the Rhino class needs to be compiled as a dependency? Hope this isn't too silly a question. Regards, Patrick

    Posted by: pdoubleya on September 17, 2006 at 11:55 PM

  • It's not a silly question. The hack is to define a javax.tools.JavaFileManager so that its getJavaFileForInput() method invokes Rhino (or any other JVM-based language which supports classfile generation) when a JavaFileObject.Kind.CLASS type FileObject instance is requested.

    Posted by: tball on September 18, 2006 at 08:13 AM

  • Tom, would Jackpot be a reasonable starting point for a major refactoring of JavaDoc? I had to do some evil things to get at javadoc's URLs. Thanks, Dave

    Posted by: dwalend on September 18, 2006 at 10:32 AM

  • I'm not sure I follow: the javac and Jackpot APIs are for inspecting and changing Java source files. Do you mean refactoring JavaDoc's sources? If so, my guess is that they will be open-sourced along with javac since the two share so much code. If you want to change the standard JavaDoc generation or the files it generates, I found it best to modify the standard doclet source rather than the tool or its output.

    Posted by: tball on September 19, 2006 at 07:19 AM

  • Note to self: tools are packaged in /lib/tools.jar (took me awhile :). Patrick

    Posted by: pdoubleya on September 20, 2006 at 11:31 AM

  • Continuing on Patrick's questions, would it be possible just to provide compiled class files as needed on the fly? Say if one prefers to generate class files directly. I agree this seems nice to have a standard way of connecting javac to other languages so that one compile process could compile across multiple languages as needed (instead of the island of compiling one language at a time and having priorly compiled dependencies sitting in separate jar files - so more opportunities for piecemeal transition to "Java Mark II" if one wants to).

    Posted by: tompalmer on September 20, 2006 at 11:46 AM

  • Java has some verbosities and annoyances that the "agile" suite of jvm languages (jruby, jython, groovy, ...) attempt to remove. But it's always seemed to me that a preprocessor of some sort could fix them.

    Examples include:

    "raw" strings so \\ not needed in regexp strings.
    Multi-line strings (''' and """) delimited, possibly with substitutions.
    Foo foo = new Foo(a,b,c) -> Foo foo = new(a,b,c) [i.e. infer type for common cases somehow]
    Improved literals: Map m = ["foo":1, "bar":77];
    functors: ArrayList al = blist.map(MAP(b){b.foo()+1})

    .. where the last is some sort of template for our anonymous class stunts like comparitor and so on.

    AList al = blist.map(new Mapper(){
    Base map(Base b) {
    b.foo()+1;
    }
    })


    Do you think these tools you mention are up to that sort of task? Do you know any tools that might be? I really like java a bunch, but boy do I swear at the awkwardnesses. I suppose that's why we build IDEs!

        -- Owen

    Owen Densmore    505-988-3787 http://backspaces.net
    Redfish Group:   505-995-0206 http://redfish.com  http://friam.org/

    Posted by: backspaces on September 20, 2006 at 02:34 PM

  • Compiling classes all the time is very possible; I suspect the scripting project (https://scripting.dev.java.net/) uses this technique to make javac into a JSR-223 compatible script engine.

    As for language changes, I strongly prefer using other languages to preprocessing extended Java source -- adding to Java makes it less accessible to newcomers, while a smaller, domain-specific language might be more accessible. The number of alternative languages on the JVM is amazing; check out http://www.robert-tolksdorf.de/vmlanguages for a fairly complete list. Many engineers share your desire for a more elegant language (even if there are conflicting definitions of elegance!), so creating a new language is much easier these days when the JVM is used as the runtime. Check out the "Various OO Languages" section for some interesting alternatives to Java.

    Posted by: tball on September 20, 2006 at 10:25 PM

  • Tom: this was actually the idea I had, that one could start by coding a project in Groovy or JavaScript, then as needed introduce Java classes, and all have them compiled within one process. The way I see it, when a Java class is compiled, the compiler wants to verify that a reference to another class or class member is valid according to the JLS. If we can trap that check, so that instead of trying to load/compile the Java class referenced, we actually compile the Groovy or JS class and return that bytestream, we seem to have what we need. I want to play around with this new API and see how difficult it is in practice, also see what happens if Foo.js references other JS that is not strongly typed at all--whether it can all be compiled and bundled up together. Regards, Patrick

    Posted by: pdoubleya on September 21, 2006 at 01:51 AM

  • Tom: Just found this:

    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6389769

    I wonder how many of these could be implemented using these new facilities?

    Owen

    Posted by: backspaces on September 21, 2006 at 09:52 AM

  • Hi Tom:
    It all sounds great, but the question is what libs do I need to use it? I am running NB 5.5 Beta 2 and Mustang Beta 2 (build 86). I installed the Jackpot and Jackpot Dev modules. I've added the Java Tree API to my project. Then I call getTask on the compiler and cast the result to a com.sun.source.util.JavacTask. However, when I try to use the Iterable returned from analyze, I get a compiler error saying that analyze's return type is void.

    --Phil

    Posted by: pventura on September 23, 2006 at 12:54 PM

  • I get a compiler error saying that analyze's return type is void.

    You'll want the latest Mustang build, since there have been many API changes to these API since beta 2, based on last-minute expert group feedback. On the surface one could reasonably assume that a beta is higher quality than a weekly build, but the post-beta builds continue to improve in quality so they are safer to work with.

    Posted by: tball on September 23, 2006 at 01:20 PM

  • Hi Tom! I am using jsr199 and jsr269 for (yet another) DBC implementation in java and love this post. But now I am stuck with two questions you might answer.
    I am wondering, from a maintainability point of view, what it means to me that the Tree-API isn't a javax-package?
    Further, I tried to resolve type-bindings for my parse tree which did not work. However, I am not even sure if JavacTask.getTypeMirror is the right place to start with....

    cheers, johannes

    Posted by: riejo on November 09, 2006 at 05:21 AM

  • The Tree API is considered public but not part of the Java specification, like other tools APIs such as javadoc's doclet API and the debugger API (JPDA). These API are available as part of Sun's JDK, but are not part of the official Java platform and therefore may not be part of other vendors' Java implementations. History has shown that the JDK tools group considers backwards compatibility very important when enhancing their APIs, so third-party tools shouldn't break between releases without plenty of warning (like the original debugger interface being replaced in 1.2).

    To get the type associated with a tree, one way is to get its tree-path using com.sun.source.util.Trees.getPath(), and pass that tto Trees.getTypeMirror(). Not all trees have types (such as statements and blocks), so be prepared to handle a null return. An easier way to get the type for a tree that has an associated element (class, variable, and method trees) is to use "Trees.getElement(tree).asType();".

    Posted by: tball on November 09, 2006 at 03:00 PM

  • Hi Tom.
    I wonder if an annotation processor can figure out what classpath elements have been used to launch javac?
    E.g:
    javac -cp.:my.processor.jar:../random.jar foo/Bar.java

    What I need is .:my.processor.jar:../random.jar in my processing environment. Is there a way to achive this? Am I missing something out because I am still stuck with apple's preview release of Java 6?

    Cheers and Happy Holidays, Johannes

    Posted by: riejo on December 22, 2006 at 06:24 AM

  • I'm not an annotation processing expert. If the -cp or -classpath options are not in the map returned by ProcessingEnvironment.getOptions(), I think you are out of luck getting the actual option text. You should still be able to load any files or classes in your jar files, however. From your annotation processor, try running "getClass().getClassLoader().getResourceAsStream()" to open an input stream for any jar file entry. That's the theory, anyway.

    Posted by: tball on December 22, 2006 at 11:27 AM

  • Is there a way to just compile one java file at a time, the way InstantJ does? I just want to look at one file at a time a find out what package it is and what classes it contains, and what interfaces those implement. Further along it may be even nice to find out more such introspection info.

    Posted by: bblfish on April 20, 2007 at 08:13 AM

  • Sure there is (that's how the NetBeans editor checks for syntax errors): have the file list passed to
    JavaCompiler.getTask() only hold that single file. If you only need the class and interface names (no type resolution), then calling JavacTask.parse() will only parse that file. If you need type resolution (JavacTask.analyze()), however, the JavaFileManager instance you use needs to return JavaFileObjects for the source and class paths.

    Posted by: tball on April 20, 2007 at 09:11 AM

  • Mhh thanks for the info Tom. I was using the analyze() method. It was clearly the parse() method I should have been using.
    I am trying to relate source .java files to the .class files they come from, as I explain in the thread Extracting class info from source, and relating them on the baetle mailing list. I now really understand the importance of the classpath in resolving names :-)

    But now that you mention it, I should in fact be able to get most of what I am looking for using JavacTask.parse() method. I should be able to relate .java source files to the .class files

    for all the top level classes in the source (easy concatenate packageName + className)
    for all the inner class files (as they are usually named packageName.className$InnerClass)
    as for the anonymous classes, I don't think one can rely on the numbering as being ordered. But one could guess how this goes the other way, that is starting from a list of my.package.Foo$1 class names one should be able to easily work out which sources, they came from, once one knows which top level .class files relate to which java files.


    The other solution, and probably the more solid one, would be to create a patched compiler called by ant or maven that would write all information out somewhere during compilation. And javax.tools.* would come in useful here again.

    I am afraid that may be a more work though....

    Posted by: bblfish on April 20, 2007 at 12:24 PM

  • >I wonder if an annotation processor can figure out
    >what classpath elements have been used to launch javac

    Wouldn't it be a simple matter of accessing the JMX to retrieve the classpath ? See:

    http://blogs.sun.com/jmxetc/entry/a_small_program_that_prints

    Posted by: hchar on June 06, 2007 at 01:41 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds