The Source for Java Technology Collaboration
User: Password:
Register | Login help    

Search

Online Books:
java.net on MarkMail:


Using Tatoo as front end of javac

Posted by forax on December 7, 2008 at 8:28 AM PST

The OpenJDK compiler grammar project provides a way to use a ANTLR parser as front end of javac.
As you perhaps already now, I am one of the core developers of Tatoo, an inovative LR parser generator. In order to demonstrate that Tatoo is a great parser generator tool, let's do the same.

Here is JLS 1.0 grammar taken by Tatoo as input: jls.ebnf (things between curly braces are names used to associate semantics). The semantics is specify as a class that implement an interface generated from the input grammar: TreeGrammarEvaluator.java

The whole prototype is on the Tatoo SVN. I've patched the compiler grammar's javac (which is already a patch of javac) a litte bit to be able to specify a parser factory on the command line. The following command compile Test.java using the parser created with Tatoo as front end.

  java -cp classes:../../lib/tatoo-runtime.jar:lib/javac.jar
       com.sun.tools.javac.Main
       -XDparser=fr.umlv.tatoo.samples.java.javac.TatooParserFactory
       Test.java

Currently, I've only implemented a Java 1.0 grammar but it's just a matter of time particularly because Tatoo allows to specify multiple grammar versions in the same grammar file. The computation of element's position seems Ok and the error recovery is pretty basic (as you will see below).

An example with an unknown type

class Test {
  public static void main(Stringz[] args) {
    System.out.println("Hello Tatoo");
  }
}

Output of javac+Tatoo

Test.java:2: cannot find symbol
symbol  : class Stringz
location: class Test
  public static void main(Stringz[] args) {
                          ^
1 error

Output of javac

Test.java:2: cannot find symbol
symbol  : class Stringz
location: class Test
  public static void main(Stringz[] args) {
                          ^
1 error

Ok, identical

An example with a grammar error

class Test {
  [] int[] foo;
  public static void main(String[] args) {
    System.out.println("Hello Tatoo");
  }
}

Output of javac+Tatoo

parse error on terminal null with stack 192,212,213,216,222,223, expected [rcurl, _boolean, _byte, _short, _char,
 _int, _long, _float, _double, _void, _static, _synchronized, _abstract, _native, _final, _volatile, _transient,
 _public, _private, _protected, identifier]
discarding character for lexer error recovery "[" (91)
...
discarding character for lexer error recovery "]" (93)

Output of javac

Test.java:2: illegal start of type
  [] int[] foo;
  ^
Test.java:2: ';' expected
  [] int[] foo;
   ^
2 errors

Ok, Tatoo's parser emits lot's of junk. But if you take a look more closely, it basically prints all the valid tokens at that place. Also note that the hand written parser is able to print that the error is an illegal starts of type, which is really a meaningfull info. I think it's possible to derive that kind of info from the grammar but it will require extra work to get this info at runtime.

Cheers,
Rémi

Related Topics >> Open JDK      
Comments
Comments are listed in date ascending order (oldest first)