Skip to main content

Using Tatoo as front end of javac

Posted by forax on December 7, 2008 at 8:28 AM PST

The OpenJDK compiler grammar project provides a way to use a ANTLR parser as front end of javac.


As you perhaps already now, I am one of the core developers of Tatoo,
an inovative LR parser generator.
In order to demonstrate that Tatoo is a great parser generator tool,
let's do the same.

Here is JLS 1.0 grammar taken by Tatoo as input:
jls.ebnf
(things between curly braces are names used to associate semantics).
The semantics is specify as a class that implement an interface
generated from the input grammar:
TreeGrammarEvaluator.java

The whole prototype is on the
Tatoo SVN.
I've patched the compiler grammar's javac
(which is already a patch of javac)
a litte bit to be able to specify
a parser factory on the command line.
The following command compile Test.java using the
parser created with Tatoo as front end.

  java -cp classes:../../lib/tatoo-runtime.jar:lib/javac.jar
       com.sun.tools.javac.Main
       -XDparser=fr.umlv.tatoo.samples.java.javac.TatooParserFactory
       Test.java

Currently, I've only implemented a Java 1.0 grammar
but it's just a matter of time particularly because
Tatoo allows to specify multiple grammar versions
in the same grammar file.
The computation of element's position seems Ok and
the error recovery is pretty basic (as you will see below).

An example with an unknown type

class Test {
  public static void main(Stringz[] args) {
    System.out.println("Hello Tatoo");
  }
}

Output of javac+Tatoo

Test.java:2: cannot find symbol
symbol  : class Stringz
location: class Test
  public static void main(Stringz[] args) {
                          ^
1 error

Output of javac

Test.java:2: cannot find symbol
symbol  : class Stringz
location: class Test
  public static void main(Stringz[] args) {
                          ^
1 error

Ok, identical

An example with a grammar error

class Test {
  [] int[] foo;
  public static void main(String[] args) {
    System.out.println("Hello Tatoo");
  }
}

Output of javac+Tatoo

parse error on terminal null with stack 192,212,213,216,222,223, expected [rcurl, _boolean, _byte, _short, _char,
_int, _long, _float, _double, _void, _static, _synchronized, _abstract, _native, _final, _volatile, _transient,
_public, _private, _protected, identifier]
discarding character for lexer error recovery "[" (91)
...
discarding character for lexer error recovery "]" (93)

Output of javac

Test.java:2: illegal start of type
  [] int[] foo;
  ^
Test.java:2: ';' expected
  [] int[] foo;
   ^
2 errors

Ok, Tatoo's parser emits lot's of junk.
But if you take a look more closely, it basically
prints all the valid tokens at that place.
Also note that the hand written parser is able to print
that the error is an illegal starts of type, which is really
a meaningfull info.
I think it's possible to derive that kind of info
from the grammar but it will require extra work
to get this info at runtime.

Cheers,


Rémi

Related Topics >>