Using Tatoo as front end of javac
The OpenJDK compiler grammar project provides a way to use a ANTLR parser as front end of javac.
As you perhaps already now, I am one of the core developers of Tatoo,
an inovative LR parser generator.
In order to demonstrate that Tatoo is a great parser generator tool,
let's do the same.
Here is JLS 1.0 grammar taken by Tatoo as input: jls.ebnf (things between curly braces are names used to associate semantics). The semantics is specify as a class that implement an interface generated from the input grammar: TreeGrammarEvaluator.java
The whole prototype is on the Tatoo SVN. I've patched the compiler grammar's javac (which is already a patch of javac) a litte bit to be able to specify a parser factory on the command line. The following command compile Test.java using the parser created with Tatoo as front end.
java -cp classes:../../lib/tatoo-runtime.jar:lib/javac.jar
com.sun.tools.javac.Main
-XDparser=fr.umlv.tatoo.samples.java.javac.TatooParserFactory
Test.java
Currently, I've only implemented a Java 1.0 grammar
but it's just a matter of time particularly because
Tatoo allows to specify multiple grammar versions
in the same grammar file.
The computation of element's position seems Ok and
the error recovery is pretty basic (as you will see below).
An example with an unknown type
class Test {
public static void main(Stringz[] args) {
System.out.println("Hello Tatoo");
}
}
Output of javac+Tatoo
Test.java:2: cannot find symbol
symbol : class Stringz
location: class Test
public static void main(Stringz[] args) {
^
1 error
Output of javac
Test.java:2: cannot find symbol
symbol : class Stringz
location: class Test
public static void main(Stringz[] args) {
^
1 error
Ok, identical
An example with a grammar error
class Test {
[] int[] foo;
public static void main(String[] args) {
System.out.println("Hello Tatoo");
}
}
Output of javac+Tatoo
parse error on terminal null with stack 192,212,213,216,222,223, expected [rcurl, _boolean, _byte, _short, _char, _int, _long, _float, _double, _void, _static, _synchronized, _abstract, _native, _final, _volatile, _transient, _public, _private, _protected, identifier] discarding character for lexer error recovery "[" (91) ... discarding character for lexer error recovery "]" (93)
Output of javac
Test.java:2: illegal start of type [] int[] foo; ^ Test.java:2: ';' expected [] int[] foo; ^ 2 errors
Ok, Tatoo's parser emits lot's of junk. But if you take a look more closely, it basically prints all the valid tokens at that place. Also note that the hand written parser is able to print that the error is an illegal starts of type, which is really a meaningfull info. I think it's possible to derive that kind of info from the grammar but it will require extra work to get this info at runtime.
Cheers,
Rémi
- Login or register to post comments
- Printer-friendly version
- forax's blog
- 1535 reads





