Skip to main content

Languages Evolution: introduction of new keywords

Posted by forax on October 9, 2006 at 5:22 AM EDT

When you want to add features to a language without breaking backward compatibility, a widespread idea that you can't add new keywords.

That is why we can currently see weird proposal in Java space that try to reuse old keywords to express new kind of abstraction, by example, synchronized (closure v0.2 section 3) or (Neal Gafter blog about for).

Why introducing a new keyword breaks already written codes ?

When you specify a new keyword, you need to change the lexer to recognize sequence of characters as a new token. Thus the lexer doesn't recognize this sequence as an identifier anymore.

One magic solution is to use a special character (or more) for differenciate keyword from identifier. Lot of scripting languages use '$', '#' etc. to tag variables, Perl6 is the best example.
Scripting language use special caracters not only to simplify the lexing process but to help their runtime system to choose between overloaded operations. So adding a new keyword is not a major problem for those languages.

Java is a strong typed language so it doesn't need such special characters and we are stuck while we continue to see lexers as lex. The problem comes from the lexer, so the solution is to change how lexer works.

Contextual keywords

Let me take an example, "enum" is a new keyword introduced in 1.5 to declare enumerated type. So the lexer of an 1.5 compiler now recognize "enum" as a keyword in the whole program.
But in fact, the "enum" that interests a language designer is only needed to recognize "enum" as a keyword in the case of a type declaration not in a block of code.

The solution is to use a lexer that implements contextual keywords, i.e a lexer that let the parser activate or not rules needed to recognize tokens depending on the parser state.

enum Foo {                                 // keyword
  public static void main(String[]) {
    Enumeration enum=... // identifier
  }
}

With two colleagues, i've written a new Parser Generator named Tatoo that generates this kind of lexer.
The tutorial is in french at this time because we haven't lot of time and by our students need it. But a translation will be available soon. Slides in PDF and an article from PPPJ'06 are available in english.

Tatoo contains other innovative features like grammar versioning, full NIO support (push lexer/parser), lexing without unicode decoding, AST generator. I will blog about those features later.

Related Topics >>