Skip to main content

Languages Evolution: introduction of new keywords

Posted by forax on October 9, 2006 at 2:22 AM PDT

When you want to add features to a language
without breaking backward compatibility,
a widespread idea that you can't add new keywords.

That is why we can currently see weird proposal in Java space
that try to reuse old keywords to express
new kind of abstraction, by example,
synchronized (closure v0.2 section 3) or (Neal Gafter blog about for).


Why introducing a new keyword breaks already written codes ?

When you specify a new keyword, you need to change the lexer to
recognize sequence of characters as a new token.
Thus the lexer doesn't recognize this sequence as an
identifier anymore.

One magic solution is to use a special character (or more) for
differenciate keyword from identifier.
Lot of scripting languages use '$', '#' etc. to tag variables,
Perl6
is the best example.


Scripting language use special caracters not only
to simplify the lexing process but to help their runtime system
to choose between overloaded operations.
So adding a new keyword is not a major problem
for those languages.

Java is a strong typed language so it doesn't need
such special characters and
we are stuck while we continue to see lexers as
lex.
The problem comes from the lexer, so the solution is to change
how lexer works.

Contextual keywords

Let me take an example, "enum" is a new keyword introduced in 1.5
to declare enumerated type.
So the lexer of an 1.5 compiler
now recognize "enum" as a keyword in the whole program.


But in fact, the "enum" that interests a language designer
is only needed to recognize "enum" as a keyword
in the case of a type declaration not in a block of code.

The solution is to use a lexer that implements contextual
keywords, i.e a lexer that let the parser activate or
not rules needed to recognize tokens depending on
the parser state.

enum Foo {                                 // keyword
  public static void main(String[]) {
    Enumeration enum=... // identifier
  }
}

With two colleagues, i've written a new Parser Generator
named Tatoo
that generates this kind of lexer.


The tutorial is in french at this time because we haven't
lot of time and by our students need it.
But a translation will be available soon.
Slides in PDF and an article from PPPJ'06
are available in english.

Tatoo contains other innovative features like grammar versioning,
full NIO support (push lexer/parser),
lexing without unicode decoding, AST generator.
I will blog about those features later.

Related Topics >>