Skip to main content

The backslash of ResourceBundle

Posted by felipegaucho on July 16, 2007 at 1:48 AM PDT

Few days ago I received a simple Java task: to take a comma-separated values (CSV) file and to create a Properties file for every column. The content of CSV file was delimited by semicolon; and the data was typed as the example below. Notice that first token of each line in the csv file represents a key and the other tokens should be split in different files - one for each language.

Original CSV file

Expected generated files

text.csv

text_en_US.properties

text_pt_BR.properties

key;pt_BR;en_US

person.name;nome;name:

person.birthday;aniversario:;birthday:;

person.salary;salario=;salary=;

person.name=name:

person.birthday=birthday:

person.salary=salary=

person.name=nome

person.birthday=aniversario:

person.salary=salario=

To solve this problem, I decided to create an ANT task with the following algorithm:

  1. Reading the csv file with a CSV Parser
  2. Creating a java.util.Properties for each language defined in the first row.
  3. For each row of csv contents (ignoring the header)
    • reading the first column as key
    • for the other columns
  4. Invoking the store(OutputStream out, String comments) in all Properties files.

Quite simple, and it worked since the very first time I run it. But when I checked the generated files, I got a surprise: the generated file included some unexpected backslashes before some characters. Checking the API documentation, I got other surprises about the store(..) method:

  • The stream is written using the ISO 8859-1 character encoding.
    • SURPRISE: you cannot change the charset of the output :(
  • For each entry the key string is written, then an ASCII =, then the associated element string. Each character of the key and element strings is examined to see whether it should be rendered as an escape sequence. The ASCII characters \, tab, form feed, newline, and carriage return are written as \\, \t, \f \n, and \r, respectively. Characters less than \u0020 and characters greater than \u007E are written as \uxxxx for the appropriate hexadecimal value xxxx. For the key, all space characters are written with a preceding \ character. For the element, leading space characters, but not embedded or trailing space characters, are written with a preceding \ character. The key and element characters #, !, =, and : are written with a preceding backslash to ensure that they are properly loaded.
    • SURPRISE: when you store a Properties object in a file, several transformations are being done in order to escape special characters.

The escape sequence rendering is well documented since Java 1.2, then I assume it was my fault not to know about that, but wait: I am developing software in Java for more than ten years now, and I really don't remember i18n files generated with backslashes - so I decided to ask about that in mailing lists. My suspicious was confirmed, almost nobody really knows about these minor tricks in internationalization Properties. Going deep in the code of Java I figured out the reason of this unconsciousness: the method getBundle(String baseName) of the class ResourceBundle supports special characters with or without the backslash, and since the most part of i18n files are created by hand, we never remember the minor details.

From my original example, we have the following output:

Original CSV file

Properties.store(...) generated files

text.csv

text_en_US.properties

text_pt_BR.properties

key;pt_BR;en_US

person.name;nome:;name:

person.birthday;aniversario:;birthday:;

person.salary;salario=;salary=;

person.name=name\:

person.birthday=birthday\:

person.salary=salary\=

person.name=nome\:

person.birthday=aniversario\:

person.salary=salario\=

Interesting, a good reason to check the javadoc sometimes :)

Related Topics >>