Skip to main content

Do Not Forget the Encoding Flag

Posted by gsporar on November 27, 2005 at 10:23 AM PST

I originally intended to write this blog entry in August, but I got busy.... Back in August I was working on my demos for NetBeans Day Beijing. I had used the nice visual editor in Project Matisse for building a simple name and address form. The strings are loaded from a .properties file. Since I wanted to do the demo in both English and simplified Chinese, I sent that .properties file to Jasper Liu over in Sun's Beijing office. He was kind enough to translate the strings for me.

The file he sent back to me was not immediately usable by a Java program. To display non-ASCII characters Java wants UTF-16 Unicode values specified as ASCII escape sequences; the file Jasper sent me contained characters encoded in UTF8. But that's not a problem - the JDK's native2ascii utility is designed specifically for this situation. So I fired it up and passed in these parameters:

native2ascii demo_from_jasper.properties demo_zh_CN.properties

I put the new demo_zh_CN.properties file onto the CLASSPATH and used command-line parameters to change the language to zh and the region to CN. I started up my program and got incorrect output. No Chinese characters displayed. Just gibberish. Hmmm... not what I wanted.

Years ago I worked on a fairly large project that had full internationalization support. One of our first customers ran the program in Spanish. And we had a simplified Chinese version. So I thought: "I know how to do this. I've done it before." But too much time had passed - I was overlooking a subtle issue. But what was it?

In my effort to find the answer I searched high and low on the internet. I made sure I had done everything on the checklists:

1. Operating system support for east Asian character sets.
2. Unicode escape sequences in the .properties file.
3. Append the appropriate language and region codes to the file name.
4. Set the locale correctly at run time.

But it still didn't work. What was especially frustrating was that I had just gotten a different application up and running in simplified Chinese a few days before. I had followed the same process with it: ask Jasper to translate the strings, convert the file to Unicode, etc. And it worked great.

In frustration, I called up a former coworker and asked for his advice. We had worked together on that application years ago and he had done more of the testing/debugging of the simplified Chinese version. Unfortunately, he was stumped.

Luckily, Jasper was able to figure out the problem: when I ran native2ascii I had omitted the -encoding flag. As I mentioned, the file he sent me contained UTF8. So to invoke native2ascii correctly, I should have used:

native2ascii -encoding UTF8 demo_from_jasper.properties demo_zh_CN.properties

Without that flag native2ascii uses a default encoding that is not UTF8.

What made this particularly embarrassing is that just a few days earlier I had correctly invoked native2ascii when converting the first .properties file that Jasper had sent me. The difference was that before running it the first time, I looked at the documentation. The second time I thought, "It hasn't been years since I did this - it was just the other day. I know what I'm doing!" Alas, such was not the case.