Skip to main content

Changing project encodings in NetBeans 6.5 M1

Posted by joconner on August 4, 2008 at 10:11 PM PDT

I reported that NetBeans 6.1's project charset encoding feature would allow an unsuspecting user to destroy file data. That's still true...through no fault of NetBeans really. It's just a matter of fact -- if you start out with UTF-8 and convert your project files to ASCII or ISO-8859-1 or any other subset of Unicode, you will lose any characters that are not also in the target charset.

But NetBeans isn't going to let you hang yourself, at least not without warning you first. NetBeans 6.5M1 has added a warning dialog that alerts you when you change from the default UTF-8 encoding. Now instead of blindly following your request to change charsets, NetBeans will tell you the following:

And, of course, you then have the option to cancel your setting before saving it. Good, very good.

Just out of curiosity, I tried the same thing in Eclipse. When I tried to save, Eclipse said this:

Eclipse would not allow me to save the file until I actually returned the encoding back to Cp1252...or as it suggests, until I removed the offending character. That's certainly one reasonable way to approach the problem.

There is a 3rd way to do this, one that I like slightly better. One could simply \uXXXX encode the characters that are not in the target charset. So, for example, if you start in Cp1252, type the word "José" as a String or variable, then change the project to ASCII encoding, the IDE could simply \u-encode the string as "Jos\u00E9". Better? Maybe. After all, the IDE doesn't have to display "Jos\u00E9" to you. It could continue to display "José" in the editor regardless of the underlying "encoding" of the character. After all, when you edit your file, you don't typically care if the file is UTF-8, or UTF-16, ASCII, or even \uXXXX -- just as long as the characters display correctly and are not lost.

What do you think of these options -- 1, 2, or 3? Or do you prefer something else from your IDE of choice?

Related Topics >>


Well, obviously no 2 is the best of those you present because it doesn't change anything without your permission. Practically, a combination of 2 and 3 would be more useful, ie, no 2 with the added option of converting offending chars to unicode escapes. Also, if you don't think about the problem as a technical exercise but as something an end-user might find useful, you could offer some more choices, eg changing "©" in comments to "(c)" (this is by far the most common example I've found).

You're right, option 3 is by far the better IMHO.