Another solution for non-UTF8 source files in NetBeans 6.1?
Recently I mentioned a potential problem when saving source files in a non-Unicode charset encoding. The potential data loss is significant for large projects. After thinking about the problem a little more, I have a potential solution, a solution that allows you to save to a non-Unicode encoding but also prevents data loss.
You are familiar with
\u notation for non-ascii characters in property files? I think the same encoding can work for non-ascii characters in any Java source file. I'm not suggesting that should be the preferred representation. I think the
\u notation is only tolerable, something to be avoided whenever possible. However, in this situation -- saving files that were once in UTF-8 -- this might be the only option for storing the files without data loss.
Here's how it would work. First, NetBeans 6.1 uses UTF-8 for a project's default source code and configuration file encoding, an excellent choice by the way. So, now imagine that your source code has the Euro currency symbol in it. That's Unicode code point
U+20AC. And the character itself is this: â‚¬. If you can't see the actual character (maybe you don't have a capable font?), here's the image instead:
Now, let's imagine that you need to change your project encoding for some reason. So, maybe you choose ISO-8859-1, which doesn't contain the Euro symbol. You can still represent the Euro character, but you'll have to encode it with the \u. Wouldn't it be nice if NetBeans did this for you, creating
\u20AC in your file instead of converting the character to a meaningless
? question mark. I think that would be better. And it's entirely possible. It doesn't prevent NetBeans from converting the file to the target encoding as requested by the user, and it allows NetBeans to prevent data loss by using the \u encoding for characters that are not in the target charset.
So, what do you think? Maybe the NetBeans team can get this into the 6.1 product before final release?
Also posted to joconner.com.