The Source for Java Technology Collaboration
User: Password:



John O'Conner

John O'Conner's Blog

International Domain Names

Posted by joconner on March 29, 2007 at 02:36 AM | Comments (3)

The Java SE 6 release provides an interesting new class: java.net.IDN. It's small, simple...very focused on a single task. That task has two parts:


  1. to convert domain names from practically any Unicode character to an ASCII Compatible Encoding or ACE.
  2. to convert ACE names back into their full Unicode UTF-16 encoding

To support these two operations, not surprisingly, the class has two static methods:


  1. toASCII
  2. toUnicode


IDN.png
The toASCII method converts its non-ASCII Unicode characters to an ACE form using an algorithm called punycode. Yeah, I snickered at the name too. The results are always surprising, but don't worry...it's well defined enough that it produces the same results repeatedly. So, for example, if you want to use the domain name 日本語.jp, the toASCII method would produce the ACE equivalent of xn--wgv71a119e.jp. The toUnicode method returns the ACE name back to its original form.

So why do you need this? It turns out that the internet's core infrastructure, including domain name servers and name resolvers just don't handle non-ASCII characters very well. At the very least, it's safe to say that they don't purposefully support non-ASCII characters. However, people want the bigger Unicode character range for their name names. So, RFC 3490 allows for internationalized, Unicode names...but with a hitch. We have to pass ACE names to the infrastructure DNS and name resolvers. Your apps can display 日本語.jp, but those same apps have to convert to ACE when they pass the name off to DNS, etc. So that's it. That's why java.net.IDN is useful.

Java SE 6 has several new internationalization features. IDN support is just one. To read more about this and other new i18n features, take a look at the article International Enhancements in Java SE 6.


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • Let's hope that the browser manufacturers can mitigate any potential extra risk that this might open up with respect to phishing. Especially since the punycode versions of the domain name might be communicated in e-mail, etc.

    Posted by: doj on March 29, 2007 at 01:54 PM

  • I remember Firefox used to display the punycode string in the address bar, I thought it was a nice approach to let users see if that's the exact site he/she intended to visit.

    Now it displays the unicode URL as-is, the torrent of i18n phishing is near.

    Posted by: weijun on March 29, 2007 at 05:36 PM

  • Firefox does display the punycode in the address bar, and perhaps even more importantly, in the status bar before you follow the link. At least mine does; maybe it's configurable now.

    But is that good enough? Maybe IDNs should have a special style or something.

    It's a tough problem. I speak Japanese and I have a lot of sympathy for Shift-JIS folks trying to live in an ISO-8859-1 world, but I still wish I didn't have to explain to grandma why paypal.com is not paypal.com.

    Posted by: erickson on March 30, 2007 at 10:34 AM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds