The Source for Java Technology Collaboration
User: Password:



Simon Brown

Simon Brown's Blog

Displaying international characters in JSP

Posted by simongbrown on March 03, 2004 at 02:18 PM | Comments (13)

I've been having lots of "fun" over the past days trying to figure out how to get JSP pages to properly display international characters. I've tried HTTP meta tags, JSP page encodings and seemed to be getting nowhere. If I have understood all the reading that I've done, then there are a couple of things that you should do to tell the web browser that you wish to display international (e.g. Japanese) characters.

  • Specify the content type and character set from within your JSP.
    <%@ page contentType="text/html; charset=UTF-8" %>
  • Use a HTTP meta tag as a hint to the browser (I don't think this is essential, but it all helps).
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

After trying this and seeing that it worked for most people, I was kind of confused to see that my pages still displayed international characters as junk. I checked and double-checked all my headers, flushed my browser caches and even tried it on different browsers (IE and Safari on Mac). Still no joy. In fact, looking at the character encoding of the page under IE revealed that the encoding was still Latin1.

After scanning around for anything else that looked remotely locale oriented, I realised that I was using the JSTL <fmt:setLocale> tag to set a default locale to be used within the <fmt:formatDate> tags. Changing the value of the locale passed through to this would change the actual character encoding of the web page. However, still the characters showed as junk, albeit different junk!

A quick scan through the JSTL specification for the <fmt:setLocale> tag revealed the answer (or at least what seems to be the answer).

As a result of using this action, browser-based locale setting capabilities are disabled.
I downloaded the code for the JSTL tags and using this tag does in fact set the locale of the response, which appears to take precedence over the above charset settings. Commenting this tag out fixed all the problems. Except one ... now my dates were all formatted according to the default locale of the JVM and the JSTL <fmt:formatDate> tag doesn't allow you to specify a locale purely for formatting purposes. Thankfully, you can set a default locale to be used in the formatting actions with the following code that uses the javax.servlet.jsp.jstl.core.Config class.
  Config.set(request, Config.FMT_LOCALE, someLocale);

Now there was just one last thing - submitting information via a HTML form. Most browsers don't appear to send back a charset in the request that corresponds to the encoding that was used to format the page. In this case, the request character encoding defaults to ISO-8859-1 meaning that there's potentially a mismatch between form data being sent (in UTF-8) and information retrieved from the request (in ISO-8859-1) using the getParameter() method on the HttpServletRequest class. To fix this, all you need to do is explicitly set the character encoding of the request before accessing data.

  request.setCharacterEncoding("UTF-8");

Is this the total solution to displaying international characters in JSP? I hope so but I need to test this on other platforms and JSP containers. Hopefully I will read this blog entry next week and everything will still be correct.


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • how to work everywhere
    Your solution -- setting the character encoding for the page using meta tags and http headers, plus forcing the request encoding to utf-8 -- is used by mvnforum (www.mvnforum.org) a powerfull and easy to use open source web forum I recently installed on my intranet.

    But I had to patch the forum to "remove" this solution: I speak portuguese, and accented characters were not being accepted on input. I found no way to force the web browser (IE6 on Windows and Mozilla 1.5 on Linux) to send the request encoded as utf-8. So the output is unicode, and works well, but the input is the default latin1.

    But the author of the forum lives on Vietnam. I suppose he needs utf-8 like you need for chinese, and the question is: how to make the same application (say the same web site) behave correctly for both latin languages like portuguese and oriental languages like chinese?

    By the way, if you browser is sending utf-8 requests (or latin1 requests) shouldn't this be auto-detected by the web container, based on information sent by the browser? I think hardwiring any locale or encoding setting is a bad idea, because what works for some users won't work for others.

    Posted by: flozano on March 05, 2004 at 05:27 AM

  • how to work everywhere
    Your solution -- setting the character encoding for the page using meta tags and http headers, plus forcing the request encoding to utf-8 -- is used by mvnforum (www.mvnforum.org) a powerfull and easy to use open source web forum I recently installed on my intranet.

    But I had to patch the forum to "remove" this solution: I speak portuguese, and accented characters were not being accepted on input. I found no way to force the web browser (IE6 on Windows and Mozilla 1.5 on Linux) to send the request encoded as utf-8. So the output is unicode, and works well, but the input is the default latin1.

    But the author of the forum lives on Vietnam. I suppose he needs utf-8 like you need for chinese, and the question is: how to make the same application (say the same web site) behave correctly for both latin languages like portuguese and oriental languages like chinese?

    By the way, if you browser is sending utf-8 requests (or latin1 requests) shouldn't this be auto-detected by the web container, based on information sent by the browser? I think hardwiring any locale or encoding setting is a bad idea, because what works for some users won't work for others.

    Posted by: flozano on March 05, 2004 at 05:28 AM

  • how to work everywhere
    When you want to convice the brwoser to send the form data encoded in utf-8, make the form itself utf-8 encoded. This works well with IE, Mozilla/Gecko and Opera.

    Posted by: grayman on March 07, 2004 at 04:37 AM

  • how to work everywhere
    Unfortunately Safari (Mac OS X) doesn't seem to respect the accept-charset attribute of the form tag. You are right though, this works in IE.

    Posted by: simongbrown on March 07, 2004 at 06:48 AM

  • how to work everywhere
    Try to encode the page which contains the form in utf-8.

    Posted by: grayman on March 07, 2004 at 07:15 AM

  • The fmt:setLocale is probably a bug with tomcat4, once the page encoding is set it should not be overridden. I do find it odd that locale and response encodings are connected. A web page in English can be encoded in several ways, and the location of the user or the server has nothing to do with that.

    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4880792

    Posted by: tcowan on December 03, 2004 at 09:34 AM

  • I am setting response encoding to pResponse.setContentType("iso-2022-jp"); for japanese characters.
    The data in the database is of UTF-8 format.
    I am getting junk data for japanes characters.
    Please advice

    Posted by: lekhaphijo on August 05, 2006 at 10:25 PM

  • I had a servlet I wrote for feedback/booking forms a few years ago, and never managed to get it send e-mail that would preserve the utf-8 encoding. Until a few days ago I got one generated e-mail that would display correctly: it was from a IE5 MacOS user! So I decided to look again and found your article. Well, I was missing the

    request.setCharacterEncoding("UTF-8");

    you mentioned!
    So, the data sent is correct but the encoding is only declared by the old IE5... indeed packet sniffing shows:

    Content-type: application/x-www-form-urlencoded; charset=UTF-8

    vs:

    Content-Type: application/x-www-form-urlencoded

    So, thanks Simon!
    For lekhaphijo, why are you not trying the method Simon suggested and why are you not using utf-8 all over? With that I had no problem in passing all types of characters, including chinese, arabic, etc.

    Posted by: giulianog on January 09, 2007 at 02:57 PM

  • I have a problem while inputting Arabic text in a JSP page. I have followed all the points mentioned above and the Arabic text displayed on the page is being displayed ok. But the Arabic request parameter values passed to the next page get passed as garbled text. The request object is encoded as UTF-8 but the parameter value is junk. What am I missing?

    Posted by: swe029 on February 11, 2007 at 06:29 AM

  • small addition here: the http-equiv meta tag servs the browser rendering a page read from a filesystem instead of a response. In casu a "saved" page...

    Posted by: stefaanh on March 28, 2007 at 03:10 PM

  • hi all,
    i face the problem of displaying the chinese word in jsp page although i hav done all the steps that written by simon. my scenario is this : 1.jsp gets the unicode(\u521B\u9020\u7528\u6237) from .properties file(the chinese words show properly in 1.jsp) and redirect to 2.jsp. 2.jsp uses request.getParameter() to get the value of the chinese words but the displays all with funny symbols(is different from what i see in 1.jsp). how should i do for this? i need somebody helps.
    **giulianog, i need your guide since you can display and input chinese word successfully. thanks .

    Posted by: landoa on April 09, 2007 at 09:03 PM

  • landoa, if you are using Tomcat (or JBoss), you should be able to solve your problem by adding the following attribute setting to the Connector tag in Tomcat's server.xml: URIEncoding="UTF-8"

    Posted by: cbattis on April 23, 2007 at 05:43 PM

  • Hi all,
    I am accessing a field entered in jsp1 in arabic from jsp2 using request.getParameter(). I have made all the changes suggested by Simon w.r.t content type and character set and also the request.setCharacterEncoding("utf-8"). But even then, the field value appears junk in the jsp2. Please help.

    Posted by: santosh_savanur on May 17, 2007 at 11:09 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds