Skip to main content

Writing CSV files as UTF-8 for Excel

Posted by joconner on March 24, 2010 at 12:13 AM PDT

Yesterday a coworker complained that Excel wasn't displaying a CSV (comma separated values) file correctly. Our application allows the user to send a report via email. The application provides the report as a CSV file. Because the report can contain multilingual text, we've decided to encode it in UTF-8. Unfortunately, when users click on the file to display it, usually in Excel, all of the multi-byte UTF-8 characters display incorrectly.

The problem was immediately clear to me...Excel was opening the UTF-8 encoded files, but it was incorrectly identifying them as Latin-1 encoded files. In the absence of any charset identification, Excel must guess about a file's content encoding. In our environment, many host PCs use en_US locales with Latin-1 as the typical charset. Excel uses that default to read and display CSV files.

My solution to the problem was to use the byte-order marker (BOM) to identify the CSV file as a Unicode file. I instructed my colleague to prepend the FEFF character to the file. The Java application that writes the file uses a FileWriter that encodes to UTF-8 to create the CSV file. It was simple to just output the BOM as the first character in the file.

Now when our customers double-click on these files, Excel opens the file, notices the BOM, and automatically selects UTF-8 as the file's charset encoding. Now Excel displays the previously mangled characters correctly. And I was able to help resolve a problem with an easy solution.

Maybe you can give your applications a hint about plain text files as well. Writing the BOM to your file can help Unicode-enabled applications know how to decode your Unicode files.
 

Related Topics >>

Comments

CSV as utf-8

Hi, I tried the option, but still having the same problem. After adding the BOM, when i open the CSV from EXCEL -> open file --> csv then its ok. when i double click on the csv, getting junk characters, Please help for some solution.