The Source for Java Technology Collaboration
User: Password:



Evan Summers

Evan Summers's Blog

Refactoring Translations

Posted by evanx on May 26, 2006 at 05:20 AM | Comments (3)


Introduction.
"There is no problem that cannot be solved by the use of high explosives."

locale.png Recently I was tasked with making an app translatable. It was a relatively small Swing app, eg. 200 classes.

That means moving strings, like exception messages, into a resource bundle. I had some fun with a phased approach, which I present here.


Moving the strings
"The best armor is staying out of gun-shot."

The first phase was moving the string literals in the code into a "message class" as below.

public class IMessage {
    public static String systemErrorLogin = "~ logging in";
    public static String systemBusyCommunicatingWithServer = "Communicating with the server...";
    public static String systemErrorOccurred = "An error has occurred";
    public static String systemErrorOccuredFormatTilde = "An error has occurred while %s";
    public static String systemSendOpLogoffReq = "~ sending logoff message";
    public static String systemUpdateError = "~ updating application";
    public static char systemLoginMnemonic = 'L';
    public static String[] periodOptions = {"Today", "Yesterday", ...};
    ...
}

Note that we allow different types in the message class, ie. char for mnemonics, and string arrays for combo boxes.

(Incidently, in the above example, we use a notation where a tilde at the beginning of exception messages is substituted with "An error has occurred while..." It's lazy, and that's what I really dig about it, man!

The application code becomes "stringless" as follows.

   public void run() {
      try {
         login();
      } catch (Exception e) {
         e.printStackTrace();
         gui.showExceptionDialog(e, IMessage.systemErrorLogin);
      }
   }

jabber_protocol.png An advantage of this approach, is that the "keys" we are choosing are refactorable field names, eg. systemErrorLogin, rather than string references. Once all strings have been refactored out into this message class, we can review the keys for naming consistency, spelling, etcetera. Renaming them is safe and easy, eg. using Netbean's refactorings.

By the way, this is in keeping with my "Bean Curd" blog, where I argue that "string references" (eg. resource bundle keys in this case) hinder refactoring, and so we should aim for applications with "no string references attached."


Generating the resource bundle
"The best tank terrain is that without anti-tank weapons."

The second phase is to generate the resource bundle from this message class. We use the field name as the key, and use reflection to generate the resource bundle as follows.

   public void generateResourceBundleContent() throws Exception {
      IMessage messages = new IMessage();
      for (Field field : IMessage.class.getFields()) {
         // iterate through all the fields in the message class 
         String key = field.getName();
         Object value = field.get(messages);
         if (field.getType() == String.class) {
            // output a regular string message  
            logger.println(key + " = " + value);
         } else if (field.getType() == String[].class) {
            // output a string array, eg. combo box items 
            String[] array = (String[]) value;
            int index = 0;
            for (String string : array) {
               logger.println(key + index + " = " + array[index]);
               index++;
            }
         } else if (field.getType() == char.class) {
            // output a char, eg. a mnemonic 
            logger.println(key + " = " + value);
         }
      }
   }   

In the above method, we generate the content to be cut and pasted into our resource bundle file eg. myapp_en.properties. Note that we handle string arrays by appending an index digit to the key.


Loading the resource bundle
"Samuel Morse must have lost his mind if he believes in this Dots and Dashes idea himself!" A Government Official (1842).

kdmconfig.png Now we can translate the resource bundle file into other languages, eg. myapp_de.properties for German. When our application starts up, we need to load the messages for the current locale's resource bundle. Firstly, to simplify the processing, we find it convenient to read the resource bundle into a Map, as follows.

    public static final Map<String, String> resourceMap = new HashMap();

    public void loadMessages() throws Exception {
       for (Enumeration<String> it = resourceBundle.getKeys(); it.hasMoreElements();) {
          String key = it.nextElement();
          String value = resourceBundle.getString(key);
          resourceMap.put(key, value);
        }
        logger.exiting(resourceMap.size());
    }

So now we gonna load the resource bundle messages into our messages class (which is otherwise still initialised to the original English strings). We use reflection, as follows.

   public void configureMessages() throws Exception {
      IMessage messages = new IMessage();
      for (Field field : IMessage.class.getFields()) {
         // iterate through all the fields in the message class          
         String key = field.getName();
         Object defaultValue = field.get(messages);
         Object resourceValue = null;
         if (field.getType() == String.class) {
            // we are looking for a regular string message
            resourceValue = resourceMap.get(key);
            if (resourceValue == null) { 
               throw new IRuntimeException(field);
            }
            field.set(messages, resourceValue);
         } else if (field.getType() == String[].class) {
            // we are looking for an array of strings in this case
            String[] defaultArray = (String[]) defaultValue; 
            List<String> stringList = new ArrayList();
            for (int index = 0;; index++) {
               String string = resourceMap.get(key + index);
               if (string == null) break;
               stringList.add(string);
            }
            if (stringList.size() != defaultArray.length) {
              throw new IRuntimeException(field); 
            }
            resourceValue = stringList.toArray(new String[stringList.size()]);
            field.set(messages, resourceValue);
         } else if (field.getType() == char.class) {
            // we are looking for a char resource
            String resourceString = resourceMap.get(key);
            if (resourceString == null || resourceString.length() != 1) {
               throw new IRuntimeException(field);
            }  
            resourceValue = resourceString.charAt(0);
            field.set(messages, resourceValue);
        }
    }

Note that in the above code, if an entry is found in the resource bundle that is inconsistent with the messages class, eg. an unrecognised key, or different length string array, then an exception is thrown. This should be performed as a unit test. Anyway we will know as soon as we run the application if our resource bundle is not as it should be (via an exception).


Testing
"Airplanes suffer from so many technical faults that it is only a matter of time before any reasonable man realizes that they are useless." - Scientific American (1910)

fonts.png Resource bundles, with string literals as keys in the code, eg. getString("loginError"), are fragile. For example, a misspelt key for some obscure exception message, might only be picked up (as a "dangling" string reference) when that exception occurs. That might only happen down the line, in production.

An advantage of the message class approach, is that it enables unit testing of our resource bundles. For example, we can easily test that every one of our messages (as declared in the message class) is translated in our resource bundles, as follows.

   public void test() throws Exception {
      IMessage messages = new IMessage();
      for (Field field : IMessage.class.getFields()) {
         String key = field.getName();
         if (resourceMap.get(key) == null) {
            throw new IRuntimeException(key); 
         }
      }
   }

The above code sample is over-simplified, but hopefully illustrates the point.


Merging messages
"No flying machine will ever fly from New York to Paris." - Orville Wright.

appearance.png What may be useful, is to generate the content of a message class from an existing resource bundle (eg. one produced using Netbeans GUI designer, for labels and such), as below. Then we can cut and paste that content into our message class. In this case, we can identify name clashes, and also happily generate the resource bundle file in its entirety later (from the message class, as shown above).

   public void emitMessagesCode() throws Exception {
      for (String key : resourceMap.keySet()) {
         String value = resourceMap.get(key);
         logger.println("public static String " + key + " = \"" + value + "\";");
      }
   }


To Bundle, or not to bundle?
"We are not retreating - we are advancing in another direction." - Gen. Douglas MacArthur

yast_babelfish.png Another option is to translate messages in code as below. The advantage of this approach is that the keys remain refactorable. And then translators can use Netbeans, and commit directly to the source code CVS, yay!

   public void installGerman() {
      IMessage.systemError = "Eine Störung trat auf";
      ...
   }


Conclusion
"I have a catapult. You will agree to my terms, or I will fling an enormous rock at your city." - Latin literature.

mozillacrystal.png We introduce a phased approach for translating an application. First, we move strings into a message class. This is achieved by cutting and pasting strings out of application classes into the message class. (Additionally, an existing resource bundle, as produced by the Netbeans GUI designer for example, might be merged into the messages class, with the assistance of some trivial code generation.)

This first phase enables us to ensure neat and consistent naming of the keys we use to reference messages. For example, we can readily rename the message keys using IDE refactorings, to correct spelling mistakes and inconsistent naming conventions.

The next phase is to generate the master resource bundle from the message class. We use reflection on the message class to generate the key/value pairs, which we cut and paste into the master resource bundle file. After this stage, the resource bundle can be translated into multiple languages.

At startup, the application loads the resource bundle for the current locale, and uses reflection to configure the message class from the resource bundle. This offers a mechanism to ensure that the resource bundle is consistent, ie. there are no strings that remain untranslated.

In general, I argue that source code should contain no string literals whatsoever! The reason for this is that string literals are typically fragile references, which are not refactorable. This applies to strings that refer to field or method names as discussed in my earlier blog "Explicit Reflection", and string references to properties, as discussed in "Bean Curd (Chapter 1)". (Strings used in OR queries will be discussed in an up-coming blog, "Bean Curd 2: Native Query Beans.")

Clearly strings that are text messages are also undesirable, because they should be externalised for translation (in resource bundles).

And finally string references to externalised messages in resource bundles, are fragile and unable to be unit tested, and consequently dangerous, eg. getString("loginError").

So I think that covers all the evil strings that we might find lurking in our code? Let's root them out and banish them forever!

My Postscript Punt for today is... "Bin Bash Java (Chapter 1)." And not "Swing trounces Ajax" as usual ;)


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment


  • Interesting approach. I'm a bit concerned about scalability, though. The approach works fine for small applications as you pointed out in your introduction.

    For large apps however, having one central place for all strings becomes a maintenance nightmare as you can't see which string is used where in your code and if you still need it or not or if it is shared among several classes or not. Hence, I prefer to have a separate properties file for *every* class, even if that means that you cannot share string constants between classes. An additional advantage is that the separate properties file can be easily renamed in case you rename your class.

    Your approach could be adapted to this scheme too, of course.

    One question: Why are you using a separate hash map? A properties based ResourceBundle in fact is a map. So this seems to be a duplication. Maybe I have missed your point here...

    Regards,
    Christian

    Posted by: christian_schlichtherle on May 29, 2006 at 11:53 AM

  • Thanks evan for a great refactoring walk through. Just a minor bug in configureMessages() i.e. String[] resource values were not assigned to 'resourceValue' before setting the field...

    resourceValue = stringList;
    field.set(messages, resourceValue);

    Posted by: gmanwani on May 29, 2006 at 12:49 PM


  • Thanks Manwani, you right :) That line got deleted somehow, when i cut and pasted from the client's project, with their kind permission.

    Thanks for your comments, Christian. I put the ResourceBundle into a typed Map, because ResourceBundle isn't one, is it?

    Christian, you're right about the scalability. Which raises two points, one of which is unrelated in the sense that I haven't got the hang of this blogging system yet. For instance, I amended this blog some days ago to change the reference to a "single messages class" (inter alia) but the "permalinks" seem to get out of sync or something. Because references to the articles (eg. from the front page) are always "out of date" permalinks, different to the "up to date" version, ie. with amendments. I wish these important links ie. from the front page, were to the "editable" version, if for no other reason, so that one can correct spelling errors and such.

    Putting that bone aside, you bring me to the main point, which is, you're right! In this example, I refactored all messages into a single file, given the limited size of the system, for convenience. But as you suggest, limiting oneself a single resource bundle is not good practice, and i should not advocate this. Incidently, you can see where the strings are used, eg. using Netbeans' "Find Usages."

    If using multiple resource files, I would have a superclass for the messages classes, which would register each class in a static list, eg. for unit testing, and of course populate each message class from its corresponding resource bundle.

    Which brings me to something which has been under the surface of my thoughts until tonight when I was in the shower. Hey, why is it that one's thoughts seem to crystalise best in the shower or in the bath - must be because we are 70% water ourselves!? ;)

    This thought is that the messages file presented in this blog is a mirror of a resource bundle (and visa versa). An explicit, toolable, internalised reflection of an externalised configuration file.

    So the idea is that for each externalised file, like a resource bundle properties file, or an XML configuration file, we implement an explicit "mirror class" to populate via reflection from that externalised file. (In the case of a resource bundle, it would be a single class, as presented here in this blog, but for an XML configuration file, it would be a suite of classes, and we would populate an object graph eg. using JAXB.)

    The important point is that elements (eg. messages) are declared explicitly (so as to be readily toolable), where we use their field names as implicit references to the externalised configuration data (and so avoiding string references).

    So that is the "forest perspective" of this "tree article" :)

    That is to say, yes, we would have multiple message classes. In particular, we would have exactly as many message classes as resource bundles.

    They would be exact mirrors of the resource bundles, defining their "structure" (ie. keys, to be matched in the resource bundle file) and awaiting "content" (to be supplied by the resource bundle).

    I look forward to writing an article along these lines (ie. the structure of every configuration file is mirrored in the code explicitly, and instantiated and populated from externalised configuration file instances, using class and field names as implicit keys). It might be entitled "Bean Curd 3: Configuration Beans" unless I can think of a better name between now and then... ;)

    Maybe what I'm talking about is just JAXB (XML Binding) applied to XML configuration files? In which case I'll just be writing a blog about JAXB :)

    Posted by: evanx on May 29, 2006 at 02:00 PM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds