Skip to main content

Unicode™: Write Once, Read Nowhere.

Posted by mkarg on February 13, 2010 at 12:19 PM PST

Back in the early 80ies "of the past millenium" (As journalists call it these days - don't you feel as old as I do when reading that phrase? For me it is just "Childhood" and feels not so far ago. At least not a Millenium ago.), when I was a young boy, I teached myself BASIC programming on my father's Sinclair ZX Spectrum 48K and started coding small arcade games (what else will ten year old boys do with a micro computer? The web was not invented back then.). That wonder machine unfortunately had everything but not the possibility to do pixel or vector graphics in pure BASIC. You had to learn Assembler for that. But first, with ten years I did not understand what that is good for, and second, I just had no Assembler software - so I had to stick with BASIC. As arcade games don't make any fun when consisting solely of alphanumerical characters (not even back in those ages), I had to find a solution to get smart effects.

 

What I "invented" (later I learned that it was the default solution to the problem) was to use normal characters, but modify their glyph (a glyph is the graphical representation of a character, like the three bars of an 'A' is the graphical representation of that character, while 65 is the ASCII representation of the same). This could be done easily (hardly couldn't believe that the trick is actually described on the web). Using this trick, I was able to provide pleasant game graphics without the need to learn Assembler. I just has to type lots of zeros and ones into the machine using its meanwhile legendary rubber keyboard (Don't tell my father - it actually was his rubber keyboard - but in fact I once used it as an eraser, what worked pretty well -- much better than the idea I once had with ten or twelve years to directly attach a small light bulb, which effectively killed the Z80 CPU. Possibly the cause why my web site is called Head Crashing Informatics [www.headcrashing.eu]).

 

I even kept that trick when I later actually did learn Assembler and moved over to the more popular and powerful Commodore C64, which came with much better graphics support in its BASIC dialect (Frankly spoken, I did not move to the C64 because of the improved BASIC but because of the plethora of computer games available for that machine: Yes, I was part of that long-haired sneakers generation hanging around with my pals playing video games for hours). The trick was typical in the games industry and still worked well, even in conjunction with more sophisticated approaches like sprites .

 

When I got older, I forgot about computer games and did more "serious" programming. I bought "a real PC"  in around 1990 and wrote business applications, studied informatics, and since make a living from developing "serious" software. I never needed to replace glyphs in a font so far.

 

So what the heck has that to do with Java? Read on.

 

Some months back a customer told me that he needs to type in "ISO 1101 characters into a text field. Well, actually I had no clue what "ISO 1101" is and what the customers problem is. I expect you neither do, so let me explain. Think of the case that you want people to check whether a produced part is actually round (but not elliptic), or actually even (but not wavy). You could write the English words in the task description, but there will be two problems. First, not everybody can read (even in the so-called "First World"). Second, not everybody would understand what "round" will mean unless you write "round in contrast to elliptic". So the clever guys at ISO defined symbols for "roundness" (), "eveness" (⏥) and other geometric words (don't wonder if you do not see them here). In industrial design and production those symbols are just as common as "male" symbol () is common to everybody seeking for a restroom. As the symbol has to be used together with alphanumeric characters in running text, there was a need to have a font containing "ISO 1101" characters.

 

We did not expect that "MS Sans Serif" would contain this characters (actually some European citizen are happy to they find their particular umlauts in fonts, so chances are low to find such specific symbols). So what to do? The customer came up with the information that he bought a special font containing only that symbols, so we had to add a second text field (since that other font did not contain any latin characters, typographic symbols or numbers). While it was a strange solution, it actually worked and the customer was happy with that.

 

Another idea we had was to do what I did in childhood: Copy the special "ISO 1101" characters into the "MS Sans Serif" font. Unfortunately, first, this is not allowed since it would infgringe Microsoft's copyright, and second, there is not enought place in the Microsoft font to host all the new symbols withouth discarding any other possibly useful character. It was about that moment when I remembered that I had exactly the same problem on my ZX Spectrum twenty years before, and I noticed that the actual cause of the problem is not the missing glyph but more the fact that there are just eight bits to select one one them. So when you want to keep the original 256 characters, you just have no code left over to select any additional glypths. History is repeating, damned!

 

As we are writing all new software solely in Java, we had the idea to write the complete software from scratch, replacing the existing code by Java. As Java is basing on Unicode™, and as Unicode™'s target is to contain all symbols ever invented by mankind, there should be all "ISO 1101" symbols found in a Unicode™ font. Actually Unicode™ really contains all of them, as can be checked on rainer-seitel.onlinehome.de/unicode-de.html#vorhandene (sorry, German only). You can imagine that we were really happy, as a switch to Java was planned anyway, so the solution would be contained for free.

 

Write Once, Read Nowhere.Markus in Winter

 

Have you ever tried to type in "ISO 1101" characters into a Java program (or into any other Unicode™-enabled software)? Try it, here are some: . Don't be disappointed if you see either nothing or just cryptic placeholders or only few but not four symbols here. This is what many readers will do not running a Windows® machine.

 

The problem is that the Java standard declares a "Write Once Run Anywhere" paradigma which only covers the "runnability" but does not defined what actually has to be seen on the screen: Java does not enforce that you will really see the actual glyphs defined by Unicode, it only enforces that the code representation ("the integer value") is to be processed unchanged. Neither does Unicode™. There is no law that says that an application that is able to process Unicode™ or claims Unicode™ compatibility is also able to render all glyphs on the screen, nor that a Unicode™ font must contain all glyphs. As a result, virtually every font only a very limited subset of Unicode™ glphys - typically only the most often used ones (it would be just too expensive to add cuneiform or egyptic hierogylphs to every font on earth). You'll have to search for long time to find a font containing all "ISO 1101" characters, and you'll have to search for even longer time to find a JRE that comes with that font bundled.

 

The end of the story.

 

So we're back where we started more than twenty years ago: If you want to have a smart GUI showing pleasant symbols but not only "ABC" and "123", you have to tweak your existing fonts with manually added glypths. Sad, but true. BTW, why ever, many fonts contain three symbols for different types of snowflakes: ❆❅❄. Maybe different types of snowflakes are more essential for mankind that geometric tolerances.

 

Apropos snowflakes. As I am already disappointed and frustrated now, let's invest some time in Sisyphos work: Winter is back in the black forest, so I'll grab my snow shovel and try to dig for my car. I am rather sure that I left it somewhere here this morning...

 

Regards

Markus

 

Note: A list of all published articles (printed and online) is available on my web site Head Crashing Informatics (http://www.headcrashing.eu).

The winter photo actually shows me at a walk through the black forest today. Photo used by courtesy of Stefanie of inviticon (http://www.inviticon.eu).

Comments

Font Linking

Hi Markus,

I don't think you should be having this problem at all. Do you know about Font Linking (aka Font Merging or Composite Fonts)? It is the standard solution to this problem: when a graphics toolkit has to draw a glyph that does not exist in the current font, it should automatically use a fallback font that does contain the glyph. So the only requirement is that there is at least 1 installed font on the system that can display the glyph, and I think that's completely reasonable.

Now things get a bit tricky with Swing. Font Linking works out-of-the-box with all the logical fonts (e.g. the "Dialog" font or the "Monospaced" font) or by any font created by a standard L&F like the Windows L&F. Last I checked, Font Linking does NOT work if you create your own non-logical font, e.g. you create a new Tahoma 11pt Font object and assign that font to the text field. It also does NOT work if you use a L&F that uses custom fonts in this way, e.g. JGoodies Looks.

So the questions are (1) are you using a non-standard L&F or (2) are you setting a custom font on the text field in question?

Finally, you can use some internal Sun classes to apply Font Linking to any font (including custom created Font objects). See these posts in the JGoodies Looks mailing list for more info and sample code:

https://looks.dev.java.net/servlets/SearchList?list=users&searchText=wes...

Cheers,
Daniel

Font Linking would solve in theory, but does not in practice

Font linking in fact would solve the problem in theory which is what I explained as a possible solution to a different comment (I didn't use the word font linking but described that there should be a complete default font provided by the Unicode consortium as a fallback).

The problem is: First, this blog entry was not solely about Swing or Java, but unicode in general. On Windows there is also font linking, but it did just not work. In Swing, I am using the default platform PLAF (Windows PLAF) with original font configuration (did neither modify nor replace font, not even decide for a particular font in my application). Why is it not working, still? I can just assume that the problem is that the fallback font is just not complete itself. So what is it good for then? Nothing!

The problem is the same: Unless there is at least one font on earth that is complete, Unicode is not the solution. So, Unicode as an idea is great, also font linkage is a great idea, but the third brick is missing: A 100% complete font. I just have no benefit from a solution that could show me in theory millions of symbols, but actually is showing me lots of missing glyphs in reality.

For the internal Sun classes: I will never use anything that is not an official part of the system as my idea is to improve the official system. I can do that by complaining problems like this one. If I just fix it silently with some inofficial tricks, then the problem will never get solved, since just nobody takes notice of (I am one of that pedantic guys having written hundrets of bug reports in Sun's trackers so far ).

Works in practice for me :)

Font Linking should work in practice, because it uses multiple fallback fonts. In a decent implementation, it should search every single font installed on the system in order to find the correct glyphs. This has worked for me in real-world apps, displaying all kinds of Asian, Western, and other characters in a single text area, using whatever fonts necessary.

So really not sure why it didn't work for you... perhaps there are some bugs in Sun's Font Linking implementation? Perhaps they only search a few fallback fonts and then give up? If I get a chance, I'll try out a test with the Unicode chars you mentioned. Note that you can paste your "strange" Unicode characters along with English characters into a regular Windows text field, so the Windows native implementation of Font Linking is working well in practice.

BTW, a bug has been file to add Font Linking to the public API, but Sun considers this low priority: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6407157

Multiple fallback fonts

Can't a multiple fallback font mechanism solve the rrequirement for a huge single font? For example , if the top layer doesn't have one , it searches in the very next layer below it and so forth .... So the mechnism should continue to search down the layers till it finds a proper one. This would avoid the requirement to have a huge single font set and help to install an incremental new fall back font layer whenever a new one is created. Any views on this.

Font Hell

Won't you agree that this would end in a "Font Hell" for both, the application provider and the end user? How many fonts will you have to provide to ensure that all data possibly stored in your database can really be rendered to the screen? And how to guarantee that really all glyphs are there at least once? Who we check this? Ain't it be much simpler if there would be exactly and uniquely one font provided by the Unicode consortium that we all can rely upon?

It's not a font hell at all. The OS -- if it's working ...

It's not a font hell at all. The OS -- if it's working properly -- does the font fallback as part of its text handling routines. (FWIW, all the funny characters display just fine on Mac OS X, which does font fallback beautifully.) The developer shouldn't need to worry about it, except to make sure that at least one font is installed containing the characters in question so that there's *something* to fall back to.

You're wildly exaggerating the difficulties here. On any OS with decent text handling, Unicode really does Just Work.

As the article itself proofs, it is font hell: I just picked ...

As the article itself proofs, it is font hell: I just picked some Windows 7 driven laptop and viewed it in IE 9, and guess what? The symbol for "evenness" (see above) is rendered as a rectangle with a question mark in it. At least Microsoft's OS does not pretend to bring all the symbols with it or do the explained font fallback. Maybe your OS does. Mine is not. Can you explaint how a normal user can ensure to see the right symbol? I do only see one solution: The author has to provide a font containing that character. And this exactly produces font hell.

Hmm ja

Oh come on this is not fair. Unicode is solving most of todays encoding problems. Of course there are always some special characters some one can think of that are not included in the default fonts of the Java VM. Do you know how big a font can get that includes all unicode characters. So find a good one that covers your 5 needed symbols and don't whine. Unicode is a blessing and it was a big step forward. And yes I need 3 different kind of snow flakes for my Java Ascii Art making use of all Unicode characters available ;-) Greetings from Lörrach Have fun, - Bernd

This is not unfair

My complaints are not unfair. Let me explain: Unicode is great, actually. At least it's idea. It really solves all my problem. But just in theory. The problem is that unless I can have all millions of glyphs on the screen, I have no benefit in reality. Why? Because I get files with chinese characters, technical symbols, and so, which just are not included in the font. So what actually is it good for if I can not see it? Since there is no font on earth containg all of the fonts, what have I won? Nothing. This is not unfair, it is just the actual outcome.

The problem is not Unicode itself, it is the licencing: The problem could easily get solved if there would be a restriction that any Unicode font must be full. To reach that, the Unicode consortium has to provide one single font containing all symbols in a default styling for free which must be used to fill the blank places in the actual font unless it comes with its own glyphs. I do not see any problem why not doing that, neither technically nor economic. Then Unicode would not just be great in theory but also in practice!

Greetings to Lörrach. When you're once near Karlsruhe, drop me a note, so we can meet for a beer. ;-)