Unicode™: Write Once, Read Nowhere.
Back in the early 80ies "of the past millenium" (As journalists call it these days - don't you feel as old as I do when reading that phrase? For me it is just "Childhood" and feels not so far ago. At least not a Millenium ago.), when I was a young boy, I teached myself BASIC programming on my father's Sinclair ZX Spectrum 48K and started coding small arcade games (what else will ten year old boys do with a micro computer? The web was not invented back then.). That wonder machine unfortunately had everything but not the possibility to do pixel or vector graphics in pure BASIC. You had to learn Assembler for that. But first, with ten years I did not understand what that is good for, and second, I just had no Assembler software - so I had to stick with BASIC. As arcade games don't make any fun when consisting solely of alphanumerical characters (not even back in those ages), I had to find a solution to get smart effects.
What I "invented" (later I learned that it was the default solution to the problem) was to use normal characters, but modify their glyph (a glyph is the graphical representation of a character, like the three bars of an 'A' is the graphical representation of that character, while 65 is the ASCII representation of the same). This could be done easily (hardly couldn't believe that the trick is actually described on the web). Using this trick, I was able to provide pleasant game graphics without the need to learn Assembler. I just has to type lots of zeros and ones into the machine using its meanwhile legendary rubber keyboard (Don't tell my father - it actually was his rubber keyboard - but in fact I once used it as an eraser, what worked pretty well -- much better than the idea I once had with ten or twelve years to directly attach a small light bulb, which effectively killed the Z80 CPU. Possibly the cause why my web site is called Head Crashing Informatics [www.headcrashing.eu]).
I even kept that trick when I later actually did learn Assembler and moved over to the more popular and powerful Commodore C64, which came with much better graphics support in its BASIC dialect (Frankly spoken, I did not move to the C64 because of the improved BASIC but because of the plethora of computer games available for that machine: Yes, I was part of that long-haired sneakers generation hanging around with my pals playing video games for hours). The trick was typical in the games industry and still worked well, even in conjunction with more sophisticated approaches like sprites .
When I got older, I forgot about computer games and did more "serious" programming. I bought "a real PC" in around 1990 and wrote business applications, studied informatics, and since make a living from developing "serious" software. I never needed to replace glyphs in a font so far.
So what the heck has that to do with Java? Read on.
Some months back a customer told me that he needs to type in "ISO 1101" characters into a text field. Well, actually I had no clue what "ISO 1101" is and what the customers problem is. I expect you neither do, so let me explain. Think of the case that you want people to check whether a produced part is actually round (but not elliptic), or actually even (but not wavy). You could write the English words in the task description, but there will be two problems. First, not everybody can read (even in the so-called "First World"). Second, not everybody would understand what "round" will mean unless you write "round in contrast to elliptic". So the clever guys at ISO defined symbols for "roundness" (?), "eveness" (?) and other geometric words (don't wonder if you do not see them here). In industrial design and production those symbols are just as common as "male" symbol (?) is common to everybody seeking for a restroom. As the symbol has to be used together with alphanumeric characters in running text, there was a need to have a font containing "ISO 1101" characters.
We did not expect that "MS Sans Serif" would contain this characters (actually some European citizen are happy to they find their particular umlauts in fonts, so chances are low to find such specific symbols). So what to do? The customer came up with the information that he bought a special font containing only that symbols, so we had to add a second text field (since that other font did not contain any latin characters, typographic symbols or numbers). While it was a strange solution, it actually worked and the customer was happy with that.
Another idea we had was to do what I did in childhood: Copy the special "ISO 1101" characters into the "MS Sans Serif" font. Unfortunately, first, this is not allowed since it would infgringe Microsoft's copyright, and second, there is not enought place in the Microsoft font to host all the new symbols withouth discarding any other possibly useful character. It was about that moment when I remembered that I had exactly the same problem on my ZX Spectrum twenty years before, and I noticed that the actual cause of the problem is not the missing glyph but more the fact that there are just eight bits to select one one them. So when you want to keep the original 256 characters, you just have no code left over to select any additional glypths. History is repeating, damned!
As we are writing all new software solely in Java, we had the idea to write the complete software from scratch, replacing the existing code by Java. As Java is basing on Unicode™, and as Unicode™'s target is to contain all symbols ever invented by mankind, there should be all "ISO 1101" symbols found in a Unicode™ font. Actually Unicode™ really contains all of them, as can be checked on rainer-seitel.onlinehome.de/unicode-de.html#vorhandene (sorry, German only). You can imagine that we were really happy, as a switch to Java was planned anyway, so the solution would be contained for free.
Write Once, Read Nowhere.
Have you ever tried to type in "ISO 1101" characters into a Java program (or into any other Unicode™-enabled software)? Try it, here are some: ? ? ? ?. Don't be disappointed if you see either nothing or just cryptic placeholders or only few but not four symbols here. This is what many readers will do not running a Windows® machine.
The problem is that the Java standard declares a "Write Once Run Anywhere" paradigma which only covers the "runnability" but does not defined what actually has to be seen on the screen: Java does not enforce that you will really see the actual glyphs defined by Unicode, it only enforces that the code representation ("the integer value") is to be processed unchanged. Neither does Unicode™. There is no law that says that an application that is able to process Unicode™ or claims Unicode™ compatibility is also able to render all glyphs on the screen, nor that a Unicode™ font must contain all glyphs. As a result, virtually every font only a very limited subset of Unicode™ glphys - typically only the most often used ones (it would be just too expensive to add cuneiform or egyptic hierogylphs to every font on earth). You'll have to search for long time to find a font containing all "ISO 1101" characters, and you'll have to search for even longer time to find a JRE that comes with that font bundled.
The end of the story.
So we're back where we started more than twenty years ago: If you want to have a smart GUI showing pleasant symbols but not only "ABC" and "123", you have to tweak your existing fonts with manually added glypths. Sad, but true. BTW, why ever, many fonts contain three symbols for different types of snowflakes: ???. Maybe different types of snowflakes are more essential for mankind that geometric tolerances.
Apropos snowflakes. As I am already disappointed and frustrated now, let's invest some time in Sisyphos work: Winter is back in the black forest, so I'll grab my snow shovel and try to dig for my car. I am rather sure that I left it somewhere here this morning...