The Source for Java Technology Collaboration
User: Password:



Ben Galbraith's Blog

J2SE Archives


C# 3.0: Relational Language Operatings, Type Inference, and More

Posted by javaben on September 13, 2005 at 02:01 PM | Permalink | Comments (7)

I'm here at Microsoft's Professional Developer Conference (PDC), the Big Redmond's irregularly scheduled conference where they introduce new technology to developers. It's a pretty surreal experience. I'm using the only Mac as far as the eye can see. It's kind of like JavaOne in a parallel universe.

This morning, Bill Gates and Jim Allchin got up and talked about a lot of the old stuff we've been hearing about for years now: Avalon/WPF for graphics, Indigo/WCF for services, etc. For those not following Microsoft, these are really cool technologies, but... old news.

The real interesting action happened when Anders Hejlsberg and Don Box got up and demonstrated Microsoft's LINQ project. It turns out that in a future release of .NET (3.0), Microsoft will embed relational operators into the language itself. To understand what this means, check out this C# code:

public void Linq1() {
    int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

    var lowNums =
        from n in numbers
        where n < 5
        select n;

    Console.WriteLine("Numbers < 5:");
    foreach (var x in lowNums) {
        Console.WriteLine(x);
    }
}

Err, that's interesting. But hang on, it turns out you can mix and match this code with data from relational databases, all using the same operators. Check this out:

public void Linq1() {
    int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

    DataContext context = ...;  // don't worry about how I get a ref to this
    Table<myobject> myTable = context.GetTable<myobject>();

    var lowMyObjects =
        from n in numbers, o in myTable
        where n < 5 and o.MyNumber == n
        select o;

    Console.WriteLine("MyObjects whose number is less than five:");
    foreach (var o in lowMyObjects) {
        Console.WriteLine(o.MyName);
    }
}

Wow. I can seamlessly combine my relational data with my object data. This is just scratching the surface of LINQ (for example, XML is also a first-class type of C# 3); check it out. They've given out pre-release bits; I'll definitely be playing with this stuff.

Oh, did you happen to notice this code snippet in one of the earlier examples?

var lowNums = ...

This is not VB code. C# 3.0 introduces type inference. That's also pretty interesting. Some folks in have been asking why Java doesn't do this for years. That is, if an IDE can figure out how to automatically write the (Cast) operator, why can't the compiler?

I remember asking one of the compiler guys at JavaOne about why they don't introduce type inference in future versions of Java; his answer: "Sounds like you want to use a dynamic language." I'm sure glad some folks understand that you don't need to throw away strong typing to avoid writing the type of a variable over and over again, unnecessarily.

Check out some of the other new features, announced today and coming to some future release sometime in the distant future.

To be clear: this is not a "Run from Java and embrace .NET" post. Rather, I'm excited to see the Java space innovate to keep up with some of these and other really intriguing new C# features. It's fun to watch the game of Java/.NET leap frog play out...

(Check out www.ajaxian.com later today for a blog on some of the Atlas/Ajax stuff that Microsoft announced today...)

(Once again, cross-posted to Married... with Children)



XML, Readers, and Streams: A Cautionary Tale

Posted by javaben on September 03, 2005 at 10:47 PM | Permalink | Comments (5)

(Note: this entry is cross-posted on my personal blog site -- galbraiths.org/blog.)

If a system's glitches can be compared to fish, I want to tell you about my white whale.

A while back, I was working on a system feature that read in some XML from the filesystem, XSLT'd it into HTML, and served it up to a browser. The XML had a bunch of characters from the higher Unicode ranges (i.e., >255), and wouldn't you know, when viewed in a browser, these characters showed up as garbled data. Not "The Box"--that ugly little placeholder used when a font doesn't contain a character for a given code point--but usually one to three seemingly random characters that had nothing to do with the character that was supposed to be displayed.

Classic encoding problem.

For the uninitiated in character encodings, let me fill you in real quick. Disks store bytes, not characters. A byte is a numeric value between 0 and 255. To store characters on disks, a convention is used to map the numeric values of bytes to characters. In the early days of computing, we kept things simple and said that there could be no more than 256 different types of characters stored in files. Lately, we've taken to storing over 60,000 different types of characters. How do we represent that many values with just a byte?

Actually, that depends. An exceedingly large number of different conventions exist for mapping >256 characters to bytes. What all of these systems have in common is that multiple bytes are used to represent a single character. Two bytes can when used together represent 65,536 unique character types; with three bytes, bump that up to 16 million.

And therein lies the rub. Files don't indicate the encoding used within them. Indeed, there's no guarantee that the files store character values at all. The user must know what to expect within the file, and if its character data, they must know what encoding was used to store it.

Back to the story. I knew it was an encoding glitch; multiple characters showing up in place of one is a classic symptom (because multiple bytes represented the character, but the parser treated each byte as a unique character). I immediately assumed that the browser or the servlet (or the web framework on top of it) was to blame. I spent a lot of time educating myself on how encodings work over the web. I threw hours at the problem here and there and came up empty handed each time.

And then, whilst reading through some of the backend code, I saw this innocuous little line:

Document document = new SAXBuilder().build(new FileReader(file));

See the problem? Look again. Notice the FileReader? I'm such an idiot. Here's the deal. XML files can contain any of thousands of different Unicode characters and can use a bunch of different encodings to map those to bytes. The encoding used on a particular XML document is indicated in the prolog, such as:

<?xml version="1.1" encoding="UTF-8"?>

I don't really use XML 1.1; I just put that in to piss off Elliotte. ;-) Note the encoding. Now, back to our FileReader. Readers in Java are nice because they handle converting bytes into characters automatically. But in order to do that, they have to know what encoding was used on the bytes they are being handed. If you don't specify an encoding, a Reader will use the operating system's default encoding.

Ahhh, and there's our problem. PCs, Macs, *nix, they all use different encoding schemes by default, and they ain't UTF-8 (actually, on some *nixs it might be, I dunno). My XML files were UTF-8 encoded. So when I used a Reader to parse my XML file, the XML parser was misinterpreting many of my characters.

This is the code I should have written:

Document document = new SAXBuilder().build(new FileInputStream(file));

If you hand an XML parser bytes, which is the currency of InputStreams, the parser handles converting those bytes to characters itself, and uses the encoding in the XML prolog to configure itself for that process. If you hand it characters... it's stuck using those characters and can't affect the decoding process one whit, since it occurs a level beneath it.

It turns out this is a rather insidious bug. Because most encodings are the same in how they assign the characters mapped to byte values 0-255 (since the ASCII standard was so pervasive), and because those are by far the most common characters for most folks here in the United States, you can go a long way with character encoding bugs like this and never know any different. But the day you add a higher value character... weird things happen.

Learn from me. Spare yourself the pain of wrestling with this one yourself. Make me feel my time was well spent. Never, ever use a Reader to parse in an XML file. There's already a great system for letting the parser handle the decoding; let it.



Suddenly, it all makes sense...

Posted by javaben on May 12, 2005 at 07:54 AM | Permalink | Comments (4)

Kudos to Dion for being the first I've seen to point this out. Says he:

Interestingly, it seems like it was lead by good 'ole Geir Magnusson.

Maybe nothing was read into that when the proposal came out. But now, Geir is an IBMer.

With the acquisition of Gluecode, GM Jr. now works for IBM. Should Harmony succeed--and as Dion points out, IBM certainly has the IP for that to happen--what becomes Sun's role?

Eclipse is the dominant IDE, Harmony would be the dominant VM, Geronimo could be the dominant app server... the JCP and Sun's reference implementations could become somewhat irrelevant to a large swath of the community.

I'm not a conspiracy theorist, and this whole line of reasoning is probably just bunk, but... things are getting interesting. Oh, right, forgot, the name "Eclipse" was just a coincidence, had no intended meaning towards Sun... ;-)



Java in Your Stereo?

Posted by javaben on January 17, 2004 at 04:19 PM | Permalink | Comments (7)

For the past few weeks, I've been playing with MacSense's new HomePod device. The HomePod is a compact MP3 player with WiFi built-in, a scroll-wheel interface not unlike that of Apple's iPod, and peer-to-peer media streaming software developed by Gloo Labs.

Unfortunately, the HomePod user interface does not work like the iPod. After playing with it for a few days, the differences between the iPod and the HomePod really began to bother me and my wife. If the HomePod were like any of the other consumer electronic devices in our home, we'd just live with it. But in this case, I can take matters into my own hands: the HomePod runs Java!

I downloaded the source code to the HomePod, tweaked the interface code in a few hours, FTP'd the new class files to the device, rebooted, and shazam! The HomePod user interface now behaves just as the iPod. Cool! This is one of the first times I could take my career skills and actually do something useful around the house.

The HomePod isn't the first consumer device I've owned that allowed me to run Java on it -- I'm still recovering from the pains of MIDP 1.0 -- but it is the first Java device that I've enjoyed playing with. I think that enjoyment is related to the following factors:

  • The HomePod runs J2SE 1.3. I don't have to do HomePod development in a subset of the "real" Java that I'm used to working in (i.e., J2ME).
  • The HomePod exposes everything to developers. The HomePod's user interface, its networking code, the application that streams music from my servers to the HomePod, it's all in Java! How refreshing this is compared to today's Java-enabled cell phones which continue to expose a subset of handset functionality to Java developers. I can tweak this thing in every way imaginable. There are some components written in C (device drivers, audio codecs), but even in those cases, the source is still made available.

The advantage of owning a device I can modify became especially apparent to me when the Wall Street Journal's Walter Mossberg recently reviewed another WiFi streaming home MP3 player. While he generally gave the product high ratings, at the end of his review, he had a stinging criticism:

[This] system doesn't let you just select "All Tracks" and then play them randomly, as you can on an iPod portable player... [This feature's] omission is a real loss.
While I fixed the HomePod's lack of "iPod fidelty" in a hurry, if I had a closed-source media player like the one reviewed, I'd be stuck singing the blues with Walt.

After my glowing experience with the HomePod, you can bet I'll be shopping for more Java-powered open-source consumer devices in the future.





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds