The Source for Java Technology Collaboration
User: Password:



Ethan Nicholas's Blog

J2SE Archives


Java Secrets Revealed #1

Posted by enicholas on April 29, 2008 at 12:19 PM | Permalink | Comments (18)

I know, I know, it's been far too long since I've made an entry. My younger son is ten months old now, so I suppose I should probably stop using "new baby" as an excuse for my laziness...

Ahem.

Before I joined Sun, I thought I knew a lot about Java. I had been using it for a decade and had dug into its innards more times than I could count. Anytime I ran into inexplicable Swing weirdness or whatnot I wouldn't hesitate to dive into the JRE's source code and study it, or even recompile the classes with my own diagnostic code added. I wrote my own classloaders, I manipulated bytecode on the fly, I even wrote my own compiler for a JVM-targeted language. I had earned the right to call myself a guru.

Or so I thought.

Joining Sun nearly two years ago was a humbling experience. You see, it turns out that knowing a lot about Java works as a third-party developer is very different than, say, having to figure out how to rip the JRE apart and reassemble it on the fly without running programs noticing (Java Kernel, for the uninitiated). I have had to learn more about Java's inner workings than I ever really wanted to know, and maybe you'll find some of it interesting. Towards that end I'm going to pick a couple of random topics to blather about here, with the intent of hopefully making this a semi-regular feature.

Why can Java Web Start specify JRE versions, but the Java Plug-In can't?

If you have worked with both JNLP programs and applets, you are no doubt aware of the incongruities. JNLP programs can specify which JRE version they need to run with, their memory settings, command-line arguments, and so forth. Applets, on the other hand, are stuck with whichever JRE is registered with the web browser, and have no control over any JRE settings. (JRE Settings can be changed via the Java Control Panel, but cannot be specified by or for individual applets.)

The limitation arises because the JRE which handles applets runs inside the web browser. It lives within the browser process and address space, and as far as the OS is concerned is merely another chunk of the browser's code, just as with any other plug-in. And you can't simply load more than one JRE into the same OS process, because they would have conflicting symbol definitions, entry points, and so forth. It would be like trying to boot two different operating systems on the same computer, without the benefit of (very sophisticated) tools like VMWare.

To fix this, you've got to run the JRE in a separate process, but have the applets appear within the web browser window. This, of course, introduces all sorts of challenges and requires some clever engineering, but fortunately people smarter than me were assigned to the task. A group led by Ken Russell has done just that, resulting in what is officially (and wordily) named Next-Generation Java™ Plug-In Technology.

The new plug-in behaves much more like Web Start, in that you can use JNLP files to specify JRE versions, memory settings, and command line arguments. It's smart enough to consolidate multiple applets into the same JRE if their settings are compatible, or spawn additional JREs as needed to make everyone happy. It also has some extremely cool tricks up its metaphorical sleeve which we will be revealing at JavaOne.

What is Class Data Sharing?

Prior to joining Sun, I had read a paragraph about Class Data Sharing somewhere, but didn't know much about it. Since then I have found that pretty much nobody outside of Sun seems to know anything about it either. That's a shame, because it's actually quite neat.

One of the JRE's biggest jobs when booting up is classloading. Hundreds and hundreds of classes are needed just to get the JRE up and running, and not just the obvious ones like Class, Object, and String. You're also going to need URL and its entourage (for URLClassLoader), PrintStream and related I/O classes (for System.out and System.err), lots of different collection and utility classes, reflection support, charset support, and hundreds more.

There are two huge drawbacks to this: first, the JVM has to parse the Java class file for each of these classes, as well as resolve and link the symbols, and (for commonly-used methods) compile the methods using HotSpot. And, of course, all of this work happens every time the JRE starts up. Second, because each individual JRE is parsing and possibly compiling the code independently, they all end up with their own independent copies of the resulting memory structures.

To combat this problem, Java 5 introduced a new feature called Class Data Sharing. The idea is that the JRE does all of the basic classloading and parsing just once, and stores the resulting memory structures in a file (bin/<jvm>/classes.jsa, with jsa standing for Java Shared Archive). The next time the JRE boots, it simply maps this file into memory, and can skip all of the messy classloading. In addition to performance, another benefit is the fact that a big chunk of the mapped bytes can be shared by all running JREs, so they do not each need an independent copy of all of the code.

Of course, as with everything the devil is in the details. Some of the classes in the archive perform initialization which isn't guaranteed to alway be the same (the AWT classes, for example, will do different things depending upon your display configuration), and I'm told that there are enough such cases that the feature was a lot trickier to implement than it might sound. Plus you've got to detect the cases where the rt.jar file has been modified, or the boot class path has been overridden, or something else has changed which makes the inherent assumptions burned into the classes.jsa file incorrect, so that class data sharing can be disabled for that particular JRE invocation.

If you use diff or a similar tool to compare JRE directories from various machines running the same JRE version, you'll most likely find that the classes.jsa files, and only those files, are different. That's because classes.jsa is actually generated on your machine, instead of packaged with the installer. One of the last things the JRE installer does is run the magic incantation java -Xshare:dump, which causes the shared archive to be generated. That way we don't have to increase the size of the installer further, and I don't know for sure but I suspect that some aspects of the file may be machine-dependent which would necessitate this approach anyway.

Until next time...

Hopefully that little look inside wasn't too boring. Provided anyone is interested, I'll continue to share tidbits about the inner workings of Java in future installments. Unless of course I forget, or get sidetracked...



Java Kernel Unmasked

Posted by enicholas on May 24, 2007 at 08:40 AM | Permalink | Comments (35)

In my last entry, I briefly introduced the major features of the upcoming Consumer JRE. I'd like to now go into details on my pet project, code-named Java Kernel.

Overview

As previously mentioned, the idea is to create a 'minimal' JRE which has enough code to run System.out.println("Hello world!") and... well, that's about it. Every class or native library that isn't strictly necessary to boot up the JVM is excluded.

This minimal JRE has a few tricks up its sleeve, of course. It can detect when you try to access a class, such as javax.swing.JFrame, which isn't currently installed. It will then go download and install a "bundle" containing the required functionality. As far as your program can tell, nothing unusual happened -- it requested javax.swing.JFrame, it got javax.swing.JFrame. The only real difference is that (due to the required download) the classload took longer than usual.

User Interface

Naturally, we display a progress dialog for any downloads taking a meaningful amount of time. If you use a freshly-installed Kernel JRE to run a Java program, you'll see a dialog telling you that a few components are being downloaded, and then the program window will pop up and life will continue as normal.

You usually won't see any other progress dialogs -- most programs download everything they need before the main window shows up. Even with the ones that don't, Swing and AWT are by far the biggest bundles you will end up downloading, and both of them will be there before the main window appears. The other bundles are mostly quite small and won't involve an objectionable delay (and, of course, if the delay is short enough we don't pop up a dialog at all).

Other than this, the Kernel JRE looks and feels exactly like any other JRE.

Bundles

The Kernel JRE is currently divided into a hundred or so different bundles. These bundles generally follow package boundaries -- if you touch any class in (say) java.rmi, the entire java.rmi package will be downloaded. This means you'll end up downloading more classes than strictly necessary to run your program, but the alternative, downloading classes one-by-one, would be ridiculously slow due to all of the individual HTTP requests involved. We are trying to strike the proper balance between reducing the number of bytes downloaded and reducing the number of HTTP requests made.

Some bundles involve more than one package. javax.swing, for example, is entirely useless without javax.swing.event and several other packages. Since they are so tightly interconnected, they are packaged together into a single bundle. A few bundles don't cleanly follow package lines. In java.awt, for example, it makes sense to separate out the subset of AWT used by Swing programs. A Swing program isn't likely to touch AWT components like java.awt.Button, so we have a separate bundle (internally named java_awt_core) which includes only the AWT classes that a typical Swing program would use.

Still not small enough...

We've got other space-saving tricks, as well. Take a look at one of the core, absolutely essential files in Java 6: jvm.dll. This is (obviously) the JVM itself, needed to run all Java code. It's 2.3MB. And that doesn't include any classes, launchers, the installer, the Java Plug-In, Java Web Start, or any of the other essential JRE features. When you're trying to deliver an entire JRE in under 2MB, the fact that one of the required files is 2.3MB puts you at a pretty severe disadvantage.

Compression helps, obviously, but it takes more than a good compressor to squeeze things down this small. Java Kernel has its own version of jvm.dll, which omits a lot of optional features like JVMTI and additional garbage collectors. The current prototype's jvm.dll is a much more svelte 1.1MB. And when the Kernel JRE finishes downloading itself in the background, it will swap in the good old full client JVM, so you won't be without these optional features for long.

Background Downloading

The Kernel JRE will continue to download its missing bundles in the background, whether they were specifically requested or not. Over a broadband connection, this will only take a couple of minutes, so the window of time during which you might run into missing bundles is brief.

After the last bundle is downloaded, the Kernel JRE will reassemble itself into an exact replica of the "normal" JRE. All of the disparate bundles will be repackaged into a unified rt.jar file, the Kernel JVM mentioned above will be replaced with the traditional client JVM, and so forth. A "finished" Kernel JRE will be byte-for-byte identical to a "normal" offline JRE.

But what if I want to pre-download everything I need?

The single most frequently asked question is "Can I force the Kernel JRE to go ahead and download everything I need, so that there are no pauses or download progress dialogs while my program is running?"

I mentioned during my JavaOne session that we were well aware of the need for this, and working on a solution, but that we weren't ready to discuss it yet. I'm pleased to announce that the plans for this have been finalized (well, as final as anything gets in the software industry...) and I can reveal them now.

The JDK will include a tool which allows you to assemble a "custom bundle" containing all of the classes and files needed by your particular program. You determine the entire set of JRE classes needed by your program (for instance by running java -verbose or by using a static analyzer) and then use this list to create the bundle.

(Command names and options likely to change)

> java -verbose -jar MyProgram.jar > class_list.txt
> jkernel -create custom_bundle.zip -classes class_list.txt

You can then install this bundle into a freshly installed Kernel JRE:

> jkernel -install custom_bundle.zip

You can run the jkernel -install command as part of your program's installation or startup. With a custom bundle installed, you can rely upon the absolute minimum set of classes and files needed to support your program, and thus get the smallest possible download size.

This isn't yet optimal for applets or web start programs, as (unlike standalone programs) they don't have the ability to install the bundle before they start to execute, and thus before any bundles are automatically downloaded. Ideally I'd like the ability to simply specify "And my program needs this custom bundle, also" in the applet tag or JNLP file somewhere -- the only question is whether we'll be able to get this into the first release or not.

Results

Remember how the Java 6 jvm.dll is 2.3MB by itself?

The Kernel JRE's installer includes jvm.dll, the other native files and hundreds of classes needed to boot the JVM, the Java Plug-In, Java Web Start, java.exe, javaw.exe, javaws.exe, the installation code, and various support libraries needed to support the installer (such as unpack200).

And it's only 1.9MB.

If you build a custom bundle containing the classes required to run a typical Swing program, it comes out to about 1.5MB, for a total download of around 3.4MB for the JRE + custom bundle. Bigger programs might use as much as 4MB-5MB of the total JRE size, but it would be rare to exceed that.

Compared to the current JRE's size of somewhere between 10MB and 15MB, depending on how you measure it, hopefully you will agree that this is quite an improvement.

So, I'm sure you've got lots of questions for me. Shoot.



Announcing the Consumer JRE (again!)

Posted by enicholas on May 17, 2007 at 12:51 PM | Permalink | Comments (24)

When Steve Jobs announced the iPhone at MacWorld, Mac fans were understandably upset that no other announcements were made. There was nary a mention of Macs, Mac OS X, or iPods -- and disgruntled fans pointed to this as evidence that Apple was ignoring these products.

A few of the saner voices in the audience took the stance that since nothing could possibly have competed with the iPhone announcement, there was no point in Apple even trying to talk about anything else until the iPhone furor died down. After seeing what happened at JavaOne, I'm inclined to agree with this particular theory.

The "iPhone effect" has struck again -- only this time it's the "JavaFX effect". We announced a bunch of exciting things at JavaOne 2007, but the news of JavaFX has inspired so much coverage and discussion that it's hard for anything else to get any press time.

The other big announcement, the one you might not have seen much (or any) coverage of, was the Consumer JRE. The Consumer JRE is a release of Java 6 targeted at making the end-user experience better, meaning smaller downloads, faster installs, better graphics performance, smoother installation, faster startup, better reliability, and a bunch of other nice enhancements.

The best part is that I've seen several references to a "rumored" Consumer JRE release. Considering that we publicly announced the Consumer JRE in front of thousands of developers, I think we can safely move this particular 'rumor' into the "confirmed" column.

In case you missed my JavaOne session about the Consumer JRE, here's what you should know:

  • The Consumer JRE will be a Java 6 update release delivered in the first half of 2008.
  • It features performance and usability enhancements geared towards easier, better, faster end-user distribution.
  • Will include the Java Technology Deployment Toolkit, a suite of technologies enabling much simpler JRE detection and installation.
  • The JRE is being modularized, so that bits and pieces of it can be downloaded as needed. In the current prototype, the download needed to support a typical Swing program is between 3 and 4MB.
  • Java Quick Start Service will pre-load portions of the JRE into the system disk cache, substantially decreasing the average start-up time.
  • A new and improved installer will streamline, simplify, and speed up the installation process.
  • Future updates will be delivered in-place -- you will no longer inadvertently end up with fifteen different versions of the JRE on your system.
  • Some of these features may be delivered sooner than others.

I'll go into more details about these specific features in the near future. But for now, at least be aware that the Consumer JRE is anything but a rumor.

Also, take note of this poll. Despite the dearth of coverage, it sounds like at least some folks caught the announcement.



What you should know about Secure Static Versioning

Posted by enicholas on October 06, 2006 at 06:45 AM | Permalink | Comments (2)

Sun recently released a Java security advisory titled Java Plug-in and Java Web Start May Allow Applets and Applications to Run With Unpatched JRE. As with any security advisory, it's important that you take note and ensure that your software is up-to-date -- in this case, the problem is fixed in Java 1.5.0_06 or higher. But a blog entry by the Washington Post's Brian Krebs raises some concerns about this particular security advisory and suggests that merely patching your machine may not be enough.

The security advisory admittedly wasn't very clear on this point, but if you have Java 1.5.0_06 or higher installed (which I imagine most of you do by now), you are in no danger from this issue, even if you also have older versions of Java on your system. While that's obviously good news, the fix for this issue may have an impact on some Java developers. If you deploy Java applets or unsigned Web Start programs, keep reading.

Java has always allowed you to keep multiple versions installed on the same computer at the same time. Whether you love this particular feature or hate it, there's no denying that it can be useful at times. Certainly as a developer I appreciate being able to have Java 5 and Java 6 on the same machine at the same time, and I know enterprises prefer to be able to certify their internal software against exactly one version of Java without worrying about employees accidentally changing the Java version out from under them.

But as with all significant decisions, there is no single right answer. What's right for developers and enterprises isn't necessarily as great in the consumer market, where users would generally prefer not to leave old versions of the software lying around after an upgrade. Unfortunately we can't just suddenly start removing older versions -- because Java has always worked this way, many programs depend on this behavior. They may take the Java VM's location and "burn it in" to their launch scripts or registry entries, and removing the specific version on which a program depends would then cause it to break. Various ideas on breaking out of this vicious cycle have been tossed around, but for the time being, this is just the way things work.

Unfortunately, it also led to a security hole (again, fixed in Java 1.5.0_06).

Sun has always responded promptly to any security problems with Java, quickly releasing a patched version which fixes the problem. But since the previous unpatched and unsafe versions have been left on your hard drive, a malicious program could potentially get access to them. Theoretically, a Java applet could have requested a specific older version of Java and used a known (and long-since patched) security hole in it to compromise your system, even though you had dutifully downloaded the security update. Clearly, this was a situation that had to be addressed.

The solution is what is known internally as Secure Static Versioning. Basically it boils down to "untrusted code can't use unpatched older versions of Java". Previously, if you had several versions of Java installed on your system, a program could request one of the older versions in order to exploit it. Now, no matter which version of Java an untrusted program requests, it will receive the highest installed version. The practical upshot of this is that untrusted code will always be run in the most secure Java environment available.

Signed, trusted Web Start programs are allowed to request whatever version of Java they want -- they're trusted, so they have full access to your system anyway. Applets (even if signed) will always be run in the latest version. The discrepancy is due to technical differences between Web Start and the Plug-In; I'll go into the details in a future entry.

With the quick overview out of the way, let me sum up in an FAQ:


1. How much danger am I in?

If you have the latest JRE (1.5.0_06 or later) installed on your system, none at all. All current JREs include the Secure Static Versioning change, so applets and sandboxed Java Web Start programs are unable to access older versions of Java.

If you don't have Java 1.5.0_06 or higher, you would be wise to go ahead and install the latest version. I'm not aware of any exploits "in the wild", but it certainly doesn't hurt to be safe.

2. Do I need to remove old versions of Java?

No. Applets and sandboxed Java Web Start programs aren't allowed to access older versions of Java. They don't pose any threat in the first place, so removing them won't change anything. If you do decide to remove older versions of Java, you need to be sure that none of the applications on your computer are using them first.

3. Do I have to change anything about my Java programs?

No. If you have a Java applet or sandboxed Java Web Start program, it will always run in the latest available version of Java without any changes on your part. As long as you test your software against newer versions of Java, you shouldn't encounter any problems.

4. What does this mean for standalone Java programs?

Nothing. Just like native programs, Java programs which run outside of the Java sandbox already have full access to your system. Requiring them to run in the latest version of Java wouldn't do anything to increase security.


So, that's that. What might have sounded like a scary security issue (OMG YOU NEED TO UNINSTALL JAVA!!!!11!!!1) turns out to be quite mundane: install the latest patch release of Java and you're completely safe. This is of course great advice no matter what software we're discussing -- you are running with all of the latest patches for your operating system, web browser, Flash Player, and everything else on your system, aren't you?

Oh, and while we're on the subject of updates...

If you update using Java Update, rather than downloading via java.sun.com, you may find that you don't end up with the very latest update release. That's normal, and it's a result of the fact that Sun distinguishes the latest "consumer version" of Java from the latest "developer / enterprise version". If you go to java.sun.com, you get the latest developer / enterprise version, which is the very latest and greatest code available. If you go to www.java.com or use Java Update, you end up with the latest consumer version, which might be a couple of update releases behind the developer version (but is otherwise identical).

In order to avoid swamping end-users with too many updates, only two or three versions of Java a year are made available as consumer versions. Naturally, security fixes (such as the one under discussion here) always warrant a new consumer version, so you're safe no matter how you choose to update your copy of Java.



"Java Browser Edition": New name, first steps

Posted by enicholas on September 06, 2006 at 01:28 PM | Permalink | Comments (76)

"Java Browser Edition" History

Some time ago I proposed the idea of a Java Browser Edition. The basic idea was that the current Java Runtime Environment is simply too big, and most programs require only a small subset of the functionality. The "browser edition" that I suggested would enable you to install exactly the subset of Java that your particular program required, but would be able to download all of the other functionality on demand (and thus be fully compatible with J2SE). I wasn't quite prepared for the response that this entry generated. In addition to generating a lot of comments and further discussions, it ultimately played a role in my getting hired by the deployment team at Sun.

I was cautioned by several folks at Sun that the Browser Edition would simply never happen. It would never be approved as a feature in the first place, and even if it were approved, we would never be able to actually pull it off. I'm told that this basic idea has actually been attempted within Sun twice before, and in both cases the resulting size reduction wasn't enough to be worthwhile. The core VM, it seems, is simply too big, and trying to make it smaller is too hard. There has even been a detailed analysis of the idea which paints a rather bleak picture of the potential gains.

New Name: Java Kernel

The feature did in fact get submitted as a proposal for Java 7, under the name "Java Kernel" (the idea being that you download a small "kernel" of Java functionality, which is in turn capable of downloading the rest of it). And, amazingly enough, it was accepted. And, lucky me, I'm part of the team responsible for implementing it. After having been told that it's been tried a couple of times before and that it's basically impossible -- not a situation which inspires tremendous confidence.

Building a minimal JRE

The first thing I have to do is establish that this project is feasible. Remember that even though it has been approved, it could alway be un-approved (disapproved?) at any point in the future if things aren't looking good. So I figured I would start out by creating a simple, stripped-down JRE installer that contained only the functionality necessary to run System.out.println("Hello world!"), to get an estimate of the size reduction we could expect. Stripping out the unecessary classes is easy -- you just run Java with the -verbose option to get a list of all of the classes it loads while running the Hello World program. Those classes are all that we need to include in rt.jar.

The real problem is the rest of the functionality. My devel build of the Java 6 JRE contains 683 files totalling 119MB. Many of them are not necessary to run Hello World, but which ones? Determining which files were truly necessary and which weren't could be a tough job, so I made my computer determine it for me. I wrote a simple program which would iterate through all of the files in the JRE. It would remove a file and then attempt to run the Hello World program using this stripped-down JRE. If the test succeeded, the file was evidently unnecessary. If the test failed, the file was deemed necessary and restored.

After going through all of the files in this fashion, I was left with an extremely minimal JRE that could run Hello World and... well, that's about it. But it at least provided a starting point. Building a working installer from this JRE was itself a challenge, because several of the files that weren't necessary to run Hello World were still necessary to successful install the JRE, but I persevered and now have a fully functioning, minimal JRE.

Results

I built two JREs using this methodology: one with a program that prints "Hello World" to System.out, and one with a program that displays an empty java.awt.Frame. Here are the results:

Java 6 Runtime Environment:15.5MB
"Hello World" JRE:2.6MB
java.awt.Frame JRE:3.5MB

Things to note

Before you get excited, remember that this is just an experiment and that the JREs I built aren't the least bit useful. They don't include the Java Plug-In, Web Start, or indeed much of anything, and any "real" program will need at least some of these components. These JREs also do not have the ability to download the missing components, and will simply fail if an attempt is made to access missing functionality. The installers we ultimately ship with Java 7 may well be bigger than this.

Next steps

Despite the cautions above, I find these results extremely exciting. Keep in mind that so far we haven't done anything the least bit sophisticated -- just omitted unnecessary files and classes -- and we've already gotten the JRE below 3MB for a non-visual program. Classes compress extremely well, so this installer would stay under 3MB mark even with a lot of additional classes included. And there are still a lot of things we can do to improve the size further, such as break up big DLLs to get better granularity.

It's hard to say how big the final Java 7 installers will end up being, but my personal goal is to make an installer that can handle basic Java applets in under 3MB. This is a difficult goal, and it may end up being too optimistic, but we're going to get as close as we can. So, what do you think? If the Java installer were 3MB instead of 15MB, would you find the idea of using Java applets (or Java Web Start programs) more appealing?



All about intern()

Posted by enicholas on June 26, 2006 at 02:16 PM | Permalink | Comments (13)

Strings are a fundamental part of any modern programming language, every bit as important as numbers. So you'd think that Java programmers would go out of their way to have a solid understanding of them -- and sadly, that isn't always the case.

I was going through the source code to Xerces (the XML parser included in Java) today, when I found a very surprising line:

com.sun.org.apache.xerces.internal.impl.XMLScanner:395
protected final static String fVersionSymbol = "version".intern();

There are a number of strings defined like this, and every one of them is being interned. So what exactly is intern()? Well, as you no doubt know, there are two different ways to compare objects in Java. You can use the == operator, or you can use the equals() method. The == operator compares whether two references point to the same object, whereas the equals() method compares whether two objects contain the same data.

One of the first lessons you learn in Java is that you should usually use equals(), not ==, to compare two strings. If you compare, say, new String("Hello") == new String("Hello"), you will in fact receive false, because they are two different string instances. If you use equals() instead, you will receive true, just as you'd expect. Unfortunately, the equals() method can be fairly slow, as it involves a character-by-character comparison of the strings.

Since the == method compares identity, all it has to do is compare two pointers to see if they are the same, and obviously it will be much faster than equals(). So if you're going to be comparing the same strings repeatedly, you can get a significant performance advantage by reducing it to an identity comparison rather than an equality comparison. The basic algorithm is:

1) Create a hash set of Strings
2) Check to see if the String you're dealing with is already in the set
3) If so, return the one from the set
4) Otherwise, add this string to the set and return it

After following this algorithm, you are guaranteed that if two strings contain the same characters, they are also the same instance. This means that you can safely compare strings using == rather than equals(), gaining a significant performance advantage with repeated comparisons.

Fortunately, Java already includes an implementation of the algorithm above. It's the intern() method on java.lang.String. new String("Hello").intern() == new String("Hello").intern() returns true, whereas without the intern() calls it returns false.

So why was I so surprised to see protected final static String fVersionSymbol = "version".intern(); in the Xerces source code? Obviously this string will be used for many comparisons, doesn't it make sense to intern it?

Sure it does. That's why Java already does it. All constant strings that appear in a class are automatically interned. This includes both your own constants (like the above "version" string) as well as other strings that are part of the class file format -- class names, method and field signatures, and so forth. It even extends to constant string expressions: "Hel" + "lo" is processed by javac exactly the same as "Hello", and "Hel" + "lo" == "Hello" will return true.

So the result of calling intern() on a constant string like "version" is by definition going to be the exact same string you passed in. "version" == "version".intern(), always. You only need to intern strings when they are not constants, and you want to be able to quickly compare them to other interned strings.

There can also be a memory advantage to interning strings -- you only keep one copy of the string's characters in memory, no matter how many times you refer to it. That's the main reason why class file constant strings are interned: think about how many classes refer to (say) java.lang.Object. The name of the class java.lang.Object has to appear in every single one of those classes, but thanks to the magic of intern(), it only appears in memory once.

The bottom line? intern() is a useful method and can make life easier -- but make sure that you're using it responsibly.



Understanding Weak References

Posted by enicholas on May 04, 2006 at 05:06 PM | Permalink | Comments (19)

Some time ago I was interviewing candidates for a Senior Java Engineer position. Among the many questions I asked was "What can you tell me about weak references?" I wasn't expecting a detailed technical treatise on the subject. I would probably have been satisfied with "Umm... don't they have something to do with garbage collection?" I was instead surprised to find that out of twenty-odd engineers, all of whom had at least five years of Java experience and good qualifications, only two of them even knew that weak references existed, and only one of those two had actual useful knowledge about them. I even explained a bit about them, to see if I got an "Oh yeah" from anybody -- nope. I'm not sure why this knowledge is (evidently) uncommon, as weak references are a massively useful feature which have been around since Java 1.2 was released, over seven years ago.

Now, I'm not suggesting you need to be a weak reference expert to qualify as a decent Java engineer. But I humbly submit that you should at least know what they are -- otherwise how will you know when you should be using them? Since they seem to be a little-known feature, here is a brief overview of what weak references are, how to use them, and when to use them.

Strong references

First I need to start with a refresher on strong references. A strong reference is an ordinary Java reference, the kind you use every day. For example, the code:

StringBuffer buffer = new StringBuffer();

creates a new StringBuffer() and stores a strong reference to it in the variable buffer. Yes, yes, this is kiddie stuff, but bear with me. The important part about strong references -- the part that makes them "strong" -- is how they interact with the garbage collector. Specifically, if an object is reachable via a chain of strong references (strongly reachable), it is not eligible for garbage collection. As you don't want the garbage collector destroying objects you're working on, this is normally exactly what you want.

When strong references are too strong

It's not uncommon for an application to use classes that it can't reasonably extend. The class might simply be marked final, or it could be something more complicated, such as an interface returned by a factory method backed by an unknown (and possibly even unknowable) number of concrete implementations. Suppose you have to use a class Widget and, for whatever reason, it isn't possible or practical to extend Widget to add new functionality.

What happens when you need to keep track of extra information about the object? In this case, suppose we find ourselves needing to keep track of each Widget's serial number, but the Widget class doesn't actually have a serial number property -- and because Widget isn't extensible, we can't add one. No problem at all, that's what HashMaps are for:

serialNumberMap.put(widget, widgetSerialNumber);

This might look okay on the surface, but the strong reference to widget will almost certainly cause problems. We have to know (with 100% certainty) when a particular Widget's serial number is no longer needed, so we can remove its entry from the map. Otherwise we're going to have a memory leak (if we don't remove Widgets when we should) or we're going to inexplicably find ourselves missing serial numbers (if we remove Widgets that we're still using). If these problems sound familiar, they should: they are exactly the problems that users of non-garbage-collected languages face when trying to manage memory, and we're not supposed to have to worry about this in a more civilized language like Java.

Another common problem with strong references is caching, particular with very large structures like images. Suppose you have an application which has to work with user-supplied images, like the web site design tool I work on. Naturally you want to cache these images, because loading them from disk is very expensive and you want to avoid the possibility of having two copies of the (potentially gigantic) image in memory at once.

Because an image cache is supposed to prevent us from reloading images when we don't absolutely need to, you will quickly realize that the cache should always contain a reference to any image which is already in memory. With ordinary strong references, though, that reference itself will force the image to remain in memory, which requires you (just as above) to somehow determine when the image is no longer needed in memory and remove it from the cache, so that it becomes eligible for garbage collection. Once again you are forced to duplicate the behavior of the garbage collector and manually determine whether or not an object should be in memory.

Weak references

A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself. You create a weak reference like this:

WeakReference<Widget> weakWidget = new WeakReference<Widget>(widget);

and then elsewhere in the code you can use weakWidget.get() to get the actual Widget object. Of course the weak reference isn't strong enough to prevent garbage collection, so you may find (if there are no strong references to the widget) that weakWidget.get() suddenly starts returning null.

To solve the "widget serial number" problem above, the easiest thing to do is use the built-in WeakHashMap class. WeakHashMap works exactly like HashMap, except that the keys (not the values!) are referred to using weak references. If a WeakHashMap key becomes garbage, its entry is removed automatically. This avoids the pitfalls I described and requires no changes other than the switch from HashMap to a WeakHashMap. If you're following the standard convention of referring to your maps via the Map interface, no other code needs to even be aware of the change.

Reference queues

Once a WeakReference starts returning null, the object it pointed to has become garbage and the WeakReference object is pretty much useless. This generally means that some sort of cleanup is required; WeakHashMap, for example, has to remove such defunct entries to avoid holding onto an ever-increasing number of dead WeakReferences.

The ReferenceQueue class makes it easy to keep track of dead references. If you pass a ReferenceQueue into a weak reference's constructor, the reference object will be automatically inserted into the reference queue when the object to which it pointed becomes garbage. You can then, at some regular interval, process the ReferenceQueue and perform whatever cleanup is needed for dead references.

Different degrees of weakness

Up to this point I've just been referring to "weak references", but there are actually four different degrees of reference strength: strong, soft, weak, and phantom, in order from strongest to weakest. We've already discussed strong and weak references, so let's take a look at the other two.

Soft references

A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.

SoftReferences aren't required to behave any differently than WeakReferences, but in practice softly reachable objects are generally retained as long as memory is in plentiful supply. This makes them an excellent foundation for a cache, such as the image cache described above, since you can let the garbage collector worry about both how reachable the objects are (a strongly reachable object will never be removed from the cache) and how badly it needs the memory they are consuming.

Phantom references

A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead. How is that different from WeakReference, though?

The difference is in exactly when the enqueuing happens. WeakReferences are enqueued as soon as the object to which they point becomes weakly reachable. This is before finalization or garbage collection has actually happened; in theory the object could even be "resurrected" by an unorthodox finalize() method, but the WeakReference would remain dead. PhantomReferences are enqueued only when the object is physically removed from memory, and the get() method always returns null specifically to prevent you from being able to "resurrect" an almost-dead object.

What good are PhantomReferences? I'm only aware of two serious cases for them: first, they allow you to determine exactly when an object was removed from memory. They are in fact the only way to determine that. This isn't generally that useful, but might come in handy in certain very specific circumstances like manipulating large images: if you know for sure that an image should be garbage collected, you can wait until it actually is before attempting to load the next image, and therefore make the dreaded OutOfMemoryError less likely.

Second, PhantomReferences avoid a fundamental problem with finalization: finalize() methods can "resurrect" objects by creating new strong references to them. So what, you say? Well, the problem is that an object which overrides finalize() must now be determined to be garbage in at least two separate garbage collection cycles in order to be collected. When the first cycle determines that it is garbage, it becomes eligible for finalization. Because of the (slim, but unfortunately real) possibility that the object was "resurrected" during finalization, the garbage collector has to run again before the object can actually be removed. And because finalization might not have happened in a timely fashion, an arbitrary number of garbage collection cycles might have happened while the object was waiting for finalization. This can mean serious delays in actually cleaning up garbage objects, and is why you can get OutOfMemoryErrors even when most of the heap is garbage.

With PhantomReference, this situation is impossible -- when a PhantomReference is enqueued, there is absolutely no way to get a pointer to the now-dead object (which is good, because it isn't in memory any longer). Because PhantomReference cannot be used to resurrect an object, the object can be instantly cleaned up during the first garbage collection cycle in which it is found to be phantomly reachable. You can then dispose whatever resources you need to at your convenience.

Arguably, the finalize() method should never have been provided in the first place. PhantomReferences are definitely safer and more efficient to use, and eliminating finalize() would have made parts of the VM considerably simpler. But, they're also more work to implement, so I confess to still using finalize() most of the time. The good news is that at least you have a choice.

Conclusion

I'm sure some of you are grumbling by now, as I'm talking about an API which is nearly a decade old and haven't said anything which hasn't been said before. While that's certainly true, in my experience many Java programmers really don't know very much (if anything) about weak references, and I felt that a refresher course was needed. Hopefully you at least learned a little something from this review.





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds