Skip to main content

Java Secrets Revealed #1

Posted by enicholas on April 29, 2008 at 12:19 PM PDT

I know, I know, it's been far too long since I've made an entry. My younger son is ten months old now, so I suppose I should probably stop using "new baby" as an excuse for my laziness...

Ahem.

Before I joined Sun, I thought I knew a lot about Java. I had been using it for a decade and had dug into its innards more times than I could count. Anytime I ran into inexplicable Swing weirdness or whatnot I wouldn't hesitate to dive into the JRE's source code and study it, or even recompile the classes with my own diagnostic code added. I wrote my own classloaders, I manipulated bytecode on the fly, I even wrote my own compiler for a JVM-targeted language. I had earned the right to call myself a guru.

Or so I thought.

Joining Sun nearly two years ago was a humbling experience. You see, it turns out that knowing a lot about Java works as a third-party developer is very different than, say, having to figure out how to rip the JRE apart and reassemble it on the fly without running programs noticing (Java Kernel, for the uninitiated). I have had to learn more about Java's inner workings than I ever really wanted to know, and maybe you'll find some of it interesting. Towards that end I'm going to pick a couple of random topics to blather about here, with the intent of hopefully making this a semi-regular feature.

Why can Java Web Start specify JRE versions, but the Java Plug-In can't?

If you have worked with both JNLP programs and applets, you are no doubt aware of the incongruities. JNLP programs can specify which JRE version they need to run with, their memory settings, command-line arguments, and so forth. Applets, on the other hand, are stuck with whichever JRE is registered with the web browser, and have no control over any JRE settings. (JRE Settings can be changed via the Java Control Panel, but cannot be specified by or for individual applets.)

The limitation arises because the JRE which handles applets runs inside the web browser. It lives within the browser process and address space, and as far as the OS is concerned is merely another chunk of the browser's code, just as with any other plug-in. And you can't simply load more than one JRE into the same OS process, because they would have conflicting symbol definitions, entry points, and so forth. It would be like trying to boot two different operating systems on the same computer, without the benefit of (very sophisticated) tools like VMWare.

To fix this, you've got to run the JRE in a separate process, but have the applets appear within the web browser window. This, of course, introduces all sorts of challenges and requires some clever engineering, but fortunately people smarter than me were assigned to the task. A group led by Ken Russell has done just that, resulting in what is officially (and wordily) named Next-Generation Javaâ„¢ Plug-In Technology.

The new plug-in behaves much more like Web Start, in that you can use JNLP files to specify JRE versions, memory settings, and command line arguments. It's smart enough to consolidate multiple applets into the same JRE if their settings are compatible, or spawn additional JREs as needed to make everyone happy. It also has some extremely cool tricks up its metaphorical sleeve which we will be revealing at JavaOne.

What is Class Data Sharing?

Prior to joining Sun, I had read a paragraph about Class Data Sharing somewhere, but didn't know much about it. Since then I have found that pretty much nobody outside of Sun seems to know anything about it either. That's a shame, because it's actually quite neat.

One of the JRE's biggest jobs when booting up is classloading. Hundreds and hundreds of classes are needed just to get the JRE up and running, and not just the obvious ones like Class, Object, and String. You're also going to need URL and its entourage (for URLClassLoader), PrintStream and related I/O classes (for System.out and System.err), lots of different collection and utility classes, reflection support, charset support, and hundreds more.

There are two huge drawbacks to this: first, the JVM has to parse the Java class file for each of these classes, as well as resolve and link the symbols, and (for commonly-used methods) compile the methods using HotSpot. And, of course, all of this work happens every time the JRE starts up. Second, because each individual JRE is parsing and possibly compiling the code independently, they all end up with their own independent copies of the resulting memory structures.

To combat this problem, Java 5 introduced a new feature called Class Data Sharing. The idea is that the JRE does all of the basic classloading and parsing just once, and stores the resulting memory structures in a file (bin/<jvm>/classes.jsa, with jsa standing for Java Shared Archive). The next time the JRE boots, it simply maps this file into memory, and can skip all of the messy classloading. In addition to performance, another benefit is the fact that a big chunk of the mapped bytes can be shared by all running JREs, so they do not each need an independent copy of all of the code.

Of course, as with everything the devil is in the details. Some of the classes in the archive perform initialization which isn't guaranteed to alway be the same (the AWT classes, for example, will do different things depending upon your display configuration), and I'm told that there are enough such cases that the feature was a lot trickier to implement than it might sound. Plus you've got to detect the cases where the rt.jar file has been modified, or the boot class path has been overridden, or something else has changed which makes the inherent assumptions burned into the classes.jsa file incorrect, so that class data sharing can be disabled for that particular JRE invocation.

If you use diff or a similar tool to compare JRE directories from various machines running the same JRE version, you'll most likely find that the classes.jsa files, and only those files, are different. That's because classes.jsa is actually generated on your machine, instead of packaged with the installer. One of the last things the JRE installer does is run the magic incantation java -Xshare:dump, which causes the shared archive to be generated. That way we don't have to increase the size of the installer further, and I don't know for sure but I suspect that some aspects of the file may be machine-dependent which would necessitate this approach anyway.

Until next time...

Hopefully that little look inside wasn't too boring. Provided anyone is interested, I'll continue to share tidbits about the inner workings of Java in future installments. Unless of course I forget, or get sidetracked...

Related Topics >>

Comments

I am interested too, will come back for more :-)

I see your point. I was thinking, a little narrowly, that there were certain vulnerabilities in modifying rt.jar that were not exposed before that are now with the cache file. However, it is all moot if any of the executables, etc are modded. So, in practical terms it is meaningless. Thanks for the thought experiment.

whartung is correct. The security model of any system inherently assumes that the system itself can be trusted. If you modify your operating system, for example, then it's trivial to add a backdoor to bypass whatever security checks it might perform. If you modify the JRE, you can bypass the JRE's verifier and security manager just as easily.

Once this assumption is broken -- the operating system or JRE (which is basically an operating system itself) has been arbitrarily modified -- security is gone. Not reduced, not compromised, just gone. The attacker might have disabled the verifier, defanged the SecurityManager, or even removed it altogether. That's why the first and foremost goal of any security system should be to keep you from modifying the system itself, because the second you can do that, you can completely disable it.

Because of that, the JRE's security model only considers the case of verified bytecode running on an unmodified JRE. To answer the question of "why do we need a verifier", it's because without a verifier bytecode is just as dangerous as native code -- you can treat an int as a pointer and write to arbitrary memory locations, such as on top of the JRE's own code. And the second you allow the JRE to be modified, there's no possible way to stop the attacker from bypassing any other security countermeasures we could devise.

Looks like this is only available on the server JVM: % sudo java -Xshare:dump [sudo] password for kohsuke: Error occurred during initialization of VM Dumping a shared archive is not supported on the Server JVM.

whartung: One additional thing, I agree with your point that the security subsystem is designed to protect a running jvm. However, since this is a memory map that is loaded directly into memory, modifying that map effects future runs of that jvm as if it were running.

whartung: I don't disagree with what you are saying, but to be devil's advocate... Since the verifier handles stack overflows and underflows, would this make it possible to introduce buffer overflow attacks into a jvm? Or since the verifier is responsible for enforcing method visibility, this could introduce the possibility of overriding a method in Policy,ProtectionDomain, ClassLoader, etc that is responsible for enforcing the configured security options (policy file). Again, I agree that this is nitpicking, but it is a change. That change should, at least, be understood. It seems that we should just be aware that the cache file needs to be at least as secure as the policy file.

The security model is mostly geared toward loading "unsafe" code in to a running JVM, rather than necessarily protecting jar files on the file system.

The simple fact that the person that creates a JVM has all sorts of powers to override and disable security points to that. Also, for example, JNI native code is implicitly unmanaged, and unsecure, yet we use that all day long on every JVM. So, it's not that this problem is "new".

Hmmm....I understand and agree with your point, but doesn't that beg a question about why we need a verifier in the first place? If you reduce all java's security layers to file system security, then does java offer any security advantages over other vms/interpreted environments? At the very least, it would seem to make the idea of allowing user libraries to use this cache a very bad idea. Just a thought. Thanx for all of the hard work. LES

hi ethan i'm very keen to know a simple example of manipulating java bytecode on the fly, could you possibly show me some pointers?

woongiap: Take a look at the Byte Code Engineering Library. It allows you to decode / modify / create class files, and by combining BCEL with a ClassLoader you can perform on-the-fly byte code magic. One neat trick, for example, is a ClassLoader which makes classes out of thin air when required...

lstroud: Excellent question. Yes, classes.jsa bypasses the verifier, and yes that means security is a concern. However, this is true of any and all executable code on your system.

Basically, modifying any executable code -- be it notepad.exe, firefox.exe, rt.jar, or classes.jsa -- gives an attacker the ability to plant trojans. What's more, in a sanely-configured system executable code will only be modifiable by an administrator / root / whatever else the OS calls it, so if someone can modify the code it means that they already have root and therefore already own your machine, and security is already out the window.

The only proven, safe way to keep your system secure is to ensure that only administrators can modify executable code, and to keep the window of opportunity as small as possible by only elevating to administrator privilege level when you absolutely need to, relinquishing the privileges immediately when you no longer need them. Unix, Mac OS X, and Windows Vista all follow this security model; older versions of Windows are inherently unsafe because (in the real world, at least) everybody just runs as administrator all the time.

So, classes.jsa should definitely be kept secure, but it is no more a threat than any other chunk of code on your system.

oops...i can't spell:) tojans == trojans

I'm interested - keep sharing

very interesting article. will come back for more...

Out of curiosity, does the caching of the classes into memory bypass the verifier? Does this mean that the classes.jsa file needs to be the most ultra secure file on your machine to prevent tojans from being dumped in?

Cool stuff, Ethan!
You could probably evolve this into something like Covert Java, tricks from the underground :)
Cheers! -DW

Neat that rt.jar is cached. However, I often wondered how come an application (say NetBeans) isn't cached the same way. On .NET, they can do this at application level and that means the next time the application is started, there is no need for compiling and very little verification is needed. I expect that could go on with Java apps too; keep the compiled binaries in a cache associated with a hash of local environment (CPU count, type etc.). Then if the environment hash is the same upon next startup, use the cache, if not, proceed normally and build a new cache?!

Sun was supposed to open up Class Data Sharing to the outside world and allow applications to cache their own data. This never ended up happening in the end. Surely there is a point for doing this, especially on the server end with application servers that take a while to load.

I personally am quite content to focus on the client-end instead (update 10 rocks!) but I just wanted to mention this in passing.