 |
Why are there two of everything?
Posted by peterkessler on November 17, 2004 at 05:16 PM | Comments (4)
You might have noticed that in addition to the
Tiger source snapshots, we have just posted a
Mustang source snapshot under the
Java Research license.
So now we have two code lines being actively worked on:
we continue to find bugs in Tiger and fix them in update releases,
and we have on-going development in Mustang.
That's bound to be confusing.
I'm a HotSpot virtual machine engineer so I'm used to having
two (or more) code lines in progress. But if you are going
to rummage around in the HotSpot virtual machine sources
(under hotspot/src), I'd like to explain why we seem to have
two of a lot of things. I'll gradually work my way down through
the layers of this particular onion.
- We have to ship releases.
We have Tiger and Mustang (and all the other releases)
because we want to get stuff out into the hands of our users.
But we're never really "done", so development is continuous
while releases are periodic. What you see by looking
at the current Tiger and Mustang source snapshots is that early
in a release (Mustang) there isn't really that much difference
between from the previous release (Tiger). But new development
stopped on Tiger months ago. At any given time, we have (at
least) two releases in progress, one in active development,
the other(s) for bug fix updates.
- Backward compatibility.
Once you get into the sources for the virtual machine,
you'll sometimes find that we often have two implementations
of things. One reason for this is because we think backward
compatibility is really important. While we are working
on some new thing, we have to keep the old thing working,
and the easiest way to do that is to keep the old thing
around. While browsing through the sources, you'll find
a lot of code guarded by command-line switches you probably
didn't know about. (Look at all those command line switches
in
hotspot/src/share/runtime/globals.hpp!) Those are there
so we can do A-B comparisons for functionality, conformance,
performance, footprint, etc. Only when an new implementation
shows itself to be compatible and substantially better than
the old one do we throw the switch to use the new one. And
we usually leave the switch around for a release or two
in case someone wants to revert to the old behavior.
- Dessert topping or floor wax.
One of the problems with being a successful Java virtual
machine is that people want to use you for everything, even
things you didn't exactly anticipate. While the Java platform
might have burst on the scene as a way of executing content
for small applets in web browsers (and people still use it for
that), people now also use it for running gigantic high-throughput
applications on big multiprocessors. Of course, we want to
make everyone happy, but that often means having alternate
implementations inside the virtual machine. You can see this
in the choice of the client versus server runtime compiler:
the client runtime compiler gives good startup and modest
performance, while the server runtime compiler is not as fast
to start up, but the code it generates runs significantly faster.
You can't use them both at the same time (yet), but if you are
looking around the code base, you'll find both runtime compilers
in there.
- One size does not fit all.
In that same style of offering different qualities of service,
we offer something like 3 different garbage collection algorithms.
The concurrent mark sweep algorithm provides lower pause times at
some cost in performance, while the parallel collector offers
better performance with occasional longer pauses. We're not going
make that choice for our users. If you go looking for "the
garbage collector", you'll be disappointed (or maybe pleasantly
surprised) to find at least three of them in there.
- We learn, but slowly.
The HotSpot Java virtual machine is a work in progress.
Ideas that we had a while ago might have been appropriate for
then, but things change. So the source base changes too. We
learn things about the interactions of the different parts of
the virtual machine (basically: compiler(s), garbage collector(s),
and runtime system) and we try to clean up the code. In some
parts of the virtual machine, that means changing the interfaces,
but our desire not to be disruptive means we often leave the
old interface and implementation in place for a release or two.
For example, the older garbage collectors use a "generation
framework" that is extremely flexible, but has some overhead.
The newest collector uses a less flexible interface that is
more efficient. We won't gratuitously convert the older
collectors (we'd risk breaking things, for no benefit to you,
our customers), so you'll find both programming styles in there
if you look.
- It depends on your point of view.
Sometimes you'll be prowling through the source and come
across things that look to be two of the same thing. For example,
hotspot/src/share/oops/oopsHierarchy.hpp shows what appear to be
similar hierarchies for oops and klasses.
But those are not alternate implementations, or us evolving the
interface, or anything like that. They are two faces of
the virtual machine's view of the data structures used to
represent Java objects (and a few VM internal data structures).
Simplifying somewhat: an oop is the Java reference to an object,
whereas the klass is the way we manipulate that object from the
C++ code inside the virtual machine. That's an example of where
you have to be able to hold both ideas in your head at the same
time, instead of looking at only the one you think you are
interested in.
The HotSpot virtual machine is a collection of engineering
tradeoffs and compromises. As such you will often find more
than one way of doing things when you look through the sources.
I hope I've clarified some of the reasons for that. If not,
ask questions and I'll try to come up with answers.
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
Thanks for the hotspot insider information.
Now we have one big monolithic JRE that can be started as -client and -server. Will it make sense to create two separate JRE distributions for client and server?
Client JRE -> client JVM optimized for client-side + runtime classes needed for client
Server JRE -> server JVM optimized for server-side + runtime classes needed for server only (no ui, media stuff)
Posted by: sutanu on November 18, 2004 at 07:38 AM
-
If only it were that easy! We find lots of "client" customers who want the higher performance of the "server" runtime compiler, and lots of server applications want "client" features like faster startup, shorter garbage collection pauses, etc. The Java platform includes the full set of API's, so we're not about to subset those.
The direction we are headed is rather away from what you are suggesting. We are integrating the different qualities of service into one virtual machine, and then using command line flags, heuristics, ergonomics, or dynamic monitoring and management to choose and adjust the virtual machine at runtime. The idea is that the user shouldn't have to choose, say, -client or -server. They should say what properties they want (short pauses, small footprint, etc.) and the virtual machine should choose the compiler, collector, and other options to maintain that quality over the execution of the application. We've started on that, but we still have a lot of improvements to make.
What's the real problem you are trying to solve by suggesting breaking up the JRE? Is it the download bundle size? The runtime footprint of the JRE? Class loading speed?
Posted by: peterkessler on November 18, 2004 at 10:27 AM
-
>> Is it the download bundle size? The runtime footprint of the JRE? Class loading speed?
All of them. This has been raised several times before via RFE or Mustang forum. I am just thinking one way to achieve this is to really break-up the JRE into two profiles, client and server. One analogy will be the various J2ME profiles that exist now.
When I think more about it, I see there are a few separate things to consider.
JVM/Hotspot: The server hotspot engine will be more sophisticated than client, thus is expected to have a bigger footprint. On the other hand do we really need so many things in a client JVM? Besides a few specific usecases, can we realistically utilize so many GC schemes in client? My point is, the client hotspot can be leaner and optimized to run a client app just good enough.
Runtime classes: The server JRE will have a smaller set of runtime classes. Why do we need awt/swing, sound, imageio etc in server? The client JRE will have a bigger set of runtime classes for UI needs.
rt.jar: There are many ways rt.jar can be broken up without compromising the “one java platform”. Think in terms of modules which can be loaded into runtime as and only when needed. That will allow us to do incremental JRE upgrades in a more efficient manner.
I understand that from distribution or maintenance perspective, it is more logical to maintain a single JRE for all types of deployments. But every release adds more and more new stuff, so very soon footprint and class-loading speed will become a real bottleneck. So if we are looking for more performance, and efficiency in JRE installation and upgrade, it’s time to break-up the JRE.
Posted by: sutanu on November 18, 2004 at 11:54 AM
-
Hi Peter.
Usually there is very little info from HostSpot team about directions and possible future optimizations of HotSpot compilers. I particularly interested in two things:
1.) tiered compilation, which elegantly solves that "faster startup or better running speed" problem, and could even more simplify ergonomics of startin up a VM (no need to specify -client or -server anymore)
2.) Escape analysis - either a JVM performed one (probably difficult because of dynamic nature of Java) or enabled though some programming practices or even with programmer hinting through new metadata facility
Any plans for those two in Mustang/Dolphin timeframe?
Posted by: selendic1 on November 19, 2004 at 03:33 AM
|