Perception == Reality
Perception vs. Reality: How big is that process?
One of the issues we have run into our work on the Java performance team is the difference between the physical memory size of a process and the perceived size of that process.
"Perceived" footprint is the number reported by the operating system (by tools such as "ps" and "top" on Unix/Linux, and "Task Manager" on Windows) that a process takes up in RAM. In reality, the size a process takes up in memory consists of executable code, global data, stack and heap etc. However, to the end-user running one of these process-viewing tools, the size of a process is much different since sometimes these tools include some memory that is not actually allocated in RAM at all.
The gap between perceived and real footprint is significant when memory mapping is used, particularly on Solaris & Linux. For those who don't know what memory mapping is, memory mapping is a technique which treats file I/O, such as "read()" and "write()" system calls, as routine memory accesses. For example, you could simply dereference a pointer obtained from memory mapping the file.
Memory mapping not only greatly simplifies programs dealing with file I/O, but also allows multiple processes to share the same underlying file data. However, the downside is these process-viewing tools such as "ps" and "top" will include the memory mapped region size even you don't touch the file at all.
How does it matter to Java?
Java historically memory maps all of its jar files in the JDK into the memory address space on Unix platform. One of these jar files is "rt.jar", which contains all class files in JRE and is about 40 Megabytes in size. Obviously that makes Java looks a lot bigger than it actually is.
Why do we care about perception?
The main reason we care about perceived footprint is that, in the eyes of end-users, Java processes look much larger than they actually are, which makes Java look like a resource hog. End users will not know or care about distinctions between perceived and real footprint, nor the processes for measuring footprint with memory mapped files; they will just see a Java process taking up what looks like a lot of memory. Why not give these users information that is closer to reality and avoid the speculation and misinformation that arises from the perceived footprint numbers?
What are we doing about it?
As of build 45 in Mustang, we have fixed this problem and made perception align better with reality. Instead of memory mapping jar files, we now read their contents from the files just as we have always done on Windows. We still memory map a very small portion of the file, called the "central directory", since that section contains information which is used frequently and can benefit from the sharing and mapping capabilities of mmap. But otherwise, the contents are simply read in as needed with normal file loading operations.
What about the sharing benefits of mmap?
Actually, the contents of rt.jar are not shared very much in practice. For one thing, we introduced class data sharing in J2SE 5.0, which loads most of the core class content from a different file entirely and ignores most of the contents in rt.jar. But even without class data sharing, we would typically only use the mmap'd data once per process, at startup, and this sharing is not worth maintaining long-lasting data structures (or long-lasting footprint penalties).
By doing what I just described, we effectively bring the perceived footprint down by around 55% on Solaris and 25% down on Linux platform measured with a set of internal benchmarks.
But Wait, There's More!
The main reason we pursued this project was to combat the perceived footprint issue; and by the numbers quoted above, we think that goal was achieved. But another huge benefit fell out of this work; we're actually noticing a 11% decrease in real footprint on Solaris and Linux. It turns out that the process of rewriting this code and sharing the implementation with Windows (which always used read() instead of mmap()) made for many more efficiencies in the implementation and ended up in a substatial drop in real footprint.
Another benefit is simply more efficient use of the memory resources at our disposal. This change reduces the risk of address space exhaustion (resulting in java.lang.OutOfMemory exceptions) on 32-bit platforms when opening very large jar files. Also, we no longer have the problem of mmap'd jar files competing with other memory demands such as regular java heap and native memory.
Perception == Reality
25 - 55% drop in perceived process memory size? 11% drop in real memory footprint? No matter how you perceive it, the benefits are real.