Skip to main content

Fun performance regression with (Open)JDK 7u4

Posted by fabriziogiudici on May 14, 2012 at 3:18 PM PDT

Testing for features is not enough: you should always test also for performance. For instance, a test could assert that a given task is executed under a certain time (which is not always easy, and if you run tests under CI, that is on a server which performs multiple tasks, things can be slower or faster depending on available CPU). Testing for performance is important because sometime you can degrade the performance of your code during a refactoring, or just updating a library or the Java runtime.

A manual performance test is usually easier (and in simpler scenarios can be enough) because you can make sure that the computer you're running it on is always in a well-known state. My SolidBlue application computes batches of MD5 fingerprints of files for integrity verification purposes and I usually test it against my 400GB directory of photos.

SolidBlue is optimized and designed around an actor system. In particular, a class named FileLoaderActor receives messages notifying that a certain file has been discovered in a scan and reads its contents, firing another notification.

The code is optimized for having a single instance of the actor (magnetic discs usually perform better when you do a single operation a time) that tries to spend most of its time by loading from the disk, thus saturating the I/O channel. MD5 is computed by other actors in separate threads. So far I've been able to achieve 90% of the speed of my disk, which sounds good.

To my surprise, when I ported the application to OpenJDK 7u4 for Mac OS X, I've noticed a dramatic performance hit (roughly 10x slower). For the limited test I've tried, the same happens with the Oracle JDK 7u4.

Thanks to the help from the macosx-port-dev(@)openjdk.java.net mailing list, I've found that the problem is a regression of JDK 7 that, fortunately, can be worked around.

I perform I/O by means of NIO channels:

FileInputStream fis = new FileInputStream(file);
ByteBuffer byteBuffer = fis.getChannel().map(READ_ONLY, 0, file.lenght()).load();

This usually is the faster method for I/O, but it also gives the advantage that mapped byte buffers don't consume heap memory: file contents are instead mapped to virtual memory. This means that I can process very large files (e.g. a 500MB video), perhaps many at the same time, without having to allocate a large heap. This can happen because methods for computing MD5 can directly operate on a ByteBuffer, including a mapped one.

The load() method, according to the javadoc, "makes the best effort" to have file contents in physical memory. This happens by accessing a sequence of single bytes, each for a physical page (a bunch of consecutive bytes whose size is defined by the operating system). This approach also allows for a very fast I/O since disks are faster when you perform sequential I/O.

Note this: the task was performed by native code in JDK 6, which was translated to Java code in JDK 7:

Unsafe unsafe = Unsafe.getUnsafe();
int ps = Bits.pageSize();
int count = Bits.pageCount(length);
long a = mappingAddress(offset);
for (int i=0; i < count; i++) {
    unsafe.getByte(a);
    a += ps;
}

What I've discovered is that with OpenJDK 7u4 (at least on Mac OS X, but I think it's a general problem) load() doesn't do anything: a quick test reveals that the method returns immediately and no disk activity is recorded. From a functional point of view the application still works because the data in the ByteBuffer is loaded on demand by the MD5 code. But the MD5 library probably doesn't access data sequentially, thus triggering a very inefficient sequence of page loads.

It turns out that the "culprit" is HotSpot, which is able to detect that the accessed bytes aren't really used. Thus it doesn't generate native code for it! Of course, it's not a bug of HotSpot, which is just doing an advanced performance optimization; it's the code in the load() method that needs to be fixed.

A way to work around the problem is to add this command line option:

-XX:CompileCommand=exclude,java/nio/MappedByteBuffer,load

which instructs HotSpot not to compile the load() method to native code. Fun, isn't it? In this strange scenario disabling HotSpot makes for faster code :-)

 

You can reproduce the problem by testing the following code without and with the -XX option described above.

import java.util.Random;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import static java.nio.channels.FileChannel.MapMode.READ_ONLY;

/***********************************************************************************************************************
*
* A self-contained test file to exercise an OpenJDK 7u4 bug:
*
* http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7168505.
*
* @author  fritz
* @version $Id: IoPerformanceTest.java,v 21d49fdefe53 2012/05/14 12:52:11 fabrizio $
*
**********************************************************************************************************************/
public class IoPerformanceTest
  {
    private final static double MEGA =  1024 * 1024;
    private final static int MIN_FILE_SIZE =  10 * 1000 * 1000;
    private final static int MAX_FILE_SIZE = 100 * 1000 * 1000;
   
    private File testFileFolder;
   
    public static void main (final String ... args)
      throws Exception
      {
        final IoPerformanceTest test = new IoPerformanceTest();
        test.createTestFiles();
        test.test();
      }
           
    private void createTestFiles()
      throws IOException
      {
        System.err.println("Creating test files...");
        testFileFolder = new File(System.getProperty("java.io.tmpdir"));
        testFileFolder.mkdirs();
        final Random random = new Random(342345426536L);
       
        for (int f = 0; f < 20; f++)
          {
            final File file = new File(testFileFolder, "testfile" + f);
            System.err.println(">>>> creating " + file.getAbsolutePath());
            int size = MIN_FILE_SIZE + random.nextInt(MAX_FILE_SIZE - MIN_FILE_SIZE);
            final byte[] buffer = new byte[size];
            random.nextBytes(buffer);
            final FileOutputStream fos = new FileOutputStream(file);
            fos.write(buffer);
            fos.close();
          }
      }
   
    public void test()
      throws Exception
      {
        final long startTime = System.currentTimeMillis();
        long size = 0;
       
        for (int f = 0; f < 20; f++)
          {
            final File file = new File(testFileFolder, "testfile" + f).getAbsoluteFile();
            final FileInputStream fis = new FileInputStream(file);
            final ByteBuffer byteBuffer = nioRead(fis, (int)file.length());
            fis.close();
            size += file.length();
          }
       
        final long time = System.currentTimeMillis() - startTime;
        System.err.printf("Read %d MB, speed %d MB/sec\n", (int)(size / MEGA), (int)(((size / MEGA) / (time / 1000.0))));
      }
   
    private ByteBuffer nioRead (final FileInputStream fis, final int length)
      throws IOException
      {
        return fis.getChannel().map(READ_ONLY, 0, length).load();
      }
   
    private ByteBuffer ioRead (final FileInputStream fis, final int length)
      throws IOException
      {
        final byte[] bytes = new byte[length];
        fis.read(bytes);
        return ByteBuffer.wrap(bytes);
      }
  }
Related Topics >>

Comments

Thanks for reporting the issue - Alan Bateman has posted a ...

Thanks for reporting the issue - Alan Bateman has posted a fix for review on the nio-dev mailing list:
http://mail.openjdk.java.net/pipermail/nio-dev/2012-May/001672.html .

Dalibor Topic
Principal Product Manager
Java Platform Group @ Oracle

Excellent. Superfast and in the open. :-)

Excellent. Superfast and in the open. :-)