The Source for Java Technology Collaboration
User: Password:



David Walend

David Walend's Blog

How big is that boolean[]?

Posted by dwalend on November 14, 2007 at 06:29 PM | Comments (4)

My day job is software architect for a bunch of algorithm scientists who "feel the need for speed." We work to keep the system safe from the evils of premature optimization while finding hot spots that really speed things up. I won't bore you with the details, but we are computing a lot of solutions to a problem in advance, and looking up the answer in a big multidimensional boolean array. For this particular puzzle, the size and accuracy of the problem we can solve depends on how big a monster boolean array we can pack into memory.

We were blowing through the top of the JVM's heap violently via OutOfMemoryErrors, so I went digging with the profiler, and digging deeper with Google. From Sun's tutorial on primitive types, "This data type represents one bit of information, but its "size" isn't something that's precisely defined." That's really freaky, given that bytes, shorts, ints, and longs are precisely defined, that floats and doubles are precisely defined by IEEE 754, and that chars are precisely defined by Unicode.

Stack Frames

A peak at bytecodes showed me that booleans are bigger than a bit. In a stack frame, booleans take 32 bits. That makes sense; the usual JVM is a 32 bit machine. From a discussion forum posting, you can see that the operations are all integer operations (starting with 'i') in the bytecodes, with 31 unused bits in the register:

0 iconst_0
1 istore_1
2 iconst_0
3 istore_2

In the Heap

Arrays of booleans are also bigger than I thought. The profiler and cheezy experiments were telling me booleans were about 8 times bigger than they should be. Google came through on this one, too, in another forum. In my JVM, boolean array elements are bytes.

Other People's Bits

A little more digging showed me that a BitSet uses a long, and an EnumSet uses a long or arrays of longs to hold their bits, not boolean arrays. From BitSet.java:

    /**
     * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
     * bit position i % 64 (where bit position 0 refers to the least
     * significant bit and 63 refers to the most significant bit).
     * INVARIANT: The words in bits[] above unitsInUse-1 are zero.
     *
     * @serial
     */
    private long bits[];  // this should be called unit[]

Arrays of booleans aren't java.util's choice. They won't be mine.

More Questions

Remaining questions: Why use longs and long[]s instead of ints and int[]s? Are the java.util programmers anticipating 64-bit machines? Is there something inherently more efficient about longs, which are usually less efficient? Or are the library writers trying to cover 33-64 members more efficiently and giving themselves twice the maximum upper size limit?

Does the weak definition of a boolean rate a bug or a request for enhancement? It seems to break write-once-run-anywhere because I can't predict how much memory my application requires. Should the compiler convert boolean[]s into the appropriate primitive or primitive array without us having to think about it? Do we want a sizeof() method?


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • boolean[] will usually use the smallest addressable memory unit, typically a byte in most current machines. Using a bit would introduce difficulties in maintaining the concurrency expectations where multiple threads are modifying a boolean array. Every write to a bit array would in fact be a read/modify/write cycle and require synchronization of some sort.. Otherwise if two threads were modifying different bits that were held in the same 'word', the update from one thread might be lost.

    Posted by: mthornton on November 15, 2007 at 12:58 AM

  • If boolean[] used one bit per boolean, writes would have to be implemented as a read, logical operation then write. Concurrent writes on nearby bits would then be a problem.

    If you need multi-thread access, you can go faster than BitSet with external synchronisation. java.util.concurrent.atomic.AtomicIntegerArray/AtomicLongArray.compareAndSet can atomically check writes overwrite the value you expected. Your code can then retry if another thread got there first.

    Posted by: tackline on November 15, 2007 at 04:07 AM

  • Hi David

    You might be interested in a blog entry I just published. I had been meaning for sometime to publish such an article and when I came across your blog entry I felt the timing was right.
    How much does that Object cost in the VM?
    http://blog.jinspired.com/?p=174

    Kind regards,

    William Louth
    JXInsight Product Architect
    CTO, JINSPIRED

    Posted by: wlouth on November 18, 2007 at 02:05 AM

  • Well, what about stopping guessing? All this is very well defined. There is no 'usually'.
    3.3.4 The boolean Type
    Although the Java virtual machine defines a boolean type, it only provides very limited support for it. There are no Java virtual machine instructions solely dedicated to operations on boolean values. Instead, expressions in the Java programming language that operate on boolean values are compiled to use values of the Java virtual machine int data type. The Java virtual machine does directly support boolean arrays. Its newarray instruction enables creation of boolean arrays. Arrays of type boolean are accessed and modified using the byte array instructions baload and bastore

    Posted by: abies on February 24, 2008 at 10:33 AM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds