Skip to main content

Hashmap Implementation

Posted by tchangu on June 18, 2005 at 10:40 AM PDT

Always thought that the Hash implementations (table & map) used buckets that are prime in number. But recently, after digging around the source code to see what that default size resulted in a bit of a puzzling situation...

So let me start with a bit of rambling of my understanding about hashtables. They are created by using a hashing function to hash the keys into buckets along with associated value. It is possible that different keys might hash into the same bucket causing Hash collision. So the goal of hash table design is to spread out the key pairs across the table. When the hash value (from hashCode) is converted into a valid array index by performing a modulus operation against the size of the hashtable, having the hashtable size as a prime number results in uniform spread of indexes. Of course java.util.Hashtable doesn't deviate from this and can be seen in the following code snippet

int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;

The java.util.Hashtable initializes the table with default value of 11 (a prime) as it can be seen here
public Hashtable() {
this(11, 0.75f);
}

The new Hashmap implementation, however, takes a different approach and, unfortunately, it took a long time for me to even look at it. The new implementation breaks the conventional thinking of needing a prime number for initializing the number of hash buckets. It now requires that size of the hash table be power of two!!!

Accordingly, the constructor for the hashmap is

public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR;
threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
table = new Entry[DEFAULT_INITIAL_CAPACITY];
init();
}

where DEFAULT_INITIAL_CAPACITY is 16!

Now scope this out. The get method

public V get(Object key) {
Object k = maskNull(key);
int hash = hash(k);
int i = indexFor(hash, table.length);

performs an extra hash call and the comment in the code specifies
"Returns a hash value for the specified object. In addition to
the object's own hashCode, this method applies a "supplemental
hash function," which defends against _poor_ quality hash functions.
This is critical because HashMap uses power-of two length
hash tables.

The shift distances in this function were chosen as the result
of an automated search over the entire four-dimensional search space.
"

and the code snippet is like this
static int hash(Object x) {
int h = x.hashCode();

h += ~(h << 9);
h ^= (h >>> 14);
h += (h << 4);
h ^= (h >>> 10);
return h;
}

I got really curious about those numbers. 9, 14, 4 & 10. What were the basis for those numbers? So not being able to contain my curiosity I wrote to Doug Lea and Doug kindly replied and his reply is...

"...The best answer is to read Donald Knuth's Art of Computer
Programming sections on hash tables where it explains
why, how, and when using power-of-two sizes works best.
In general, you need to use pre-conditioning functions
(as seen HashMap.hash) to make them work well, but given
these, they are faster than prime-number techniques...
"

So I am now waiting to check this book out see why power of two works? However I am not sure if I will be able to get an answer for those numbers!

Your thoughts???

Related Topics >>