Skip to main content

Multithreaded Hash Table

Posted by elevy on April 17, 2007 at 8:49 AM PDT

If your application needs a "Hash Table" type of structure you have several options.

One is to use the java.util.Hashtable. In a multithreaded environment it would be safe, but not very efficient. If you have a couple of threads that want to read from the hash table, they will have to wait for each other. This wait is not necessary. There is no problem having multiple threads reading a structure.

Another solution is to use the java.util.HashMap. Again in a multithreaded environment, you have to ensure that it is not modified concurrently or you can reach a critical memory problem, because it is not synchronized in any way.

I thought that the solution was to use the static Collections.synchronizedMap method. I was expecting it to return a better implementation. But if you look at the source code you will realize that all they do in there is just a wrapper with a synchronized call on a mutex, which happens to be the same map, not allowing reads to occur concurrently.

In the Jakarta commons project, there is an implementation that is called FastHashMap. This implementation has a property called fast. If fast is true, then the reads are non-synchronized, and the writes will perform the following steps:

  • Clone the current structure
  • Perform the modification on the clone
  • Replace the existing structure with the modified clone

  • This implementation will work fine for the applications that write everything that they need into the structure at startup, and then only perform reads. You can see that writing into this structure is expensive. For the case where you need to do some writes every now and then, it can introduce a significant memory/performance issue.

    I wrote a very simple implementation. It starts creating a ReentrantReadWriteLock. Then every time we do a read operation we lock for reading. When we want to do an operation that modifies the structure, we call a write lock.

    Here an excerpt from the code:

    public class FastSynchronizedMap implements Map,   
        Serializable {

        private final Map m;
        private ReentrantReadWriteLock lock = new ReentrantReadWriteLock();

        .
        .
        .

        public V get(Object key) {
    lock.readLock().lock();
    V value = null;
    try {
        value = m.get(key);
    } finally {
        lock.readLock().unlock();
    }
    return value;
        }

        public V put(K key, V value) {
    lock.writeLock().lock();
    V v = null;
    try {
        v = m.put(key, value);
    } finally {
        lock.writeLock().lock();
    }
    return v;
        }

        .
        .
        .
    }

    Note that we do a try finally block, we want to guarantee that the lock is released no matter what problem is encountered in the block.

    This implementation works well when you have almost no write operations, and mostly read operations.

    When we want to insert into a hash table, we can see that depending on the hash code of the key that we are trying to insert, we will modify only one section of the structure. We could create a lock for each of the sections. In that way we can even have concurrent writes happening at the same time.

    An approach like that one is the one followed by the ConcurrentHashMap in the concurrency API. Looking at the implementation is interesting.

    There is an inner class called Segment. The Segment is a "specialized version of hash tables". When you want to do an insert or read operation in the ConcurrentHashMap, it first finds the Segment where the key belongs too. After that it proceeds to do the operation on the respective Segment.

    Interestingly enough, the Segment uses a ReentrantLock (actually it extends the ReentrantLock, I guess to save an instance). The lock is only used when writing to the table. When a read occurs no lock is performed.

    If you are aware of the Java Memory Model you can be asking yourself what about the visibility? Can you get a partially built object when you are accessing it?

    Actually, the HashEntry that holds the value, and the value itself are defined as volatile. In that way you are guaranteed to read the value only when it has been fully initialized.

    And there is a check after performing the get to make sure a null reference is not being returned (this can happen if the compiler happens to reorder the operations). In the case of a null reference, it performs the lock to ensure that we have the proper visibility and then read the value.

    The choice of the structure to use, will depend on the scenario where you want to use it. Understanding how each of the structures work would be the first step into getting the right decision.

    If you are working on a web application, and you are loading some data into a hash table when the Servlet is initialized and then just reading the data from concurrent requests a HashMap should suffice. If you are just updating the table from time to time on a very controlled way, an implementation like the one I proposed above would do the job. If you are going to be updating the data often, a ConcurrentHashMap will actually give you better performance. However, if you want to iterate the data, and have guarantees that the data is correct, then may be a Hashtable is the answer.

    For those of you who are interested in concurrency, I would encourage you to look at the implementation of the ConcurrentHashMap. I would say that it is a nice source to learn from.

    Related Topics >>