Wednesday, 13 May 2015

How HashMap internally works in Java

Hash Map is one of the most used collection, though it will be surprising to know that maps themselves are not collections because they don't implement Collection interface. However collection view of a map can be obtained using entrySet() method. To obtain a collection-view of the keys, keySet() method can be used.

Coming to the HashMap implementation in Java, which is also a favorite Java Collections interview question, there are four things we should know about before going into the internals of how does HashMap work in Java -

  • HashMap works on the principal of hashing.
  • Map.Entry interface - This interface gives a map entry (key-value pair). HashMap in Java stores both key and value object, in bucket, as an object of Entry class which implements this nested interface Map.Entry.
  • hashCode() - HashMap provides put(key, value) for storing and get(key) method for retrieving Values from HashMap. When put() method is used to store (Key, Value) pair, HashMap implementation calls hashcode on Key object to calculate a hash that is used to find a bucket where Entry object will be stored. When get() method is used to retrieve value, again key object is used to calculate a hash which is used then to find a bucket where that particular key is stored.
  • equals() - equals() method is used to compare objects for equality. In case of HashMap key object is used for comparison, also using equals() method Map knows how to handle hashing collision (hashing collision means more than one key having the same hash value, thus assigned to the same bucket. In that case objects are stored in a linked list, refer figure for more clarity.
    Where hashCode method helps in finding the bucket where that key is stored, equals method helps in finding the right key as there may be more than one key-value pair stored in a single bucket.

** Bucket term used here is actually an index of array, that array is called table in HashMap implementation. Thus table[0] is referred as bucket0, table[1] as bucket1 and so on.

How important it is to have a proper hash code and equals method can be seen through the help of the following program -

public class HashMapTest {
    public static void main(String[] args) {
        Map <Key, String> cityMap = new HashMap<Key, String>();
        cityMap.put(new Key(1, "NY"),"New York City" );
        cityMap.put(new Key(2, "ND"), "New Delhi");
        cityMap.put(new Key(3, "NW"), "Newark");
        cityMap.put(new Key(4, "NP"), "Newport");

        System.out.println("size before iteration " + cityMap.size());
        Iterator <Key> itr = cityMap.keySet().iterator();
        while (itr.hasNext()){
            System.out.println(cityMap.get(itr.next()));     
        }
        System.out.println("size after iteration " + cityMap.size());    
    }
 
}

// This class' object is used as key
// in the HashMap
class Key{
 int index;
 String Name;
 Key(int index, String Name){
  this.index = index;
  this.Name = Name;
 }
 
 @Override
 // A very bad implementation of hashcode
 // done here for illustrative purpose only 
 public int hashCode(){
  return 5;
 }
 
 @Override
 // A very bad implementation of equals
 // done here for illustrative purpose only 
 public boolean equals(Object obj){
  return true;
 }
 
}

Output

size before iteration 1
Newport
size after iteration 1

Understanding the Code

Lets get through the code to see what is happening, this will also help in understanding how put works internally.

Notice that I am inserting 4 values in the HashMap, still in the output it says size is 1 and iterating the map gives me the last inserted entry. Why is that?

Answer lies in, how hashCode() and equals() method are implemented for the key Class. Have a look at the hashCode() method of the class Key which always returns "5" and the equals() method which is always returning "true".

When a value is put into HashMap it calculates a hash using key object and for that it uses the hashCode() method of the key object class (or its parent class). Based on the calculated hash value HashMap implementation decides which bucket should store the particular Entry object.

In my code the hashCode() method of the key class always returns "5". This effectively means calculated hash value, is same for all the entries inserted in the HashMap. Thus all the entries are stored in the same bucket.

Second thing, a HashMap implementation does is to use equals() method to see if the key is equal to any of the already inserted keys (Recall that there may be more than one entry in the same bucket). Note that, with in a bucket key-value pair entries (Entry objects) are stored in a linked-list (Refer figure for more clarity). In case hash is same, but equals() returns false (which essentially means more than one key having the same hash or hash collision) Entry objects are stored, with in the same bucket, in a linked-list.

In my code, I am always returning true for equals() method so the HashMap implementation "thinks" that the keys are equal and overwrites the value. So, in a way using hashCode() and equals() I have "tricked" HashMap implementation to think that all the keys (even though different) are same, thus overwriting the values.

In a nutshell there are three scenarios in case of put() -

  • Using hashCode() method, hash value will be calculated. Using that hash it will be ascertained, in which bucket particular entry will be stored.
  • equals() method is used to find if such a key already exists in that bucket, if no then a new node is created with the map entry and stored within the same bucket. A linked-list is used to store those nodes.
  • If equals() method returns true, which means that the key already exists in the bucket. In that case, the new value will overwrite the old value for the matched key.

Pictorial representation of how Entry (key-value pair) objects will be stored in table array

How get() methods works internally

As we already know how Entry objects are stored in a bucket and what happens in the case of Hash Collision it is easy to understand what happens when key object is passed in the get method of the HashMap to retrieve a value.

Using the key again hash value will be calculated to determine the bucket where that Entry object is stored, in case there are more than one Entry object with in the same bucket stored as a linked-list equals() method will be used to find out the correct key. As soon as the matching key is found get() method will return the value object stored in the Entry object.

In case of null Key

As we know that HashMap also allows null, though there can only be one null key in HashMap. While storing the Entry object HashMap implementation checks if the key is null, in case key is null, it always map to bucket 0 as hash is not calculated for null keys.

HashMap changes in Java 8

Though HashMap implementation provides constant time performance O(1) for get() and put() method but that is in the ideal case when the Hash function distributes the objects evenly among the buckets.

But the performance may worsen in the case hashCode() used is not proper and there are lots of hash collisions. As we know now that in case of hash collision entry objects are stored as a node in a linked-list and equals() method is used to compare keys. That comparison to find the correct key with in a linked-list is a linear operation so in a worst case scenario the complexity becomes O(n).

To address this issue in Java 8 hash elements use balanced trees instead of linked lists after a certain threshold is reached. Which means HashMap starts with storing Entry objects in linked list but after the number of items in a hash becomes larger than a certain threshold, the hash will change from using a linked list to a balanced tree, this will improve the worst case performance from O(n) to O(log n).

Points to note -

  • HashMap works on the principal of hashing.
  • HashMap uses the hashCode() method to calculate a hash value. Hash value is calculated using the key object. This hash value is used to find the correct bucket where Entry object will be stored.
  • HashMap uses the equals() method to find the correct key whose value is to be retrieved in case of get() and to find if that key already exists or not in case of put().
  • Hashing collision means more than one key having the same hash value, in that case Entry objects are stored as a linked-list with in a same bucket.
  • With in a bucket values are stored as Entry objects which contain both key and value.
  • In Java 8 hash elements use balanced trees instead of linked lists after a certain threshold is reached while storing values. This improves the worst case performance from O(n) to O(log n).

That's all for this topic how HashMap internally works in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. How ArrayList works internally in Java
  2. How Linked List class works internally in Java
  3. How HashSet works internally in Java
  4. fail-fast Vs fail-safe iterator in Java
  5. Overriding hashCode() and equals() method in Java
  6. How to loop through a map in Java
  7. ConcurrentHashMap in Java

You may also like -

31 comments:

  1. Good and different type of explanation for HashMap internal working.

    ReplyDelete
    Replies
    1. Indeed nice and clear explanation. You complement my post How get method of HashMap works in Java, let me know if you find it useful.

      Delete
    2. Really good and much informative information on internal working on HashMap.Thank you very much!!

      Delete
  2. its very nice and a very beautifully explained ,everyone can easily understand

    ReplyDelete
  3. Really a very good explanation... Simple and straight forward.

    ReplyDelete
  4. Thanks for simple and complete explanation!!!

    ReplyDelete
  5. Very nice explanation :) Thanks.

    ReplyDelete
  6. Thanks for writing, it is very useful...

    ReplyDelete
  7. Can you please explain how iterator works internally in HashMap.

    ReplyDelete
  8. Can you give me any real time example like hash map that when we will go for the array list and linked list.
    Note- Don't say for retrieval array list and manipulation linked list
    I need great example.

    ReplyDelete
    Replies
    1. Please go through the link Difference between ArrayList and LinkedList in Java to see the cases where these data structures work the best respectively. If you still have doubts please let me know. From my experience I can say in almost all of the cases ArrayList suffice. Only when you are updating the internal structure of the list extensively, usage of linkedlist can be considered.

      Delete
  9. Thanks for this. Pretty helpful.

    ReplyDelete
  10. Really, nice explanation...Thanks for your post..Please do post for important concepts..

    ReplyDelete
  11. This is the one of the best explainations I have found in the years. thanks for explaining it so well and easily, HashMap seems so easy now. :)

    ReplyDelete
  12. Very Good explanation...It's really helpful...

    ReplyDelete
  13. How in Hash map works in Java 8 in case of it uses tree inside bucket. Anyway because in a bucket the only way to check element exist or not by using equals and untill not traversing/checking all how can it be done, it means each object must implement comparable interface and it uses compare to and all element goes in sorted order. Please mention that.

    ReplyDelete
  14. very clear and crisp explanation.

    ReplyDelete
  15. very good explanation now i have clear my doubts

    ReplyDelete
  16. Thanks for the post buddy. It was informative.

    ReplyDelete
  17. Thanks for the post buddy. It was informative.

    ReplyDelete
  18. https://play.google.com/store/apps/details?id=brain.booster.app

    awesome app to increase brain power

    ReplyDelete
  19. i have a doubt if both keys are diiferent but values are same then how can hashmap avoid that.

    ReplyDelete
    Replies
    1. There is nothing to avoid in that case. I fail to understand how having different keys with same values pose any problem?

      Delete
  20. Can u please explain how balancer tree is worked internally in Java 8 to improve the worst case performance?

    ReplyDelete
  21. Nice explanation.Easy to understand. Thanks.

    ReplyDelete
  22. nice article
    www.codegeekslab.com

    ReplyDelete