hashCode Method

We know that a Map is a key-value mapping table that allows for quick lookup of corresponding values based on keys.

Taking HashMap as an example, observe the following code:

java

Map<String, Person> map = new HashMap<>();
map.put("a", new Person("Xiao Ming"));
map.put("b", new Person("Xiao Hong"));
map.put("c", new Person("Xiao Jun"));

map.get("a"); // Person("Xiao Ming")
map.get("x"); // null

The reason HashMap can directly retrieve a value based on a key is that it internally uses a space-for-time strategy. It stores all values in a large array and calculates the index where a value should be stored based on the key:

  ┌───┐
0 │   │
  ├───┤
1 │ ●─┼───▶ Person("Xiao Ming")
  ├───┤
2 │   │
  ├───┤
3 │   │
  ├───┤
4 │   │
  ├───┤
5 │ ●─┼───▶ Person("Xiao Hong")
  ├───┤
6 │ ●─┼───▶ Person("Xiao Jun")
  ├───┤
7 │   │
  └───┘

If the key is "a", the calculated index is always 1, so it returns the value Person("Xiao Ming"). If the key is "b", the index is always 5, returning Person("Xiao Hong"). This way, there's no need to traverse the entire array to read the value corresponding to a key.

The Problem with Keys

When we use a key to store and retrieve a value in a Map, an issue arises:

We put the key "a" as a string into the Map, but when retrieving the value, the key passed might not be the exact same key object.

In other words, two keys should have the same content but don't need to be the exact same object. Here's the test code:

java

import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) {
        String key1 = "a";
        Map<String, Integer> map = new HashMap<>();
        map.put(key1, 123);

        String key2 = new String("a");
        map.get(key2); // 123

        System.out.println(key1 == key2); // false
        System.out.println(key1.equals(key2)); // true
    }
}

Because Map internally uses equals() to compare keys, similar to how List requires correctly overridden equals() for element lookup, correct usage of Map necessitates that the objects used as keys must properly override the equals() method.

We often use String as keys because String already correctly overrides the equals() method. However, if we use a custom class as a key, we must ensure that the equals() method is properly overridden.

Why `HashMap` Uses `hashCode()`

Let's reconsider why HashMap can directly calculate the index where a value is stored based on the key. If two key objects (determined equal by equals()) produce different indices, then the same key might retrieve different values each time.

The way HashMap calculates the index is by calling the hashCode() method of the key object, which returns an int. HashMap uses this method to directly locate the index corresponding to the key and then directly returns the value.

Therefore, to use Map correctly, you must ensure:

Override equals(): The object used as a key must correctly override the equals() method so that two key instances that are considered equal by equals() return true.
Override hashCode(): The object must also correctly override the hashCode() method, adhering strictly to the following rules:

If two objects are equal according to equals(), their hashCode() values must be equal.
If two objects are not equal according to equals(), their hashCode() values should, as much as possible, be different.

In other words, for two instances a and b:

If a and b are equal (a.equals(b) is true), then a.hashCode() must equal b.hashCode().
If a and b are not equal (a.equals(b) is false), then a.hashCode() and b.hashCode() should, as much as possible, be different.

The first rule is essential for correctness and must be guaranteed; otherwise, HashMap won't function properly.

The second rule, while not mandatory, helps ensure lookup efficiency. If different objects return the same hashCode(), it causes collisions within the Map, leading to longer chains of entries and reduced access speed.

Implementing `equals()` and `hashCode()`

We've already discussed how to correctly implement the equals() method in the section on writing the equals() method. Taking the Person class as an example:

java

public class Person {
    String firstName;
    String lastName;
    int age;
}

Identify the fields to compare:

firstName
lastName
age

For reference types, use Objects.equals() for comparison, and for primitive types, use ==.

Based on a correct implementation of equals(), we also need to implement hashCode() correctly, ensuring that instances with the same values for these fields produce the same hashCode():

java

public class Person {
    String firstName;
    String lastName;
    int age;

    @Override
    public int hashCode() {
        int h = 0;
        h = 31 * h + firstName.hashCode();
        h = 31 * h + lastName.hashCode();
        h = 31 * h + age;
        return h;
    }
}

Notice that the String class already correctly implements the hashCode() method. When calculating Person's hashCode(), repeatedly using 31 * h helps distribute different Person instances' hashCode() values evenly across the entire range of int.

However, similar to implementing equals(), if firstName or lastName is null, the above code will throw a NullPointerException. To handle this, we often use Objects.hash() when calculating hashCode():

java

@Override
public int hashCode() {
    return Objects.hash(firstName, lastName, age);
}

Principles for Writing equals() and hashCode():

Every field used in equals() for comparison must be used in hashCode() for calculation.
Fields not used in equals() must not be included in hashCode().

Additionally, note that there are no requirements for the objects used as values in a HashMap.

Since HashMap internally uses an array and calculates the index for storing a value using the key's hashCode(), the first question arises: The hashCode() method returns an int, which ranges up to approximately ±2.1 billion. Ignoring negative numbers, how large is the array inside HashMap?

In reality, when a HashMap is initialized, its default array size is only 16. Regardless of how large the hashCode() of any key is, it can be simplified by:

java

int index = key.hashCode() & 0xf; // 0xf = 15

This ensures the index is between 0 and 15, never exceeding the array's bounds. This algorithm is a simple implementation.

Second Question: What happens when more than 16 key-value pairs are added to a HashMap, and the array becomes full?

When the number of key-value pairs exceeds a certain threshold, HashMap automatically resizes its internal array, doubling its size each time (e.g., from 16 to 32). Consequently, the method for calculating the index based on the hashCode() changes to:

java

int index = key.hashCode() & 0x1f; // 0x1f = 31

Since resizing causes existing key-value pairs to be redistributed, frequent resizing can significantly impact HashMap's performance. If you know in advance that you'll be using a HashMap to store around 10,000 key-value pairs, it's better to specify the capacity when creating the HashMap:

java

Map<String, Integer> map = new HashMap<>(10000);

Although specifying a capacity of 10,000, the internal array size is always a power of two, so it will be initialized to the next power of two greater than 10,000, which is 16,384 (2¹⁴).

Final Question: If two different keys, such as "a" and "b", happen to have the same hashCode() (which is entirely possible since unequal instances are only required to have different hashCode() values as much as possible), what happens when we put:

java

map.put("a", new Person("Xiao Ming"));
map.put("b", new Person("Xiao Hong"));

Since both "a" and "b" calculate to the same array index, does the later "Xiao Hong" overwrite "Xiao Ming"?

Answer: Of course not! When using a Map, as long as the keys are different, their associated values do not interfere with each other. However, internally, HashMap may have different keys mapping to the same hashCode(), meaning the same array index. So, what does it do?

Suppose both "a" and "b" have a calculated index of 5. Instead of storing a single Person instance, HashMap actually stores a List at that index, containing two entries: one for "a" and one for "b".

  ┌───┐
0 │   │
  ├───┤
1 │   │
  ├───┤
2 │   │
  ├───┤
3 │   │
  ├───┤
4 │   │
  ├───┤
5 │ ●─┼───▶ List<Entry<String, Person>>
  ├───┤
6 │   │
  ├───┤
7 │   │
  └───┘

When retrieving, for example:

java

Person p = map.get("a");

Internally, HashMap finds the List<Entry<String, Person>> at the calculated index and then iterates through this list to find the Entry whose key field is "a", finally returning the corresponding Person instance.

We refer to the situation where different keys have the same hashCode() as a hash collision. In case of a collision, one simple solution is to use a List to store key-value pairs with the same hashCode(). Clearly, the higher the probability of collisions, the longer the List becomes, reducing the efficiency of Map's get() method. This is why it's crucial to adhere to the second rule:

Tip

If two objects are not equal, their hashCode() values should, as much as possible, not be equal.

The better the hashCode() method is implemented, the more efficiently HashMap operates.

Summary

To use HashMap correctly, the key's class must properly override both the equals() and hashCode() methods.
If a class overrides equals(), it must also override hashCode(). The overriding rules are:
- If equals() returns true, hashCode() must return the same value.
- If equals() returns false, hashCode() should, as much as possible, return different values.
Implementing the hashCode() method can be assisted by using the Objects.hash() method.

hashCode Method ​

The Problem with Keys ​

Why HashMap Uses hashCode() ​

Implementing equals() and hashCode() ​

Further Reading ​

Summary ​

hashCode Method

The Problem with Keys

Why `HashMap` Uses `hashCode()`

Implementing `equals()` and `hashCode()`

Further Reading

Summary