Appearance
hashCode Method
We know that a Map
is a key-value mapping table that allows for quick lookup of corresponding values based on keys.
Taking HashMap
as an example, observe the following code:
java
Map<String, Person> map = new HashMap<>();
map.put("a", new Person("Xiao Ming"));
map.put("b", new Person("Xiao Hong"));
map.put("c", new Person("Xiao Jun"));
map.get("a"); // Person("Xiao Ming")
map.get("x"); // null
The reason HashMap
can directly retrieve a value based on a key is that it internally uses a space-for-time strategy. It stores all values in a large array and calculates the index where a value should be stored based on the key:
┌───┐
0 │ │
├───┤
1 │ ●─┼───▶ Person("Xiao Ming")
├───┤
2 │ │
├───┤
3 │ │
├───┤
4 │ │
├───┤
5 │ ●─┼───▶ Person("Xiao Hong")
├───┤
6 │ ●─┼───▶ Person("Xiao Jun")
├───┤
7 │ │
└───┘
If the key is "a"
, the calculated index is always 1
, so it returns the value Person("Xiao Ming")
. If the key is "b"
, the index is always 5
, returning Person("Xiao Hong")
. This way, there's no need to traverse the entire array to read the value corresponding to a key.
The Problem with Keys
When we use a key to store and retrieve a value in a Map
, an issue arises:
We put the key "a"
as a string into the Map
, but when retrieving the value, the key passed might not be the exact same key object.
In other words, two keys should have the same content but don't need to be the exact same object. Here's the test code:
java
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) {
String key1 = "a";
Map<String, Integer> map = new HashMap<>();
map.put(key1, 123);
String key2 = new String("a");
map.get(key2); // 123
System.out.println(key1 == key2); // false
System.out.println(key1.equals(key2)); // true
}
}
Because Map
internally uses equals()
to compare keys, similar to how List
requires correctly overridden equals()
for element lookup, correct usage of Map
necessitates that the objects used as keys must properly override the equals()
method.
We often use String
as keys because String
already correctly overrides the equals()
method. However, if we use a custom class as a key, we must ensure that the equals()
method is properly overridden.
Why HashMap
Uses hashCode()
Let's reconsider why HashMap
can directly calculate the index where a value is stored based on the key. If two key objects (determined equal by equals()
) produce different indices, then the same key might retrieve different values each time.
The way HashMap
calculates the index is by calling the hashCode()
method of the key object, which returns an int
. HashMap
uses this method to directly locate the index corresponding to the key and then directly returns the value.
Therefore, to use Map
correctly, you must ensure:
- Override
equals()
: The object used as a key must correctly override theequals()
method so that two key instances that are considered equal byequals()
returntrue
. - Override
hashCode()
: The object must also correctly override thehashCode()
method, adhering strictly to the following rules:
- If two objects are equal according to
equals()
, theirhashCode()
values must be equal. - If two objects are not equal according to
equals()
, theirhashCode()
values should, as much as possible, be different.
In other words, for two instances a
and b
:
- If
a
andb
are equal (a.equals(b)
istrue
), thena.hashCode()
must equalb.hashCode()
. - If
a
andb
are not equal (a.equals(b)
isfalse
), thena.hashCode()
andb.hashCode()
should, as much as possible, be different.
The first rule is essential for correctness and must be guaranteed; otherwise, HashMap
won't function properly.
The second rule, while not mandatory, helps ensure lookup efficiency. If different objects return the same hashCode()
, it causes collisions within the Map
, leading to longer chains of entries and reduced access speed.
Implementing equals()
and hashCode()
We've already discussed how to correctly implement the equals()
method in the section on writing the equals()
method. Taking the Person
class as an example:
java
public class Person {
String firstName;
String lastName;
int age;
}
Identify the fields to compare:
firstName
lastName
age
For reference types, use Objects.equals()
for comparison, and for primitive types, use ==
.
Based on a correct implementation of equals()
, we also need to implement hashCode()
correctly, ensuring that instances with the same values for these fields produce the same hashCode()
:
java
public class Person {
String firstName;
String lastName;
int age;
@Override
public int hashCode() {
int h = 0;
h = 31 * h + firstName.hashCode();
h = 31 * h + lastName.hashCode();
h = 31 * h + age;
return h;
}
}
Notice that the String
class already correctly implements the hashCode()
method. When calculating Person
's hashCode()
, repeatedly using 31 * h
helps distribute different Person
instances' hashCode()
values evenly across the entire range of int
.
However, similar to implementing equals()
, if firstName
or lastName
is null
, the above code will throw a NullPointerException
. To handle this, we often use Objects.hash()
when calculating hashCode()
:
java
@Override
public int hashCode() {
return Objects.hash(firstName, lastName, age);
}
Principles for Writing equals()
and hashCode()
:
- Every field used in
equals()
for comparison must be used inhashCode()
for calculation. - Fields not used in
equals()
must not be included inhashCode()
.
Additionally, note that there are no requirements for the objects used as values in a HashMap
.
Further Reading
Since HashMap
internally uses an array and calculates the index for storing a value using the key's hashCode()
, the first question arises: The hashCode()
method returns an int
, which ranges up to approximately ±2.1 billion. Ignoring negative numbers, how large is the array inside HashMap
?
In reality, when a HashMap
is initialized, its default array size is only 16. Regardless of how large the hashCode()
of any key is, it can be simplified by:
java
int index = key.hashCode() & 0xf; // 0xf = 15
This ensures the index is between 0
and 15
, never exceeding the array's bounds. This algorithm is a simple implementation.
Second Question: What happens when more than 16 key-value pairs are added to a HashMap
, and the array becomes full?
When the number of key-value pairs exceeds a certain threshold, HashMap
automatically resizes its internal array, doubling its size each time (e.g., from 16 to 32). Consequently, the method for calculating the index based on the hashCode()
changes to:
java
int index = key.hashCode() & 0x1f; // 0x1f = 31
Since resizing causes existing key-value pairs to be redistributed, frequent resizing can significantly impact HashMap
's performance. If you know in advance that you'll be using a HashMap
to store around 10,000 key-value pairs, it's better to specify the capacity when creating the HashMap
:
java
Map<String, Integer> map = new HashMap<>(10000);
Although specifying a capacity of 10,000, the internal array size is always a power of two, so it will be initialized to the next power of two greater than 10,000, which is 16,384 (2¹⁴).
Final Question: If two different keys, such as "a"
and "b"
, happen to have the same hashCode()
(which is entirely possible since unequal instances are only required to have different hashCode()
values as much as possible), what happens when we put:
java
map.put("a", new Person("Xiao Ming"));
map.put("b", new Person("Xiao Hong"));
Since both "a"
and "b"
calculate to the same array index, does the later "Xiao Hong"
overwrite "Xiao Ming"
?
Answer: Of course not! When using a Map
, as long as the keys are different, their associated values do not interfere with each other. However, internally, HashMap
may have different keys mapping to the same hashCode()
, meaning the same array index. So, what does it do?
Suppose both "a"
and "b"
have a calculated index of 5
. Instead of storing a single Person
instance, HashMap
actually stores a List
at that index, containing two entries: one for "a"
and one for "b"
.
┌───┐
0 │ │
├───┤
1 │ │
├───┤
2 │ │
├───┤
3 │ │
├───┤
4 │ │
├───┤
5 │ ●─┼───▶ List<Entry<String, Person>>
├───┤
6 │ │
├───┤
7 │ │
└───┘
When retrieving, for example:
java
Person p = map.get("a");
Internally, HashMap
finds the List<Entry<String, Person>>
at the calculated index and then iterates through this list to find the Entry
whose key
field is "a"
, finally returning the corresponding Person
instance.
We refer to the situation where different keys have the same hashCode()
as a hash collision. In case of a collision, one simple solution is to use a List
to store key-value pairs with the same hashCode()
. Clearly, the higher the probability of collisions, the longer the List
becomes, reducing the efficiency of Map
's get()
method. This is why it's crucial to adhere to the second rule:
Tip
If two objects are not equal, their hashCode()
values should, as much as possible, not be equal.
The better the hashCode()
method is implemented, the more efficiently HashMap
operates.
Summary
To use
HashMap
correctly, the key's class must properly override both theequals()
andhashCode()
methods.If a class overrides
equals()
, it must also overridehashCode()
. The overriding rules are:- If
equals()
returnstrue
,hashCode()
must return the same value. - If
equals()
returnsfalse
,hashCode()
should, as much as possible, return different values.
- If
Implementing the
hashCode()
method can be assisted by using theObjects.hash()
method.