Skip to content
On this page

Set

We know that a Map is used to store key-value mappings. The objects used as keys must be unique and must correctly override the equals() and hashCode() methods.

If we only need to store unique keys without the need for a mapping value, we can use a Set.

A Set is used to store a collection of unique elements and primarily provides the following methods:

  • Adding an element to Set<E>: boolean add(E e)
  • Removing an element from Set<E>: boolean remove(Object e)
  • Checking if an element is contained: boolean contains(Object e)

Let’s look at some simple examples:

java
import java.util.*;

public class Main {
    public static void main(String[] args) {
        Set<String> set = new HashSet<>();
        System.out.println(set.add("abc")); // true
        System.out.println(set.add("xyz")); // true
        System.out.println(set.add("xyz")); // false, adding failed because the element already exists
        System.out.println(set.contains("xyz")); // true, the element exists
        System.out.println(set.contains("XYZ")); // false, the element does not exist
        System.out.println(set.remove("hello")); // false, removal failed because the element does not exist
        System.out.println(set.size()); // 2, there are two elements in total
    }
}

In fact, a Set is equivalent to a Map that only stores keys and does not store values. We often use a Set to remove duplicate elements.

Since the elements placed in a Set are similar to the keys in a Map, they must correctly implement the equals() and hashCode() methods; otherwise, the element cannot be correctly placed in the Set.

The most commonly used implementation class of Set is HashSet. In fact, HashSet is simply a straightforward wrapper around HashMap, and its core code is as follows:

java
public class HashSet<E> implements Set<E> {
    // Holds a HashMap:
    private HashMap<E, Object> map = new HashMap<>();

    // Value to put in HashMap:
    private static final Object PRESENT = new Object();

    public boolean add(E e) {
        return map.put(e, PRESENT) == null;
    }

    public boolean contains(Object o) {
        return map.containsKey(o);
    }

    public boolean remove(Object o) {
        return map.remove(o) == PRESENT;
    }
}

The Set interface does not guarantee order, whereas the SortedSet interface guarantees that elements are ordered:

  • HashSet is unordered because it implements the Set interface but does not implement the SortedSet interface.
  • TreeSet is ordered because it implements the SortedSet interface.

Here’s a diagram to illustrate:

       ┌───┐
       │Set│
       └───┘

    ┌────┴─────┐
    │          │
┌───────┐ ┌─────────┐
│HashSet│ │SortedSet│
└───────┘ └─────────┘


          ┌─────────┐
          │ TreeSet │
          └─────────┘

Now let’s look at the output of HashSet:

java
import java.util.*;

public class Main {
    public static void main(String[] args) {
        Set<String> set = new HashSet<>();
        set.add("apple");
        set.add("banana");
        set.add("pear");
        set.add("orange");
        for (String s : set) {
            System.out.println(s);
        }
    }
}

Note that the output order is neither the order of addition nor the sorted order of the String. In different versions of the JDK, this order may also vary.

If we replace HashSet with TreeSet, the output will be ordered, following the sorting order of the elements:

java
import java.util.*;

public class Main {
    public static void main(String[] args) {
        Set<String> set = new TreeSet<>();
        set.add("apple");
        set.add("banana");
        set.add("pear");
        set.add("orange");
        for (String s : set) {
            System.out.println(s);
        }
    }
}

The requirements for using TreeSet are the same as those for using TreeMap; the elements added must correctly implement the Comparable interface. If the Comparable interface is not implemented, a Comparator object must be passed when creating the TreeSet.

Exercise

In a chat application, when the sender sends a message and encounters a network timeout, it will automatically resend, which means the receiver might receive duplicate messages. When displaying to the user, it is necessary to remove duplicates first. Please practice using a Set to remove duplicate messages:

java
import java.util.*;

public class Main {
    public static void main(String[] args) {
        List<Message> received = List.of(
            new Message(1, "Hello!"),
            new Message(2, "Have you received your salary?"),
            new Message(2, "Have you received your salary?"),
            new Message(3, "Where to eat?"),
            new Message(3, "Where to eat?"),
            new Message(4, "Bye")
        );
        List<Message> displayMessages = process(received);
        for (Message message : displayMessages) {
            System.out.println(message.text);
        }
    }

    static List<Message> process(List<Message> received) {
        // TODO: Remove duplicate messages by sequence
        return received;
    }
}

class Message {
    public final int sequence;
    public final String text;
    public Message(int sequence, String text) {
        this.sequence = sequence;
        this.text = text;
    }
}

Summary

  • Set is used to store a collection of unique elements:
    • The elements placed in a HashSet must meet the same requirements as keys in a HashMap.
    • The elements placed in a TreeSet must meet the same requirements as keys in a TreeMap.
  • Using Set can help remove duplicate elements.
  • Traversing a SortedSet iterates according to the sorted order of the elements, and a custom sorting algorithm can also be defined.
Set has loaded