Skip to content
On this page

dict and set

dict

Python has a built-in dictionary type: dict. The full name is dictionary, and in other languages, it is also known as a map. It uses key-value storage and offers extremely fast lookup speeds.

For example, suppose you want to look up students' scores by their names. If you use a list, you would need two lists:

python
names = ['Michael', 'Bob', 'Tracy']
scores = [95, 75, 85]

Given a name, you would first find the corresponding position in names, then retrieve the corresponding score from scores. The longer the list, the slower the lookup.

If you use a dict, you only need a single "name"-"score" mapping table. You can directly look up the score by name, and the lookup speed does not slow down regardless of how large the table becomes. Here's how to create a dict in Python:

python
>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
>>> d['Michael']
95

Why is dict lookup so fast? Because the implementation of dict is similar to how dictionaries are organized. Suppose a dictionary contains 10,000 Chinese characters. To look up a specific character, one method is to flip through the dictionary from the first page until you find the character, similar to how elements are searched in a list. The larger the list, the slower the search.

The second method is to first check the dictionary's index table (such as the radical table) to find the page number corresponding to the character, then directly turn to that page to find the character. No matter which character you search for, the lookup speed is very fast and does not decrease as the dictionary size increases.

dict uses the second implementation method. Given a name like 'Michael', dict can internally calculate the "page number" where the score 95 is stored, which is the memory address where 95 is stored, and retrieve it directly. Therefore, the speed is very fast.

You can guess that this key-value storage method must calculate the storage location of the value based on the key when inserting data, so that the value can be directly retrieved based on the key.

Besides initializing a dict with key-value pairs, you can also add data by assigning a value to a key:

python
>>> d['Adam'] = 67
>>> d['Adam']
67

Since a key can only correspond to one value, assigning a value to the same key multiple times will overwrite the previous value:

python
>>> d['Jack'] = 90
>>> d['Jack']
90
>>> d['Jack'] = 88
>>> d['Jack']
88

If a key does not exist, dict will raise an error:

python
>>> d['Thomas']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Thomas'

To avoid errors when a key does not exist, there are two methods:

  1. Use the in operator to check if the key exists:

    python
    >>> 'Thomas' in d
    False
  2. Use the get() method provided by dict. If the key does not exist, it can return None or a default value you specify:

    python
    >>> d.get('Thomas')
    >>> d.get('Thomas', -1)
    -1

Note: When returning None, Python's interactive environment does not display the result.

To delete a key, use the pop(key) method. The corresponding value will also be removed from the dict:

python
>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}

Please note: The internal order of dict does not relate to the order in which keys were added.

Compared to list, dict has the following characteristics:

  • Fast lookup and insertion: The speed does not decrease as the number of keys increases.
  • High memory usage: It consumes a lot of memory, leading to significant memory waste.

In contrast, list has the opposite characteristics:

  • Lookup and insertion time increases with the number of elements.
  • Low memory usage: It occupies less space and wastes minimal memory.

Therefore, dict is a method that trades space for time.

dict can be used in many places where high-speed lookup is required. It is almost ubiquitous in Python code. It is crucial to use dict correctly. The first rule to remember is that dict keys must be immutable objects.

This is because dict calculates the storage location of the value based on the key. If the result of calculating the same key changes each time, the internal structure of the dict becomes completely disorganized. This key-based location calculation algorithm is known as the hash algorithm.

To ensure the correctness of hashing, objects used as keys must not change. In Python, strings and integers are immutable, so they can safely be used as keys. However, list is mutable and cannot be used as a key:

python
>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

set

A set is similar to a dict in that it is a collection of keys, but it does not store values. Since keys cannot be duplicated, a set does not contain duplicate keys.

To create a set, list each element within {}:

python
>>> s = {1, 2, 3}
>>> s
{1, 2, 3}

Alternatively, you can provide a list as the input collection:

python
>>> s = set([1, 2, 3])
>>> s
{1, 2, 3}

Note: The input [1, 2, 3] is a list, while the output {1, 2, 3} indicates that the set contains the elements 1, 2, and 3. The displayed order does not indicate that the set is ordered.

Duplicate elements are automatically filtered out in a set:

python
>>> s = {1, 1, 2, 2, 3, 3}
>>> s
{1, 2, 3}

You can add elements to a set using the add(key) method. Adding the same element multiple times has no effect:

python
>>> s.add(4)
>>> s
{1, 2, 3, 4}
>>> s.add(4)
>>> s
{1, 2, 3, 4}

You can remove elements using the remove(key) method:

python
>>> s.remove(4)
>>> s
{1, 2, 3}

A set can be considered a mathematical set with unordered and unique elements. Therefore, two sets can perform mathematical operations like intersection and union:

python
>>> s1 = {1, 2, 3}
>>> s2 = {2, 3, 4}
>>> s1 & s2
{2, 3}
>>> s1 | s2
{1, 2, 3, 4}

The only difference between set and dict is that set does not store corresponding values. However, the principle behind set is the same as dict, so mutable objects cannot be added to a set. This is because it is impossible to determine whether two mutable objects are equal, making it impossible to ensure that there are no duplicate elements in the set. Try adding a list to a set to see if it raises an error.

Revisiting Immutable Objects

As mentioned above, str is an immutable object, while list is a mutable object.

For mutable objects like list, performing operations on the list will change its internal content. For example:

python
>>> a = ['c', 'b', 'a']
>>> a.sort()
>>> a
['a', 'b', 'c']

For immutable objects like str, what happens when you perform operations?

python
>>> a = 'abc'
>>> a.replace('a', 'A')
'Abc'
>>> a
'abc'

Although the string has a replace() method and does produce 'Abc', the variable a remains 'abc'. How should this be understood?

Let's modify the code as follows:

python
>>> a = 'abc'
>>> b = a.replace('a', 'A')
>>> b
'Abc'
>>> a
'abc'

Always remember that a is a variable, and 'abc' is the string object! Sometimes, we say that the content of object a is 'abc', but actually, a is a variable that points to the string object 'abc':

┌───┐     ┌───────┐
│ a │────▶│ 'abc' │
└───┘     └───────┘

When we call a.replace('a', 'A'), we are actually calling the replace method on the string object 'abc'. Although the method is named replace, it does not change the content of 'abc'. Instead, the replace method creates a new string 'Abc' and returns it. If we assign this new string to variable b, it becomes clear that a still points to the original string 'abc', while b points to the new string 'Abc':

┌───┐     ┌───────┐
│ a │────▶│ 'abc' │
└───┘     └───────┘
┌───┐     ┌───────┐
│ b │────▶│ 'Abc' │
└───┘     └───────┘

Therefore, for immutable objects, calling any method on the object does not change the object's content. Instead, these methods create new objects and return them, ensuring that the immutable object's content remains unchanged.

Summary

  • dict: A key-value storage structure that is highly useful in Python. The first rule to remember is that dict keys must be immutable objects, with strings being the most commonly used keys.

  • set: Similar to dict, but only stores keys without corresponding values. Sets are unordered and do not allow duplicate elements.

  • Immutable Objects: Objects like str and tuple are immutable, meaning their content cannot be changed after creation. Immutable objects can be safely used as keys in dict and elements in set.

  • Mutable Objects: Objects like list are mutable and cannot be used as keys in dict or elements in set because they are unhashable.

dict and set has loaded