Appearance
dict and set
dict
Python has a built-in dictionary type: dict
. The full name is dictionary, and in other languages, it is also known as a map. It uses key-value storage and offers extremely fast lookup speeds.
For example, suppose you want to look up students' scores by their names. If you use a list
, you would need two lists:
python
names = ['Michael', 'Bob', 'Tracy']
scores = [95, 75, 85]
Given a name, you would first find the corresponding position in names
, then retrieve the corresponding score from scores
. The longer the list, the slower the lookup.
If you use a dict
, you only need a single "name"-"score" mapping table. You can directly look up the score by name, and the lookup speed does not slow down regardless of how large the table becomes. Here's how to create a dict
in Python:
python
>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
>>> d['Michael']
95
Why is dict
lookup so fast? Because the implementation of dict
is similar to how dictionaries are organized. Suppose a dictionary contains 10,000 Chinese characters. To look up a specific character, one method is to flip through the dictionary from the first page until you find the character, similar to how elements are searched in a list
. The larger the list
, the slower the search.
The second method is to first check the dictionary's index table (such as the radical table) to find the page number corresponding to the character, then directly turn to that page to find the character. No matter which character you search for, the lookup speed is very fast and does not decrease as the dictionary size increases.
dict
uses the second implementation method. Given a name like 'Michael'
, dict
can internally calculate the "page number" where the score 95
is stored, which is the memory address where 95
is stored, and retrieve it directly. Therefore, the speed is very fast.
You can guess that this key-value storage method must calculate the storage location of the value based on the key when inserting data, so that the value can be directly retrieved based on the key.
Besides initializing a dict
with key-value pairs, you can also add data by assigning a value to a key:
python
>>> d['Adam'] = 67
>>> d['Adam']
67
Since a key can only correspond to one value, assigning a value to the same key multiple times will overwrite the previous value:
python
>>> d['Jack'] = 90
>>> d['Jack']
90
>>> d['Jack'] = 88
>>> d['Jack']
88
If a key does not exist, dict
will raise an error:
python
>>> d['Thomas']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Thomas'
To avoid errors when a key does not exist, there are two methods:
Use the
in
operator to check if the key exists:python>>> 'Thomas' in d False
Use the
get()
method provided bydict
. If the key does not exist, it can returnNone
or a default value you specify:python>>> d.get('Thomas') >>> d.get('Thomas', -1) -1
Note: When returning None
, Python's interactive environment does not display the result.
To delete a key, use the pop(key)
method. The corresponding value will also be removed from the dict
:
python
>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}
Please note: The internal order of dict
does not relate to the order in which keys were added.
Compared to list
, dict
has the following characteristics:
- Fast lookup and insertion: The speed does not decrease as the number of keys increases.
- High memory usage: It consumes a lot of memory, leading to significant memory waste.
In contrast, list
has the opposite characteristics:
- Lookup and insertion time increases with the number of elements.
- Low memory usage: It occupies less space and wastes minimal memory.
Therefore, dict
is a method that trades space for time.
dict
can be used in many places where high-speed lookup is required. It is almost ubiquitous in Python code. It is crucial to use dict
correctly. The first rule to remember is that dict
keys must be immutable objects.
This is because dict
calculates the storage location of the value based on the key. If the result of calculating the same key changes each time, the internal structure of the dict
becomes completely disorganized. This key-based location calculation algorithm is known as the hash algorithm.
To ensure the correctness of hashing, objects used as keys must not change. In Python, strings and integers are immutable, so they can safely be used as keys. However, list
is mutable and cannot be used as a key:
python
>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
set
A set
is similar to a dict
in that it is a collection of keys, but it does not store values. Since keys cannot be duplicated, a set
does not contain duplicate keys.
To create a set
, list each element within {}
:
python
>>> s = {1, 2, 3}
>>> s
{1, 2, 3}
Alternatively, you can provide a list
as the input collection:
python
>>> s = set([1, 2, 3])
>>> s
{1, 2, 3}
Note: The input [1, 2, 3]
is a list
, while the output {1, 2, 3}
indicates that the set
contains the elements 1
, 2
, and 3
. The displayed order does not indicate that the set
is ordered.
Duplicate elements are automatically filtered out in a set
:
python
>>> s = {1, 1, 2, 2, 3, 3}
>>> s
{1, 2, 3}
You can add elements to a set
using the add(key)
method. Adding the same element multiple times has no effect:
python
>>> s.add(4)
>>> s
{1, 2, 3, 4}
>>> s.add(4)
>>> s
{1, 2, 3, 4}
You can remove elements using the remove(key)
method:
python
>>> s.remove(4)
>>> s
{1, 2, 3}
A set
can be considered a mathematical set with unordered and unique elements. Therefore, two sets
can perform mathematical operations like intersection and union:
python
>>> s1 = {1, 2, 3}
>>> s2 = {2, 3, 4}
>>> s1 & s2
{2, 3}
>>> s1 | s2
{1, 2, 3, 4}
The only difference between set
and dict
is that set
does not store corresponding values. However, the principle behind set
is the same as dict
, so mutable objects cannot be added to a set
. This is because it is impossible to determine whether two mutable objects are equal, making it impossible to ensure that there are no duplicate elements in the set
. Try adding a list
to a set
to see if it raises an error.
Revisiting Immutable Objects
As mentioned above, str
is an immutable object, while list
is a mutable object.
For mutable objects like list
, performing operations on the list
will change its internal content. For example:
python
>>> a = ['c', 'b', 'a']
>>> a.sort()
>>> a
['a', 'b', 'c']
For immutable objects like str
, what happens when you perform operations?
python
>>> a = 'abc'
>>> a.replace('a', 'A')
'Abc'
>>> a
'abc'
Although the string has a replace()
method and does produce 'Abc'
, the variable a
remains 'abc'
. How should this be understood?
Let's modify the code as follows:
python
>>> a = 'abc'
>>> b = a.replace('a', 'A')
>>> b
'Abc'
>>> a
'abc'
Always remember that a
is a variable, and 'abc'
is the string object! Sometimes, we say that the content of object a
is 'abc'
, but actually, a
is a variable that points to the string object 'abc'
:
┌───┐ ┌───────┐
│ a │────▶│ 'abc' │
└───┘ └───────┘
When we call a.replace('a', 'A')
, we are actually calling the replace
method on the string object 'abc'
. Although the method is named replace
, it does not change the content of 'abc'
. Instead, the replace
method creates a new string 'Abc'
and returns it. If we assign this new string to variable b
, it becomes clear that a
still points to the original string 'abc'
, while b
points to the new string 'Abc'
:
┌───┐ ┌───────┐
│ a │────▶│ 'abc' │
└───┘ └───────┘
┌───┐ ┌───────┐
│ b │────▶│ 'Abc' │
└───┘ └───────┘
Therefore, for immutable objects, calling any method on the object does not change the object's content. Instead, these methods create new objects and return them, ensuring that the immutable object's content remains unchanged.
Summary
dict
: A key-value storage structure that is highly useful in Python. The first rule to remember is thatdict
keys must be immutable objects, with strings being the most commonly used keys.set
: Similar todict
, but only stores keys without corresponding values. Sets are unordered and do not allow duplicate elements.Immutable Objects: Objects like
str
andtuple
are immutable, meaning their content cannot be changed after creation. Immutable objects can be safely used as keys indict
and elements inset
.Mutable Objects: Objects like
list
are mutable and cannot be used as keys indict
or elements inset
because they are unhashable.