Appearance
Generator
Through list comprehensions, we can directly create a list. However, due to memory limitations, the capacity of a list is certainly limited. Moreover, creating a list containing one million elements not only occupies a large amount of storage space, but if we only need to access a few elements at the beginning, the space occupied by the vast majority of elements at the end is wasted.
So, if the elements of the list can be calculated based on some algorithm, can we continually calculate subsequent elements during the loop? This way, we do not need to create a complete list, thus saving a lot of space. In Python, this mechanism of calculating while looping is called a generator.
There are many ways to create a generator. The simplest method is to change the []
of a list comprehension to ()
, which creates a generator:
python
>>> L = [x * x for x in range(10)]
>>> L
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> g = (x * x for x in range(10))
>>> g
<generator object <genexpr> at 0x1022ef630>
The difference between creating L
and g
lies only in the outer brackets []
and ()
. L
is a list, while g
is a generator.
We can directly print each element of a list, but how can we print each element of a generator?
To print them one by one, we can use the next()
function to get the next return value from the generator:
python
>>> next(g)
0
>>> next(g)
1
>>> next(g)
4
>>> next(g)
9
>>> next(g)
16
>>> next(g)
25
>>> next(g)
36
>>> next(g)
49
>>> next(g)
64
>>> next(g)
81
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As we mentioned, a generator saves the algorithm, and each time next(g)
is called, it calculates the next element's value until the last element is reached. When there are no more elements, it raises a StopIteration
error.
Of course, constantly calling next(g)
like this is rather cumbersome; the correct method is to use a for
loop since generators are also iterable objects:
python
>>> g = (x * x for x in range(10))
>>> for n in g:
... print(n)
...
0
1
4
9
16
25
36
49
64
81
Thus, after creating a generator, we typically do not call next()
directly; instead, we iterate over it using a for
loop without worrying about the StopIteration
error.
Generators are very powerful. If the algorithm for calculating the elements is complex and cannot be implemented with a simple list comprehension, we can also use functions.
For example, the famous Fibonacci sequence, where every number after the first two can be derived from the sum of the two preceding ones:
1, 1, 2, 3, 5, 8, 13, 21, 34, ...
The Fibonacci sequence cannot be expressed with a list comprehension, but it can be easily printed using a function:
python
def fib(max):
n, a, b = 0, 0, 1
while n < max:
print(b)
a, b = b, a + b
n = n + 1
return 'done'
Note the assignment statement:
python
a, b = b, a + b
This is equivalent to:
python
t = (b, a + b) # t is a tuple
a = t[0]
b = t[1]
But we can assign values without explicitly defining the temporary variable t
.
The above function can output the first N numbers of the Fibonacci sequence:
python
>>> fib(6)
1
1
2
3
5
8
'done'
Upon close observation, it can be seen that the fib
function actually defines the rules for calculating the Fibonacci sequence, starting from the first element and being able to calculate any subsequent element. This logic is very similar to that of a generator.
In other words, the function above is only one step away from a generator. To turn the fib
function into a generator function, we just need to change print(b)
to yield b
:
python
def fib(max):
n, a, b = 0, 0, 1
while n < max:
yield b
a, b = b, a + b
n = n + 1
return 'done'
This is another way to define a generator. If a function definition contains the yield
keyword, then that function is no longer a regular function but a generator function. Calling a generator function will return a generator:
python
>>> f = fib(6)
>>> f
<generator object fib at 0x104feaaa0>
Here, the most difficult concept to grasp is that the execution flow of a generator function is different from that of a regular function. A regular function executes sequentially and returns when it encounters a return
statement or reaches the last line of the function. In contrast, a generator function executes each time next()
is called, returning when it hits a yield
statement, and continuing execution from the last yielded yield
statement on the next call.
For example, define a generator function that returns the numbers 1, 3, and 5 sequentially:
python
def odd():
print('step 1')
yield 1
print('step 2')
yield(3)
print('step 3')
yield(5)
When calling this generator function, we first need to create a generator object and then repeatedly use the next()
function to get the next return value:
python
>>> o = odd()
>>> next(o)
step 1
1
>>> next(o)
step 2
3
>>> next(o)
step 3
5
>>> next(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As we can see, odd
is not a regular function but a generator function. During execution, when it encounters yield
, it pauses and continues on the next call. After executing yield
three times, there are no more yield
statements to execute, so calling next(o)
a fourth time results in an error.
Please note: Calling a generator function creates a generator object, and calling the generator function multiple times creates multiple independent generators. Some students may find that calling next()
returns 1 each time:
python
>>> next(odd())
step 1
1
>>> next(odd())
step 1
1
>>> next(odd())
step 1
1
The reason is that odd()
creates a new generator object. The above code actually creates three completely independent generators, and calling next()
on each of the three generators will naturally return the first value.
The correct way is to create a single generator object and then repeatedly call next()
on that one generator object:
python
>>> g = odd()
>>> next(g)
step 1
1
>>> next(g)
step 2
3
>>> next(g)
step 3
5
Returning to the Fibonacci example, if we continually call yield
during the loop, it will keep interrupting. Of course, we need to set a condition to exit the loop; otherwise, it will generate an infinite sequence.
Similarly, after converting the function to a generator function, we basically never use next()
to get the next return value but directly use a for
loop to iterate:
python
>>> for n in fib(6):
... print(n)
...
1
1
2
3
5
8
However, when using a for
loop to call a generator, we find that we cannot obtain the return value from the generator's return
statement. If we want to capture the return value, we must catch the StopIteration
error, as the return value is contained within the value of the StopIteration
exception:
python
>>> g = fib(6)
>>> while True:
... try:
... x = next(g)
... print('g:', x)
... except StopIteration as e:
... print('Generator return value:', e.value)
... break
...
g: 1
g: 1
g: 2
g: 3
g: 5
g: 8
Generator return value: done
We will discuss error handling in more detail later.
Exercise
Pascal's triangle is defined as follows:
1
/ \
1 1
/ \ / \
1 2 1
/ \ / \ / \
1 3 3 1
/ \ / \ / \ / \
1 4 6 4 1
/ \ / \ / \ / \ / \
1 5 10 10 5 1
Each row can be viewed as a list. Try to write a generator that continually outputs the next row as a list:
python
def triangles():
pass
# Expected output:
# [1]
# [1, 1]
# [1, 2, 1]
# [1, 3, 3, 1]
# [1, 4, 6, 4, 1]
# [1, 5, 10, 10, 5, 1]
# [1, 6, 15, 20, 15, 6, 1]
# [1, 7, 21, 35, 35, 21, 7, 1]
# [1, 8, 28, 56, 70, 56, 28, 8, 1]
# [1, 9, 36, 84, 126, 126, 84, 36, 9, 1]
n = 0
results = []
for t in triangles():
results.append(t)
n = n + 1
if n == 10:
break
for t in results:
print(t)
if results == [
[1],
[1, 1],
[1, 2, 1],
[1, 3, 3, 1],
[1, 4, 6, 4, 1],
[1, 5, 10, 10, 5, 1],
[1, 6, 15, 20, 15, 6, 1],
[1, 7, 21, 35, 35, 21, 7, 1],
[1, 8, 28, 56, 70, 56, 28, 8, 1],
[1, 9, 36, 84, 126, 126, 84, 36, 9, 1]
]:
print('Test passed!')
else:
print('Test failed!')
Summary
Generators are a very powerful tool in Python. You can easily convert a list comprehension into a generator or implement complex logic using functions to create generators.
To understand how generators work, they continuously calculate the next element during the process of a for
loop and terminate the loop under appropriate conditions. For generator functions, encountering a return
statement or reaching the last line of the function body serves as the termination instruction for the generator, and the for
loop ends accordingly.
Please note the distinction between regular functions and generator functions. A regular function call directly returns a result:
python
>>> r = abs(6)
>>> r
6
A generator function call actually returns a generator object:
python
>>> g = fib(6)
>>> g
<generator object fib at 0x1022ef948>