Mastering Python Generators - A Comprehensive Guide with Examples
Generators are a powerful feature in Python that allow you to create iterators in a simple and memory-efficient way. In this comprehensive tutorial, we’ll dive deep into the world of generators, exploring their inner workings, use cases, and best practices with numerous examples and step-by-step code explanations.
GitHub Repo : Python Generators
What is a Generator?
A generator is a special type of function that returns an iterator. Unlike regular functions that use the return
statement to return a value and terminate, generators use the yield
keyword to produce a sequence of values, one at a time, and maintain their state between successive calls.
Here’s a simple example to illustrate the concept:
1
2
3
4
5
6
7
8
9
10
def gen_demo():
yield "first statement"
yield "second statement"
yield "third statement"
gen = gen_demo()
print(next(gen)) # Output: first statement
print(next(gen)) # Output: second statement
print(next(gen)) # Output: third statement
In the above example, gen_demo
is a generator function that yields three strings. When we call gen_demo()
, it returns a generator object gen
. We can then use the next()
function to retrieve the next value yielded by the generator.
Step-by-Step Explanation:
Function Definition: We define the
gen_demo()
function using thedef
keyword. This function is a generator, which means it will produce a sequence of values one at a time.Using
yield
: Insidegen_demo()
, instead of usingreturn
to send back a single result, we use theyield
keyword.yield
allows the function to return a value while saving its state so that it can resume from where it left off when called again.Creating the Generator: We call
gen_demo()
and assign the resulting generator object to the variablegen
. This does not immediately execute the function; it just prepares it for later use.Retrieving Values with
next()
: We use thenext(gen)
function to retrieve the next value yielded by the generatorgen
.First
next()
Call: Whennext(gen)
is first called, it starts executinggen_demo()
. The function runs until it reaches the firstyield
statement and returns “first statement”.Subsequent
next()
Calls: Each subsequent call tonext(gen)
causes the function to resume execution from where it was paused by the lastyield
statement. It continues until it encounters the nextyield
statement, returning the next value in the sequence.Sequence of Execution:
- First call: Executes until the first
yield
, returning “first statement”. - Second call: Resumes from where it left off, executes until the next
yield
, returning “second statement”. - Third call: Again resumes from where it paused, executes until the next
yield
, returning “third statement”.
- First call: Executes until the first
The Need for Generators
Before diving into the intricacies of generators, let’s understand why we need them in the first place. Consider the following example:
1
2
3
4
5
6
7
L = [x for x in range(100000)]
import sys
print(sys.getsizeof(L)) # Output: 824456
x = range(10000000)
print(sys.getsizeof(x)) # Output: 48
In the first case, we create a list L
with 100,000 elements. This operation allocates memory for the entire list, resulting in a sizeable memory footprint of 824,456 bytes. In contrast, the range
object in the second case doesn’t actually store all the values in memory; instead, it generates them on-the-fly as needed, occupying only 48 bytes.
This example highlights the memory efficiency of generators, which becomes increasingly important when dealing with large datasets or infinite streams of data.
Generator Functions vs Regular Functions
While regular functions use the return
statement to return a value and terminate, generator functions use the yield
keyword to produce a sequence of values, one at a time, and maintain their state between successive calls.
To better understand the difference between yield
and return
, let’s use the Python Tutor visualization tool:
1
2
3
4
5
6
7
8
9
def regular_function():
result = []
for i in range(3):
result.append(i)
return result
def generator_function():
for i in range(3):
yield i
In the regular function, the entire list [0, 1, 2]
is created and returned, while in the generator function, values are yielded one by one, preserving the state of the function between iterations.
Step-by-Step Explanation:
- Definition of
regular_function()
:- The
regular_function()
is defined using thedef
keyword. - Inside the function, an empty list named
result
is created.
- The
- Appending Values to
result
:- A
for
loop is used to iterate from 0 to 2 (inclusive). - During each iteration, the current value (0, 1, or 2) is appended to the
result
list.
- A
- Return the Resulting List:
- Once the loop finishes, the function returns the
result
list containing[0, 1, 2]
.
- Once the loop finishes, the function returns the
- Definition of
generator_function()
:- The
generator_function()
is also defined using thedef
keyword.
- The
- Yielding Values:
- Inside this function, a
for
loop is used to iterate from 0 to 2 (inclusive). - Instead of appending values to a list, each value (0, 1, 2) is yielded using the
yield
keyword during each iteration.
- Inside this function, a
- State Maintenance with
yield
:- The
yield
keyword allows the function’s state to be preserved between successive iterations of the loop. - After yielding a value, the function is paused until the next iteration, retaining its current state (e.g., the loop variable’s value and other local variables).
- The
In summary:
regular_function()
creates a list and populates it with values using a loop, then returns the complete list.generator_function()
yields values one by one during each iteration of a loop, maintaining its execution state between eachyield
statement. This function can be used as a generator to produce values lazily without generating the entire sequence upfront.
Creating Generators
There are several ways to create generators in Python:
Generator Functions: As shown earlier, you can define a generator function using the
yield
keyword.1 2 3
def square(num): for i in range(1, num+1): yield i**2
In this example, the
square
function is a generator function that yields the squares of numbers from 1 tonum
.
Step-by-Step Explanation:
- Definition of
square(num)
Function:- The
square(num)
function is defined using thedef
keyword, withnum
as a parameter representing the maximum value for iteration.
- The
- Iterating with a
for
Loop:- Inside the function, a
for
loop is used to iterate from 1 tonum
(inclusive). This loop will run for each integer value from 1 up to and includingnum
.
- Inside the function, a
- Using
yield
to Produce Squares:- During each iteration of the loop, the
yield
keyword is used to produce the square of the current value (i
). - For example, if the loop is currently at
i
,yield i * i
will produce the square ofi
.
- During each iteration of the loop, the
- State Maintenance with
yield
:- The
yield
keyword allows the function’s state to be preserved between successive iterations of the loop. - After yielding a value (in this case, the square of
i
), the function is paused until the next iteration, retaining its current state (such as the loop variable’s value and other local variables).
- The
In summary, the square(num)
function generates and yields the square of each integer from 1 up to num
lazily, one value at a time. This makes square(num)
a generator function, allowing you to iterate over the squares of numbers without having to compute and store all of them in memory at once. You can use this function in a loop or with other generator functions to process the yielded values as needed.
To use this generator function, you can create a generator object and iterate over it:
1
2
3
gen = square(10)
for i in gen:
print(i)
This will print the squares of numbers from 1 to 10:
1
2
3
4
5
6
7
8
9
10
1
4
9
16
25
36
49
64
81
100
Generator Expressions: Similar to list comprehensions, generator expressions provide a concise way to create generators. Here’s an example:
1 2 3
gen = (i**2 for i in range(1, 101)) for i in gen: print(i)
This will print the squares of numbers from 1 to 100.
Step-by-Step Explanation:
- Generator Expression Creation:
- The generator expression
(i**2 for i in range(1, 101))
is used to create a generator object namedgen
. - This expression defines a sequence where each value is the square of
i
fori
in the range1
to100
(inclusive).
- The generator expression
- Evaluation of
i**2
for Eachi
:- As the generator is iterated over, the expression
i**2
is evaluated for each value ofi
in the range1
to100
. - This means that
i**2
computes the square of each number in the specified range.
- As the generator is iterated over, the expression
- On-the-Fly Value Production:
- Unlike list comprehensions that eagerly create a list of all values, the generator expression produces values on-the-fly as they are needed.
- This lazy evaluation is a key feature of generator expressions. It conserves memory by generating values only when requested, rather than storing them all at once in memory.
- Iteration with a
for
Loop:- A
for
loop is used to iterate over the generator objectgen
. - During each iteration, the loop retrieves the next value (the square of the current number) from the generator and prints it.
- The loop continues until all values in the generator have been exhausted.
- A
In summary, the generator expression (i**2 for i in range(1, 101))
efficiently computes and yields the square of each number from 1 to 100, producing values as needed without precomputing and storing them in memory. This approach is suitable for scenarios where memory conservation and lazy evaluation are important considerations.
Iterating over Generators: You can create a generator by iterating over another generator or an iterable object. For example:
1 2 3 4 5 6 7
import os import cv2 def image_data_reader(folder_path): for file in os.listdir(folder_path): f_array = cv2.imread(os.path.join(folder_path, file)) yield f_array
This generator function reads image files from a specified folder path and yields their numpy arrays one by one, allowing you to process large datasets without loading them entirely into memory.
Step-by-Step Explanation:
- The
image_data_reader(folder_path)
function is defined with thedef
keyword, taking afolder_path
parameter. - Inside the function, a
for
loop iterates over the list of files in thefolder_path
directory, obtained usingos.listdir(folder_path)
. - For each file, the
cv2.imread()
function from the OpenCV library is used to read the image file and store its numpy array representation in thef_array
variable. - The
yield
keyword is used to produce thef_array
value, allowing the function to generate image arrays one by one, without loading all of them into memory at once.
- The
To use the image_data_reader
generator function, you can iterate over the generator object it returns. Here’s an example of how you can do this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import os
import cv2
def image_data_reader(folder_path):
for file in os.listdir(folder_path):
f_array = cv2.imread(os.path.join(folder_path, file))
yield f_array
# Specify the folder path containing the image files
folder_path = "path/to/image/folder"
# Create a generator object
image_generator = image_data_reader(folder_path)
# Iterate over the generator to process each image array
for image_array in image_generator:
# Perform operations on the image array
# For example, display the image
cv2.imshow("Image", image_array)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here’s a step-by-step explanation of how to use the generator:
- You call the
image_data_reader
function with thefolder_path
argument, which creates a generator object. - The generator object is assigned to the
image_generator
variable. - You can then iterate over the
image_generator
using afor
loop. - Inside the loop, each iteration yields the next image array (
image_array
) from the generator. - Within the loop, you can perform operations on the
image_array
, such as displaying the image using OpenCV’scv2.imshow()
function.
By using a generator, you can process large datasets of image files without loading all of them into memory at once, which can be more memory-efficient, especially when dealing with large datasets or limited system resources.
Note: Make sure to replace "path/to/image/folder"
with the actual path to the folder containing your image files.
Benefits of Using Generators
Using generators in Python offers several benefits:
Memory Efficiency: As demonstrated earlier, generators don’t store the entire sequence in memory, making them more memory-efficient, especially when dealing with large datasets or infinite streams of data.
Representing Infinite Streams: Generators can represent infinite streams of data, as they produce values on-the-fly. For example:
1 2 3 4 5
def all_even(): n = 0 while True: yield n n += 2
This generator function produces an infinite stream of even numbers.
Step-by-Step Explanation:
- The
all_even()
function is defined with thedef
keyword. - Inside the function, the variable
n
is initialized to 0. - An infinite
while True
loop is used to generate even numbers indefinitely. - Inside the loop, the current value of
n
is yielded using theyield
keyword. - The value of
n
is incremented by 2 to generate the next even number. - The loop continues indefinitely, generating and yielding even numbers on-the-fly.
To use this generator function, you can create a generator object and iterate over it (with care, as it will produce an infinite stream of values):
1 2 3 4 5
gen = all_even() print(next(gen)) # Output: 0 print(next(gen)) # Output: 2 print(next(gen)) # Output: 4 # ... and so on
- The
Chaining Generators: Generators can be chained together, allowing you to create complex data pipelines. For example:
1 2 3 4 5 6 7 8 9 10 11
def fibonacci_numbers(nums): x, y = 0, 1 for _ in range(nums): x, y = y, x+y yield x def square(nums): for num in nums: yield num**2 print(sum(square(fibonacci_numbers(10)))) # Output: 4895
In this example, the
fibonacci_numbers
generator produces the first 10 Fibonacci numbers, which are then squared by thesquare
generator, and finally summed up.Step-by-Step Explanation:
- The
fibonacci_numbers(nums)
function is defined to generate the firstnums
Fibonacci numbers. - Inside the function,
x
andy
are initialized to 0 and 1, respectively, representing the first two Fibonacci numbers. - A
for
loop iteratesnums
times, generating the next Fibonacci number by swapping the values ofx
andy
and adding them (x, y = y, x+y
). - The current Fibonacci number (
x
) is yielded using theyield
keyword. - The
square(nums)
function is defined to take an iterablenums
and yield the square of each number. - Inside the
square
function, afor
loop iterates over the elements innums
. - For each element
num
, the squarenum**2
is yielded using theyield
keyword. - The
sum(square(fibonacci_numbers(10)))
expression creates a generatorfibonacci_numbers(10)
that generates the first 10 Fibonacci numbers. - The
square
generator takes thefibonacci_numbers(10)
generator as input and squares each Fibonacci number. - The
sum
function consumes the squared Fibonacci numbers from thesquare
generator and calculates their sum, which is printed as the output (4895).
- The
Ease of Implementation: Generators are relatively easy to implement compared to creating custom iterator classes.
Representing Ranges and Sequences: Generators can be used to represent ranges and sequences in a memory-efficient way. For example:
1 2 3
def mera_range(start, end): for i in range(start, end): yield i
This generator function behaves like the built-in
range
function but uses a generator to produce values on-the-fly.Step-by-Step Explanation:
- The
mera_range(start, end)
function is defined with thedef
keyword, takingstart
andend
parameters. - Inside the function, a
for
loop iterates fromstart
toend-1
using the built-inrange
function. - For each iteration, the current value
i
is yielded using theyield
keyword.
To use this generator function, you can create a generator object and iterate over it:
1 2
for i in mera_range(15, 26): print(i)
This will print the numbers from 15 to 25 (inclusive):
1 2 3 4 5 6 7 8 9 10 11
15 16 17 18 19 20 21 22 23 24 25
- The
Advanced Topics
While the basics of generators are relatively straightforward, there are some advanced topics worth exploring:
Generator Delegation: Generators can delegate part of their operation to a different generator, allowing for more modular and reusable code.
1 2 3 4 5 6 7 8 9
def gen_range(start, end): while start < end: yield start start += 1 def gen_delegated(start, end, step): for i in gen_range(start, end): if i % step == 0: yield i
In this example, the
gen_delegated
generator delegates the task of generating numbers to thegen_range
generator and filters out values that are not divisible by thestep
value.Generator Pipelines: By chaining multiple generators together, you can create powerful data processing pipelines, enabling efficient data transformations and filtering.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
def integers(): i = 0 while True: yield i i += 1 def squares(nums): for n in nums: yield n ** 2 def odds(nums): for n in nums: if n % 2 != 0: yield n # Pipeline: Integers -> Squares -> Odds pipeline = odds(squares(integers())) for i in pipeline: if i > 100: break print(i)
In this example, we create a pipeline that generates integers, squares them, and filters out the even numbers. The pipeline is constructed by chaining the
odds
,squares
, andintegers
generators together.Coroutines and Async Generators: Python’s coroutines and async generators provide a way to write concurrent code using generators, making it easier to manage and reason about asynchronous operations.
Generator-based Algorithms: Many algorithms, such as graph traversal algorithms (e.g., breadth-first search, depth-first search), can be elegantly implemented using generators.
1 2 3 4 5 6 7 8
def bfs(graph, start): visited, queue = set(), [start] while queue: vertex = queue.pop(0) if vertex not in visited: visited.add(vertex) yield vertex queue.extend(graph[vertex] - visited)
This generator function implements the breadth-first search (BFS) algorithm for traversing a graph, yielding each visited vertex one by one.
Conclusion
Python generators are a powerful and versatile tool that offer memory efficiency, ease of implementation, and the ability to represent infinite streams of data. By understanding and leveraging generators, you can write more efficient and elegant code, especially when dealing with large datasets or infinite data streams.
Throughout this tutorial, we’ve covered the fundamental concepts of generators, their creation methods, benefits, and advanced topics with numerous examples and step-by-step code explanations. With the knowledge gained from this tutorial, you should now be well-equipped to incorporate generators into your Python projects and unlock their full potential.