Mastering Python Generators - A Comprehensive Guide with Examples
Generators are a powerful feature in Python that allow you to create iterators in a simple and memory-efficient way. In this comprehensive tutorial, we’ll dive deep into the world of generators, exploring their inner workings, use cases, and best practices with numerous examples and step-by-step code explanations.
GitHub Repo : Python Generators
What is a Generator?
A generator is a special type of function that returns an iterator. Unlike regular functions that use the return statement to return a value and terminate, generators use the yield keyword to produce a sequence of values, one at a time, and maintain their state between successive calls.
Here’s a simple example to illustrate the concept:
1
2
3
4
5
6
7
8
9
10
def gen_demo():
yield "first statement"
yield "second statement"
yield "third statement"
gen = gen_demo()
print(next(gen)) # Output: first statement
print(next(gen)) # Output: second statement
print(next(gen)) # Output: third statement
In the above example, gen_demo is a generator function that yields three strings. When we call gen_demo(), it returns a generator object gen. We can then use the next() function to retrieve the next value yielded by the generator.
Step-by-Step Explanation:
Function Definition: We define the
gen_demo()function using thedefkeyword. This function is a generator, which means it will produce a sequence of values one at a time.Using
yield: Insidegen_demo(), instead of usingreturnto send back a single result, we use theyieldkeyword.yieldallows the function to return a value while saving its state so that it can resume from where it left off when called again.Creating the Generator: We call
gen_demo()and assign the resulting generator object to the variablegen. This does not immediately execute the function; it just prepares it for later use.Retrieving Values with
next(): We use thenext(gen)function to retrieve the next value yielded by the generatorgen.First
next()Call: Whennext(gen)is first called, it starts executinggen_demo(). The function runs until it reaches the firstyieldstatement and returns “first statement”.Subsequent
next()Calls: Each subsequent call tonext(gen)causes the function to resume execution from where it was paused by the lastyieldstatement. It continues until it encounters the nextyieldstatement, returning the next value in the sequence.Sequence of Execution:
- First call: Executes until the first
yield, returning “first statement”. - Second call: Resumes from where it left off, executes until the next
yield, returning “second statement”. - Third call: Again resumes from where it paused, executes until the next
yield, returning “third statement”.
- First call: Executes until the first
The Need for Generators
Before diving into the intricacies of generators, let’s understand why we need them in the first place. Consider the following example:
1
2
3
4
5
6
7
L = [x for x in range(100000)]
import sys
print(sys.getsizeof(L)) # Output: 824456
x = range(10000000)
print(sys.getsizeof(x)) # Output: 48
In the first case, we create a list L with 100,000 elements. This operation allocates memory for the entire list, resulting in a sizeable memory footprint of 824,456 bytes. In contrast, the range object in the second case doesn’t actually store all the values in memory; instead, it generates them on-the-fly as needed, occupying only 48 bytes.
This example highlights the memory efficiency of generators, which becomes increasingly important when dealing with large datasets or infinite streams of data.
Generator Functions vs Regular Functions
While regular functions use the return statement to return a value and terminate, generator functions use the yield keyword to produce a sequence of values, one at a time, and maintain their state between successive calls.
To better understand the difference between yield and return, let’s use the Python Tutor visualization tool:
1
2
3
4
5
6
7
8
9
def regular_function():
result = []
for i in range(3):
result.append(i)
return result
def generator_function():
for i in range(3):
yield i
In the regular function, the entire list [0, 1, 2] is created and returned, while in the generator function, values are yielded one by one, preserving the state of the function between iterations.
Step-by-Step Explanation:
- Definition of
regular_function():- The
regular_function()is defined using thedefkeyword. - Inside the function, an empty list named
resultis created.
- The
- Appending Values to
result:- A
forloop is used to iterate from 0 to 2 (inclusive). - During each iteration, the current value (0, 1, or 2) is appended to the
resultlist.
- A
- Return the Resulting List:
- Once the loop finishes, the function returns the
resultlist containing[0, 1, 2].
- Once the loop finishes, the function returns the
- Definition of
generator_function():- The
generator_function()is also defined using thedefkeyword.
- The
- Yielding Values:
- Inside this function, a
forloop is used to iterate from 0 to 2 (inclusive). - Instead of appending values to a list, each value (0, 1, 2) is yielded using the
yieldkeyword during each iteration.
- Inside this function, a
- State Maintenance with
yield:- The
yieldkeyword allows the function’s state to be preserved between successive iterations of the loop. - After yielding a value, the function is paused until the next iteration, retaining its current state (e.g., the loop variable’s value and other local variables).
- The
In summary:
regular_function()creates a list and populates it with values using a loop, then returns the complete list.generator_function()yields values one by one during each iteration of a loop, maintaining its execution state between eachyieldstatement. This function can be used as a generator to produce values lazily without generating the entire sequence upfront.
Creating Generators
There are several ways to create generators in Python:
Generator Functions: As shown earlier, you can define a generator function using the
yieldkeyword.1 2 3
def square(num): for i in range(1, num+1): yield i**2
In this example, the
squarefunction is a generator function that yields the squares of numbers from 1 tonum.
Step-by-Step Explanation:
- Definition of
square(num)Function:- The
square(num)function is defined using thedefkeyword, withnumas a parameter representing the maximum value for iteration.
- The
- Iterating with a
forLoop:- Inside the function, a
forloop is used to iterate from 1 tonum(inclusive). This loop will run for each integer value from 1 up to and includingnum.
- Inside the function, a
- Using
yieldto Produce Squares:- During each iteration of the loop, the
yieldkeyword is used to produce the square of the current value (i). - For example, if the loop is currently at
i,yield i * iwill produce the square ofi.
- During each iteration of the loop, the
- State Maintenance with
yield:- The
yieldkeyword allows the function’s state to be preserved between successive iterations of the loop. - After yielding a value (in this case, the square of
i), the function is paused until the next iteration, retaining its current state (such as the loop variable’s value and other local variables).
- The
In summary, the square(num) function generates and yields the square of each integer from 1 up to num lazily, one value at a time. This makes square(num) a generator function, allowing you to iterate over the squares of numbers without having to compute and store all of them in memory at once. You can use this function in a loop or with other generator functions to process the yielded values as needed.
To use this generator function, you can create a generator object and iterate over it:
1
2
3
gen = square(10)
for i in gen:
print(i)
This will print the squares of numbers from 1 to 10:
1
2
3
4
5
6
7
8
9
10
1
4
9
16
25
36
49
64
81
100
Generator Expressions: Similar to list comprehensions, generator expressions provide a concise way to create generators. Here’s an example:
1 2 3
gen = (i**2 for i in range(1, 101)) for i in gen: print(i)
This will print the squares of numbers from 1 to 100.
Step-by-Step Explanation:
- Generator Expression Creation:
- The generator expression
(i**2 for i in range(1, 101))is used to create a generator object namedgen. - This expression defines a sequence where each value is the square of
iforiin the range1to100(inclusive).
- The generator expression
- Evaluation of
i**2for Eachi:- As the generator is iterated over, the expression
i**2is evaluated for each value ofiin the range1to100. - This means that
i**2computes the square of each number in the specified range.
- As the generator is iterated over, the expression
- On-the-Fly Value Production:
- Unlike list comprehensions that eagerly create a list of all values, the generator expression produces values on-the-fly as they are needed.
- This lazy evaluation is a key feature of generator expressions. It conserves memory by generating values only when requested, rather than storing them all at once in memory.
- Iteration with a
forLoop:- A
forloop is used to iterate over the generator objectgen. - During each iteration, the loop retrieves the next value (the square of the current number) from the generator and prints it.
- The loop continues until all values in the generator have been exhausted.
- A
In summary, the generator expression (i**2 for i in range(1, 101)) efficiently computes and yields the square of each number from 1 to 100, producing values as needed without precomputing and storing them in memory. This approach is suitable for scenarios where memory conservation and lazy evaluation are important considerations.
Iterating over Generators: You can create a generator by iterating over another generator or an iterable object. For example:
1 2 3 4 5 6 7
import os import cv2 def image_data_reader(folder_path): for file in os.listdir(folder_path): f_array = cv2.imread(os.path.join(folder_path, file)) yield f_array
This generator function reads image files from a specified folder path and yields their numpy arrays one by one, allowing you to process large datasets without loading them entirely into memory.
Step-by-Step Explanation:
- The
image_data_reader(folder_path)function is defined with thedefkeyword, taking afolder_pathparameter. - Inside the function, a
forloop iterates over the list of files in thefolder_pathdirectory, obtained usingos.listdir(folder_path). - For each file, the
cv2.imread()function from the OpenCV library is used to read the image file and store its numpy array representation in thef_arrayvariable. - The
yieldkeyword is used to produce thef_arrayvalue, allowing the function to generate image arrays one by one, without loading all of them into memory at once.
- The
To use the image_data_reader generator function, you can iterate over the generator object it returns. Here’s an example of how you can do this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import os
import cv2
def image_data_reader(folder_path):
for file in os.listdir(folder_path):
f_array = cv2.imread(os.path.join(folder_path, file))
yield f_array
# Specify the folder path containing the image files
folder_path = "path/to/image/folder"
# Create a generator object
image_generator = image_data_reader(folder_path)
# Iterate over the generator to process each image array
for image_array in image_generator:
# Perform operations on the image array
# For example, display the image
cv2.imshow("Image", image_array)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here’s a step-by-step explanation of how to use the generator:
- You call the
image_data_readerfunction with thefolder_pathargument, which creates a generator object. - The generator object is assigned to the
image_generatorvariable. - You can then iterate over the
image_generatorusing aforloop. - Inside the loop, each iteration yields the next image array (
image_array) from the generator. - Within the loop, you can perform operations on the
image_array, such as displaying the image using OpenCV’scv2.imshow()function.
By using a generator, you can process large datasets of image files without loading all of them into memory at once, which can be more memory-efficient, especially when dealing with large datasets or limited system resources.
Note: Make sure to replace "path/to/image/folder" with the actual path to the folder containing your image files.
Benefits of Using Generators
Using generators in Python offers several benefits:
Memory Efficiency: As demonstrated earlier, generators don’t store the entire sequence in memory, making them more memory-efficient, especially when dealing with large datasets or infinite streams of data.
Representing Infinite Streams: Generators can represent infinite streams of data, as they produce values on-the-fly. For example:
1 2 3 4 5
def all_even(): n = 0 while True: yield n n += 2
This generator function produces an infinite stream of even numbers.
Step-by-Step Explanation:
- The
all_even()function is defined with thedefkeyword. - Inside the function, the variable
nis initialized to 0. - An infinite
while Trueloop is used to generate even numbers indefinitely. - Inside the loop, the current value of
nis yielded using theyieldkeyword. - The value of
nis incremented by 2 to generate the next even number. - The loop continues indefinitely, generating and yielding even numbers on-the-fly.
To use this generator function, you can create a generator object and iterate over it (with care, as it will produce an infinite stream of values):
1 2 3 4 5
gen = all_even() print(next(gen)) # Output: 0 print(next(gen)) # Output: 2 print(next(gen)) # Output: 4 # ... and so on
- The
Chaining Generators: Generators can be chained together, allowing you to create complex data pipelines. For example:
1 2 3 4 5 6 7 8 9 10 11
def fibonacci_numbers(nums): x, y = 0, 1 for _ in range(nums): x, y = y, x+y yield x def square(nums): for num in nums: yield num**2 print(sum(square(fibonacci_numbers(10)))) # Output: 4895
In this example, the
fibonacci_numbersgenerator produces the first 10 Fibonacci numbers, which are then squared by thesquaregenerator, and finally summed up.Step-by-Step Explanation:
- The
fibonacci_numbers(nums)function is defined to generate the firstnumsFibonacci numbers. - Inside the function,
xandyare initialized to 0 and 1, respectively, representing the first two Fibonacci numbers. - A
forloop iteratesnumstimes, generating the next Fibonacci number by swapping the values ofxandyand adding them (x, y = y, x+y). - The current Fibonacci number (
x) is yielded using theyieldkeyword. - The
square(nums)function is defined to take an iterablenumsand yield the square of each number. - Inside the
squarefunction, aforloop iterates over the elements innums. - For each element
num, the squarenum**2is yielded using theyieldkeyword. - The
sum(square(fibonacci_numbers(10)))expression creates a generatorfibonacci_numbers(10)that generates the first 10 Fibonacci numbers. - The
squaregenerator takes thefibonacci_numbers(10)generator as input and squares each Fibonacci number. - The
sumfunction consumes the squared Fibonacci numbers from thesquaregenerator and calculates their sum, which is printed as the output (4895).
- The
Ease of Implementation: Generators are relatively easy to implement compared to creating custom iterator classes.
Representing Ranges and Sequences: Generators can be used to represent ranges and sequences in a memory-efficient way. For example:
1 2 3
def mera_range(start, end): for i in range(start, end): yield i
This generator function behaves like the built-in
rangefunction but uses a generator to produce values on-the-fly.Step-by-Step Explanation:
- The
mera_range(start, end)function is defined with thedefkeyword, takingstartandendparameters. - Inside the function, a
forloop iterates fromstarttoend-1using the built-inrangefunction. - For each iteration, the current value
iis yielded using theyieldkeyword.
To use this generator function, you can create a generator object and iterate over it:
1 2
for i in mera_range(15, 26): print(i)
This will print the numbers from 15 to 25 (inclusive):
1 2 3 4 5 6 7 8 9 10 11
15 16 17 18 19 20 21 22 23 24 25
- The
Advanced Topics
While the basics of generators are relatively straightforward, there are some advanced topics worth exploring:
Generator Delegation: Generators can delegate part of their operation to a different generator, allowing for more modular and reusable code.
1 2 3 4 5 6 7 8 9
def gen_range(start, end): while start < end: yield start start += 1 def gen_delegated(start, end, step): for i in gen_range(start, end): if i % step == 0: yield i
In this example, the
gen_delegatedgenerator delegates the task of generating numbers to thegen_rangegenerator and filters out values that are not divisible by thestepvalue.Generator Pipelines: By chaining multiple generators together, you can create powerful data processing pipelines, enabling efficient data transformations and filtering.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
def integers(): i = 0 while True: yield i i += 1 def squares(nums): for n in nums: yield n ** 2 def odds(nums): for n in nums: if n % 2 != 0: yield n # Pipeline: Integers -> Squares -> Odds pipeline = odds(squares(integers())) for i in pipeline: if i > 100: break print(i)
In this example, we create a pipeline that generates integers, squares them, and filters out the even numbers. The pipeline is constructed by chaining the
odds,squares, andintegersgenerators together.Coroutines and Async Generators: Python’s coroutines and async generators provide a way to write concurrent code using generators, making it easier to manage and reason about asynchronous operations.
Generator-based Algorithms: Many algorithms, such as graph traversal algorithms (e.g., breadth-first search, depth-first search), can be elegantly implemented using generators.
1 2 3 4 5 6 7 8
def bfs(graph, start): visited, queue = set(), [start] while queue: vertex = queue.pop(0) if vertex not in visited: visited.add(vertex) yield vertex queue.extend(graph[vertex] - visited)
This generator function implements the breadth-first search (BFS) algorithm for traversing a graph, yielding each visited vertex one by one.
Conclusion
Python generators are a powerful and versatile tool that offer memory efficiency, ease of implementation, and the ability to represent infinite streams of data. By understanding and leveraging generators, you can write more efficient and elegant code, especially when dealing with large datasets or infinite data streams.
Throughout this tutorial, we’ve covered the fundamental concepts of generators, their creation methods, benefits, and advanced topics with numerous examples and step-by-step code explanations. With the knowledge gained from this tutorial, you should now be well-equipped to incorporate generators into your Python projects and unlock their full potential.
