What is an iterable?
Something fit for iterating over
→ we'll see a more formal definition for Python's iterable protocol
Already seen: Sequences and iteration
More general concept of iteration
Iterators → get next item, no indexes needed → consumables
Iterables
Consuming iterators manually
Relationship between sequence types and iterators
Infinite Iterables
Lazy Evaluation
Iterator Delegation
Iterating Sequences
We saw that in the last section → __getitem__
→ assumed indexing started at 0
→ iteration: __getitem__(0), __getitem__(1), etc
But iteration can be more general than based on sequential indexing
All we need is:
a bucket of items → collection, container
get next item → no concept of ordering needed
→ just a way to get items out of the container one by one
a specific order in which this
happens is not required – but
can be
Example: Sets
Sets are unordered collections of items s = {'x', 'y', 'b', 'c', 'a'}
Sets are not indexable s[0]
→ TypeError – 'set' object does not support indexing
But sets are iterable
y
for item in s: c Note that we have no idea of the order in
print(item) → x which the elements are returned in the
b iteration
a
The concept of next
For general iteration, all we really need is the concept of "get the next item" in the collection
If a collection object implements a get_next_item method
we can get elements out of the collection, one get_next_item()
after the other, this way: get_next_item()
get_next_item()
and we could iterate over the collection as follows:
for _ in range(10):
item = coll.get_next_item()
print(item)
But how do we know when to stop asking for the next item?
i.e. when all the elements of the collection have been returned
by calling get_next_item()?
→ StopIteration built-in Exception
Attempting to build an Iterable ourselves
Let's try building our own class, which will be a collection of squares of integers
We could make this a sequence, but we want to avoid the concept of indexing
In order to implement a next method, we need to know what we've already "handed out"
so we can hand out the "next" item without repeating ourselves
class Squares:
def __init__(self):
self.i = 0
def next_(self):
result = self.i ** 2
self.i += 1
return result
class Squares:
Iterating over Squares def __init__(self):
self.i = 0
sq = Squares() def next_(self):
result = self.i ** 2
self.i += 1
return result
for _ in range(5): 0
item = sq.next_() 1
print(item) → 4
9
16
There are a few issues:
→ the collection is essentially infinite
→ cannot use a for loop, comprehension, etc
→ we cannot restart the iteration "from the beginning"
Refining the Squares Class
we first tackle the idea of making the collection finite
• we specify the size of the collection when we create the instance
• we raise a StopIteration exception if next_ has been called too many times
class Squares: class Squares:
def __init__(self): def __init__(self, length):
self.i = 0 self.i = 0
self.length = length
def next_(self):
result = self.i ** 2
self.i += 1 def next_(self):
return result if self.i >= self.length:
raise StopIteration
else:
result = self.i ** 2
self.i += 1
return result
class Squares:
Iterating over Squares instances def __init__(self, length):
self.i = 0
self.length = length
sq = Squares(5) create a collection of length 5 def next_(self):
if self.i >= self.length:
raise StopIteration
while True: start an infinite loop else:
result = self.i ** 2
try: self.i += 1
item = sq.next_() try getting the next item return result
print(item)
except StopIteration: catch the StopIteration exception → nothing left to iterate
break break out of the infinite while loop – we're done iterating
Output: 0
1
4
9
16
Python's next() function
Remember Python's len() function? We could implement that function
for our custom type by
implementing the special method: __len__
Python has a built-in function: next() We can implement that function
for our custom type by
implementing the special method: __next__
class Squares:
def __init__(self, length):
self.i = 0
self.length = length
def __next__(self):
if self.i >= self.length:
raise StopIteration
else:
result = self.i ** 2
self.i += 1
return result
Iterating over Squares instances
sq = Squares(5) Output: 0
while True: 1
try: 4
item = next(sq) 9
print(item) 16
except StopIteration:
break
We still have some issues:
• cannot iterate using for loops, comprehensions, etc
• once the iteration starts we have no way of re-starting it
• and once all the items have been iterated (using next) the
object becomes useless for iteration → exhausted
Code
Exercises
Where we're at so far…
We created a custom container type object with a __next__ method
But it had several drawbacks: → cannot use a for loop
→ once we start using next there's no going back
→ once we have reached StopIteration we're basically
done with the object
Let's tackle the loop issue first
We saw how to iterate using __next__, StopIteration, and a while loop
This is actually how Python handles for loops in general
Somehow, we need to tell Python that our class has that __next__
method and that it will behave in a way consistent with using a
while loop to iterate
Python knows we have __next__, but how does it know we implement
StopIteration?
The iterator Protocol
A protocol is simply a fancy way of saying that our class is going to implement certain
functionality that Python can count on
To let Python know our class can be iterated over using __next__ we implement the iterator protocol
The iterator protocol is quite simple – the class needs to implement two methods:
→ __iter__ this method should just return the object (class instance) itself
sounds weird, but we'll understand why later
→ __next__ this method is responsible for handing back the next
element from the collection and raising the
StopIteration exception when all elements have been
handed out
An object that implements these two methods is called an iterator
Iterators
An iterator is therefore an object that implements:
__iter__ → just returns the object itself
__next__ → returns the next item from the container, or raises SopIteration
If an object is an iterator, we can use it with for loops, comprehensions, etc
Python will know how to loop (iterate) over such an object
(basically using the same while loop technique we used)
Example
Let's go back to our Squares example, and make it into an iterator
sq = Squares(5) 0
class Squares:
1
def __init__(self, length):
for item in sq: → 4
self.i = 0
print(item) 9
self.length = length
16
def __next__(self):
if self.i >= self.length: Still one issue though!
raise StopIteration
else: The iterator cannot be "restarted"
result = self.i ** 2
self.i += 1 Once we have looped through all the items
return result the iterator has been exhausted
def __iter__(self): To loop a second time through the
return self collection we have to create a new
instance and loop through that
Code
Excercises
Iterators
We saw than an iterator is an object that implements
__iter__ → returns the object itself
__next__ → returns the next element
The drawback is that iterators get exhausted → become useless for iterating again
→ become throw away objects
But two distinct things going on:
maintaining the collection of items (the container) (e.g. creating, mutating (if mutable), etc)
iterating over the collection
Why should we have to re-create the collection of items just to
iterate over them?
Separating the Collection from the Iterator
Instead, we would prefer to separate these two
Maintaining the data of the collection should be one object
Iterating over the data should be a separate object → iterator
That object is throw-away → but we don't throw away the collection
The collection is iterable
but the iterator is responsible for iterating over the collection
The iterable is created once
The iterator is created every time we need to start a fresh iteration
Example
class Cities:
def __init__(self):
self._cities = ['Paris', 'Berlin', 'Rome', 'London']
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._cities):
raise StopIteration
else:
item = self._cities[self._index]
self._index += 1
return item
Cities instances are iterators
Every time we want to run a new loop, we have to create a new
instance of Cities
This is wasteful, because we should not have to re-create the _cities
list every time
Example So, let's separate the object that maintains the cities, from the iterator itself
class Cities:
def __init__(self):
self._cities = ['New York', 'New Delhi', 'Newcastle']
def __len__(self):
return len(self._cities)
class CityIterator:
def __init__(self, cities):
self._cities = cities
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._cities):
raise StopIteration
else:
etc…
Example
To use the Cities and CityIterator together here's how we would proceed:
cities = Cities() create an instance of the container object
create a new iterator – but see how we pass in the
city_iterator = CityIterator(cities) existing cities instance
for city in cities_iterator:
can now use the iterator to iterate
print(city)
At this point, the cities_iterator is exhausted
If we want to re-iterate over the collection, we need to create a new one
city_iterator = CityIterator(cities)
for city in cities_iterator:
print(city)
But this time, we did not have to re-create the collection – we just
passed in the existing one!
So far…
At this point we have:
a container that maintains the collection items
a separate object, the iterator, used to iterate over the collection
So we can iterate over the collection as many times as we want
we just have to remember to create a new iterator every time
It would be nice if we did not have to do that manually every time
and if we could just iterate over the Cities object instead of CityIterator
This is where the formal definition of a Python iterable comes in…
Iterables
An iterable is a Python object that implements the iterable protocol
The iterable protocol requires that the object implement a single method
__iter__ returns a new instance of the iterator object
used to iterate over the iterable
class Cities:
def __init__(self):
self._cities = ['New York', 'New Delhi', 'Newcastle']
def __len__(self):
return len(self._cities)
def __iter__(self):
return CityIterator(self)
Iterable vs Iterator
An iterable is an object that implements
__iter__ → returns an iterator (in general, a new instance)
An iterator is an object that implements
__iter__ → returns itself (an iterator) (not a new instance)
__next__ → returns the next element
So iterators are themselves iterables
but they are iterables that become exhausted
Iterables on the other hand never become exhausted
because they always return a new iterator that is then used to iterate
Iterating over an iterable
Python has a built-in function iter()
It calls the __iter__ method (we'll actually come back to this for sequences!)
The first thing Python does when we try to iterate over an object
it calls iter() to obtain an iterator
then it starts iterating (using next, StopIteration, etc)
using the iterator returned by iter()
Code
Exercises
Lazy Evaluation
This is often used in class properties
properties of classes may not always be populated when the object is created
value of a property only becomes known when the property is requested - deferred
Example
class Actor:
def __init__(self, actor_id):
self.actor_id = actor_id
self.bio = lookup_actor_in_db(actor_id)
self.movies = None
@property
def movies(self):
if self.movies is None:
self.movies = lookup_movies_in_db(self.actor_id)
return self.movies
Application to Iterables
We can apply the same concept to certain iterables
We do not calculate the next item in an iterable until it is actually requested
Example
iterable → Factorial(n)
will return factorials of consecutive integers from 0 to n-1
do not pre-compute all the factorials
wait until next requests one, then calculate it
This is a form of lazy evaluation
Application to Iterables
Another application of this might be retrieving a list of forum posts
Posts might be an iterable
each call to next returns a list of 5 posts (or some page size)
but uses lazy loading
→ every time next is called, go back to database and get next 5 posts
Application to Iterables → Infinite Iterables
Using that lazy evaluation technique means that we can actually have infinite iterables
Since items are not computed until they are requested
we can have an infinite number of items in the collection
Don't try to use a for loop over such an iterable
unless you have some type of exit condition in your loop
→ otherwise infinite loop!
Lazy evaluation of iterables is something that is used a lot in Python!
We'll examine that in detail in the next section on generators
Code
Exercises
Python provides many functions that return iterables or iterators
Additionally, the iterators perform lazy evaluation
You should always be aware of whether you are dealing with an iterable or an iterator
why? if an object is an iterable (but not an iterator) you can iterate over it many times
if an object is an iterator you can iterate over it only once
range(10) → iterable
zip(l1, l2) → iterator
enumerate(l1) → iterator
open('cars.csv') → iterator
dictionary .keys() → iterable
dictionary .values() → iterable
dictionary .items() → iterable
and many more…
Code
Exercises
iter()
What happens when Python performs an iterationon over an iterable?
The very first thing Python does is call the iter() function on the object we want to iterate
If the object implements the __iter__ method, that method is called
and Python uses the returned iterator
What happens if the object does not implement the __iter__ method?
Is an exception raised immediately?
Sequence Types
So how does iterating over a sequence type – that maybe only implemented __getitem__ work?
I just said that Python always calls iter() first
You'll notice I did not say Python always calls the __iter__ method
I said it calls the iter() function!!
In fact, if obj is an object that only implements __getitem__
iter(obj) → returns an iterator type object!
Some form of magic at work?
Not really!
Let's think about sequence types and how we can iterate over them
Suppose seq is some sequence type that implements __getitem__ (but not __iter__)
Remember what happens when we request an index that is out of bounds from the
__getitem__ method? → IndexError
index = 0
while True:
try:
print(seq[index])
index += 1
except IndexError:
break
Making an Iterator to iterate over any Sequence
This is basically what we just did!
class SeqIterator:
def __init__(self, seq):
self.seq = seq
self.index = 0
def __iter__(self):
return self
def __next__:
try:
item = self.seq[self.index]
self.index += 1
return item
except IndexError:
raise StopIteration()
Calling iter()
So when iter(obj) is called:
Python first looks for an __iter__ method
→ if it's there, use it
→ if it's not
look for a __getitem__ method
→ if it's there create an iterator object and return that
→ if it's not there, raise a TypeError exception (not iterable)
Testing if an object is iterable
Sometimes (very rarely!)
you may want to know if an object is iterable or not
But now you would have to check if they implement
__getitem__ or __iter__
and that __iter__ returns an iterator
Easier approach: try:
iter(obj)
except TypeError:
# not iterable
<code>
else:
# is iterable
<code>
Code
Exercises
Iterating over the return values of a callable
Consider a callable that provides a countdown from some start value:
countdown() → 5
We now want to run a loop that will call countdown()
countdown() → 4
until 0 is reached
countdown() → 3
countdown() → 2
countdown() → 1 We could certainly do that using a loop and testing the
countdown() → 0 value to break out of the loop once 0 has been reached
countdown() → -1
...
while True:
val = countdown()
if val == 0:
break
else:
print(val)
An iterator approach
We could take a different approach, using iterators, and we can also make it quite generic
Make an iterator that knows two things:
the callable that needs to be called
a value (the sentinel) that will result in a StopIteration if the callable returns that value
The iterator would then be implemented as follows:
when next() is called:
call the callable and get the result
if the result is equal to the sentinel → StopIteration
and "exhaust" the iterator
otherwise return the result
We can then simply iterate over the iterator until it is exhausted
The first form of the iter() function
We just studied the first form of the iter() function:
iter(iterable) → iterator for iterable
if the iterable did not implement the iterator protocol, but implemented the sequence protocol
iter() creates a iterator for us (leveraging the sequence protocol)
Notice that the iter() function was able to generate an iterator for us automatically
The second form of the iter() function
iter(callable, sentinel)
This will return an iterator that will:
call the callable when next() is called
and either raise StopIteration if the result is equal to the sentinel value
or return the result otherwise
Coding
Exercises
Iterating a sequence in reverse order
If we have a sequence type, then iterating over the sequence in reverse order is quite simple:
for item in seq[::-1]: This works, but is wasteful because it makes a copy of
print(item) the sequence
for i in range(len(seq)):
print(seq[len(seq) – i – 1])
This is more efficient, but the syntax is messy
for i in range(len(seq)-1, -1, -1):
print(seq[i])
for item in reversed(seq) This is cleaner and just as efficient, because it
print(item) creates an iterator that will iterate backwards
over the sequence – it does not copy the
data like the first example
Both __getitem__ and __len__ must be implemented
We can override how reversed works by implementing
the __reversed__ special method
Iterating an iterable in reverse
Unfortunately, reversed() will not work with custom iterables without a little bit of extra work
When we call reversed() on a custom iterable, Python will look for and call
the __reversed__ function
That function should return an iterator that will be used to perform the reversed iteration
So basically we have to implement a reverse iterator ourselves
Just like the iter() method, when we call reversed() on an object:
looks for and calls __reversed__ method
if it's not there, uses __getitem__ and __len__
to create an iterator for us
exception otherwise
Card Deck Example
In the code exercises I am going to build an iterable containing a deck of 52 sorted cards
2 Spades … Ace Spades, 2 Hearts … Ace Hearts, 2 Diamonds … Ace Diamonds, 2 Clubs … Ace Clubs
But I don't want to create a list containing all the pre-created cards → Lazy evaluation
So I want my iterator to figure out the suit and card name for a given index in the sorted deck
SUITS = ['Spades', 'Hearts', 'Diamonds', 'Clubs']
RANKS = [2, 3, …, 10, 'J', 'Q', 'K', 'A']
We assume the deck is sorted as follows:
iterate over SUITS
for each suit iterate over RANKS
card = combination of suit and rank
SUITS = ['Spades', 'Hearts', 'Diamonds', 'Clubs']
Card Deck Example
RANKS = [2, 3, …, 10, 'J', 'Q', 'K', 'A']
2S … AS 2H … AH 2D … AD 2C … AC
There are len(SUITS) suits 4 There are len(RANKS) ranks 13
The deck has a length of: len(SUITS) * len(RANKS) 52
Each card in this deck has a positional index: a number from 0 to len(deck) - 1 0 - 51
To find the suit index of a card at index i: To find the rank index of a card at index i:
i // len(RANKS) i % len(RANKS)
Examples Examples
5th card (6S) → index 4 5th card (6S) → index 4
→ 4 // 13 → 0 → 4 % 13 → 4
16th card (4H) → index 15 16th card (4H) → index 15
→ 15 // 13 → 1 → 15 % 13 → 2
Code
Exercises