Iterators and Generators

6 min read · Oct 8, 2018

Are you satisfied with your code? if yes never do that because nothing is perfect so have a look and see whether you are consuming more memory, extra processing or even more lines of code which can be avoided, do that favour to your code because “Iterators and Generators” are going to help you.

This article is based on Python’s excellent features iterators and generators which make our code easy to understand, clean, and low memory consuming. We will cover iterators and generators’ examples and and their explanation how they work, how we can make our own custom iterators and generators. This article is for beginner level python programmers.If something you do not understand, be patient and hopefully you will get your answer.

Iterator is an object which follows iteration protocol. Iteration protocol consists of two methods **iter** and next().

The **iter** method returns an iterator object and next() method returns next element in the sequence and when there is no element to return it should raise StopIteration exception.

Built-in Iterators:

In Python language, most of the data structures implement iteration protocol, for example list, string, tuple, dictionary, socket and file are iterable objects and return an iterator when we call iter() method it returns an iterator object for that iterable.

Iterator = iter (iterable)

A simple example is much more understandable than words.

>>> names = ['John', 'Adam', 'Mark', 'Eve'\] >>> iterator = iter(names) >>> iterator.next() 'John' >>> iterator.next() 'Adam' >>> iterator.next() 'Mark' >>> iterator.next() 'Eve'

but when we reach the end of the sequence, after that calling next() will raise a StopIterationexception.

>>> iterator.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration

StopIteration tells us that now it is time to stop. But once we have iterated over the iterable now iterator is of no use and it will keep on throwing StopIteration exception.

Self defined iterators:

Now lets create our own class which will implement the iteration protocol.

class Integrator(object): """lazy evaluation for integrator""" def __init__(self, limit): self.limit = limit self.sum = 0 self.count = 0 def __iter__(self): return self def next(self): if self.count < self.limit: self.count += 1 self.sum += self.count return self.sum else: raise StopIteration

In above class we have implemented an integrator so let’s use it.

>>> integrator = Integrator(5) >>> for sum in integrator: ... print(sum) 1 3 6 10 15

So now you may think what is special about that. Special is the magic when we use these iterators with list, tuple, set, sum, min, max.

>>> i = Integrator(10) >>> list(i) [1, 3, 6, 10, 15, 21, 28, 36, 45, 55\] >>> i = Integrator(5) >>> tuple(i) (1, 3, 6, 10, 15) >>> i = Integrator(10) >>> max(i) 55

We can say that iterators make our code elegant, clean and more beautiful. Iterators use less memory due to lazy evaluation which is great virtue because if we had 100,000 numbers or more , memory usage would be much higher but in this case we have only one number at any time.

Let us see an example to approve the claim.

Make a file named ‘performance.py’ and write below lines in the file.

import time import sys start1 = time.time() lst = list(range(10000000)) for item in lst: sqr = item \*\* 2 end1 = time.time() start2 = time.time() for item in xrange(10000000): sqr = item \*\* 2 end2 = time.time() print('Memory Usage for Iterator is {} Mbs'.format(sys.getsizeof(xrange(10000000)) / 1000000)) print('Memory Usage for above list is {} Mbs'.format(sys.getsizeof(lst) / 1000000)) print('list version took {} seconds'.format(end1 - start1)) print('Iterator version took {} seconds'.format(end2 - start2))

In the code above we are just calculating squares of number and nothing special just for comparison purposes.

After running this code in terminal we will get following output.

Memory Usage for Iterator is 0 Mbs Memory Usage for above list is 90 Mbs list version took 1.65862607956 seconds Iterator version took 1.46054387093 seconds

So for 10 Million numbers, we are consuming 90MBs in list version while in Iterator version memory use is negligible. Also iterator version is faster than the list version. Proved :)

Iterators can work well with infinite sequences like Fibonacci series. We can find first 100,000Fibonacci numbers without holding all numbers in memory but just two or three numbers.

Similarly in case of large file reading, there is no need to load whole file in memory in an iterator version.

Lets take an example of infinite sequence, simply all natural numbers . We can use Python built-in itertoolscount() method takes a start value (default is 0) and yields next number on each next() call.

>>> import itertools >>> counter = itertools.count() >>> counter.next() 0 >>> counter.next() 1 >>> counter.next() 2 >>> counter.next() 3

It is a never ending (infinite) series but we can access as many as we want without holding all numbers.

Generators:

Generator is a factory which makes controlled iteration over a sequence in a very clean and elegant way with less lines of code.

Generator Types:

There are two kinds of generators.

  1. Generator Functions

  2. Generator Expressions

1. Generator Functions, Virtue of yield:

Generator functions are actually functions which contain at least one yield statement. Generator function returns a generator object (without executing a single line of the function). Returned generator object can be used in a controlled fashion to iterate over the sequence generated by the Generator function.

Let’s take a simple example.

>>> def abc(): ... print('a is returned') ... yield 'a' ... print('b is returned') ... yield 'b' ... print('c is returned') ... yield 'c' ... >>> generator = abc() >>> type(generator) <type 'generator'> >>> generator.next() a is returned 'a' >>> generator.next() b is returned 'b' >>> generator.next() c is returned 'c' >>> generator.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration

We can see that first print() statement is executed on first next() call on generator object.

In the above example yield returns a value and stop function execution over there and on next “next()” call starts from next statement and so on.

We noticed that although we have not implemented iteration protocol but still we returned a generator/iterator object.

We can repeat previous Integrator example using generator in a much more simple and easy way.

>>> def Integrator(limit): ... sum = 0 ... count = 0 ... while count < limit: ... count += 1 ... sum += count ... yield sum ... >>> i = Integrator(5) >>> i.next() 1 >>> i.next() 3 >>> i.next() 6 >>> i.next() 10 >>> i.next() 15 >>> for sum in Integrator(5): ... print(sum) ... 1 3 6 10 15

Please remember that, once a generator throws a StopIteration exception it will keep on doing that, so in order to use it again we have to call generator function again.

Generator Expressions:

Generator expressions are more like comprehensions but they result in creation of a generator object which produces the specified sequence lazily. Generator expression has similar syntax like list comprehensions but has parentheses instead of square brackets.

generator = (expression(item) for item in iterable)

Using generator expression we can do lot of useful things in very elegant and specially lazy evaluation make use of almost no memory usage. We can calculate sum of first one million numbers using lazy evaluation (generate when needed) .

>>> sqr_gen = (x\*x for x in xrange(1000000)) >>> sqr_gen <generator object <genexpr> at 0x7fabb14b8640> >>>

Above generator expression yields a list of 1 million square numbers but at this point none of the squares has been created. We just have captured the specification of the sequence in a generator object which will lazily evaluate this as needed.

>>> list( sqr_gen ) [0, 1, 4, 9, 16..... ........... 999988000036, 999990000025, 999992000016, 999994000009, 999996000004, 999998000001\]

This list consumes memory in megabytes and calling this generator object will give us nothing because like an iterator once it is exhausted it is no more useful. Let’s see.

>>> list( sqr_gen ) [\] >>>

Executing the same generator object created an empty list. So if we want another fresh generator we have to execute the generator expression again. Each time executing a generator expression will generate a new independent generator object. Let’s see.

>>> gen1 = (x\*x for x in xrange(100)) >>> gen2 = (x\*x for x in xrange(100)) >>> gen1.next() 0 >>> gen1.next() 1 >>> gen2.next() 0 >>> gen2.next() 1 >>> gen2.next() 4

Each iterator object has its own local variables and its own execution flow.

Now if we want to find sum of first 10 million numbers using list comprehension there would be memory usage near hundred megabytes but using generator approach memory usage would be insignificant (only some bytes).

>>> sum(x\*x for x in xrange(10000000)) 333333283333335000000L >>>

Here function parentheses also serves for generator expression and there is no need for separate parentheses.

Finally we can say that

  1. Generator is also an iterator but reverse is not always true.

  2. Generator is a lazy sequence factory.

Generators have following benefits:

  1. Generators resume execution

  2. Can maintain local variables

  3. Can control complex flow

  4. Lazy evaluation

  5. can reduce the memory burden

If you like this article 👏👏