Iterators

A Python iterator Python’s provides many handy utilities for working with iterators; the next lessons on itertools are dedicated to those functions and their Haskell equivalents. is a sort of sequence that, unlike a list, computes each subsequent value only as it is needed rather than producing the entire list all at once. For example, from a range, we can produce an iterator of numbers. Here we construct an iterator xs that will produce three values: We are using Python 3, in which the range function produces an iterator. (In Python 2, range produced a list, and xrange produced an iterator.)

>>> xs = iter(range(1, 4))

We can use the next function to demonstrate pulling values from the iterator one at a time. The first three times we apply the next function, xs will produce a value. The fourth time, it raises StopIteration to indicate that the iterator is exhausted.

>>> next(xs)
1

>>> next(xs)
2

>>> next(xs)
3

>>> next(xs)
StopIteration

We don’t often use the next function directly, though – Most often, we use an iterator via a for loop (and Python uses the iter and next functions under the hood).

>>> for x in range(1, 4):
...     print(x)
1
2
3

For these examples, the closest practical Haskell analogue is lists. Unlike Python lists, Haskell lists compute each subsequent value only as it is needed rather than producing the entire list all at once – much like Python iterators. enumFromTo We could also write this example using Haskell’s range notation. A list from 1 to 3 can be expressed as either enumFromTo 1 3 or [1..3].

λ> import Data.Foldable (for_)

λ> for_ (enumFromTo 1 3) print
1
2
3

When parts of a value aren’t computed until they’re needed, we call that laziness – not as an insult, but to say that we’re trying to avoid putting the machine through more work than necessary.

Infinity

The easiest sure way to tell whether a data structure is lazy is to construct something infinite. For Python iterators and Haskell lists, we can interact with some portion of the beginning of the sequence, as long as we don’t try to use the entire thing. itertools.count

>>> import itertools

>>> xs = itertools.count(1)

Here xs is an iterator that counts upward from 1, on to infinity. If we were to try converting this to a list, the REPL would crash from attempting to contemplate infinity.

>>> list(xs)    # Don't do this!

But we can take, itertools.islice for example, just the first ten elements from that iterator.

>>> list(itertools.islice(xs, 10))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Here’s a Haskell list that counts upward from 1: enumFrom Again we could have written this with range notation. A list from 1 to infinity can be expressed as either enumFrom 1 or [1..].

λ> xs = enumFrom 1

Again, if we try to evaluate xs, we have a problem.

λ> xs

If you try this, you can see GHCi make an admirable attempt to print the entire list. But it will never finish. (Use ctrl+C to stop it.)

But again, we can take take some portion of the beginning of the list.

λ> take 10 xs
[1,2,3,4,5,6,7,8,9,10]

Generators

Python provides a convienient way to define an iterator as a function written using the yield keyword. This feature is called generator syntax.

When a Python function returns, it can only return once and then it’s over. But when a function yields, it can keep going and continue to yield more things until it returns.

def f():
    yield 1
    yield 2
    yield 3

When we apply the function f, it returns an iterator:More specifically, it returns a generator; a generator is a kind of iterator.

>>> f()
<generator object f at 0x7f695c6342b0>

If we convert that to a list, we can see that the list contains the three values that were yielded by f.

>>> list(f())
[1, 2, 3]

Generator syntax does not have any direct analogue in the Haskell language, but it is strikingly similar to the pipes library which we will discuss further below.

Effects

The fun thing about generators is that they don’t only yield values; they can also have side effects as they run, which makes generator syntax much more than a way to express sequences. It’s a way to write programs that emit values; The iterator represents the stream of values that are emitted as the program runs.

As a simple example, we’ll write a generator that reads three lines from the terminal and yields each one in all caps.

import sys

def f():
    for _ in range(3):
        x = sys.stdin.readline()
        yield x.upper()

When we apply f and convert the result to a list to collect all the results, it looks like this:

>>> list(f())
one
two
three
['ONE\n', 'TWO\n', 'THREE\n']

This is what the for function Yes, for is a function, and not a language keyword. This is a recurring theme you’ll notice as you learn Haskell; a lot of things that are built-in language features or macros in other languages are ordinary functions in Haskell. does in Haskell. The underscore signifies the same thing in Haskell as it does in Python: that the argument is not being used.

import Data.Char (toUpper)
import Data.Traversable (for)

threeUppercase =
    for [1..3] (\_ -> do
        x <- getLine
        return (map toUpper x))

The type of for, The for function comes from the Data.Traversable module. It has many uses, and its fully-polymorphic type is much more general than the specialized type we show here. in this context, is:

for :: [Integer]                -- 1
    -> (Integer -> IO String)   -- 2
    -> IO [String]              -- 3
  1. The first argument is the list of integers we’re iterating over: [1..3].
  2. The second argument is an action, which returns a string, that will be performed for each integer in the list: read a line and capitalize each of the characters.
  3. The return value is an action which loops over the input and returns all of the results aggregated in a list.

When we run main in GHCi, we get a behavior quite similar to that of the Python generator.

λ> threeUppercase
one
two
three
["ONE","TWO","THREE"]

Pipes

Now we’re going to venture outside of the Haskell standard library, because when you use the pipes package, The pipes library Web servers, lesson 13 is about the pipes package and goes into more depth with a realistic example. writing Producers is strikingly similar to writing Python generators. There is even a function with the same name as the Python keyword yield, and it serves a similar role.

import Pipes
import qualified Pipes.Prelude

A pipe can await to receive a value, and it can yield to emit a value. We give a stage of a pipeline a different name depending on whether it awaits, receives, or both: A Producer only yields, a Consumer only awaits, and a Pipe does both. For our purpose here, to compare pipes to simple Python generators, we will only consider Producers.

Here is a simple Producer of Integers:

numbers =
  do
    yield 1
    yield 2
    yield 3

Among the many useful functions in the Pipes.Prelude module is toList, Pipes.Prelude.toList which runs a producer and returns a list of its results.

λ> Pipes.Prelude.toList numbers
[1,2,3]

Applying the toList function is a bit like when we had a Python generator function f and we invoked it as f() to run it and produce an iterator of its results.

Now let’s look at a producer that also does some I/O. We’ll rewrite threeUppercase using pipes:

threeUppercase =
    for_ [1..3] (\_ -> do
        x <- lift getLine
        yield (map toUpper x))
λ> Pipes.toListM threeUppercase
one
two
three
["ONE","TWO","THREE"]

It does the same thing as before. There are some new things in this producer that we haven’t seen before:

  • We have to lift lift getLine into the world of pipes. It may not be clear why we need lift, because we haven’t shown any of the types in this very brief pipes overview. Informally, think of lifting here as converting the type from “I/O” to “a pipe that involves I/O”.
  • This time we’re using toListM toListM instead of toList because our producer involves I/O. You’ll see this a lot in Haskell: something with side effects has a different type than something that doesn’t. The ‘M’ in ‘toListM’ stands for ‘monad’. Whereas before toList returned [String], here toListM returns IO [String]; the resulting list is within some monad (IO). It isn’t a list of strings; it’s an I/O action that produces a list of strings. This is an important distinction!

In the next lessons, we give a lot more examples on how Python iterator code translates into Haskell.

Join Type Classes for courses and projects to get you started and make you an expert in FP with Haskell.