# Folds

- 34 minutes

We made a lot of progress on our palindrome program, and it’s probably working more or less as you’d want a palindrome program to work, if you wanted a palindrome program for some reason. The core logic of determining whether a string is a palindrome wasn’t too difficult, but working through the rules for what we should properly consider a palindrome took more thinking and effort.

Now we’re moving on from the palindrome program; we’ll be doing similar, but different, things and working on some new concepts now.

So, let’s consider a case where we might need to *reject* string inputs that have spaces in them, rather than simply removing the spaces. For example, many web sites do not let you put spaces in the middle of, say, your first name; certainly you cannot have spaces in the field where you entered your email address. They probably don’t allow anything except alphabetic characters in your name inputs either, although some do. And they will certainly reject your input, as we did in our palindrome program, if you input an empty string for your name, email, password – any of the information that they require from you for registration. So, let’s imagine a case where our validation rules could possibly result in `Nothing`

– invalid input – in multiple ways, for example, if it’s an empty string or if it has spaces or weird characters in it, depending on the sort of input it is and our rules for that type of input.

We’re not going to worry about the I/O part of what this program would look like because we’re going to explore some different concepts in these upcoming sessions. So we don’t need more than one module right now. In the future, if you make a web app that has multiple modules handling different bits of the logic and I/O, then you might consider adding a password- or email-validation module to it, and it might have some rules in it that are like the ones we’re going to work on now.

## Filters versus folds

Let’s first focus on getting a function that will not just filter out punctuation and numbers, but reject strings that have anything other alphabetic characters.

So the check that we want to do returns `Nothing`

if *any* nonalphabetic characters are in the string; or, another way of saying that is it returns the string when *all* characters are alphabetic. We’re emphasizing those words because `any`

and `all`

are the names of two functions that we’re going to write in this lesson. They do exist in the `Prelude`

so we will, again, call our versions `myAny`

and `myAll`

. We don’t want to show you the type signature of the `Prelude`

versions of those functions just yet, although of course you know how to query them and may do so if you wish. We’ll work with a type signature for ours that is slightly different from the `Prelude`

one.

Let’s quickly summarize the two major recursive functions we’ve seen so far, `map`

and `filter`

.

`map`

can take a unary (one-argument) function, apply it to each element of a list, and return a new list of transformed elements.`λ> map (+1) [1, 2, 3] [2,3,4]`

`filter`

on the other hand, takes only certain kinds of input functions – those that return a`Bool`

value, which we often call*predicates*– and it returns*only*the elements in the input list that meet the condition, that the*predicate*is true of.`λ> filter (<3) [1, 2, 3] [1,2]`

That could end up being anything from an empty list to the entire original input list, depending on the predicate and the values. For example, with the same predicate but a different list, we can see how returning only the values that the predicate is true of means we get an empty list.

`λ> filter (<3) [3, 4, 5] []`

But there’s another important set of functions that are similar to both `map`

and `filter`

in some ways but also different. Sometimes you want to reduce the list to one summary result. If you were using a predicate to test each character, for example, sometimes you want to get an answer to your question: does *every* element meet this predicate? Or, perhaps, does *at least one* element in this least meet this predicate. For us, the question is, is *every* character in this string an alphabetic character?

These functions that reduce a list to a summary value are called *folds* in Haskell, although they are called `reduce`

or the like in other languages. There are several different fold functions built into the standard library of Haskell, and you’ll find even more of them in various libraries because they’re such a useful tool. Because of the task we have set ourselves, we want to reduce a list to a summary Boolean judgment – ultimately a summary `Maybe`

judgment, representing whether it’s a valid string or not, but we already know how to map a Boolean judgment to a `Maybe`

judgment.

This is the function we want to write.

This type tells us that we’ll take a string input and return either a valid string or an invalid string, something or `Nothing`

.

We’ll go back to our trusty `case`

statements. Our `myAll`

function will take as its first argument a predicate; since we want to test whether all characters are alphabetic, we’ll use the `isAlpha`

function from the `Data.Char`

module, so we’ll need to add that import to the top of our file if it isn’t there already. The second argument to `myAll`

will be the input string.

When the application of `myAll isAlpha`

to our input string returns `False`

(meaning at least one character in the string is not alphabetic), we’ll return `Nothing`

to mark an invalid input, and, for now, our `True`

case will return the string unchanged. And `myAll`

is what we’re going to write in this lesson, using folds.

We’ve used `isAlpha`

as the predicate here, but you could swap out any `Char -> Bool`

predicate without changing the rest of this function. We would like to note, as is important for programmers to think about, that the set of characters that this predicate returns `True`

for is quite a bit larger than the standard English alphabet.

```
λ> isAlpha 'è'
True
λ> isAlpha 'ж'
True
λ> isAlpha 'タ'
True
```

So, if you allow katakana or other non-English alphabets to be used in your first name field, then `isAlpha`

will still cover that case.

Before we get into folds, for the sake of completeness and comparison, we want to talk about how you would write these functions without folds, and to do that, we want to spend a few moments considering some facts about Boolean logic.

## A little Boolean logic

At their hearts, `any`

and `all`

, the functions under current consideration, are Boolean operations that happen to have some recursion built into them in order to work over lists; the `any`

function is related to Boolean OR: it’s True if *any* one element is `True`

. The `all`

function is related to Boolean AND and is `True`

only if *all* elements are `True`

and there are no `False`

values.

You might already be familiar with Boolean truth tables.

False | True | |
---|---|---|

False | False | False |

True | False | True |

False | True | |
---|---|---|

False | False | True |

True | True | True |

AND and OR are in some senses complementary functions. The AND function has an *identity* value of `True`

. Let’s talk about what an *identity value* is for a moment. An identity value is a value that, with respect to some operation, doesn’t change the value of the other input(s). AND is a binary function, meaning it takes two arguments. Saying `True`

is the identity value of that operation means that if `True`

is one of the two values, then the result value is always the same as the *other* input value. The only time the result is `True`

is when *both* arguments are `True`

. So we consider `True`

the identity value for this operation because whenever one input is `True`

, the result of the AND operation is whatever the other argument is.

Furthermore, with the AND operation, what we might call the “dominant” or “determining” value is `False`

. That means whenever we see one `False`

value, we don’t really need to know what the other value is, we know the result will be `False`

. So we can think of `False`

as a kind of “dominant” value for the AND operation, and `True`

is the identity value.

OR is sort of the opposite of that: `False`

is the identity, where the result is whatever the *other* value is, and `True`

is the *dominant* value, where the presence of a single `True`

means however many things you are ORing together, the result will be `True`

.

We’re going through all this because we want to think about which values are our base cases and which case is a recursion case and which one might be a branch that just terminates in a result. As we said, we’ll begin by writing these functions without folds, because it’s a useful comparison with the versions that use folds.

The first arguments given in the type signatures are the `(a -> Bool)`

functions that we call predicates. The predicates are applied to each element of a list, so the list is the second argument here. And these functions return a single `Bool`

value, the summary judgment of the list.

Whenever the list is empty, we have to return something. The empty list is our way of stopping this recursion, but we can’t return the empty list because it’s not a value of type `Bool`

, which is our return type. So, we have to return either `True`

or `False`

when we hit the empty list.

So, let’s think about what these cases should return. They are there to stop the recursion, but we don’t really want them to affect the final result. We’ll be doing a bunch of function applications, recursing over the list, and we don’t want the base case to change the final result of those applications. That gives us a hint that we probably want to use one of those identity values. So, what was the identity value for `all`

? The `all`

function is a recursive case of AND, so we want the identity value for AND. The identity value for AND is `True`

.

The `any`

function, on the other hand, is a recursive version of OR, and the identity value for OR is `False`

.

```
myAll :: (a -> Bool) -> [a] -> Bool
myAll _ [] = True
myAny :: (a -> Bool) -> [a] -> Bool
myAny _ [] = False
```

So, we’ve dealt with our base cases, and now we’ll write the second portion of the function, using a `case`

statement. We’ll have a predicate for our first argument, so we’ll call that parameter `pred`

, and then we’ll pattern match on the list to decompose it into its head and tail.

```
myAll :: (a -> Bool) -> [a] -> Bool
myAll _ [] = True
myAll pred (x:xs) =
case (pred x) of
myAny :: (a -> Bool) -> [a] -> Bool
myAny _ [] = False
myAny pred (x:xs) =
case (pred x) of
```

Then when we enter the `case`

statement, we can apply the `pred`

function to the first value (called `x`

) of the list.

Now we need to think through our two cases of applying `pred`

to `x`

. Predicates return `Bool`

values so there are only two. We’ll handle the “dominant” value for that operation first; so, for example, with the `all`

function, we can return a final judgment of `False`

if we ever encounter a single `False`

value. So let’s handle the “dominant” cases first.

```
myAll :: (a -> Bool) -> [a] -> Bool
myAll _ [] = True
myAll pred (x:xs) =
case (pred x) of
False -> False
myAny :: (a -> Bool) -> [a] -> Bool
myAny _ [] = False
myAny pred (x:xs) =
case (pred x) of
True -> True
```

Now each of them just needs the recursive case. Since we don’t need to return any of the intermediate values or reconstruct a list of `Bool`

values, all we’re doing is recursing until we hit one of the two stopping conditions. For `all`

, our recursion will stop if we hit a `False`

or if we keep hitting `True`

values until we get to the empty list constructor that is the tail of every list.

```
myAll :: (a -> Bool) -> [a] -> Bool
myAll _ [] = True
myAll pred (x:xs) =
case (pred x) of
False -> False
True -> myAll pred xs
myAny :: (a -> Bool) -> [a] -> Bool
myAny _ [] = False
myAny pred (x:xs) =
case (pred x) of
True -> True
False -> myAny pred xs
```

We wanted to walk through that to get us thinking about Boolean logic in terms of its stopping conditions and identity values because it will be relevant to our folding functions. Next, let us quickly show you the Haskell operators for Boolean AND and OR. These are both infix functions in Haskell.

```
λ> :type (&&)
(&&) :: Bool -> Bool -> Bool
λ> :type (||)
(||) :: Bool -> Bool -> Bool
λ> True && False
False
it :: Bool
λ> True && True
True it :: Bool
```

Try the `(||)`

operator out in your REPL.

These are the functions that, when folded over the list, will give us our `any`

and `all`

functions.

## Explaining folds

OK, so now, let’s go on to talk about folds themselves.

We’ve seen that we can write any function without them – we just wrote the function we said we would write with a fold without one, so what are we even talking about here?

We like to use folds in Haskell so that we don’t have to think about the recursive process and write the recursion out explicitly. Folds have that idea built into them already, allowing us to concentrate our brain power on other things than making sure we got all of our recursion correct.

All right, so let’s take a look at `foldr`

, the function that will do the recursion for us in our `all`

implementation. The `r`

in `foldr`

stands for “right” so this is a “right fold”. The rightness and leftness in question refer to the *associativity* of the two functions, and we will look a little at left folds but we aren’t going to dwell on them too much.

Let’s take a close-up look at the `foldr`

function itself in order to understand it better.

```
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f acc [] = acc
foldr f acc (x:xs) = f x (foldr f acc xs)
```

There are three differences from our `map`

implementation that should immediately stand out:

This takes a binary function – a function of two arguments – instead of the unary function

`map`

requires.The next thing that should stand out is the

`acc`

value.`acc`

is short for*accumulator*. It provides a start value for the function applications; as you recurse over a list, you’re applying the folding function to one element at a time, but the function is*binary*so it needs two arguments to evaluate. So the`acc`

value serves as one of the arguments to the very first application of the binary function – the other argument is the last value in the list. Furthermore, if the list that gets passed to`foldr`

is an empty list, the`acc`

value also serves as a default return value. It is often (not always) the identity value for the binary function we’re folding over the list.Finally, there is no consing the list back together after the function application. The function here is a binary function, so in the first step we apply it to the head of the list

*and also*to the result of doing more folding. One of its two arguments, therefore, involves recursive calls to`foldr`

.

Let’s compare what `map`

does to what `foldr`

does. With `map`

, we have a series of function applications consed back into a new list of results. With `foldr`

, on the other hand, the values of the list become the arguments to a series of binary function applications, resulting in one value.

`map :: (a -> b) -> [a] -> [b]`

`foldr :: (a -> b -> b) -> b -> [a] -> b`

The breaking up of the list into its elements makes them available in both cases to be arguments to a function, but in the case of `map`

, the function only takes one argument. In `foldr`

it needs two – so, because `foldr`

is right associative, the `acc`

or start value appears at the far right of the function applications. So, first we will add 3 and 0, our start value, and then work our way leftward – not rebuilding a list but just adding the elements together. In fact, the fold we just did there is one implementation of the `Prelude`

function `sum`

.

```
λ> sum [1, 2, 3]
6
```

`sum`

really is implemented with a fold in the `base`

library, although not with `foldr`

.

A left fold is a little different. For one thing, it does associate to the left, which means the start value will be at the far left of the associations. So, for example, if we had written `sum`

with `foldl`

, the evaluation would look like this, with the start value at the leftmost spot in the series of function applications.

`foldl :: (b -> a -> b) -> b -> [a] -> b`

Now, as you can see, that wouldn’t change the sum of these numbers, because addition is … well, it’s *associative*. But certainly there are operations where the difference between the left and right associativity of these folds could change the results.

We will take just a moment to look at the implementation of `foldl`

as well, because the difference with `foldr`

is quite interesting, although we won’t have time to say as much as we’d like to about it. After all, the boss wants this validation code finished by tomorrow.

```
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f acc [] = acc
foldl f acc (x:xs) = foldl f (f acc x) xs
```

The empty list case is quite the same as it was with `foldr`

, and, once again, the `acc`

value serves the same purpose as it did in `foldr`

and is once again, quite often the identity value of the operation.

The nonempty list case, though, is quite different! Here the recursive call is the *first* thing. `foldl`

calls itself immediately, before doing any function applications. That means that if you were using it with something like a Boolean operator that has a “dominant” value such that if you knew you had *one* of those values then you wouldn’t need to examine the whole list, you can’t stop – you have to go on, calling yourself, putting into place a function application for every element in the list. `foldr`

doesn’t have to do that; since the first thing `foldr`

does is arrange a function application, it could stop itself entirely if the first value it encountered gave away what the entire result would be. We haven’t talked much about the possibility of infinite lists in Haskell, but they are a possibility, and while it is not in general a good idea to try to fold up infinity, with `foldr`

and certain types of operations it is at least *possible* to obtain a result from folding an infinite list. Boolean operations with what we’ve been calling a “dominant” value are one of those types of operations. Watch!

A `Prelude`

function called `cycle`

can make an infinite list.

```
λ> cycle [1, 2, 3]
λ> all (<3) (cycle [1, 2, 3])
False
```

It can stop itself the first time it hits a value for which the predicate is `False`

– in this case, 3. But an `all`

written with `foldl`

would not be able to do that because it is forced to recurse over the entire list.

So, `foldr`

and its ability to stop its recursion as long as one particular case is met is very nice! Although it is VERY IMPORTANT to note that it will not stop if it never finds a “dominant” value (in this case, if it never finds a `False`

value) that it can return.

## How to use a right fold

OK, so let’s implement this thing shall we? We’ll start by writing down what we know, although this doesn’t yet work. We’ll work only on the `all`

function now; you should use this model to implement `any`

as an exercise.

The type signature for `myAll`

doesn’t need to change, and we know that we’ll still need a parameter representing the predicate. We also know that we’re going to use `foldr`

. The function that we’re folding over the list is Boolean AND, which we write in Haskell as `(&&)`

, and its identity value is `True`

so we’ll use that for our `acc`

value.

The next things that we know are:

We need to pass individual values from the list to the

`(&&)`

function.We need to apply the predicate function to values from the list to get the Boolean values that

`(&&)`

needs. The list is a list of`a`

– any type! – and it’s the predicate function,`a -> Bool`

, that transforms them into`Bool`

values. We recall that`(&&)`

takes two`Bool`

values as inputs.

The way we’re going to pass individual values to that function is with *lambdas*, which we write with a backslash (`\`

). We don’t need to pattern match them out of the list, because `foldr`

already has that logic built into it. If all we wanted to do was fold the `(&&)`

function over a list of Booleans, we could do that.

```
λ> foldr (&&) True [True, False, True]
False
```

But the need to apply the predicate function to each value to transform it from a `Char`

to a `Bool`

means that we have to explicitly pass them in. We can use one lambda to bind both variables.

Almost finished, but not quite! Now we need to figure out where we apply the predicate function. Since `foldr`

puts its `acc`

value to the far right, the right-side value is already a Boolean; the first one will be `True`

because that’s the value we gave it. So it’s the left one, `x`

, that needs the predicate function. So we apply `pred`

to `x`

.

This was an incredibly long-winded way to get our validation function, which we had called `rejectNonalphabetic`

, written, but now we have the pride in knowing we have handcrafted each function with loving care. Except the `isAlpha`

, of course. We’re not going to make you do that.

```
rejectNonalphabetic :: String -> Maybe String
rejectNonalphabetic string =
case (myAll isAlpha string) of
False -> Nothing
True -> Just string
```

Try it out with different inputs and try implementing some other folding functions, such as `product`

(like `sum`

, but multiplication) or `any`

, before you come back for the next video. We will be using `rejectNonalphabetic`

again later, so hold onto it!