- 34 minutes
We made a lot of progress on our palindrome program, and it’s probably working more or less as you’d want a palindrome program to work, if you wanted a palindrome program for some reason. The core logic of determining whether a string is a palindrome wasn’t too difficult, but working through the rules for what we should properly consider a palindrome took more thinking and effort.
Now we’re moving on from the palindrome program; we’ll be doing similar, but different, things and working on some new concepts now.
So, let’s consider a case where we might need to reject string inputs that have spaces in them, rather than simply removing the spaces. For example, many web sites do not let you put spaces in the middle of, say, your first name; certainly you cannot have spaces in the field where you entered your email address. They probably don’t allow anything except alphabetic characters in your name inputs either, although some do. And they will certainly reject your input, as we did in our palindrome program, if you input an empty string for your name, email, password – any of the information that they require from you for registration. So, let’s imagine a case where our validation rules could possibly result in
Nothing – invalid input – in multiple ways, for example, if it’s an empty string or if it has spaces or weird characters in it, depending on the sort of input it is and our rules for that type of input.
We’re not going to worry about the I/O part of what this program would look like because we’re going to explore some different concepts in these upcoming sessions. So we don’t need more than one module right now. In the future, if you make a web app that has multiple modules handling different bits of the logic and I/O, then you might consider adding a password- or email-validation module to it, and it might have some rules in it that are like the ones we’re going to work on now.
Filters versus folds
Let’s first focus on getting a function that will not just filter out punctuation and numbers, but reject strings that have anything other alphabetic characters.
So the check that we want to do returns
Nothing if any nonalphabetic characters are in the string; or, another way of saying that is it returns the string when all characters are alphabetic. We’re emphasizing those words because
all are the names of two functions that we’re going to write in this lesson. They do exist in the
Prelude so we will, again, call our versions
myAll. We don’t want to show you the type signature of the
Prelude versions of those functions just yet, although of course you know how to query them and may do so if you wish. We’ll work with a type signature for ours that is slightly different from the
Let’s quickly summarize the two major recursive functions we’ve seen so far,
mapcan take a unary (one-argument) function, apply it to each element of a list, and return a new list of transformed elements.
λ> map (+1) [1, 2, 3] [2,3,4]
filteron the other hand, takes only certain kinds of input functions – those that return a
Boolvalue, which we often call predicates – and it returns only the elements in the input list that meet the condition, that the predicate is true of.
λ> filter (<3) [1, 2, 3] [1,2]
That could end up being anything from an empty list to the entire original input list, depending on the predicate and the values. For example, with the same predicate but a different list, we can see how returning only the values that the predicate is true of means we get an empty list.
λ> filter (<3) [3, 4, 5] 
But there’s another important set of functions that are similar to both
filter in some ways but also different. Sometimes you want to reduce the list to one summary result. If you were using a predicate to test each character, for example, sometimes you want to get an answer to your question: does every element meet this predicate? Or, perhaps, does at least one element in this least meet this predicate. For us, the question is, is every character in this string an alphabetic character?
These functions that reduce a list to a summary value are called folds in Haskell, although they are called
reduce or the like in other languages. There are several different fold functions built into the standard library of Haskell, and you’ll find even more of them in various libraries because they’re such a useful tool. Because of the task we have set ourselves, we want to reduce a list to a summary Boolean judgment – ultimately a summary
Maybe judgment, representing whether it’s a valid string or not, but we already know how to map a Boolean judgment to a
This is the function we want to write.
This type tells us that we’ll take a string input and return either a valid string or an invalid string, something or
We’ll go back to our trusty
case statements. Our
myAll function will take as its first argument a predicate; since we want to test whether all characters are alphabetic, we’ll use the
isAlpha function from the
Data.Char module, so we’ll need to add that import to the top of our file if it isn’t there already. The second argument to
myAll will be the input string.
When the application of
myAll isAlpha to our input string returns
False (meaning at least one character in the string is not alphabetic), we’ll return
Nothing to mark an invalid input, and, for now, our
True case will return the string unchanged. And
myAll is what we’re going to write in this lesson, using folds.
isAlpha as the predicate here, but you could swap out any
Char -> Bool predicate without changing the rest of this function. We would like to note, as is important for programmers to think about, that the set of characters that this predicate returns
True for is quite a bit larger than the standard English alphabet.
λ> isAlpha 'è' True λ> isAlpha 'ж' True λ> isAlpha 'タ' True
So, if you allow katakana or other non-English alphabets to be used in your first name field, then
isAlpha will still cover that case.
Before we get into folds, for the sake of completeness and comparison, we want to talk about how you would write these functions without folds, and to do that, we want to spend a few moments considering some facts about Boolean logic.
A little Boolean logic
At their hearts,
all, the functions under current consideration, are Boolean operations that happen to have some recursion built into them in order to work over lists; the
any function is related to Boolean OR: it’s True if any one element is
all function is related to Boolean AND and is
True only if all elements are
True and there are no
You might already be familiar with Boolean truth tables.
AND and OR are in some senses complementary functions. The AND function has an identity value of
True. Let’s talk about what an identity value is for a moment. An identity value is a value that, with respect to some operation, doesn’t change the value of the other input(s). AND is a binary function, meaning it takes two arguments. Saying
True is the identity value of that operation means that if
True is one of the two values, then the result value is always the same as the other input value. The only time the result is
True is when both arguments are
True. So we consider
True the identity value for this operation because whenever one input is
True, the result of the AND operation is whatever the other argument is.
Furthermore, with the AND operation, what we might call the “dominant” or “determining” value is
False. That means whenever we see one
False value, we don’t really need to know what the other value is, we know the result will be
False. So we can think of
False as a kind of “dominant” value for the AND operation, and
True is the identity value.
OR is sort of the opposite of that:
False is the identity, where the result is whatever the other value is, and
True is the dominant value, where the presence of a single
True means however many things you are ORing together, the result will be
We’re going through all this because we want to think about which values are our base cases and which case is a recursion case and which one might be a branch that just terminates in a result. As we said, we’ll begin by writing these functions without folds, because it’s a useful comparison with the versions that use folds.
The first arguments given in the type signatures are the
(a -> Bool) functions that we call predicates. The predicates are applied to each element of a list, so the list is the second argument here. And these functions return a single
Bool value, the summary judgment of the list.
Whenever the list is empty, we have to return something. The empty list is our way of stopping this recursion, but we can’t return the empty list because it’s not a value of type
Bool, which is our return type. So, we have to return either
False when we hit the empty list.
So, let’s think about what these cases should return. They are there to stop the recursion, but we don’t really want them to affect the final result. We’ll be doing a bunch of function applications, recursing over the list, and we don’t want the base case to change the final result of those applications. That gives us a hint that we probably want to use one of those identity values. So, what was the identity value for
all function is a recursive case of AND, so we want the identity value for AND. The identity value for AND is
any function, on the other hand, is a recursive version of OR, and the identity value for OR is
So, we’ve dealt with our base cases, and now we’ll write the second portion of the function, using a
case statement. We’ll have a predicate for our first argument, so we’ll call that parameter
pred, and then we’ll pattern match on the list to decompose it into its head and tail.
Then when we enter the
case statement, we can apply the
pred function to the first value (called
x) of the list.
Now we need to think through our two cases of applying
x. Predicates return
Bool values so there are only two. We’ll handle the “dominant” value for that operation first; so, for example, with the
all function, we can return a final judgment of
False if we ever encounter a single
False value. So let’s handle the “dominant” cases first.
Now each of them just needs the recursive case. Since we don’t need to return any of the intermediate values or reconstruct a list of
Bool values, all we’re doing is recursing until we hit one of the two stopping conditions. For
all, our recursion will stop if we hit a
False or if we keep hitting
True values until we get to the empty list constructor that is the tail of every list.
We wanted to walk through that to get us thinking about Boolean logic in terms of its stopping conditions and identity values because it will be relevant to our folding functions. Next, let us quickly show you the Haskell operators for Boolean AND and OR. These are both infix functions in Haskell.
λ> :type (&&) (&&) :: Bool -> Bool -> Bool λ> :type (||) (||) :: Bool -> Bool -> Bool λ> True && False False it :: Bool λ> True && True Trueit :: Bool
(||) operator out in your REPL.
These are the functions that, when folded over the list, will give us our
OK, so now, let’s go on to talk about folds themselves.
We’ve seen that we can write any function without them – we just wrote the function we said we would write with a fold without one, so what are we even talking about here?
We like to use folds in Haskell so that we don’t have to think about the recursive process and write the recursion out explicitly. Folds have that idea built into them already, allowing us to concentrate our brain power on other things than making sure we got all of our recursion correct.
All right, so let’s take a look at
foldr, the function that will do the recursion for us in our
all implementation. The
foldr stands for “right” so this is a “right fold”. The rightness and leftness in question refer to the associativity of the two functions, and we will look a little at left folds but we aren’t going to dwell on them too much.
Let’s take a close-up look at the
foldr function itself in order to understand it better.
foldr :: (a -> b -> b) -> b -> [a] -> b foldr f acc  = acc foldr f acc (x:xs) = f x (foldr f acc xs)
There are three differences from our
map implementation that should immediately stand out:
This takes a binary function – a function of two arguments – instead of the unary function
The next thing that should stand out is the
accis short for accumulator. It provides a start value for the function applications; as you recurse over a list, you’re applying the folding function to one element at a time, but the function is binary so it needs two arguments to evaluate. So the
accvalue serves as one of the arguments to the very first application of the binary function – the other argument is the last value in the list. Furthermore, if the list that gets passed to
foldris an empty list, the
accvalue also serves as a default return value. It is often (not always) the identity value for the binary function we’re folding over the list.
Finally, there is no consing the list back together after the function application. The function here is a binary function, so in the first step we apply it to the head of the list and also to the result of doing more folding. One of its two arguments, therefore, involves recursive calls to
Let’s compare what
map does to what
foldr does. With
map, we have a series of function applications consed back into a new list of results. With
foldr, on the other hand, the values of the list become the arguments to a series of binary function applications, resulting in one value.
map :: (a -> b) -> [a] -> [b]
foldr :: (a -> b -> b) -> b -> [a] -> b
The breaking up of the list into its elements makes them available in both cases to be arguments to a function, but in the case of
map, the function only takes one argument. In
foldr it needs two – so, because
foldr is right associative, the
acc or start value appears at the far right of the function applications. So, first we will add 3 and 0, our start value, and then work our way leftward – not rebuilding a list but just adding the elements together. In fact, the fold we just did there is one implementation of the
λ> sum [1, 2, 3] 6
sum really is implemented with a fold in the
base library, although not with
A left fold is a little different. For one thing, it does associate to the left, which means the start value will be at the far left of the associations. So, for example, if we had written
foldl, the evaluation would look like this, with the start value at the leftmost spot in the series of function applications.
foldl :: (b -> a -> b) -> b -> [a] -> b
Now, as you can see, that wouldn’t change the sum of these numbers, because addition is … well, it’s associative. But certainly there are operations where the difference between the left and right associativity of these folds could change the results.
We will take just a moment to look at the implementation of
foldl as well, because the difference with
foldr is quite interesting, although we won’t have time to say as much as we’d like to about it. After all, the boss wants this validation code finished by tomorrow.
foldl :: (b -> a -> b) -> b -> [a] -> b foldl f acc  = acc foldl f acc (x:xs) = foldl f (f acc x) xs
The empty list case is quite the same as it was with
foldr, and, once again, the
acc value serves the same purpose as it did in
foldr and is once again, quite often the identity value of the operation.
The nonempty list case, though, is quite different! Here the recursive call is the first thing.
foldl calls itself immediately, before doing any function applications. That means that if you were using it with something like a Boolean operator that has a “dominant” value such that if you knew you had one of those values then you wouldn’t need to examine the whole list, you can’t stop – you have to go on, calling yourself, putting into place a function application for every element in the list.
foldr doesn’t have to do that; since the first thing
foldr does is arrange a function application, it could stop itself entirely if the first value it encountered gave away what the entire result would be. We haven’t talked much about the possibility of infinite lists in Haskell, but they are a possibility, and while it is not in general a good idea to try to fold up infinity, with
foldr and certain types of operations it is at least possible to obtain a result from folding an infinite list. Boolean operations with what we’ve been calling a “dominant” value are one of those types of operations. Watch!
Prelude function called
cycle can make an infinite list.
λ> cycle [1, 2, 3] λ> all (<3) (cycle [1, 2, 3]) False
It can stop itself the first time it hits a value for which the predicate is
False – in this case, 3. But an
all written with
foldl would not be able to do that because it is forced to recurse over the entire list.
foldr and its ability to stop its recursion as long as one particular case is met is very nice! Although it is VERY IMPORTANT to note that it will not stop if it never finds a “dominant” value (in this case, if it never finds a
False value) that it can return.
How to use a right fold
OK, so let’s implement this thing shall we? We’ll start by writing down what we know, although this doesn’t yet work. We’ll work only on the
all function now; you should use this model to implement
any as an exercise.
The type signature for
myAll doesn’t need to change, and we know that we’ll still need a parameter representing the predicate. We also know that we’re going to use
foldr. The function that we’re folding over the list is Boolean AND, which we write in Haskell as
(&&), and its identity value is
True so we’ll use that for our
The next things that we know are:
We need to pass individual values from the list to the
We need to apply the predicate function to values from the list to get the Boolean values that
(&&)needs. The list is a list of
a– any type! – and it’s the predicate function,
a -> Bool, that transforms them into
Boolvalues. We recall that
Boolvalues as inputs.
The way we’re going to pass individual values to that function is with lambdas, which we write with a backslash (
\). We don’t need to pattern match them out of the list, because
foldr already has that logic built into it. If all we wanted to do was fold the
(&&) function over a list of Booleans, we could do that.
λ> foldr (&&) True [True, False, True] False
But the need to apply the predicate function to each value to transform it from a
Char to a
Bool means that we have to explicitly pass them in. We can use one lambda to bind both variables.
Almost finished, but not quite! Now we need to figure out where we apply the predicate function. Since
foldr puts its
acc value to the far right, the right-side value is already a Boolean; the first one will be
True because that’s the value we gave it. So it’s the left one,
x, that needs the predicate function. So we apply
This was an incredibly long-winded way to get our validation function, which we had called
rejectNonalphabetic, written, but now we have the pride in knowing we have handcrafted each function with loving care. Except the
isAlpha, of course. We’re not going to make you do that.
Try it out with different inputs and try implementing some other folding functions, such as
sum, but multiplication) or
any, before you come back for the next video. We will be using
rejectNonalphabetic again later, so hold onto it!