[Explainer] Function pipeline monoid

After seeing much discussion around the following tweet by Uncle Bob, we’d like to explain what we think Bob is getting at when he says “pipeline of functions”.

We’re not going to discuss the second part of this tweet; we’ll just focus on monoids.

Monoid

First of all, let’s be clear: A monoid is

  1. a set, along with
  2. a binary associative operator (<>)
  3. that has an identity.

In Haskell, when we talk about sets in relation to structures like monoids, we mean types. Haskell has a typeclass called Monoid Monoid that defines a polymorphic binary associative operator and a polymorphic identity. The piece of code that ties a type to a particular implementation of that operator and identity is called an instance declaration.

Endo

One example of a monoid is Endo, Endo the type of functions a -> a – that is, any unary function whose argument and return types are the same.

For Endo, the <> operator works a bit like the pipe operator (|) in the Bash shell. To illustrate, here’s the Bash pipeline that Bob uses in his example:

$ echo -e "1\n5\n1\n5\n2\n1" > x

$ cat x | sort | uniq
1
2
5

Below we’ll define cat, sort, and uniq as Endo values in Haskell.

import Data.Function
import Data.Monoid
import qualified Data.List as List

x = [1,5,1,5,2,1]

cat x = Endo (const x)

sort = Endo List.sort

uniq = Endo $ fix $ \f -> \case
  x : y : zs | x == y -> f (y : zs)
  x : zs              -> x : f zs
  []                  -> []

We can then compose these functions in a chain of <> applications, and the resulting code looks a lot like the Bash. Although you’ll notice that the ordering is reversed, because function composition is “backwards.”

λ> appEndo (uniq <> sort <> cat x) []
[1,2,5]

So, perhaps we can say the pipe intuition is all right in this case. But Endo is not what most people mean when they talk about monoids; list concatenation is more representative.

How to think about highly abstract typeclasses

A typeclass is a generalization of some abstract properties. One of the goals of typeclasses was to make operators overloadable in a way that was extensible to new types, so that any time you think a type you’ve defined has a lawful implementation of a binary associative operation with an identity element, you can make it an instance of Monoid. However, trying to reduce “monoid” to one particular example is misleading.

Historically, mathematicians and logicians noticed that some operations over quite different types of things – addition and multiplication over, for example, integers; Boolean disjunction and conjunction over truth values; and union and intersection operations over sets – shared some common features. These are all classic monoids: they take two arguments and have an identity. The idea of monoids arose through abstracting those properties away from their implementations over those particular types and focusing more on what they have in common than on what is different about them.

Endo is a bit of an oddball among these. But it has the critical properties: The operator is associative, and it has an identity. The identity for the Endo monoid is the identity function, Endo (\x -> x). So it is one of the many monoids.

To understand what “monoid” means practically, we look at examples. But we can’t get lost in the examples or anchored to any one; we always have to come back to the definition, and the study of the particular properties that all of the examples share.