In abstract algebra, a semigroup is a set together with a binary operation. For set, in Haskell, you can more or less substitute the word type; there are ways in which types do not perfectly correspond to sets, but it is close enough for this purpose. A binary operation is a function that takes two arguments. The binary operation must be closed – that is, its two arguments and its return value but must all be values from the same set. A semigroup must also obey one law: the law of associativity. That is usually summarized as

x <> (y <> z) == (x <> y) <> z

If the <> operation is as yet unfamiliar to you, you may read it as +; addition is associative.Addition is associative over many types of numbers, but not all. Addition over the reals in mathematics is associative, but addition over floating point numbers is an example of non-associative addition.

Many sets have more than one such operation over them. Integers, for example, are semigroups under both a minimum and a maximum operation.

λ> :type min
min :: Ord a => a -> a -> a

λ> :type max
max :: Ord a => a -> a -> a

λ> min 5 3

λ> max 5 3

The Semigroup class

In Haskell, this algebra is represented by the Semigroup typeclass, which is in base since the release, and in Prelude since GHC 8.4. The Semigroup class is defined with one main operation, <>, and a couple of supporting functions.

class Semigroup a where
  (<>) :: a -> a -> a
  sconcat :: NonEmpty a -> a
  stimes :: Integral b => b -> a -> a
  {-# MINIMAL (<>) #-}

Because the relationship between a type in Haskell and a typeclass must be unique, it is common to use newtype wrappers to implement different semigroup operations over what is the same underlying type. For example, the module gives Min and Max types wrappers around a, which means they can be wrappers around a great number of types.The Data.Semigroup module provides many such newtypes.

newtype Min a = Min { getMin :: a }

newtype Max a = Max { getMax :: a }

Any type that might form semigroups under both such operations can be wrapped in these types and have two unique Semigroup instances.The newtypes in Data.Semigroup are not in Prelude by default, so they must be imported.

λ> import Data.Semigroup

λ> Min 'a' <> Min 'j'
Min {getMin = 'a'}

λ> Max 'a' <> Max 'j'
Max {getMax = 'j'}

λ> Min 4 <> Min 8
Min {getMin = 4}


One thing we like about typeclasses over some other strategies for handling overloading is that most typeclasses have laws that are derived from the laws of the algebras that they represent.While we do support typeclass laws as being generally a desirable thing, it is vital to understand that the compiler does not enforce those laws. We recommend property testing your instances, as with the hedgehog or QuickCheck libraries, to ensure they obey the laws that pertain them. If something compiles, it usually does “work”; the problem is that it might not work the way you expected it to. It is somewhat interesting to note that we do not talk about the fact that the operation must also be binary and closed as being laws, but those are the properties that are enforced by the compiler. For the Semigroup class, there is only one law, and that is the law of associativity.

Associativity in general is a property of some binary operations – that is, operators that take two arguments. Associativity refers to how chains of operations of the same precedence may be grouped when there aren’t parentheses to enforce certain groupings. In Haskell, the associativity of a lot of infix operators is given in their fixity declarations, along with their precedence, and they may be right or left associative; infixr is for right-associative operators, while infixl is for left-associative operators. Exponentiation, for example, is right associative:

λ> :info (^)
infixr 8 ^

That means that if we have a series of exponentiations, they will associate from the right. We can see that by comparing a version with no parentheses to versions that use parentheses to force either a right or left associativity.

λ> 2^3^4

λ> (2^3)^4

λ> 2^(3^4)

Similarly, the r in foldr and the l in foldl stand for right and left associativity, respectively.

But some operators are associative meaning the operations may be grouped arbitrarily without changing the result.It’s not the case that they are grouped arbitrarily for purposes of evaluation, which is why operators like (+) and (<>) still have fixity declarations. It does mean that in a chain of such operations if you thought they were going to left-associate but they actually right-associate, it won’t make any difference to the result. It also doesn’t mean arguments can be given in any order; commutativity (the order of arguments not affecting the result) and associativity are different properties. All semigroups are associative, but they are not all commutative.

For example, if we replaced the (^) operator above with the (+) operator, the parenthesization will not change the result.

λ> 2 + 3 + 4

λ> (2 + 3) + 4

λ> 2 + (3 + 4)

We know from its fixity declaration that the first two expressions should be the same in terms of evaluation order, but because the operation is associative, parenthesizing the expressions in a way that forces a different evaluation order can’t change the result.

The (<>) operator is associative like addition is. That means, for all a, b, c, and d, the following are all equal:

a <> b <> c <> d
(a <> b) <> (c <> d)
a <> (b <> c) <> d
a <> (b <> (c <> d))

And so on, any groupings you can think of; all will evaluate to the same final result. Associativity is a good friend to have for a few reasons. It is very nice not to have to worry about parenthesizing such chains of operations, of course, but more importantly associativity is often what allows us to break problems down into components and then recombine the results of those components. The guarantees we get from associativity are important both to monadic (and applicative) program composition as well as parallelism.

Sign up for access to the full page, plus the complete archive and all the latest content.