Semigroup And Monoid In Scala


functional programming, monoid, scala, semigroup

Semigroup And Monoid In Scala


Daniel Shin - October 22, 2015

This post is a sequel to Monad In Scala and Applicative vs Monad In Scala.

In order to learn what Monoid is, we first need to look at what Semigroup is first.

Semigroup

Semigroup is a typeclass that defines combine method (a.k.a append in Scalaz etc). Naming convention for such method seems to vary per each library, yet the essence is straightforward. It’s a method that combines two semigroup instances.

Int is an instance of Semigroup and its analogous combine method is + (add two integers).

String is an instance Semigroup and its analogous combine method is + (concatenate two strings).

Note the word analogous since they aren’t the methods defined for Semigroup per se, these standard library methods simply happen to resemble what Semigroup ’s combine method does.

So then now we wonder, what’s the use of this typeclass when all it does is to combine stuff, which is so common in programming that every language’s standard library already implements them from the get-go?

Like most other typeclass (except the ones like Monad), they compose. This is immensely useful.

Let’s look List[String].

Semigroup[List[String]].combine(List("hello"), List("world")) // List("hello", "world")
List("hello") ++ List("world") // List("hello", "world")

Okay both Semigroup version and standard library version do the same thing. What about Map[String, String]?

val map1 = Map("foo" -> "hello"); val map2 = Map("foo" -> "world")
Semigroup[Map[String, String]].combine(map1, map2) // Map("foo" -> "helloworld")
map1 ++ map2 // Map("foo" -> "world")

This is interesting. The reason that they differ is because Semigroup composes while Map’s ++ operator doesn’t. Before we dive deeper, there is an operator for combine in Semigroup: |+|. So from now on, we can call map1 |+| map2 instead.

When we call map1 |+| map2, whose type is Map[String, String], given that Semigroup knows how to combine Map[String, V] and String (that is, there is Semigroup defined for each), it is able to work on Map[String, String] as well (which is to say that they are composable). An abstract implementation of Map[K, V] is:

class MapSemigroup[K, V](implicit V: Semigroup[V]) extends Semigroup[Map[K, V]]  {
  def combine(x: Map[K, V], y: Map[K, V]): Map[K, V] = addMap(x, y)(V.combine)
}

All we need to understand is that it takes Semigroup[V], which in our case is Semigroup[String] and pass its combine method to opaque addMap function that does the work of combining. Thus, Semigroup[Map[String, String]] has access to both its own implementation of combine and also the implementation of combine of its V, enabling the composition.

Compare this to map1 ++ map2 which defines a monolithic definition of simply replacing the duplicate key’s values instead of applying custom implementation for those values. This is due to lack of compositionality.

To see its true usefulness, let me throw this at you:

Map("foo" -> Map("foo1" -> "v1"), "bar" -> Map("bar1" -> "v2")) |+| Map("foo" -> Map("foo1" -> "v3"))
// Map("foo" -> Map("foo1" -> "v1v3"), "bar" -> Map("bar1" -> "v2"))

Monoid

Conceptually, Monoid is to Semigroup like Applicative is to Apply. All it does is to define an additional method empty (a.k.a zero in Scalaz) which returns an identity value. When you combine identity value (which is in and of itself, a Monoid instance) with another Monoid instance, it simply returns the combined Monoid in return. It applies no change to it.

Think of an empty string "" as an example. Combining empty string with string "Hello" returns "Hello".

This is a simple extension to Semigroup but derived power from this is immense.

What this simple addition allows us to do is an ability to fall back to default for possibly empty values as well as an ability to fold.

It allows us to implement useful typeclass like Foldable, which is beyond the scope of this post.

Let’s look at example from algebra library:

// To be able to combine 3 strings

// contrived version in Seimgroup
Semigroup[String].combine("foo", Semigroup[String].combine("bar", "baz"))

// Monoid
Monoid[String].combineAll(List("foo", "bar", "baz"))

In algebra library, combineAll method internally calls fold. folding over Semigroup is not possible since we don’t have an initial value to start fold from. This is solved by using Monoid which now has an empty method which returns an identity value, which can happily be used as initial value.

In fact, fold method in Foldable (which is a typeclass that defines fold-like methods) requires an instance of Monoid through implicit parameters.

Monoid allows us to work at higher abstraction where we can combine an non-deterministic values (List is an example) through the use of default value, that is, empty. In Semigroup, values are fully deterministic in that we specify them upfront using notations like |+|. We cannot pass a varying number of values contained within, say, List to it. If the List happens to be empty, then it will simply fall back to its empty method.

In context of functional programming, non-determinism is quite abstract and takes longer post than this to explain. Though you can refer to wiki here.

Takeaway

Use Semigroup when you can explicitly define all values involved (**deterministic**) to combine. Think of simple addition of two integers.

Use Monoid when you cannot explicitly define all values involved (**non-deterministic**) to combine. Think of fold over a List of integers. (Foldable’s fold method is defined through Monoid’s empty and combine methods)

But for all purposes, there is rarely a reason to use lesser-compentent, yet more freeing, Semigroup over Monoid (Unlike the relationship between Applicative and Monad) since Monoid is in fact, a strict superset of Semigroup and can access all its methods without penalties.

You can safely default to Monoid for most.