Semigroup And Monoid In Scala
This post is a sequel to Monad In Scala and Applicative vs Monad In Scala.
In order to learn what Monoid
is, we first need to look at what Semigroup
is first.
Semigroup
Semigroup
is a typeclass that defines combine
method (a.k.a append
in Scalaz etc). Naming convention for such method seems to vary per each library, yet the essence is straightforward. It’s a method that combines two semigroup
instances.
Int
is an instance of Semigroup
and its analogous combine method is +
(add two integers).
String
is an instance Semigroup
and its analogous combine method is +
(concatenate two strings).
Note the word analogous since they aren’t the methods defined for Semigroup
per se, these standard library methods simply happen to resemble what Semigroup
’s combine method does.
So then now we wonder, what’s the use of this typeclass when all it does is to combine stuff, which is so common in programming that every language’s standard library already implements them from the get-go?
Like most other typeclass (except the ones like Monad
), they compose. This is immensely useful.
Let’s look List[String]
.
Semigroup[List[String]].combine(List("hello"), List("world")) // List("hello", "world")
List("hello") ++ List("world") // List("hello", "world")
Okay both Semigroup version and standard library version do the same thing. What about Map[String, String]
?
val map1 = Map("foo" -> "hello"); val map2 = Map("foo" -> "world")
Semigroup[Map[String, String]].combine(map1, map2) // Map("foo" -> "helloworld")
map1 ++ map2 // Map("foo" -> "world")
This is interesting. The reason that they differ is because Semigroup
composes while Map’s ++
operator doesn’t. Before we dive deeper, there is an operator for combine
in Semigroup
: |+|
. So from now on, we can call map1 |+| map2
instead.
When we call map1 |+| map2
, whose type is Map[String, String]
, given that Semigroup
knows how to combine Map[String, V]
and String
(that is, there is Semigroup
defined for each), it is able to work on Map[String, String]
as well (which is to say that they are composable
). An abstract implementation of Map[K, V]
is:
class MapSemigroup[K, V](implicit V: Semigroup[V]) extends Semigroup[Map[K, V]] {
def combine(x: Map[K, V], y: Map[K, V]): Map[K, V] = addMap(x, y)(V.combine)
}
All we need to understand is that it takes Semigroup[V]
, which in our case is Semigroup[String]
and pass its combine
method to opaque addMap
function that does the work of combining. Thus, Semigroup[Map[String, String]]
has access to both its own implementation of combine
and also the implementation of combine
of its V
, enabling the composition.
Compare this to map1 ++ map2
which defines a monolithic definition of simply replacing the duplicate key’s values instead of applying custom implementation for those values. This is due to lack of compositionality.
To see its true usefulness, let me throw this at you:
Map("foo" -> Map("foo1" -> "v1"), "bar" -> Map("bar1" -> "v2")) |+| Map("foo" -> Map("foo1" -> "v3"))
// Map("foo" -> Map("foo1" -> "v1v3"), "bar" -> Map("bar1" -> "v2"))
Monoid
Conceptually, Monoid
is to Semigroup
like Applicative
is to Apply
. All it does is to define an additional method empty
(a.k.a zero
in Scalaz) which returns an identity value. When you combine identity value (which is in and of itself, a Monoid
instance) with another Monoid
instance, it simply returns the combined Monoid
in return. It applies no change to it.
Think of an empty string ""
as an example. Combining empty string with string "Hello"
returns "Hello"
.
This is a simple extension to Semigroup
but derived power from this is immense.
What this simple addition allows us to do is an ability to fall back to default for possibly empty values as well as an ability to fold
.
It allows us to implement useful typeclass like Foldable
, which is beyond the scope of this post.
Let’s look at example from algebra library:
// To be able to combine 3 strings
// contrived version in Seimgroup
Semigroup[String].combine("foo", Semigroup[String].combine("bar", "baz"))
// Monoid
Monoid[String].combineAll(List("foo", "bar", "baz"))
In algebra library, combineAll
method internally calls fold
. fold
ing over Semigroup
is not possible since we don’t have an initial value to start fold
from. This is solved by using Monoid
which now has an empty
method which returns an identity value, which can happily be used as initial value.
In fact, fold
method in Foldable
(which is a typeclass that defines fold
-like methods) requires an instance of Monoid
through implicit parameters.
Monoid
allows us to work at higher abstraction where we can combine an non-deterministic values (List
is an example) through the use of default value, that is, empty
. In Semigroup
, values are fully deterministic in that we specify them upfront using notations like |+|
. We cannot pass a varying number of values contained within, say, List
to it. If the List
happens to be empty, then it will simply fall back to its empty
method.
In context of functional programming, non-determinism is quite abstract and takes longer post than this to explain. Though you can refer to wiki here.
Takeaway
Use Semigroup
when you can explicitly define all values involved (**deterministic**) to combine. Think of simple addition of two integers.
Use Monoid
when you cannot explicitly define all values involved (**non-deterministic**) to combine. Think of fold
over a List of integers. (Foldable
’s fold
method is defined through Monoid
’s empty
and combine
methods)
But for all purposes, there is rarely a reason to use lesser-compentent, yet more freeing, Semigroup
over Monoid
(Unlike the relationship between Applicative
and Monad
) since Monoid
is in fact, a strict superset of Semigroup
and can access all its methods without penalties.
You can safely default to Monoid
for most.