As you might have to expected, especially since I wrote it in the last blog post, we will use our new best friend the -semiring to do so. We acutally come to the premier example of a semiring, the regular expression. You migth know regular expressions from other programming languages, which is as helpful as it is unfortunate. Why is that you might ask and the answer is simple: There is a singificant difference between regular expressions used in theoretical computer science and regular expressions as implemented in programming languages. The later are strictly more powerful. The nice part is they are also a syntactical super set of the first and by simply dropping this extra syntax we get exactly the former. So for you, this might result in some unlearning, I can’t help that, sorry.
So for everybody not knowing regular expressions from computer science curriculum, let’s give a briefe explanation.
First we need a finite set of “letters” called alphabet usually for illustrative purposes one uses the regular letters of a usual alphabet, like . But any alphabet will do, we could for example use the natural numbers up to some number, the edges of a finite graph or all subsets of as letters, it doesn’t matter.
Let be any letter of the alphabet , then we can give the following recursive definition of the regular expressions as follows:
Let’s go through this and decipher what it means.
- this is simple. If we write nothing, it is a valid regular expression. And since we can’t leave blank space in a definition and think that everybody picks up that this is a significant syntactical element we write this as .
- we need to construct a word that has no letters, to do that we use the symbol . You might ask what is the difference between nothing and a a word of no letters. It is the same difference as between and one is a set containing something of no value, the other is simply empty, same goes here.
- If we have a letter , writing just that letter is a valid regular expression.
- If we have two regular expressions already, we can either use the left or the right, appropriatly this is usually called alternative or choice. The plus-sign is somewhat of a convention for regular expressions, but you might have already guessed what will make it’s way into our semiring in the end.
- this is called concatenation or sequential compositon it simply states that we can look for something that matches the first regular expression followed by something that matches the second regular expression. If there is no danger of misreading it we usually ommit the just like with multiplication.
- this finally is arbitrary iteration, if we have a regular expression we can match it an arbitrary amount of times.
Usually you define some form of rules that say we can ommit some of the parens by saying binds stronger than which in turn binds stronger than . Since this just looks like basic arithmetic I trust you can follow.
Let’s give an example, suppose we have this regular expression:
Then let’s first check that it is a valid regular expression for the alphabet . After stating this, we know that the individual letters are valid regular expressions. Next we combine and with and get same for and now we can apply and get . As a last step we combine and and get . Ommiting the ’s we get that the regular expression is valid.
Now what does it “mean”
Let’s assume we have a graph and the edges are labeled with letters so there is an edge between two nodes labeld with , one labeled and so on, like in this graph.
Then the regular expression above describes all pathes from to . We first have to take the edge labeled which is labeld then either the edges labeled followed by the edge labeled or the edges labeled and .
I think you see how this is useful for our case. So let’s go on and define a semiring.
The Regular Expression -Semiring
Acutally regular expressions form a family of -semirings, one for each underlying alphabet. So let’s call our alphabet , which is the usual name for the alphabet in computer science. Then
is a -semiring with the rules above and two corner cases which give rise to the following two (standard) definitions:
Let’s do a quick check if our properties hold.
- since is the same as or this holds.
- , by almost the same argument, if we first decide between or and then between the result of that decission or it’s the same as deciding the other way around.
- , this is by the definition given above.
- the concatenation of and yields that concatenated with yields which is the same as first concatenating and to and then prepending .
- concatenating the empty word to anything will not change anything, so this is ok too.
- that’s by the definition above.
- , this is simply either first doing and then deciding between or or first deciding to go or so this holds true to.
- , almost the same as the one before, either decided between and and then do or deciding to do or should be the same.
Ok, we are satisfied, this is a semiring. Now for the part, which this time is a little more interesting than before.
If we want to describe all ways from one node to another, there might be loops on the way. For example let’s modify the graph from above slightly that it looks like this.
Now on a way from to we would be allowed to take the loop labeled arbitrarily often, or writing it in the syntax of regular expression . We have to take this into account for our -semiring definition of regular expressions. So let’s do it.
I have a small problem since now we have two different operations, one from the semiring and one from the regular expressions. To keep those apart I will use for the semiring and for the regular expression star. I hope this isn’t to confusing for you.
Let us define a .
Even though this is a bit tricky I think you are by now fully capable of checking that this will actually give us a valid -semiring.
Now for a Haskell implementation. Since I don’t want to conflict with other operators I use Or, Concat and Star instead of and .
data StarSemiringExpression a = Var a | Or (StarSemiringExpression a) (StarSemiringExpression a) | Concat (StarSemiringExpression a) (StarSemiringExpression a) | Star (StarSemiringExpression a) | None | Empty newtype RE a = RE (StarSemiringExpression a) re :: a -> RE a re = RE . Var instance Semiring (RE a) where zero = RE None one = RE Empty RE None <+> x = x x <+> RE None = x RE Empty <+> RE Empty = RE Empty RE Empty <+> RE (Star a) = RE (Star a) RE (Star a) <+> RE Empty = RE (Star a) RE x <+> RE y = RE (x `Or` y) RE Empty <.> x = x x <.> RE Empty = x RE None <.> _ = RE None _ <.> RE None = RE None RE x <.> RE y = RE (x `Concat` y) instance StarSemiring (RE a) where star (RE None) = RE Empty star (RE Empty) = RE Empty star (RE (Star x)) = star (RE x) star (RE x) = RE (Star x)
The helper function re in combination with leaving out the actual alphabet for our regular expression allows us to use any type, as I’ve shown above, and implicitly creating the alphabet from the “letters” that are used in the regular expression. (For mathmatically inclined readers: Yes, this is only ok, as long as we only use finite regular expression, but I would doubt that you can write an infinite one ;-) )
There is one thing left to do to put this to use, we need another small helper function:
reGraph :: (Ix i) => Matrix i (Maye a) -> Matrix (Re a) reGraph = fmap (maybe zero re)
This simply takes a matrix, where there might be an entry at any index and transforms that into a regular expression of one letter and takes the entry itself as letter, if there is no entry present at that point it just uses .
What about the Matrix
What remains is to clear up is what our operation matrixes does now. What we saw in the post from last week is that the algorithm minimized or maximized some property we calculated. This time it’s a little different. Operations like , and “discard” one of their operatands but the operation from regular expressions doesn’t it creates an alternative. So for any node we can use as an intermediate what we get is “Either use the path we already know OR the path we can construct by using the intermediate node we are currently testing.” Now there might not be a path leading from our start to our target node using the specific intermediate node, then we get back a as an alternative path, which is the only value that is acutally discarded by . So what we get in the end is a regular expression specifying all pathes we can take to get from a start to a target node.
Oh by the way with this we implemented another well known algorithm with the same 7 lines of code I presented in the last blog post. It’s called the McNaughton-Yamada algorithm or the Kleene-Construction and is taught to probably every computer science major on the planet. (And forgotten about 5 minutes later ;-) )
In the last blog post we started to put our -semiring knowledge to use on graphs and found that the same 7 lines of code got us various properties, depending on which semiring we plugged in. This blog post showed how we can describe path between nodes and how we can get all path between two given nodes. Again with the very same 7 lines of code. What remains is to put both of those together to get all path exibiting a given property and that is what we’ll do in the next installment.