Last active
January 13, 2017 10:27
-
-
Save luciferous/06a70c4a20b519673c9ec4a139fa5bfb to your computer and use it in GitHub Desktop.
Revisions
-
luciferous revised this gist
Jan 13, 2017 . 1 changed file with 76 additions and 88 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,36 +16,37 @@ This post is sort of a continuation of the [Comonad Tutorial](http://blog.higher At [work](http://innovation.verizon.com/), we develop and use a Scala library called [Quiver](http://github.com/oncue/quiver) for working with [graphs](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)). In this library, a graph is a recursively defined immutable data structure. A graph, with node IDs of type `V`, node labels `N`, and edge labels `E`, is constructed in one of two ways. It can be empty: ``` haskell empty :: Graph v n e ``` Or it can be of the form `c & g`, where `c` is the _context_ of one node of the graph and `g` is the rest of the graph with that node removed: ``` haskell data Context v n e = Context { inEdges :: Vector (e, v) , vertex :: v , label :: n , outEdges :: Vector (e, v) } (&) :: Graph v n e -> Graph v n e (&) g = undefined ``` By the same token, we can decompose a graph on a particular node: ``` haskell decomp :: Graph v n e -> v -> Option (GDecomp v n e) ``` Where a `GDecomp` is a `Context` for the node `v` (if it exists in the graph) together with the rest of the graph: ``` haskell data GDecomp v n e = GDecomp { ctx :: Context v n e , rest :: Graph v n e } ``` ## Recursive decomposition @@ -62,35 +63,31 @@ If we decompose on the node `a`, we get a view of the graph from the perspective Quiver can arbitrarily choose a node for us, so we can look at the context of some "first" node: ``` haskell decompAny :: Graph v n e -> Maybe (GDecomp v n e) ``` We can keep decomposing the remainder recursively, to perform an arbitrary calculation over the entire graph: ``` haskell fold :: Graph v n e -> b -> ((Context v n e, b) -> b) -> b fold g b f = undefined ``` The implementation of `fold` will be something like: ``` haskell fromMaybe b $ do (GDecomp ctx rest) <- decompAny g return f (ctx, fold rest b f) ``` For instance, if we wanted to count the edges in the graph `g`, we could do: ``` haskell do (Context(ins, _, _, outs), b) <- fold g 0 return size ins + size outs + b ``` The recursive decomposition will guarantee that our function doesn't see any given edge more than once. For the graph `g` above, `(g fold b)(f)` would look something like this: @@ -103,12 +100,9 @@ Let's now say that we wanted to find the maximum [degree](https://en.wikipedia.o A first stab might be: ``` haskell maxDegree :: Graph v n e -> Int maxDegree g = fold g 0 $ \(Context ins _ _ outs) -> max (size ins + size outs) z ``` But that would get the incorrect result. In our graph `g` above, the nodes `b`, `d`, and `f` have a degree of 3, but this fold would find the highest degree to be 2. The reason is that once our function gets to look at `b`, its edge to `a` has already been removed, and once it sees `f`, it has no edges left to look at. @@ -119,20 +113,17 @@ This was the issue that came up at work. This behaviour of `fold` is both correc That is, we often want to consider each node in the context of the entire graph we started with. In order to express that with `fold`, we have to decompose the original graph at each step: ``` haskell maxDegree :: Graph v n e -> Int maxDegree g = fold g 0 $ \(c, z) -> max z $ fromMaybe 0 $ do (GDecomp (Context ins _ _ outs) _) <- decompose g (vertex c) return size ins + size outs ``` But what if we could have a combinator that _labels each node with its context_? ``` haskell contextGraph :: Graph v n e -> Graph v (Context v n e) e ``` Visually, that looks something like this: @@ -141,37 +132,37 @@ Visually, that looks something like this: If we now fold over `contextGraph(g)` rather than `g`, we get to see the whole graph from the perspective of each node in turn. We can then write the `maxDegree` function like this: ``` haskell maxDegree :: Graph v n e -> Int maxDegree g = fold (contextGraph g) 0 $ \(c, z) -> max z (insSize c + outsSize c) where insSize = size . ins . label outsSize = size . outs . label ``` ## Two different comonads This all sounds suspiciously like a comonad! Of course, `Graph` itself is not a comonad, but `GDecomp` definitely is. The `counit` just gets the label of the node that's been `decomp`ed out: ``` haskell type GDecomp2 v e n = GDecomp v n e instance Comonad (GDecomp2 v e) where extract g = label (ctx g) extend g f = undefined ``` The `cobind` can be implemented in one of two ways. There's the "successive decompositions" version: ``` haskell extend :: GDecomp v a e -> GDecomp v a e -> b extend g f = GDecomp newCtx newRest where newCtx = (ctx g) { label = f g } newRest = fromMaybe empty $ do gdecomp <- decompAny (rest g) let (GDecomp c r) = extract gdecomp f return c & r ``` Visually, it looks like this: @@ -190,29 +181,26 @@ That one looks visually more like this: Its `cobind` takes a computation focused on one node of the graph (that is, on a `GDecomp`), repeats that for every other decomposition of the original graph in turn, and stores the results in the respective node labels: ``` haskell extend :: GDecomp v a e -> GDecomp v a e -> b extend g f = GDecomp newCtx newRest where newCtx = (ctx g) { label = f g } newRest = fold rest empty $ \(c, acc) -> c { label = newLabel } newLabel = f (get (decomp orig (vertex c))) & acc ``` This is useful for algorithms where we want to label every node with some information computed from its neighborhood. For example, some clustering algorithms start by assigning each node its own cluster, then repeatedly joining nodes to the most popular cluster in their immediate neighborhood, until a fixed point is reached. As a simpler example, we could take the average value for the labels of neighboring nodes, to apply something like a low-pass filter to the whole graph: ``` haskell lopass :: Graph v Int e -> Graph v Int e lopass g = fromMaybe g $ do d <- decompAny g return $ cobind d $ \x -> let neighbors = fmap (label . get . decomp g) (inEdges x ++ outEdges x) in (sum neighbors + label x) / (length neighbors + 1) ``` The difference between these two comonad instances is essentially the same as the difference between [`NonEmptyList`](https://github.com/scalaz/scalaz/blob/series/7.3.x/core/src/main/scala/scalaz/NonEmptyList.scala) and the nonempty list [`Zipper`](https://github.com/scalaz/scalaz/blob/series/7.3.x/core/src/main/scala/scalaz/Zipper.scala). -
luciferous created this gist
Jan 13, 2017 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,220 @@ --- layout: post title: "A comonad of graph decompositions" date: 2016-04-02 13:02:54 -0400 comments: true author: Rúnar categories: scala comonads commentIssueId: 22 --- I want to talk about a comonad that came up at [work](http://innovation.verizon.com/) the other day. Actually, two of them, as the data structure in question is a comonad in at least two ways, and the issue that came up is related to the difference between those two comonads. This post is sort of a continuation of the [Comonad Tutorial](http://blog.higher-order.com/blog/2015/06/23/a-scala-comonad-tutorial/), and we can call this "part 3". I'm going to assume the reader has a basic familiarity with comonads. ## Inductive Graphs At [work](http://innovation.verizon.com/), we develop and use a Scala library called [Quiver](http://github.com/oncue/quiver) for working with [graphs](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)). In this library, a graph is a recursively defined immutable data structure. A graph, with node IDs of type `V`, node labels `N`, and edge labels `E`, is constructed in one of two ways. It can be empty: ``` scala def empty[V,N,E]: Graph[V,N,E] ``` Or it can be of the form `c & g`, where `c` is the _context_ of one node of the graph and `g` is the rest of the graph with that node removed: ``` scala case class Context[V,N,E]( inEdges: Vector[(E,V)], vertex: V, label: N, outEdges: Vector[(E,V)] ) { def &(g: Graph[V,N,E]): Graph[V,N,E] = ??? } ``` By the same token, we can decompose a graph on a particular node: ``` scala g: Graph[V,N,E] v: V g decomp v: Option[GDecomp[V,N,E]] ``` Where a `GDecomp` is a `Context` for the node `v` (if it exists in the graph) together with the rest of the graph: ``` scala case class GDecomp[V,N,E](ctx: Context[V,N,E], rest: Graph[V,N,E]) ``` ## Recursive decomposition Let's say we start with a graph `g`, like this:  I'm using an _undirected_ graph here for simplification. An undirected graph is one in which the edges don't have a direction. In Quiver, this is represented as a graph where the "in" edges of each node are the same as its "out" edges. If we decompose on the node `a`, we get a view of the graph from the perspective of `a`. That is, we'll have a `Context` letting us look at the label, vertex ID, and edges to and from `a`, and we'll also have the remainder of the graph, with the node `a` "broken off":  Quiver can arbitrarily choose a node for us, so we can look at the context of some "first" node: ``` scala g: Graph[V,N,E] g.decompAny: Option[GDecomp[V,N,E]] ``` We can keep decomposing the remainder recursively, to perform an arbitrary calculation over the entire graph: ``` scala f: (Context[V,N,E], B) => B b: B (g fold b)(f): B ``` The implementation of `fold` will be something like: ``` scala g.decompAny map { case GDecomp(ctx, rest) => f(ctx, rest.fold(b)(f)) } getOrElse b ``` For instance, if we wanted to count the edges in the graph `g`, we could do: ``` scala (g fold 0) { case (Context(ins, _, _, outs), b) => ins.size + outs.size + b } ``` The recursive decomposition will guarantee that our function doesn't see any given edge more than once. For the graph `g` above, `(g fold b)(f)` would look something like this:  ## Graph Rotations Let's now say that we wanted to find the maximum [degree](https://en.wikipedia.org/wiki/Degree_(graph_theory) of a graph. That is, find the highest number of edges to or from any node. A first stab might be: ``` scala def maxDegree[V,N,E](g: Graph[V,N,E]): Int = g.fold(0) { case (Context(ins, _, _, outs), z) => (ins.size + outs.size) max z } ``` But that would get the incorrect result. In our graph `g` above, the nodes `b`, `d`, and `f` have a degree of 3, but this fold would find the highest degree to be 2. The reason is that once our function gets to look at `b`, its edge to `a` has already been removed, and once it sees `f`, it has no edges left to look at. This was the issue that came up at work. This behaviour of `fold` is both correct and useful, but it can be surprising. What we might expect is that instead of receiving successive decompositions, our function sees "all rotations" of the graph through the `decomp` operator:  That is, we often want to consider each node in the context of the entire graph we started with. In order to express that with `fold`, we have to decompose the original graph at each step: ``` scala def maxDegree[V,N,E](g: Graph[V,N,E]): Int = g.fold(0) { (c, z) => g.decompose(c.vertex).map { case GDecomp(Context(ins, _, _, outs), _) => ins.size + outs.size }.getOrElse(0) max z } ``` But what if we could have a combinator that _labels each node with its context_? ``` scala def contextGraph(g: Graph[V,N,E]): Graph[V,Context[V,N,E],E] ``` Visually, that looks something like this:  If we now fold over `contextGraph(g)` rather than `g`, we get to see the whole graph from the perspective of each node in turn. We can then write the `maxDegree` function like this: ``` scala def maxDegree[V,N,E](g: Graph[V,N,E]): Int = contextGraph(g).fold(0) { (c, z) => z max (c.label.ins.size + c.label.outs.size) } ``` ## Two different comonads This all sounds suspiciously like a comonad! Of course, `Graph` itself is not a comonad, but `GDecomp` definitely is. The `counit` just gets the label of the node that's been `decomp`ed out: ``` scala def gdecompComonad[V,E] = new Comonad[λ[α => GDecomp[V,α,E]]] { def counit[A](g: GDecomp[V,A,E]): A = g.ctx.label def cobind[A,B](g: GDecomp[V,A,E])( f: GDecomp[V,A,E] => B): GDecomp[B] = ??? } ``` The `cobind` can be implemented in one of two ways. There's the "successive decompositions" version: ``` scala def cobind[A,B](g: GDecomp[V,A,E])( f: GDecomp[V,A,E] => B): GDecomp[B] = GDecomp(g.ctx.copy(label = f(g)), g.rest.decompAny.map { val GDecomp(c, r) = cobind(_)(f) c & r } getOrElse empty) ``` Visually, it looks like this:  It _exposes the substructure_ of the graph by storing it in the labels of the nodes. It's very much like the familiar `NonEmptyList` comonad, which replaces each element in the list with the whole sublist from that element on. So this is the comonad of _recursive folds over a graph_. Really its action is the same as as just `fold`. It takes a computation on one decomposition of the graph, and extends it to all sub-decompositions. But there's another, comonad that's much more useful _as a comonad_. That's the comonad that works like `contextGraph` from before, except instead of copying the context of a node into its label, we copy the whole decomposition; both the context and the remainder of the graph. That one looks visually more like this:  Its `cobind` takes a computation focused on one node of the graph (that is, on a `GDecomp`), repeats that for every other decomposition of the original graph in turn, and stores the results in the respective node labels: ``` scala def cobind[A,B](g: GDecomp[V,A,E])( f: GDecomp[V,A,E] => B): GDecomp[B] = { val orig = g.ctx & g.rest GDecomp(g.ctx.copy(label = f(g)), rest.fold(empty) { (c, acc) => c.copy(label = f(orig.decomp(c.vertex).get)) & acc }) } ``` This is useful for algorithms where we want to label every node with some information computed from its neighborhood. For example, some clustering algorithms start by assigning each node its own cluster, then repeatedly joining nodes to the most popular cluster in their immediate neighborhood, until a fixed point is reached. As a simpler example, we could take the average value for the labels of neighboring nodes, to apply something like a low-pass filter to the whole graph: ``` scala def lopass[V,E](g: Graph[V,Int,E]): Graph[V,Int,E] = g.decompAny.map { d => cobind(d) { x => val neighbors = (x.inEdges ++ x.outEdges).map { n => g.decomp(n).get.ctx.label } (neighbors.sum + x.label) / (neighbors.length + 1) }} getOrElse g ``` The difference between these two comonad instances is essentially the same as the difference between [`NonEmptyList`](https://github.com/scalaz/scalaz/blob/series/7.3.x/core/src/main/scala/scalaz/NonEmptyList.scala) and the nonempty list [`Zipper`](https://github.com/scalaz/scalaz/blob/series/7.3.x/core/src/main/scala/scalaz/Zipper.scala). It's this latter "decomp zipper" comonad that I decided to ultimately include as the `Comonad` instance for `quiver.GDecomp`.