### 19Graphs

In From Acyclicity to Cycles we introduced a special kind of sharing: when the data become cyclic, i.e., there exist values such that traversing other reachable values from them eventually gets you back to the value at which you began. Data that have this characteristic are called graphs.Technically, a cycle is not necessary to be a graph; a tree or a DAG is also regarded as a (degenerate) graph. In this section, however, we are interested in graphs that have the potential for cycles.

Lots of very important data are graphs. For instance, the people and connections in social media form a graph: the people are nodes or vertices and the connections (such as friendships) are links or edges. They form a graph because for many people, if you follow their friends and then the friends of their friends, you will eventually get back to the person you started with. (Most simply, this happens when two people are each others’ friends.) The Web, similarly is a graph: the nodes are pages and the edges are links between pages. The Internet is a graph: the nodes are machines and the edges are links between machines. A transportation network is a graph: e.g., cities are nodes and the edges are transportation links between them. And so on. Therefore, it is essential to understand graphs to represent and process a great deal of interesting real-world data.

Graphs are important and interesting for not only practical but also principled reasons. The property that a traversal can end up where it began means that traditional methods of processing will no longer work: if it blindly processes every node it visits, it could end up in an infinite loop. Therefore, we need better structural recipes for our programs. In addition, graphs have a very rich structure, which lends itself to several interesting computations over them. We will study both these aspects of graphs below.

#### 19.1Understanding Graphs

Consider again the binary trees we saw earlier [Re-Examining Equality]. Let’s now try to distort the definition of a “tree” by creating ones with cycles, i.e., trees with nodes that point back to themselves (in the sense of identical). As we saw earlier [From Acyclicity to Cycles], it is not completely straightforward to create such a structure, but what we saw earlier [Streams From Functions] can help us here, by letting us suspend the evaluation of the cyclic link. That is, we have to not only use rec, we must also use a function to delay evaluation. In turn, we have to update the annotations on the fields. Since these are not going to be “trees” any more, we’ll use a name that is suggestive but not outright incorrect:
data BinT:
| leaf
| node(v, l :: ( -> BinT), r :: ( -> BinT))
end
Now let’s try to construct some cyclic values. Here are a few examples:
rec tr = node("rec", lam(): tr end, lam(): tr end)
t0 = node(0, lam(): leaf end, lam(): leaf end)
t1 = node(1, lam(): t0 end, lam(): t0 end)
t2 = node(2, lam(): t1 end, lam(): t1 end)
Now let’s try to compute the size of a BinT. Here’s the obvious program:
fun sizeinf(t :: BinT) -> Number:
cases (BinT) t:
| leaf => 0
| node(v, l, r) =>
ls = sizeinf(l())
rs = sizeinf(r())
1 + ls + rs
end
end
(We’ll see why we call it sizeinf in a moment.)

Do Now!

What happens when we call sizeinf(tr)?

It goes into an infinite loop: hence the inf in its name.

There are two very different meanings for “size”. One is, “How many times can we traverse an edge?” The other is, “How many distinct nodes were constructed as part of the data structure?” With trees, by definition, these two are the same. With a DAG the former exceeds the latter but only by a finite amount. With a general graph, the former can exceed the latter by an infinite amount. In the case of a datum like tr, we can in fact traverse edges an infinite number of times. But the total number of constructed nodes is only one! Let’s write this as test cases in terms of a size function, to be defined:
check:
size(tr) is 1
size(t0) is 1
size(t1) is 2
size(t2) is 3
end

It’s clear that we need to somehow remember what nodes we have visited previously: that is, we need a computation with “memory”. In principle this is easy: we just create an extra data structure that checks whether a node has already been counted. As long as we update this data structure correctly, we should be all set. Here’s an implementation.

fun sizect(t :: BinT) -> Number:
fun szacc(shadow t :: BinT, seen :: List<BinT>) -> Number:
if has-id(seen, t):
0
else:
cases (BinT) t:
| leaf => 0
| node(v, l, r) =>
ls = szacc(l(), ns)
rs = szacc(r(), ns)
1 + ls + rs
end
end
end
szacc(t, empty)
end

The extra parameter, seen, is called an accumulator, because it “accumulates” the list of seen nodes.Note that this could just as well be a set; it doesn’t have to be a list. The support function it needs checks whether a given node has already been seen:
fun has-id<A>(seen :: List<A>, t :: A):
cases (List) seen:
| empty => false
if f <=> t: true
else: has-id(r, t)
end
end
end

How does this do? Well, sizect(tr) is indeed 1, but sizect(t1) is 3 and sizect(t2) is 7!

Do Now!

Explain why these answers came out as they did.

The fundamental problem is that we’re not doing a very good job of remembering! Look at this pair of lines:
ls = szacc(l(), ns)
rs = szacc(r(), ns)

The nodes seen while traversing the left branch are effectively forgotten, because the only nodes we remember when traversing the right branch are those in ns: namely, the current node and those visited “higher up”. As a result, any nodes that “cross sides” are counted twice.

The remedy for this, therefore, is to remember every node we visit. Then, when we have no more nodes to process, instead of returning only the size, we should return all the nodes visited until now. This ensures that nodes that have multiple paths to them are visited on only one path, not more than once. The logic for this is to return two values from each traversal—the size and all the visited nodes—and not just one.

fun size(t :: BinT) -> Number:
fun szacc(shadow t :: BinT, seen :: List<BinT>)
-> {n :: Number, s :: List<BinT>}:
if has-id(seen, t):
{n: 0, s: seen}
else:
cases (BinT) t:
| leaf => {n: 0, s: seen}
| node(v, l, r) =>
ls = szacc(l(), ns)
rs = szacc(r(), ls.s)
{n: 1 + ls.n + rs.n, s: rs.s}
end
end
end
szacc(t, empty).n
end

Sure enough, this function satisfies the above tests.

#### 19.2Representations

The representation we’ve seen above for graphs is certainly a start towards creating cyclic data, but it’s not very elegant. It’s both error-prone and inelegant to have to write lam everywhere, and remember to apply functions to () to obtain the actual values. Therefore, here we explore other representations of graphs that are more conventional and also much simpler to manipulate.

There are numerous ways to represent graphs, and the choice of representation depends on several factors:
1. The structure of the graph, and in particular, its density. We will discuss this further later (Measuring Complexity for Graphs).

2. The representation in which the data are provided by external sources. Sometimes it may be easier to simply adapt to their representation; in particular, in some cases there may not even be a choice.

3. The features provided by the programming language, which make some representations much harder to use than others.

Previously [(part "sets")], we have explored the idea of having many different representations for one datatype. As we will see, this is very true of graphs as well. Therefore, it would be best if we could arrive at a common interface to process graphs, so that all later programs can be written in terms of this interface, without overly depending on the underlying representation.

In terms of representations, there are three main things we need:
1. A way to construct graphs.

2. A way to identify (i.e., tell apart) nodes or vertices in a graph.

3. Given a way to identify nodes, a way to get that node’s neighbors in the graph.

Any interface that satisfies these properties will suffice. For simplicity, we will focus on the second and third of these and not abstract over the process of constructing a graph.

Our running example will be a graph whose nodes are cities in the United States and edges are known direct flight connections between them, reminiscent of the route maps found in the back of airlines’ in-flight magazines.

Here’s our first representation. We will assume that every node has a unique name (such a name, when used to look up information in a repository of data, is sometimes called a key). A node is then a key, some information about that node, and a list of keys that refer to other nodes:
data KeyedNode:
| keyed-node(key :: String, content, adj :: List<String>)
end

type KNGraph = List<KeyedNode>

type Node = KeyedNode
type Graph = KNGraph
(Here we’re assuming our keys are strings.)

Here’s a concrete instance of such a graph:
kn-cities :: Graph = block:
knEWR = keyed-node("nwk", "Newark", [list: "chi", "den", "saf", "hou"])
knORD = keyed-node("chi", "Chicago", [list: "nwk", "saf"])
knWOS = keyed-node("wor", "Worcester", [list: ])
knHOU = keyed-node("hou", "Houston", [list: "nwk", "saf"])
knDEN = keyed-node("den", "Denver", [list: "nwk", "saf"])
knSFO = keyed-node("saf", "San Francisco", [list: "nwk", "den", "hou"])
[list: knEWR, knORD, knWOS, knHOU, knDEN, knSFO]
end

Given a key, here’s how we look up its neighbor:
fun find-kn(key :: Key, graph :: Graph) -> Node:
matches = for filter(n from graph):
n.key == key
end
matches.first # there had better be exactly one!
end

Exercise

Convert the comment in the function into an invariant about the datum. Express this invariant as a refinement and add it to the declaration of graphs.

With this support, we can look up neighbors easily:
fun kn-neighbors(city :: Key,  graph :: Graph) -> List<Key>:
city-node = find-kn(city, graph)
end

When it comes to testing, some tests are easy to write. Others, however, might require describing entire nodes, which can be unwieldy, so for the purpose of checking our implementation it suffices to examine just a part of the result:
check:
ns = kn-neighbors("hou", kn-cities)

ns is [list: "nwk", "saf"]

map(_.content, map(find-kn(_, kn-cities), ns)) is
[list: "Newark", "San Francisco"]
end

In some languages, it is common to use numbers as names. This is especially useful when numbers can be used to get access to an element in a constant amount of time (in return for having a bound on the number of elements that can be accessed). Here, we use a list—which does not provide constant-time access to arbitrary elements—to illustrate this concept. Most of this will look very similar to what we had before; we’ll comment on a key difference at the end.

First, the datatype:
data IndexedNode:
end

type IXGraph = List<IndexedNode>

type Node = IndexedNode
type Graph = IXGraph
Our graph now looks like this:
ix-cities :: Graph = block:
inEWR = idxed-node("Newark", [list: 1, 4, 5, 3])
inORD = idxed-node("Chicago", [list: 0, 5])
inWOS = idxed-node("Worcester", [list: ])
inHOU = idxed-node("Houston", [list: 0, 5])
inDEN = idxed-node("Denver", [list: 0, 5])
inSFO = idxed-node("San Francisco", [list: 0, 4, 3])
[list: inEWR, inORD, inWOS, inHOU, inDEN, inSFO]
end
where we’re assuming indices begin at 0. To find a node:
fun find-ix(idx :: Key, graph :: Graph) -> Node:
index(graph, idx)
end
We can then find neighbors almost as before:
fun ix-neighbors(city :: Key,  graph :: Graph) -> List<Key>:
city-node = find-ix(city, graph)
end
Finally, our tests also look similar:
check:
ns = ix-neighbors(3, ix-cities)

ns is [list: 0, 5]

map(_.content, map(find-ix(_, ix-cities), ns)) is
[list: "Newark", "San Francisco"]
end

Something deeper is going on here. The keyed nodes have intrinsic keys: the key is part of the datum itself. Thus, given just a node, we can determine its key. In contrast, the indexed nodes represent extrinsic keys: the keys are determined outside the datum, and in particular by the position in some other data structure. Given a node and not the entire graph, we cannot know for what its key is. Even given the entire graph, we can only determine its key by using identical, which is a rather unsatisfactory approach to recovering fundamental information. This highlights a weakness of using extrinsically keyed representations of information. (In return, extrinsically keyed representations are easier to reassemble into new collections of data, because there is no danger of keys clashing: there are no intrinsic keys to clash.)

##### 19.2.3A List of Edges

The representations we have seen until now have given priority to nodes, making edges simply a part of the information in a node. We could, instead, use a representation that makes edges primary, and nodes simply be the entities that lie at their ends:
data Edge:
| edge(src :: String, dst :: String)
end

type LEGraph = List<Edge>

type Graph = LEGraph
Then, our flight network becomes:
le-cities :: Graph =
[list:
edge("Newark", "Chicago"),
edge("Newark", "Denver"),
edge("Newark", "San Francisco"),
edge("Newark", "Houston"),
edge("Chicago", "Newark"),
edge("Chicago", "San Francisco"),
edge("Houston", "Newark"),
edge("Houston", "San Francisco"),
edge("Denver", "Newark"),
edge("Denver", "San Francisco"),
edge("San Francisco", "Newark"),
edge("San Francisco", "Denver"),
edge("San Francisco", "Houston") ]
To obtain the set of neighbors:
fun le-neighbors(city :: Key, graph :: Graph) -> List<Key>:
neighboring-edges = for filter(e from graph):
city == e.src
end
names = for map(e from neighboring-edges): e.dst end
names
end
And to be sure:
check:
le-neighbors("Houston", le-cities) is
[list: "Newark", "San Francisco"]
end
However, this representation makes it difficult to store complex information about a node without replicating it. Because nodes usually have rich information while the information about edges tends to be weaker, we often prefer node-centric representations. Of course, an alternative is to think of the node names as keys into some other data structure from which we can retrieve rich information about nodes.

##### 19.2.4Abstracting Representations

We would like a general representation that lets us abstract over the specific implementations. We will assume that broadly we have available a notion of Node that has content, a notion of Keys (whether or not intrinsic), and a way to obtain the neighbors—a list of keys—given a key and a graph. This is sufficient for what follows. However, we still need to choose concrete keys to write examples and tests. For simplicity, we’ll use string keys [Links by Name].

#### 19.3Measuring Complexity for Graphs

Before we begin to define algorithms over graphs, we should consider how to measure the size of a graph. A graph has two components: its nodes and its edges. Some algorithms are going to focus on nodes (e.g., visiting each of them), while others will focus on edges, and some will care about both. So which do we use as the basis for counting operations: nodes or edges?

It would help if we can reduce these two measures to one. To see whether that’s possible, suppose a graph has $$k$$ nodes. Then its number of edges has a wide range:
• No two nodes are connected. Then there are no edges at all.

• Every two nodes is connected. Then there are essentially as many edges as the number of pairs of nodes: approximately $$k^2$$.

In general, we call a graph with significantly fewer edges than nodes a sparse graph, while one with significantly more edges than nodes is called dense.

The number of nodes can thus be significantly less or even significantly more than the number of edges. Were this difference a matter of constants, we could have ignored it; but it’s not. For sparse graphs, the number of nodes dominates the number of edges by a factor of $$k$$ (or even infinity, if there truly are zero edges, but such graphs are usually not very interesting or difficult to process); for extremely dense graphs, too, the ratio is one of $$k$$, but in the other direction.

Therefore, when we want to speak of the complexity of algorithms over graphs, we have to consider the sizes of both the number of nodes and edges. In a connected graphA graph is connected if, from every node, we can traverse edges to get to every other node., however, there must be at least as many edges as nodes, which means the number of edges dominates the number of nodes. Since we are usually processing connected graphs, or connected parts of graphs one at a time, we can bound the number of nodes by the number of edges.

#### 19.4Reachability

Many uses of graphs need to address reachability: whether we can, using edges in the graph, get from one node to another. For instance, a social network might suggest as contacts all those who are reachable from existing contacts. On the Internet, traffic engineers care about whether packets can get from one machine to another. On the Web, we care about whether all public pages on a site are reachable from the home page. We will study how to compute reachability using our travel graph as a running example.

##### 19.4.1Simple Recursion

At its simplest, reachability is easy. We want to know whether there exists a pathA path is a sequence of zero or more linked edges. between a pair of nodes, a source and a destination. (A more sophisticated version of reachability might compute the actual path, but we’ll ignore this for now.) There are two possibilities: the source and destintion nodes are the same, or they’re not.
• If they are the same, then clearly reachability is trivially satisfied.

• If they are not, we have to iterate through the neighbors of the source node and ask whether the destination is reachable from each of those neighbors.

This translates into the following function:
<graph-reach-1-main> ::=
 fun reach-1(src :: Key, dst :: Key, g :: Graph) -> Boolean: if src == dst: true else: loop(neighbors(src, g)) end end
where the loop through the neighbors of src is:
<graph-reach-1-loop> ::=
 fun loop(ns): cases (List) ns: | empty => false | link(f, r) => if reach-1(f, dst, g): true else: loop(r) end end end
We can test this as follows:
<graph-reach-tests> ::=
 check: reach = reach-1 reach("nwk", "nwk", kn-cities) is true reach("nwk", "chi", kn-cities) is true reach("nwk", "wor", kn-cities) is false reach("nwk", "hou", kn-cities) is true reach("nwk", "den", kn-cities) is true reach("nwk", "saf", kn-cities) is true end
Unfortunately, we don’t find out about how these tests fare, because some of them don’t complete at all. That’s because we have an infinite loop, due to the cyclic nature of graphs!

Exercise

Which of the above examples leads to a cycle? Why?

##### 19.4.2Cleaning up the Loop

Before we continue, let’s try to improve the expression of the loop. While the nested function above is a perfectly reasonable definition, we can use Pyret’s for to improve its readability.

The essence of the above loop is to iterate over a list of boolean values; if one of them is true, the entire loop evaluates to true; if they are all false, then we haven’t found a path to the destination node, so the loop evaluates to false. Thus:
fun ormap(fun-body, l):
cases (List) l:
| empty => false
if fun-body(f): true else: ormap(fun-body, r) end
end
end
With this, we can replace the loop definition and use with:
for ormap(n from neighbors(src, g)):
reach-1(n, dst, g)
end

##### 19.4.3Traversal with Memory

Because we have cyclic data, we have to remember what nodes we’ve already visited and avoid traversing them again. Then, every time we begin traversing a new node, we add it to the set of nodes we’ve already started to visit so that. If we return to that node, because we can assume the graph has not changed in the meanwhile, we know that additional traversals from that node won’t make any difference to the outcome.This property is known as idempotence.

We therefore define a second attempt at reachability that take an extra argument: the set of nodes we have begun visiting (where the set is represented as a graph). The key difference from <graph-reach-1-main> is, before we begin to traverse edges, we should check whether we’ve begun processing the node or not. This results in the following definition:
<graph-reach-2> ::=
 fun reach-2(src :: Key, dst :: Key, g :: Graph, visited :: List) -> Boolean: if visited.member(src): false else if src == dst: true else: new-visited = link(src, visited) for ormap(n from neighbors(src, g)): reach-2(n, dst, g, new-visited) end end end
In particular, note the extra new conditional: if the reachability check has already visited this node before, there is no point traversing further from here, so it returns false. (There may still be other parts of the graph to explore, which other recursive calls will do.)

Exercise

Does it matter if the first two conditions were swapped, i.e., the beginning of reach-2 began with
if src == dst:
true
else if visited.member(src):
false
? Explain concretely with examples.

Exercise

We repeatedly talk about remembering the nodes that we have begun to visit, not the ones we’ve finished visiting. Does this distinction matter? How?

##### 19.4.4A Better Interface

As the process of testing reach-2 shows, we may have a better implementation, but we’ve changed the function’s interface; now it has a needless extra argument, which is not only a nuisance but might also result in errors if we accidentally misuse it. Therefore, we should clean up our definition by moving the core code to an internal function:
fun reach-3(s :: Key, d :: Key, g :: Graph) -> Boolean:
fun reacher(src :: Key, dst :: Key, visited :: List<Key>) -> Boolean:
if visited.member(src):
false
else if src == dst:
true
else:
for ormap(n from neighbors(src, g)):
reacher(n, dst, new-visited)
end
end
end
reacher(s, d, empty)
end
We have now restored the original interface while correctly implementing reachability.

Exercise

Does this really gives us a correct implementation? In particular, does this address the problem that the size function above addressed? Create a test case that demonstrates the problem, and then fix it.

It is conventional for computer science texts to call these depth- and breadth-first search. However, searching is just a specific purpose; traversal is a general task that can be used for many purposes.

The reachability algorithm we have seen above has a special property. At every node it visits, there is usually a set of adjacent nodes at which it can continue the traversal. It has at least two choices: it can either visit each immediate neighbor first, then visit all of the neighbors’ neighbors; or it can choose a neighbor, recur, and visit the next immediate neighbor only after that visit is done. The former is known as breadth-first traversal, while the latter is depth-first traversal.

The algorithm we have designed uses a depth-first strategy: inside <graph-reach-1-loop>, we recur on the first element of the list of neighbors before we visit the second neighbor, and so on. The alternative would be to have a data structure into which we insert all the neighbors, then pull out an element at a time such that we first visit all the neighbors before their neighbors, and so on. This naturally corresponds to a queue (An Example: Queues from Lists).

Exercise

Using a queue, implement breadth-first traversal.

If we correctly check to ensure we don’t re-visit nodes, then both breadth- and depth-first traversal will properly visit the entire reachable graph without repetition (and hence not get into an infinite loop). Each one traverses from a node only once, from which it considers every single edge. Thus, if a graph has $$N$$ nodes and $$E$$ edges, then a lower-bound on the complexity of traversal is $$O([N, E \rightarrow N + E])$$. We must also consider the cost of checking whether we have already visited a node before (which is a set membership problem, which we address elsewhere: (part "sets")). Finally, we have to consider the cost of maintaining the data structure that keeps track of our traversal. In the case of depth-first traversal, recursion—which uses the machine’s stackdoes it automatically at constant overhead. In the case of breadth-first traversal, the program must manage the queue, which can add more than constant overhead.In practice, too, the stack will usually perform much better than a queue, because it is supported by machine hardware.

This would suggest that depth-first traversal is always better than breadth-first traversal. However, breadth-first traversal has one very important and valuable property. Starting from a node $$N$$, when it visits a node $$P$$, count the number of edges taken to get to $$P$$. Breadth-first traversal guarantees that there cannot have been a shorter path to $$P$$: that is, it finds a shortest path to $$P$$.

Exercise

Why “a” rather than “the” shortest path?

Do Now!

Prove that breadth-first traversal finds a shortest path.

#### 19.6Graphs With Weighted Edges

Consider a transportation graph: we are usually interested not only in whether we can get from one place to another, but also in what it “costs” (where we may have many different cost measures: money, distance, time, units of carbon dioxide, etc.). On the Internet, we might care about the latency or bandwidth over a link. Even in a social network, we might like to describe the degree of closeness of a friend. In short, in many graphs we are interested not only in the direction of an edge but also in some abstract numeric measure, which we call its weight.

In the rest of this study, we will assume that our graph edges have weights. This does not invalidate what we’ve studied so far: if a node is reachable in an unweighted graph, it remains reachable in a weighted one. But the operations we are going to study below only make sense in a weighted graph.We can, however, always treat an unweighted graph as a weighted one by giving every edge the same, constant, positive weight (say one).

Exercise

When treating an unweighted graph as a weighted one, why do we care that every edge be given a positive weight?

Exercise

Revise the graph data definitions to account for edge weights.

Exercise

Weights are not the only kind of data we might record about edges. For instance, if the nodes in a graph represent people, the edges might be labeled with their relationship (“mother”, “friend”, etc.). What other kinds of data can you imagine recording for edges?

#### 19.7Shortest (or Lightest) Paths

Imagine planning a trip: it’s natural that you might want to get to your destination in the least time, or for the least money, or some other criterion that involves minimizing the sum of edge weights. This is known as computing the shortest path.

We should immediately clarify an unfortunate terminological confusion. What we really want to compute is the lightest path—the one of least weight. Unfortunately, computer science terminology has settled on the terminology we use here; just be sure to not take it literally.

Exercise

Construct a graph and select a pair of nodes in it such that the shortest path from one to the other is not the lightest one, and vice versa.

We have already seen (Depth- and Breadth-First Traversals) that breadth-first search constructs shortest paths in unweighted graphs. These correspond to lightest paths when there are no weights (or, equivalently, all weights are identical and positive). Now we have to generalize this to the case where the edges have weights.

We will proceed inductively, gradually defining a function seemingly of this type
w :: Key -> Number
that reflects the weight of the lightest path from the source node to that one. But let’s think about this annotation: since we’re building this up node-by-node, initially most nodes have no weight to report; and even at the end, a node that is unreachable from the source will have no weight for a lightest (or indeed, any) path. Rather than make up a number that pretends to reflect this situation, we will instead use an option type:
w :: Key -> Option<Number>
When there is some value it will be the weight; otherwise the weight will be none.

Now let’s think about this inductively. What do we know initially? Well, certainly that the source node is at a distance of zero from itself (that must be the lightest path, because we can’t get any lighter). This gives us a (trivial) set of nodes for which we already know the lightest weight. Our goal is to grow this set of nodes—modestly, by one, on each iteration—until we either find the destination, or we have no more nodes to add (in which case our desination is not reachable from the source).

Inductively, at each step we have the set of all nodes for which we know the lightest path (initially this is just the source node, but it does mean this set is never empty, which will matter in what we say next). Now consider all the edges adjacent to this set of nodes that lead to nodes for which we don’t already know the lightest path. Choose a node, $$q$$, that minimizes the total weight of the path to it. We claim that this will in fact be the lightest path to that node.

If this claim is true, then we are done. That’s because we would now add $$q$$ to the set of nodes whose lightest weights we now know, and repeat the process of finding lightest outgoing edges from there. This process has thus added one more node. At some point we will find that there are no edges that lead outside the known set, at which point we can terminate.

It stands to reason that terminating at this point is safe: it corresponds to having computed the reachable set. The only thing left is to demonstrate that this greedy algorithm yields a lightest path to each node.

We will prove this by contradiction. Suppose we have the path $$s \rightarrow d$$ from source $$s$$ to node $$d$$, as found by the algorithm above, but assume also that we have a different path that is actually lighter. At every node, when we added a node along the $$s \rightarrow d$$ path, the algorithm would have added a lighter path if it existed. The fact that it did not falsifies our claim that a lighter path exists (there could be a different path of the same weight; this would be permitted by the algorithm, but it also doesn’t contradict our claim). Therefore the algorithm does indeed find the lightest path.

What remains is to determine a data structure that enables this algorithm. At every node, we want to know the least weight from the set of nodes for which we know the least weight to all their neighbors. We could achieve this by sorting, but this is overkill: we don’t actually need a total ordering on all these weights, only the lightest one. A heap [REF] gives us this.

Exercise

What if we allowed edges of weight zero? What would change in the above algorithm?

Exercise

What if we allowed edges of negative weight? What would change in the above algorithm?

For your reference, this algorithm is known as Dijkstra’s Algorithm.

#### 19.8Moravian Spanning Trees

At the turn of the milennium, the US National Academy of Engineering surveyed its members to determine the “Greatest Engineering Achievements of the 20th Century”. The list contained the usual suspects: electronics, computers, the Internet, and so on. But a perhaps surprising idea topped the list: (rural) electrification.Read more about it on their site.

##### 19.8.1The Problem

To understand the history of national electrical grids, it helps to go back to Moravia in the 1920s. Like many parts of the world, it was beginning to realize the benefits of electricity and intended to spread it around the region. A Moravian academia named Otakar Borůvka heard about the problem, and in a remarkable effort, described the problem abstractly, so that it could be understood without reference to Moravia or electrical networks. He modeled it as a problem about graphs.

Borůvka observed that at least initially, any solution to the problem of creating a network must have the following characteristics:
• The electrical network must reach all the towns intended to be covered by it. In graph terms, the solution must be spanning, meaning it must visit every node in the graph.

• Redundancy is a valuable property in any network: that way, if one set of links goes down, there might be another way to get a payload to its destination. When starting out, however, redundancy may be too expensive, especially if it comes at the cost of not giving someone a payload at all. Thus, the initial solution was best set up without loops or even redundant paths. In graph terms, the solution had to be a tree.

• Finally, the goal was to solve this problem for the least cost possible. In graph terms, the graph would be weighted, and the solution had to be a minimum.

Thus Borůvka defined the Moravian Spanning Tree (MST) problem.

##### 19.8.2A Greedy Solution

Borůvka had published his problem, and another Czech mathematician, Vojtěch Jarník, came across it. Jarník came up with a solution that should sound familiar:
• Begin with a solution consisting of a single node, chosen arbitrarily. For the graph consisting of this one node, this solution is clearly a minimum, spanning, and a tree.

• Of all the edges incident on nodes in the solution that connect to a node not already in the solution, pick the edge with the least weight.Note that we consider only the incident edges, not their weight added to the weight of the node to which they are incident.

• Add this edge to the solution. The claim is that for the new solution will be a tree (by construction), spanning (also by construction), and a minimum. The minimality follows by an argument similar to that used for Dijkstra’s Algorithm.

Jarník had the misfortune of publishing this work in Czech in 1930, and it went largely ignored. It was rediscovered by others, most notably by R.C. Prim in 1957, and is now generally known as Prim’s Algorithm, though calling it Jarník’s Algorithm would attribute credit in the right place.

Implementing this algorithm is pretty easy. At each point, we need to know the lightest edge incident on the current solution tree. Finding the lightest edge takes time linear in the number of these edges, but the very lightest one may create a cycle. We therefore need to efficiently check for whether adding an edge would create a cycle, a problem we will return to multiple times (Checking Component Connectedness). Assuming we can do that effectively, we then want to add the lightest edge and iterate. Even given an efficient solution for checking cyclicity, this would seem to require an operation linear in the number of edges for each node. With better representations we can improve on this complexity, but let’s look at other ideas first.

##### 19.8.3Another Greedy Solution

Recall that Jarník presented his algorithm in 1930, when computers didn’t exist, and Prim his in 1957, when they were very much in their infancy. Programming computers to track heaps was a non-trivial problem, and many algorithms were implemented by hand, where keeping track of a complex data structure without making errors was harder still. There was need for a solution that was required less manual bookkeeping (literally speaking).

In 1956, Joseph Kruskal presented such a solution. His idea was elegantly simple. The Jarník algorithm suffers from the problem that each time the tree grows, we have to revise the content of the heap, which is already a messy structure to track. Kruskal noted the following.

To obtain a minimum solution, surely we want to include one of the edges of least weight in the graph. Because if not, we can take an otherwise minimal solution, add this edge, and remove one other edge; the graph would still be just as connected, but the overall weight would be no more and, if the removed edge were heavier, would be less.Note the careful wording: there may be many edges of the same least weight, so adding one of them may remove another, and therefore not produce a lighter tree; but the key point is that it certainly will not produce a heavier one. By the same argument we can add the next lightest edge, and the next lightest, and so on. The only time we cannot add the next lightest edge is when it would create a cycle (that problem again!).

Therefore, Kruskal’s algorithm is utterly straightforward. We first sort all the edges, ordered by ascending weight. We then take each edge in ascending weight order and add it to the solution provided it will not create a cycle. When we have thus processed all the edges, we will have a solution that is a tree (by construction), spanning (because every connected vertex must be the endpoint of some edge), and of minimum weight (by the argument above). The complexity is that of sorting (which is $$[e \rightarrow e \log e]$$ where $$e$$ is the size of the edge set. We then iterate over each element in $$e$$, which takes time linear in the size of that set—modulo the time to check for cycles. This algorithm is also easy to implement on paper, because we sort all the edges once, then keep checking them off in order, crossing out the ones that create cycles—with no dynamic updating of the list needed.

##### 19.8.4A Third Solution

Both the Jarník and Kruskal solutions have one flaw: they require a centralized data structure (the priority heap, or the sorted list) to incrementally build the solution. As parallel computers became available, and graph problems grew large, computer scientists looked for solutions that could be implemented more efficiently in parallel—which typically meant avoiding any centralized points of synchronization, such as these centralized data structures.

In 1965, M. Sollin constructed an algorithm that met these needs beautifully. In this algorithm, instead of constructing a single solution, we grow multiple solution components (potentially in parallel if we so wish). Each node starts out as a solution component (as it was at the first step of Jarník’s Algorithm). Each node considers the edges incident to it, and picks the lightest one that connects to a different component (that problem again!). If such an edge can be found, the edge becomes part of the solution, and the two components combine to become a single component. The entire process repeats.

Because every node begins as part of the solution, this algorithm naturally spans. Because it checks for cycles and avoids them, it naturally forms a tree.Note that avoiding cycles yields a DAG and is not automatically guaranteed to yield a tree. We have been a bit lax about this difference throughout this section. Finally, minimality follows through similar reasoning as we used in the case of Jarník’s Algorithm, which we have essentially run in parallel, once from each node, until the parallel solution components join up to produce a global solution.

Of course, maintaining the data for this algorithm by hand is a nightmare. Therefore, it would be no surprise that this algorithm was coined in the digital age. The real surprise, therefore, is that it was not: it was originally created by Otakar Borůvka himself.

Borůvka, you see, had figured it all out. He’d not only understood the problem, he had:
• pinpointed the real problem lying underneath the electrification problem so it could be viewed in a context-independent way,

• created a descriptive language of graph theory to define it precisely, and

• even solved the problem in addition to defining it.

He’d just come up with a solution so complex to implement by hand that Jarník had in essence de-parallelized it so it could be done sequentially. And thus this algorithm lay unnoticed until it was reinvented (several times, actually) by Sollin in time for parallel computing folks to notice a need for it. But now we can just call this Borůvka’s Algorithm, which is only fitting.

As you might have guessed by now, this problem is indeed called the MST in other textbooks, but “M” stands not for Moravia but for “Minimum”. But given Borůvka’s forgotten place in history, I prefer the more whimsical name.

##### 19.8.5Checking Component Connectedness

As we’ve seen, we need to be able to efficiently tell whether two nodes are in the same component. One way to do this is to conduct a depth-first traversal (or breadth-first traversal) starting from the first node and checking whether we ever visit the second one. (Using one of these traversal strategies ensures that we terminate in the presence of loops.) Unfortunately, this takes a linear amount of time (in the size of the graph) for every pair of nodesand depending on the graph and choice of node, we might do this for every node in the graph on every edge addition! So we’d clearly like to do this better.

It is helpful to reduce this problem from graph connectivity to a more general one: of disjoint-set structure (colloquially known as union-find for reasons that will soon be clear). If we think of each connected component as a set, then we’re asking whether two nodes are in the same set. But casting it as a set membership problem makes it applicable in several other applications as well.

The setup is as follows. For arbitrary values, we want the ability to think of them as elements in a set. We are interested in two operations. One is obviously union, which merges two sets into one. The other would seem to be something like is-in-same-set that takes two elements and determines whether they’re in the same set. Over time, however, it has proven useful to instead define the operator find that, given an element, “names” the set (more on this in a moment) that the element belongs to. To check whether two elements are in the same set, we then have to get the “set name” for each element, and check whether these names are the same. This certainly sounds more roundabout, but this means we have a primitive that may be useful in other contexts, and from which we can easily implement is-in-same-set.

Now the question is, how do we name sets? The real question we should ask is, what operations do we care to perform on these names? All we care about is, given two names, they represent the same set precisely when the names are the same. Therefore, we could construct a new string, or number, or something else, but we have another option: simply pick some element of the set to represent it, i.e., to serve as its name. Thus we will associate each set element with an indicator of the “set name” for that element; if there isn’t one, then its name is itself (the none case of parent):
data Element<T>:
| elt(val :: T, parent :: Option<Element>)
end
We will assume we have some equality predicate for checking when two elements are the same, which we do by comparing their value parts, ignoring their parent values:
fun is-same-element(e1, e2): e1.val <=> e2.val end

Do Now!

Why do we check only the value parts?

We will assume that for a given set, we always return the same representative element. (Otherwise, equality will fail even though we have the same set.) Thus:We’ve used the name fynd because find is already defined to mean something else in Pyret.
fun is-in-same-set(e1 :: Element, e2 :: Element, s :: Sets)
-> Boolean:
s1 = fynd(e1, s)
s2 = fynd(e2, s)
identical(s1, s2)
end
where Sets is the list of all elements:
type Sets = List<Element>

How do we find the representative element for a set? We first find it using is-same-element; when we do, we check the element’s parent field. If it is none, that means this very element names its set; this can happen either because the element is a singleton set (we’ll initialize all elements with none), or it’s the name for some larger set. Either way, we’re done. Otherwise, we have to recursively find the parent:
fun fynd(e :: Element, s :: Sets) -> Element:
cases (List) s:
| empty => raise("fynd: shouldn't have gotten here")
if is-same-element(f, e):
cases (Option) f.parent:
| none => f
| some(p) => fynd(p, s)
end
else:
fynd(e, r)
end
end
end

Exercise

Why is this recursive in the nested cases?

What’s left is to implement union. For this, we find the representative elements of the two sets we’re trying to union; if they are the same, then the two sets are already in a union; otherwise, we have to update the data structure:
fun union(e1 :: Element, e2 :: Element, s :: Sets) -> Sets:
s1 = fynd(e1, s)
s2 = fynd(e2, s)
if identical(s1, s2):
s
else:
update-set-with(s, s1, s2)
end
end
To update, we arbitrarily choose one of the set names to be the name of the new compound set. We then have to update the parent of the other set’s name element to be this one:
fun update-set-with(s :: Sets, child :: Element, parent :: Element)
-> Sets:
cases (List) s:
| empty => raise("update: shouldn't have gotten here")
if is-same-element(f, child):
else:
end
end
end
Here are some tests to illustrate this working:
check:
s0 = map(elt(_, none), [list: 0, 1, 2, 3, 4, 5, 6, 7])
s1 = union(index(s0, 0), index(s0, 2), s0)
s2 = union(index(s1, 0), index(s1, 3), s1)
s3 = union(index(s2, 3), index(s2, 5), s2)
print(s3)
is-same-element(fynd(index(s0, 0), s3), fynd(index(s0, 5), s3)) is true
is-same-element(fynd(index(s0, 2), s3), fynd(index(s0, 5), s3)) is true
is-same-element(fynd(index(s0, 3), s3), fynd(index(s0, 5), s3)) is true
is-same-element(fynd(index(s0, 5), s3), fynd(index(s0, 5), s3)) is true
is-same-element(fynd(index(s0, 7), s3), fynd(index(s0, 7), s3)) is true
end

Unfortunately, this implementation suffers from two major problems:
• First, because we are performing functional updates, the value of the parent reference keeps “changing”, but these changes are not visible to older copies of the “same” value. An element from different stages of unioning has different parent references, even though it is arguably the same element throughout. This is a place where functional programming hurts.

• Relatedly, the performance of this implementation is quite bad. fynd recursively traverses parents to find the set’s name, but the elements traversed are not updated to record this new name. We certainly could update them by reconstructing the set afresh each time, but that complicates the implementation and, as we will soon see, we can do much better.

The bottom line is that pure functional programming is not a great fit with this problem. We need a better implementation strategy: Disjoint Sets Redux.