9 Sharing and Equality
9.1 The Dawn of a New Equality
data BinTree:
| leaf
| node(v, l :: BinTree, r :: BinTree)
end
a-tree =
node(5,
node(4, leaf, leaf),
node(4, leaf, leaf))
b-tree =
block:
four-node = node(4, leaf, leaf)
node(5,
four-node,
four-node)
end
check: |
a-tree is b-tree |
a-tree.l is a-tree.l |
a-tree.l is a-tree.r |
b-tree.l is b-tree.r |
end |
However, there is another sense in which these trees are not equivalent. concretely, a-tree constructs a distinct node for each child, while b-tree uses the same node for both children. Surely this difference should show up somehow, but we have not yet seen a way to write a program that will tell these apart.
check:
identical(a-tree, b-tree) is false
identical(a-tree.l, a-tree.l) is true
identical(a-tree.l, a-tree.r) is false
identical(b-tree.l, b-tree.r) is true
end
check:
a-tree is b-tree
identical(a-tree, b-tree) is false
a-tree.l is a-tree.r
identical(a-tree.l, a-tree.r) is false
end
9.2 The Cost of Evaluating References
From a complexity viewpoint, it’s important for us to understand how these references work. As we have hinted, four-node is computed only once, and each use of it refers to the same value: if, instead, it was evaluated each time we referred to four-node, there would be no real difference between a-tree and b-tree, and the above tests would not distinguish between them.
L = range(0, 100)
L1 = link(1, L)
L2 = link(-1, L)
check:
identical(L1.rest, L) is true
identical(L2.rest, L) is true
identical(L1.rest, L2.rest) is true
end
9.3 On the Internet, Nobody Knows You’re a DAG
Despite the name we’ve given it, b-tree is not actually a tree. In a tree, by definition, there are no shared nodes, whereas in b-tree the node named by four-node is shared by two parts of the tree. Despite this, traversing b-tree will still terminate, because there are no cyclic references in it: if you start from any node and visit its “children”, you cannot end up back at that node. There is a special name for a value with such a shape: directed acyclic graph (DAG).
Many important data structures are actually a DAG underneath. For instance, consider Web sites. It is common to think of a site as a tree of pages: the top-level refers to several sections, each of which refers to sub-sections, and so on. However, sometimes an entry needs to be cataloged under multiple sections. For instance, an academic department might organize pages by people, teaching, and research. In the first of these pages it lists the people who work there; in the second, the list of courses; and in the third, the list of research groups. In turn, the courses might have references to the people teaching them, and the research groups are populated by these same people. Since we want only one page per person (for both maintenance and search indexing purposes), all these personnel links refer back to the same page for people.
data Content:
| page(s :: String)
| section(title :: String, sub :: List<Content>)
end
people-pages :: Content =
section("People",
[ page("Church"),
page("Dijkstra"),
page("Haberman") ])
fun get-person(n): index(people-pages.sub, n);
theory-pages :: Content =
section("Theory", [get-person(0), get-person(1)])
systems-pages :: Content =
section("Systems", [get-person(1), get-person(2)])
site :: Content =
section("Computing Sciences",
[theory-pages, systems-pages])
check:
theory = index(site.sub, 0)
systems = index(site.sub, 1)
theory-dijkstra = index(theory.sub, 1)
systems-dijkstra = index(systems.sub, 0)
identical(theory-dijkstra, systems-dijkstra) is true
end
9.4 Constructing Data Values With Sharing
One of the annoying things about defining values with sharing, as we have above, is that we must manually sequence their definition: for instance, if we move the definition of site before that of theory-pages, we get an error, even though as readers of this program we might have preferred to read the site’s definition more hierarchically. As data grow larger, this becomes a nuisance that it would be better to leave to the language to manage.
graph: |
Gsite :: Content = |
section("Computing Sciences", |
[Gtheory-pages, Gsystems-pages]) |
Gtheory-pages :: Content = |
section("Theory", [get-person(0), get-person(1)]) |
Gsystems-pages :: Content = |
section("Systems", [get-person(1), get-person(2)]) |
Gpeople-pages :: Content = |
section("People", |
[ page("Church"), |
page("Dijkstra"), |
page("Haberman") ]) |
end |
check:
theory = index(Gsite.sub, 0)
systems = index(Gsite.sub, 1)
theory-dijkstra = index(theory.sub, 1)
systems-dijkstra = index(systems.sub, 0)
identical(theory-dijkstra, systems-dijkstra) is true
end
9.5 Cyclic Data
Earlier we talked about having to move data definitions one before another being a nuisance, and offered graph as a convenient solution. However, we glossed over this difficulty a little too quickly: is it only an inconvenience?
Actually, the problem can be much worse: if two definitions refer to one another, there is no suitable order for them!
graph:
W = link("white", G)
G = link("grey", W)
end
web-colors :: List<String> = W
check:
web-colors.first is "white"
web-colors.rest.first is "grey"
end
check:
web-colors.rest.rest.first is "white"
web-colors.rest.rest.rest.first is "grey"
end
check:
identical(web-colors, web-colors.rest.rest) is true
identical(web-colors.rest, web-colors.rest.rest.rest) is true
end
check:
(web-colors == web-colors.rest) is false
(web-colors == web-colors.rest.rest) is true
end
What happens if we invoke web-colors.length()?
9.6 Cyclic Lists Versus Streams
web-color-stream :: Stream<String> =
lz-link("white", fun():
lz-link("grey", fun():
web-color-stream;);)
check:
take(4, web-color-stream) is ["white", "grey", "white", "grey"]
end
Use graph to define ones.
Use graph to define nats and fibs.
Cyclic lists have on their side simplicity: for instance, they do not require the use of thunks. This simplicity comes at a price: they can only represent strictly repeating data, i.e., you cannot define nats or fibs as cyclic lists. In contrast, the function abstraction in a stream makes the function generative: each invocation can create a truly novel datum. Therefore, it is straightforward to implement cyclic lists as streams (we have already seen two examples).
That would seem to suggest that we could do away with cyclic lists. Indeed, if all graph could create was lists, this argument would hold. However, as we have already seen (<site-def-graph>), graph operates over all forms of structured data. As we will soon see (Graphs), we will create cyclic values using other structured data as well. Though we could define these data using thunks too, doing so would impose an enormous complexity on programs. Nevertheless, it is worth remembering that first-class functions have tremendous power (Shrinking the Language [EMPTY]).