On this page:
9.1 The Dawn of a New Equality
9.2 The Cost of Evaluating References
9.3 On the Internet, Nobody Knows You’re a DAG
9.4 Constructing Data Values With Sharing
9.5 Cyclic Data
9.6 Cyclic Lists Versus Streams

9 Sharing and Equality

    9.1 The Dawn of a New Equality

    9.2 The Cost of Evaluating References

    9.3 On the Internet, Nobody Knows You’re a DAG

    9.4 Constructing Data Values With Sharing

    9.5 Cyclic Data

    9.6 Cyclic Lists Versus Streams

9.1 The Dawn of a New Equality

Consider the following data definition and example values:

data BinTree:

  | leaf

  | node(v, l :: BinTree, r :: BinTree)

end

 

a-tree =

  node(5,

    node(4, leaf, leaf),

    node(4, leaf, leaf))

 

b-tree =

  block:

    four-node = node(4, leaf, leaf)

    node(5,

      four-node,

      four-node)

  end

In particular, it might seem that the way we’ve written b-tree is morally equivalent to how we’ve written a-tree, but we’ve created a helpful binding to avoid code duplication.

Because both a-tree and b-tree are bound to trees with 5 at the root and a left and right child each containing 4, we can indeed reasonably consider these trees equivalent. Sure enough:
<equal-tests> ::=

    check:

      a-tree is b-tree

      a-tree.l is a-tree.l

      a-tree.l is a-tree.r

      b-tree.l is b-tree.r

    end

However, there is another sense in which these trees are not equivalent. concretely, a-tree constructs a distinct node for each child, while b-tree uses the same node for both children. Surely this difference should show up somehow, but we have not yet seen a way to write a program that will tell these apart.

The way we can tell them apart is using Pyret’s identical function. This checks not only whether two values are structurally equivalent but whether they are the same value—i.e., the constructed at the same instant in time. With this, we can now write additional tests:

check:

  identical(a-tree, b-tree) is false

  identical(a-tree.l, a-tree.l) is true

  identical(a-tree.l, a-tree.r) is false

  identical(b-tree.l, b-tree.r) is true

end

Observe how these are the same values that were compared earlier (<equal-tests>), but the results are now different: some values that were true earlier are now false. In particular,

check:

  a-tree is b-tree

  identical(a-tree, b-tree) is false

  a-tree.l is a-tree.r

  identical(a-tree.l, a-tree.r) is false

end

We return to what identical really means later (What it Means to be Identical).

9.2 The Cost of Evaluating References

From a complexity viewpoint, it’s important for us to understand how these references work. As we have hinted, four-node is computed only once, and each use of it refers to the same value: if, instead, it was evaluated each time we referred to four-node, there would be no real difference between a-tree and b-tree, and the above tests would not distinguish between them.

Here’s another contrived example. Suppose we define

L = range(0, 100)

L1 = link(1, L)

L2 = link(-1, L)

Constructing a list clearly takes time at least proportional to the length; therefore, we expect the time to compute L to be considerably more than that for a single link operation. Therefore, the question is how long it takes to compute L1 and L2 after L has been computed: constant time, or time proportional to the length of L?

The answer, for Pyret, and for other call-by-value languages [REF] (including Java, C#, OCaml, Racket, etc.), is that these additional computations take constant time. That is, the value bound to L is computed once and bound to L; subsequent expressions refer to this value rather than reconstructing it. Thus:

check:

  identical(L1.rest, L) is true

  identical(L2.rest, L) is true

  identical(L1.rest, L2.rest) is true

end

9.3 On the Internet, Nobody Knows You’re a DAG

Despite the name we’ve given it, b-tree is not actually a tree. In a tree, by definition, there are no shared nodes, whereas in b-tree the node named by four-node is shared by two parts of the tree. Despite this, traversing b-tree will still terminate, because there are no cyclic references in it: if you start from any node and visit its “children”, you cannot end up back at that node. There is a special name for a value with such a shape: directed acyclic graph (DAG).

Many important data structures are actually a DAG underneath. For instance, consider Web sites. It is common to think of a site as a tree of pages: the top-level refers to several sections, each of which refers to sub-sections, and so on. However, sometimes an entry needs to be cataloged under multiple sections. For instance, an academic department might organize pages by people, teaching, and research. In the first of these pages it lists the people who work there; in the second, the list of courses; and in the third, the list of research groups. In turn, the courses might have references to the people teaching them, and the research groups are populated by these same people. Since we want only one page per person (for both maintenance and search indexing purposes), all these personnel links refer back to the same page for people.

Let’s construct a simple form of this. First a datatype to represent a site’s content:

data Content:

  | page(s :: String)

  | section(title :: String, sub :: List<Content>)

end

Let’s now define a few people:

people-pages :: Content =

  section("People",

    [ page("Church"),

      page("Dijkstra"),

      page("Haberman") ])

and a way to extract a particular person’s page:

fun get-person(n): index(people-pages.sub, n);

Now we can define theory and systems sections:

theory-pages :: Content =

  section("Theory", [get-person(0), get-person(1)])

systems-pages :: Content =

  section("Systems", [get-person(1), get-person(2)])

which are integrated into a site as a whole:

site :: Content =

  section("Computing Sciences",

    [theory-pages, systems-pages])

Now we can confirm that each of these luminaries needs to keep only one Web page current; for instance:

check:

  theory = index(site.sub, 0)

  systems = index(site.sub, 1)

  theory-dijkstra = index(theory.sub, 1)

  systems-dijkstra = index(systems.sub, 0)

  identical(theory-dijkstra, systems-dijkstra) is true

end

9.4 Constructing Data Values With Sharing

One of the annoying things about defining values with sharing, as we have above, is that we must manually sequence their definition: for instance, if we move the definition of site before that of theory-pages, we get an error, even though as readers of this program we might have preferred to read the site’s definition more hierarchically. As data grow larger, this becomes a nuisance that it would be better to leave to the language to manage.

Pyret therefore provides a special mechanism for conveniently defining instances of shared data: graph. For instance, we can write the above example as follows (we’ll add the prefix G to every definition to avoid name-clashes and to distinguish between the data created by the two different mechanisms):
<site-def-graph> ::=

    graph:

    Gsite :: Content =

      section("Computing Sciences",

        [Gtheory-pages, Gsystems-pages])

    Gtheory-pages :: Content =

      section("Theory", [get-person(0), get-person(1)])

    Gsystems-pages :: Content =

      section("Systems", [get-person(1), get-person(2)])

    Gpeople-pages :: Content =

      section("People",

        [ page("Church"),

          page("Dijkstra"),

          page("Haberman") ])

    end

Observe that we have almost completely reversed the order of definition. Despite that, not only does this program not signal any errors, our previous test adapted to these definitions also works:

check:

  theory = index(Gsite.sub, 0)

  systems = index(Gsite.sub, 1)

  theory-dijkstra = index(theory.sub, 1)

  systems-dijkstra = index(systems.sub, 0)

  identical(theory-dijkstra, systems-dijkstra) is true

end

Notice that all the names defined “inside” graph are available beyond it. Therefore, graph does not define a separate scoping block; rather, the names defined in it are available at the same level of scope as where graph was written.This is why graph does not indent its contents.

9.5 Cyclic Data

Do Now!

Earlier we talked about having to move data definitions one before another being a nuisance, and offered graph as a convenient solution. However, we glossed over this difficulty a little too quickly: is it only an inconvenience?

Actually, the problem can be much worse: if two definitions refer to one another, there is no suitable order for them!

Suppose, for instance, we are constructing tabular output for a Web page. We would like the rows of the table to alternate between white and grey. If the table had only two rows, we could map the row-generating function over a list of these two colors. Since we do not know how many rows it will have, however, we would like the list to be as long as necessary. Here is how we can define this list using graph:

graph:

W = link("white", G)

G = link("grey", W)

end

web-colors :: List<String> = W

where web-colors is bound to a cyclic list. First, it behaves as we would expect if it were simply defined as a regular list:

check:

  web-colors.first is "white"

  web-colors.rest.first is "grey"

end

But in addition, we can keep traversing further and further down:

check:

  web-colors.rest.rest.first is "white"

  web-colors.rest.rest.rest.first is "grey"

end

Despite this, the list does not consume an arbitrary amount of memory in the machine:

check:

  identical(web-colors, web-colors.rest.rest) is true

  identical(web-colors.rest, web-colors.rest.rest.rest) is true

end

Furthermore, in languages like Pyret and Racket that are designed to cope with cyclic data, the following is also true:

check:

  (web-colors == web-colors.rest) is false

  (web-colors == web-colors.rest.rest) is true

end

Note that asking whether the structure of two cyclic data is identical did not cause the language to go into an infinite loop.

Exercise

What happens if we invoke web-colors.length()?

9.6 Cyclic Lists Versus Streams

Hopefully, the definition of web-colors above reminded you of similar data structures you have seen earlier, such as ones (Streams From Functions). Indeed, using streams, here is another definition of the Web color list:

web-color-stream :: Stream<String> =

  lz-link("white", fun():

      lz-link("grey", fun():

          web-color-stream;);)

Sure enough:

check:

  take(4, web-color-stream) is ["white", "grey", "white", "grey"]

end

Therefore, we should ask what the difference is between cyclic lists and streams; in particular, if one can subsume the other, perhaps we don’t need one of the concepts.

Do Now!

Use graph to define ones.

Do Now!

Use graph to define nats and fibs.

Cyclic lists have on their side simplicity: for instance, they do not require the use of thunks. This simplicity comes at a price: they can only represent strictly repeating data, i.e., you cannot define nats or fibs as cyclic lists. In contrast, the function abstraction in a stream makes the function generative: each invocation can create a truly novel datum. Therefore, it is straightforward to implement cyclic lists as streams (we have already seen two examples).

That would seem to suggest that we could do away with cyclic lists. Indeed, if all graph could create was lists, this argument would hold. However, as we have already seen (<site-def-graph>), graph operates over all forms of structured data. As we will soon see (Graphs), we will create cyclic values using other structured data as well. Though we could define these data using thunks too, doing so would impose an enormous complexity on programs. Nevertheless, it is worth remembering that first-class functions have tremendous power (Shrinking the Language [EMPTY]).