On this page:
4.1 A Canonical Mutable Structure
4.2 What it Means to be Identical
4.3 From Identifiers to Variables
4.4 Interaction of Mutation with Closures:   Counters
4.4.1 Implementation Using Boxes
4.4.2 Implementation Using Variables
4.5 Stateful but Equal
4.5.1 A Hierarchy of Equality
4.5.2 Space and Time Complexity
4.5.2.1 Comparing Functions

4 State and Change

    4.1 A Canonical Mutable Structure

    4.2 What it Means to be Identical

    4.3 From Identifiers to Variables

    4.4 Interaction of Mutation with Closures: Counters

      4.4.1 Implementation Using Boxes

      4.4.2 Implementation Using Variables

    4.5 Stateful but Equal

      4.5.1 A Hierarchy of Equality

      4.5.2 Space and Time Complexity

        4.5.2.1 Comparing Functions

4.1 A Canonical Mutable Structure

As we have motivated ((part "union-find-functional")), sometimes it’s nice to be able to change the value of a datum rather than merely construct a new one with an updated value. The main advantage to changing it is that every value that refers to it can now see this change. The main disadvantage to changing it is that every value that refers to it can now see this change. Using this power responsibly is therefore an important programming challenge.

To put this idea in the simplest light, let us consider the simplest kind of mutable datum: one that has only one field. We call this a box, and treat it as a fresh container type. Boxes will support just three operations:
  1. box consumes a value and creates a mutable box containing that value.

  2. unbox consumes a box and returns the value contained in the box.

  3. set-box consumes a box, a new value, and changes the box to contain the value. All subsequent unboxes of that box will now return the new value.

In a typed language, we would require that the new value put in the box by set-box be type-consistent with what was there before. (Even in a language that isn’t statically typed, we would presumably expect the same to keep the programming process sane.) You can thus think of a box as equivalent to a Java container class with parameterized type, which has a single member field with a getter and setter: box is the constructor, unbox is the getter, and set-box is the setter. (Because there is only one field, its name is irrelevant.)

class Box<T> {

    private T the_value;

    Box(T v) {

        this.the_value = v;

    }

    T get() {

        return this.the_value;

    }

    void set(T v) {

        this.the_value = v;

    }

}

Correspondingly, here is a definition of a box in Pyret:

data Box: | box(ref v) where: n1 = box(1) n2 = box(2) n1!{v : 3} n2!{v : 4} n1!v is 3 n2!v is 4 end

Notice that in Pyret, because values are immutable by default, we have to explicitly declare the v field to be mutable using ref; mutable fields must be accessed using ! rather than ., the dot operator.The reason for using a different syntax is to warn the programmer that the value obtained from this field may change over time, so they should not make assumptions about its longevity. Because Pyret is single-threaded, however, they can assume the value will stay unchanged until either the next mutation in the same procedure, or until the next call to another procedure or return from this procedure—whichever comes soonest.

Do Now!

Why do we say “type-consistent” above, rather than “the same type”?

The values could be related by subtyping ((part "subtyping")).

Exercise

What does the comment about longevity mean? How does this apply to reusing values extracted from fields?

4.2 What it Means to be Identical

Now that we have seen mutation, we’re in a position to understand the meaning of Pyret’s identical operation (The Dawn of a New Equality). Consider a check of these two boxes and three references to them:
<three-boxes> ::=

    b0 = box("a value")

    b1 = box("a value")

    b2 = b1

Observe that b1 and b2 are referring to the same box, while b0 is referring to a different one. We can see this because the following tests pass:

check: b0!v == b1!v is true identical(b0, b1) is false identical(b1, b2) is true b1!v is b2!v end

In other words, b1 and b2 are aliases for each other: they are two different names for one and the same value. In contrast, neither name is an alias for the value referred to by b0.Since identical is transitive, it follows from the above two checks that b0 is not identical to b2.

Until now, this distinction had limited relevance. However, mutation introduces a wrinkle: if we change one value, all aliases to that value detect the change. We will perform this mutation in a slightly elaborate way, which we’ll explain in a moment:
<modify-b1> ::=

    hold-b1-value = b1!v

    b1!{v: "a different value"}

Observe that if you just copy these definitions and tests one after the other in your editor, tests that should succeed will fail. This is because Pyret moves all test blocks to the end of the program, so the checks are not running in the source location where they are written. Until now, this made not a whit of difference. Now that we have mutation, it makes a world of difference. Therefore, you will need to erase old tests as you add new code that modifies state. Because that is the point of state: statements that were previously true no longer are. With this, we now have:

check: b0!v == b1!v is false identical(b0, b1) is false identical(b1, b2) is true b1!v is b2!v end

after which we can restore the value:

b1!{v: hold-b1-value}

Continuing in this vein, symmetrically, we can also modify the value bound to b2 and make sure the change is visible via b1:

hold-b2-value = b2!v b2!{v: "yet another value"} check: b0!v == b1!v is false identical(b0, b1) is false identical(b1, b2) is true b1!v is b2!v b2!{v: hold-b2-value} end

In contrast, if we modify the content of the box referred to by b0, the only fact that held previously that no longer holds is this:

b0!{v: "yet another value"} check: b0!v == b1!v is false end

Thus, whereas previously the two boxes were structurally the same but not identical, now the are not even the former (while still not being the latter).

Now, why did we bother holding on to and restoring the value? It’s because at the end of each of these sequences, the values have all been restored, but by making the change, we have been able to observe which other names detect the change and which ones do not—in other words, which names are aliases and which are not—in still other words, we have implemented the essence of identical.

In practice, identical does not behave this way: it would be too disruptive—e.g., if this fake value was saved to persistent storage and then the system crashed—and it would anyway be observable if the system had multiple threads. It is also not the most efficient implementation possible. Nevertheless, it does demonstrate the basic idea behind identical: two values are identical precisely when, when you make changes to one, you see the changes manifest on the “other” (i.e., there is really only one value).

4.3 From Identifiers to Variables

As we have just seen, mutable values can be aliased, which means references can inadvertently have their values changed. Because these values can be passed around, it can be difficult to track all the aliases that might exist (because it would be infeasible for a value to retain “backward references”).

Therefore, in Pyret as in most languages, there is another form of mutable, called a variable. A variable is an identifier whose binding can be changed (as opposed to re-bound in a new scope); in Pyret, the syntax for changing the value of a variable is :=. Furthermore, variables must be declared explicitly by preceding their declaration with var:

var x = 0 x := 1

The var keyword forces you to understand that the = is not a true equality: it’s equal for now, but may not be in the future.

We’ve insisted on using the word “identifier” before because we wanted to reserve “variable” for what we’re about to study. In Java, when we say (assuming x is locally bound, e.g., as a method parameter)

x = 1;

x = 3;

we’re asking to change the value of x. After the first assignment, the value of x is 1; after the second one, it’s 3. Thus, the value of x varies over the course of the execution of the method.

Now, we also use the term “variable” in mathematics to refer to function parameters. For instance, in f(y) = y+3 we say that y is a “variable”. That is called a variable because it varies across invocations; however, within each invocation, it has the same value in its scope. Our identifiers until now have corresponded to this mathematical notion of a variable.If the identifier was bound to a box, then it remained bound to the same box value. It’s the content of the box that changed, not which box the identifier was bound to. In contrast, programming variables can vary even within each invocation, like the Java x above.

Henceforth, we will use variable when we mean an identifier whose value can change within its scope, and identifier when this cannot happen. If in doubt, we might play it safe and use “variable”; if the difference doesn’t really matter, we might use either one. It is less important to get caught up in these specific terms than to understand that they represent a distinction that matters (Mutation: Structures and Variables).

4.4 Interaction of Mutation with Closures: Counters

Suppose we want to create a function that counts how many times it has been invoked. Further, let us assume we can create new counters on need. Thus, mk-counter creates a fresh counter each time, each of which maintains its own count history. A sample use might look like this:

check: l1 = mk-counter() l1() is 1 l1() is 2 l2 = mk-counter() l2() is 1 l1() is 3 l2() is 2 end

Notice that each invocation of mk-counter makes a fresh counter, so the counters created by two separate invocations do not interfere with one another.

We now see how we can implement this using both mutable structures (specifically, boxes) and variables.

4.4.1 Implementation Using Boxes

Here is the implementation:

fun mk-counter(): ctr = box(0) lam(): ctr!{v : (ctr!v + 1)} ctr!v end end

Why does this work? It’s because each invocation of mk-counter creates a box only once, which it binds to ctr. The closure closes over this one box. All subsequent mutations affect the same box. In contrast, swapping two lines makes a big difference:

fun mk-broken-counter(): lam(): ctr = box(0) ctr!{v : (ctr!v + 1)} ctr!v end where: l1 = mk-broken-counter() l1() is 1 l1() is 1 l2 = mk-broken-counter() l2() is 1 l1() is 1 l2() is 1 end

In this case, a new box is allocated on every invocation, not of mk-broken-counter but of the function that it returns, so the answer each time is the same (despite the mutation inside the procedure). Our implementation of boxes should be certain to preserve this distinction.

The examples above hint at an implementation necessity. Clearly, whatever the environment closes over in the procedure returned by mk-counter must refer to the same box each time. Yet something also needs to make sure that the value in that box is different each time! Look at it more carefully: it must be lexically the same, but dynamically different. This distinction will be at the heart of a strategy for implementing state (Mutation: Structures and Variables).

4.4.2 Implementation Using Variables

The implementation using variables is virtually identical:

fun mk-counter(): var ctr = 0 lam(): ctr := ctr + 1 ctr end where: l1 = mk-counter() l1() is 1 l1() is 2 l2 = mk-counter() l2() is 1 l1() is 3 l2() is 2 end

And sure enough, if we swap the same critical two lines, we get the wrong behavior:

fun mk-broken-counter(): lam(): var ctr = 0 ctr := ctr + 1 ctr end where: l1 = mk-broken-counter() l1() is 1 l1() is 1 l2 = mk-broken-counter() l2() is 1 l1() is 1 l2() is 1 end

4.5 Stateful but Equal

Until now we have seen two notions of equality:
  • The binary operator ==, which is also used as the equality comparison by in when testing.

  • identical, also written as <=>.

However, we haven’t discussed exactly what == means, and it turns out there are two things it could mean, leaving us with three different notions of equality. To see this, refresh your memory with our three boxes.

We have already covered the meaning of identical. However, there is an intuitive level where it is unsatisfying: when our notion of equality is, “When printed (as output) or written (as input), would these two values look the same?”, where box("a value") and box("a value") are the “same”. That is, it would be nice to have an equality operator—call it E1such that

check: E1(b0, b1) is true E1(b1, b2) is true end

because all three look the same when written out as values.

However, as we just saw (What it Means to be Identical), these equivalences can be ephemeral. When we modify b1 (see above), clearly these no longer print identically:

> b0

box("a value")

> b1

box("a different value")

> b2

box("a different value")

so we would expect

check: E1(b0, b1) is false E1(b1, b2) is true end

There is in fact such an operator in Pyret: it is called equal-now, and written as a binary operator as =~ (the ~ is meant to be suggestive of hand-waving, because the value is equal now, but you shouldn’t assume it will be in the future). As the name and visual suggest, this is a fragile operator: you should not write programs that assume anything about the long-term equality of values that are equal at this moment, in case the values are mutable. However, it can still be useful to know whether, at this instant, two values will, in effect, “print the same”.

Exercise

Confirm that equal-now does indeed have the properties ascribed to E1 above.

However, this still does not enable us to distinguish between == and identical:

check: (b0 == b1) is false (b1 == b2) is true identical(b0, b1) is false identical(b1, b2) is true end

For that, it helps to have a slightly richer structure:

b0 = box("a value") b1 = box("a value") b2 = b1 l0 = [list: b0] l1 = [list: b1] l2 = [list: b2]

Note that we are allocating three different lists, though two of them share the same mutable box. Now we find something interesting. Unsurprisingly, the following is true:

check: identical(l0, l1) is false identical(l1, l2) is false end

while

check: equal-now(l0, l1) is true equal-now(l1, l2) is true end

However:

check: (l0 == l1) is false (l1 == l2) is true end

What might == represent that is interestingly different from both identical and equal-now? When it returns true, it is that the two values will “print the same” now and forever. How is this possible? It is because == recursively checks that the two arguments are structural until it gets to a mutable field; at that point, it checks that they are identical. If they are identical, then any change made to one will be reflected in the other (because they are in fact the same mutable field). That means their content, too, will always “print the same”. Therefore, we can now reveal the name given to ==: it is equal-always.

4.5.1 A Hierarchy of Equality

Observe that if two values v1 and v2 are equal-now, they are not necessarily equal-always; if they are equal-always, they are not necessarily identical. We have seen examples of both these cases above.

In contrast, if two values are identical, then they are certainly going to be equal-always. That is because their mutable fields reduce to identical, while the immutable parts—which will be traversed structurally—are guaranteed to yield equality, being in fact the same value. In turn, if they satisfy equal-always they are guaranteed to be equal-now, because the only difference is that equal-now structurally traverses the content of mutable fields, but if these are identical (as they must be, to be equal-always), they are certain to be structurally equal.

In most languages, it is common to have two equality operators, corresponding to identical (known as reference equality) and equal-now (known as structural equality). Pyret is rare in having a third operator, equal-always. For most programs, this is in fact the most useful equality operator: it is not overly bothered with details of aliasing, which can be difficult to predict; at the same time it makes decisions that stand the test of time, thereby forming a useful basis for various optimizations (which may not even be conscious of their temporal assumptions). This is why is in testing uses equal-always by default, and forces users to explicitly pick a different primitive if they want it.

4.5.2 Space and Time Complexity

identical always takes constant time. Indeed, some programs use identical precisely because they want constant-time equality, carefully structuring their program so that values that should be considered equal are aliases to the same value. Of course, maintaining this programming discipline is tricky.

equal-always and equal-now both must traverse at least the immutable part of data. Therefore, they take time proportional to the smaller datum (because if the two data are of different size, they must not be equal anyway, so there is no need to visit the extra data). The difference is that equal-always reduces to identical at references, thereby performing less computation than equal-now would.

For some programs, the cost of checking equality may be considerable. There are two common strategies such a program can employ:
  1. Use a quick check first, use the slower check only if the quick one fails. This corresponds to the following strategy:

    fun my-eq(v1, v2) -> Boolean: identical(v1, v2) or equal-always(v1, v2) end

    which has the following behavior: { check: my-eq(b0, b1) is false my-eq(b1, b2) is true my-eq(l0, l1) is false my-eq(l1, l2) is true end } It is worth noting that the above behavior is exactly the same as that of equal-always, which is the point: it is just often faster, when it can discharge the equality using identical without having to traverse the data. (Observe that this is a safe optimization because identical implies equal-always.)

  2. Use a different equality strategy entirely, if possible: Set Membership Again.

4.5.2.1 Comparing Functions

identical

To understand this, suppose we mutate the value in the box bound to b1 (or, equivalently, b2):

b1!{v: "a different value"}

We have discussed (The Dawn of a New Equality) the difference between these two operators. However, we have not exhausted the kinds of equality we need, and that Pyret provides.