### 11` `State and Change

#### 11.1` `A Canonical Mutable Structure

As we have motivated (Checking Component Connectedness), sometimes it’s nice to be able to change the value of a datum rather than merely construct a new one with an updated value. The main advantage to changing it is that every value that refers to it can now see this change. The main disadvantage to changing it is that every value that refers to it can now see this change. Using this power responsibly is therefore an important programming challenge.

box consumes a value and creates a mutable box containing that value.

unbox consumes a box and returns the value contained in the box.

set-box consumes a box, a new value, and changes the box to contain the value. All subsequent unboxes of that box will now return the new value.

class Box<T> { |

private T the_value; |

Box(T v) { |

this.the_value = v; |

} |

T get() { |

return this.the_value; |

} |

void set(T v) { |

this.the_value = v; |

} |

} |

data Box:

| box (mutable v)

where:

n1 = box(1)

n2 = box(2)

n1!{v : 3}

n2!{v : 4}

n1!v is 3

n2!v is 4

end

Why do we say “type-consistent” above, rather than “the same type”?

The values could be related by subtyping (Subtyping).

What does the comment about longevity mean? How does this apply to reusing values extracted from fields?

#### 11.2` `Implementing Disjoint Sets with Mutation

Here’s how we can use this to implement union-find afresh. We will try to keep things as similar to the previous version (Checking Component Connectedness) as possible, to enhance comparison.

data Element:

| elt (val, mutable parent :: Option<Element>)

end

fun is-in-same-set(e1 :: Element, e2 :: Element) -> Bool:

s1 = fynd(e1)

s2 = fynd(e2)

identical(s1, s2)

end

fun update-set-with(child :: Element, parent :: Element) -> Nothing:

child!{parent: some(parent)}

end

fun union(e1 :: Element, e2 :: Element):

s1 = fynd(e1)

s2 = fynd(e2)

if identical(s1, s2):

s1

else:

update-set-with(s1, s2);

end

fun fynd(e :: Element) -> Element:

cases (Option) e!parent:

| none => e

| some(p) => fynd(p);

end

##### 11.2.1` `Optimizations

Look again at fynd. In the some case, the element bound to e is not the set name; that is obtained by recursively traversing parent references. As this value returns, however, we don’t do anything to reflect this new knowledge! Instead, the next time we try to find the parent of this element, we’re going to perform this same recursive traversal all over again.

fun fynd(e :: Element) -> Element:

cases (Option) e!parent:

| none => e

| some(p) =>

new-parent = fynd(p)

e!{parent: some(new-parent)}

new-parent;

end

There is one more interesting idea we can apply. This is to maintain a rank of each element, which is roughly the depth of the tree of elements for which that element is their set name. When we union two elements, we then make the one with larger rank the parent of the one with the smaller rank. This has the effect of avoiding growing very tall paths to set name elements, instead tending towards “bushy” trees. This too reduces the number of parents that must be traversed to find the representative.

##### 11.2.2` `Analysis

This optimized union-find data structure has a remarkble analysis. In
the worst case, of course, we must traverse the entire chain of
parents to find the name element, which takes time proportional to the
number of elements in the set. However, once we apply the above
optimizations, we never need to traverse that same chain again! In
particular, if we conduct an amortized analysis over a sequence
of set equality tests after a collection of union operations, we find
that the cost for subsequent checks is very small—

#### 11.3` `What it Means to be Identical

b0 = box("a value")

b1 = box("a value")

b2 = b1

b0!v == b1!v is true

identical(b0, b1) is false

identical(b1, b2) is true

b1!v is b2!v

hold-b1-value = b1!v

b1!{v: "a different value"}

b0!v == b1!v is false

identical(b0, b1) is false

identical(b1, b2) is true

b1!v is b2!v

b1!{v: hold-b1-value}

hold-b2-value = b2!v

b2!{v: "yet another value"}

b0!v == b1!v is false

identical(b0, b1) is false

identical(b1, b2) is true

b1!v is b2!v

b2!{v: hold-b2-value}

b0!v == b1!v is false

Now, why did we bother holding on to and restoring the value? It’s
because at the end of each of these sequences, the values have all been
restored, but by making the change, we have been able to observe which
other names detect the change and which ones do not—

In practice, identical does not behave this way: it would be
too disruptive—

#### 11.4` `From Identifiers to Variables

As we have just seen, mutable values can be aliased, which means references can inadvertently have their values changed. Because these values can be passed around, it can be difficult to track all the aliases that might exist (because it would be infeasible for a value to retain “backward references”).

Therefore, in Pyret as in most languages, there is another form of mutable, called a variable. A variable is simply an identifier whose binding can be changed (as opposed to re-bound in a new scope); in Pyret, the syntax for changing the value of a variable is :=.

x = 1; |

x = 3; |

Now, we also use the term “variable” in mathematics to refer to function parameters. For instance, in f(y) = y+3 we say that y is a “variable”. That is called a variable because it varies across invocations; however, within each invocation, it has the same value in its scope. Our identifiers until now have corresponded to this mathematical notion of a variable.If the identifier was bound to a box, then it remained bound to the same box value. It’s the content of the box that changed, not which box the identifier was bound to. In contrast, programming variables can vary even within each invocation, like the Java x above.

Henceforth, we will use variable when we mean an identifier whose value can change within its scope, and identifier when this cannot happen. If in doubt, we might play it safe and use “variable”; if the difference doesn’t really matter, we might use either one. It is less important to get caught up in these specific terms than to understand that they represent a distinction that matters (Mutation: Structures and Variables).

#### 11.5` `Interaction of Mutation with Closures: Counters

check:

l1 = mk-counter()

l1() is 1

l1() is 2

l2 = mk-counter()

l2() is 1

l1() is 3

l2() is 2

end

We now see how we can implement this using both mutable structures (specifically, boxes) and variables.

##### 11.5.1` `Implementation Using Boxes

fun mk-counter():

ctr = box(0)

fun():

ctr!{v : (ctr!v + 1)}

ctr!v

end

end

fun mk-broken-counter():

fun():

ctr = box(0)

ctr!{v : (ctr!v + 1)}

ctr!v

end

where:

l1 = mk-broken-counter()

l1() is 1

l1() is 1

l2 = mk-broken-counter()

l2() is 1

l1() is 1

l2() is 1

end

The examples above hint at an implementation necessity. Clearly, whatever the environment closes over in the procedure returned by mk-counter must refer to the same box each time. Yet something also needs to make sure that the value in that box is different each time! Look at it more carefully: it must be lexically the same, but dynamically different. This distinction will be at the heart of a strategy for implementing state (Mutation: Structures and Variables).

##### 11.5.2` `Implementation Using Variables

fun mk-counter():

var ctr = 0

fun():

ctr := ctr + 1

ctr

end

end

check:

l1 = mk-counter()

l1() is 1

l1() is 2

l2 = mk-counter()

l2() is 1

l1() is 3

l2() is 2

end

fun mk-broken-counter():

fun():

var ctr = 0

ctr := ctr + 1

ctr

end

where:

l1 = mk-broken-counter()

l1() is 1

l1() is 1

l2 = mk-broken-counter()

l2() is 1

l1() is 1

l2() is 1

end

#### 11.6` `Set Membership Again

We have already seen solutions to set membership. First we saw how to
represent sets as lists (Representing Sets by Lists), then as
(balanced) binary trees (A Fine Balance: Tree Surgery).Don’t
confuse this with union-find, which is a different kind of problem on
sets (Implementing Disjoint Sets with Mutation). With this we were able to reduce
insertion and membership to logarithmic time in the number of
elements. Along the way, we also learned that the essence of using
these representations was to reduce any datatype to a comparable,
ordered element—

Let us now ask whether we can use these numbers in any other way. Suppose our set has only five elements, which map densely to the values between 0 and 4. We can then have a five element list of boolean values, where the boolean at each index of the list indicates whether the element corresponding to that position is in the set or not. Both membership and insertion, however, require traversing potentially the entire list, giving us solutions linear in the number of elements.

That’s not all. Unless we can be certain that there will be only five elements, we can’t be sure to bound the size of the representation. Also, we haven’t yet shown how to actually hash in a way that makes the representation dense; barring that, our space consumption gets much worse, in turn affecting time.

There is, actually, a relatively simple solution to the problem of reducing numbers densely to a range: given the hash, we apply modular arithmetic. That is, if we want to use a list of five elements to represent the set, we simply compute the hash’s modulo five. This gives us an easy solution to that problem.

[true, false, false, false, false]

[5, false, false, false, false]

[[5], empty, empty, empty, empty]

[[5, 10], empty, empty, empty, empty]

Good; now we have another way of representing sets so we can check for membership. However, in the worst case one of those lists is going to contain all elements in the set, and we may have to traverse the entire list to find an element in it, which means membership testing will take time linear in the number of elements. Insertion, in turn, takes time proportional to the size of the modulus because we may have to traverse the entire outer list to get to the right sub-list.

Can we improve on this?

##### 11.6.1` `Improving Access Time

Given that we currently have no way of ensuring we won’t get hash collisions, for now we’re stuck with a list of elements at each position that could be the size of the set we are trying to represent. Therefore, we can’t get around that (yet). But, we’re currently paying time in the size of the outer list just to insert an element, and surely we can do better than that!

Accessing the \(n\)th element of an array takes constant, not linear, time in \(n\). This is sometimes known as random-access, because it takes the same time to access any random element, as opposed to just a known element.

Arrays are updated by mutation. Thus, a change to an array is seen by all references to the array.

SIZE = 19

v = array-of(empty, SIZE)

fun find-bucket(n): n.modulo(SIZE);

fun get-bucket(n): array-get(v, find-bucket(n));

fun is-in(n): get-bucket(n).member(n);

fun set-bucket(n, new): array-set(v, find-bucket(n), new);

fun put(n):

when not(is-in(n)):

set-bucket(n, link(n, get-bucket(n)));

end

What impact do duplicate elements have on the complexity of operations?

The data structure we have defined above is known as a hash table (which is a slightly confusing name, because it isn’t really a table of hashes, but this is the name used conventionally in computer science).

##### 11.6.2` `Better Hashing

Using arrays therefore appears to address one issue: insertion. Finding the relevant bucket takes constant time, linking the new element takes constant time, and so the entire operation takes constant time...except, we have to also check whether the element is already in the bucket, to avoid storing duplicates. We have gotten rid of the traversal through the outer list representing the set, but the member operation on the inner list remains unchanged. In principle it won’t, but in practice we can make it much better.

Note that collisions are virtually inevitable. If we have uniformly distributed data, then collisions show up sooner than we might expect.This follows from the reasoning behind what is known as the birthday problem, commonly presented as how many people need to be in a room before the likelihood that two of them share a birthday exceeds some percentage. For the likelihood to exceed half we need just 23 people! Therefore, it is wise to prepare for the possibility of collisions.

The key is to know something about the distribution of hash values. For instance, if we knew our hash values are all multiples of 10, then using a table size of 10 would be a terrible idea (because all elements would hash to the same bucket, turning our hash table into a list). In practice, it is common to use uncommon prime numbers as the table size, since a random value is unlikely to have it as a divisor. This does not yield a theoretical improvement (unless you can make certain assumptions about the input, or work through the math very carefully), but it works well in practice. In particular, since the typical hashing function uses memory addresses for objects on the heap, and on most systems these addresses are multiples of 4, using a prime like 31 is often a fairly good bet.

##### 11.6.3` `Bloom Filters

Another way to improve the space and time complexity is to relax the
properties we expect of the operations. Right now, set membership
gives perfect answers, in that it answers true exactly when the
element being checked was previously inserted into the set. But
suppose we’re in a setting where we can accept a more relaxed notion
of correctness, where membership tests can “lie” slightly in one
direction or the other (but not both, because that makes the
representation almost useless). Specifically, let’s say that “no
means no” (i.e., if the set representation says the element isn’t
present, it really isn’t) but “yes sometimes means no” (i.e., if the
set representation says an element is present, sometimes it
might not be). In short, if the set says the element isn’t in it, this
should be guaranteed; but if the set says the element is present,
it may not be. In the latter case, we either need some
other—

Where is such a data structure of use? Suppose we are building a Web site that uses password-based authentication. Because many passwords have been leaked in well-publicized breaches, it is safe to assume that hackers have them and will guess them. As a result, we want to not allow users to select any of these as passwords. We could use a hash-table to reject precisely the known leaked passwords. But for efficiency, we could use this imperfect hash instead. If it says “no”, then we allow the user to use that password. But if it says “yes”, then either they are using a password that has been leaked, or they have an entirely different password that, purely by accident, has the same hash value, but no matter; we can just disallow that password as well.A related use is for filtering out malicious Web sites. The URL shortening system, bitly, uses it for this purpose.

Another example is in updating databases or memory stores. Suppose we have a database of records, which we update frequently. It is often more efficient to maintain a journal of changes: i.e., a list that sequentially records all the changes that have occurred. At some interval (say overnight), the journal is “flushed”, meaning all these changes are applied to the database proper. But that means every read operation has become highly inefficient, because it has to check the entire journal first (for updates) before accessing the database. Again, here we can use this faulty notion of a hash table: if the hash of the record locator says “no”, then the record certainly hasn’t been modified and we go directly to the database; if it says “yes” then we have to check the journal.

We have already seen a simple example implementation of this idea earlier, when we used a single list (or array) of booleans, with modular arithmetic, to represent the set. When the set said 4 was not present; this was absolutely true; but when it said 5 and 10 are both present, only one of these was present. The advantage was a huge saving in space and time: we needed only one bit per bucket, and did not need to search through a list to answer for membership. The downside, of course, was a hugely inaccurate set data structure, and one with correlated failure tied to the modulus.

There is a simple way to improve this solution: instead of having just one array, have several (but a fixed number of them). When an element is added to the set, it is added to each array; when checking for membership, every array is consulted. The set only answers affirmatively to membership if all the arrays do so.

Naturally, using multiple arrays offers absolutely no advantage if the arrays are all the same size: since both insertion and lookup are deterministic, all will yield the same answer. However, there is a simple antidote to this: use different array sizes. In particular, by using array sizes that are relatively prime to one another, we minimize the odds of a clash (only hashes that are the product of all the array sizes will fool the array).

This data structure, called a Bloom Filter, is a probabilistic data structure. Unlike our earlier set data structure, this one is not guaranteed to always give the right answer; but we get to save both space and time by changing the problem slightly and accepting incorrect answers. If we know something about the distribution of hash values, and we have some acceptable bound of error, we can design hash table sizes so that with high probability, the Bloom Filter will lie within the acceptable error bounds.

#### 11.7` `Avoiding Recomputation by Remembering Answers

We have on several instances already referred to a ☛ space-time tradeoff. The most obvious tradeoff is when a computation “remembers” prior results and, instead of recomputing them, looks them up and returns the answers. This is an instance of the tradeoff because it uses space (to remember prior answers) in place of time (recomputing the answer). Let’s see how we can write such computations.

##### 11.7.1` `An Interesting Numeric Sequence

Suppose we want to create properly-parenthesized expressions, and ignore all non-parenthetical symbols. How many ways are there of creating parenthesized expressions given a certain number of opening (equivalently, closing) parentheses?

If we have zero opening parentheses, the only expression we can create is the empty expression. If we have one opening parenthesis, the only one we can construct is “()” (there must be a closing parenthesis since we’re interested only in properly-parenthesized expressions). If we have two opening parentheses, we can construct “(())” and “()()”. Given three, we can construct “((()))”, “(())()”, “()(())”, “()()()”, and “(()())”, for a total of five. And so on. Observe that the solutions at each level use all the possible solutions at one level lower, combined in all the possible ways.

fun catalan(n):

if n == 0: 1

else if n > 0:

for fold(acc from 0, k from range(0, n)):

acc + (catalan(k) * catalan(n - 1 - k));

end

end

check: |

catalan(0) is 1 |

catalan(1) is 1 |

catalan(2) is 2 |

catalan(3) is 5 |

catalan(4) is 14 |

catalan(5) is 42 |

catalan(6) is 132 |

catalan(7) is 429 |

catalan(8) is 1430 |

catalan(9) is 4862 |

catalan(10) is 16796 |

catalan(11) is 58786 |

end |

Check at what value you start to observe a significant slowdown on your machine. Plot the graph of running time against input size. What does this suggest?

The reason the Catalan computation takes so long is precisely because of what we alluded to earlier: at each level, we depend on computing the Catalan number of all the smaller levels; this computation in turn needs the numbers of all of its smaller levels; and so on down the road.

Map the subcomputations of catalan to see why the computation time explodes as it does. What is the asymptotic time complexity of this function?

##### 11.7.1.1` `Using State to Remember Past Answers

Therefore, this is clearly a case where trading space for time is likely to be of help. How do we do this? We need a notion of memory that records all previous answers and, on subsequent attempts to compute them, checks whether they are already known and, if so, just returns them instead of recomputing them.

What critical assumption is this based on?

Naturally, this assumes that for a given input, the answer will
always be the same. As we have seen, functions with state violate
this liberally, so typical stateful functions cannot utilize this
optimization. Ironically, we will use state to implement this
optimization, so we will have a stateful function that always returns
the same answer on a given input—

data MemoryCell:

| mem(in, out)

end

var memory :: List<MemoryCell> = []

fun catalan(n :: Number) -> Number:

answer = list.find(fun(elt): elt.in == n;, memory)

cases (Option) answer:

| none =>

result =

if n == 0: 1

else if n > 0:

for fold(acc from 0, k from range(0, n)):

acc + (catalan(k) * catalan(n - 1 - k));

end

memory := link({in: n, out: result}, memory)

result

| some(v) => v.out

end

end

This process, of converting a function into a version that remembers its past answers, is called memoization.

##### 11.7.1.2` `From a Tree of Computation to a DAG

What we have subtly done is to convert a tree of computation into a
DAG over the same computation, with equivalent calls being
reused. Whereas previously each call was generating lots of recursive
calls, which induced still more recursive calls, now we are reusing
previous recursive calls—

This has an important complexity benefit. Whereas previously we were
performing a super-exponential number of calls, now we perform only
one call per input and share all previous calls—

##### 11.7.1.3` `The Complexity of Numbers

As we start to run larger computations, however, we may start to
notice that our computations are starting to take longer than linear
growth. This is because our numbers are growing arbitrarily
large—

##### 11.7.1.4` `Abstracting Memoization

Now we’ve achieved the desired complexity improvement, but there is still something unsatisfactory about the structure of our revised definition of catalan: the act of memoization is deeply intertwined with the definition of a Catalan number, even though these should be intellectually distinct. Let’s do that next.

In effect, we want to separate our program into two parts. One part defines a general notion of memoization, while the other defines catalan in terms of this general notion.

fun<T, U> memoize-1(f :: (T -> U)) -> (T -> U):

data MemoryCell:

| mem(in, out)

end

var memory :: List<MemoryCell> = []

fun(n):

answer = list.find(fun(elt): elt.in == n;, memory)

cases (Option) answer:

| none =>

result = f(n)

memory := link({in: n, out: result}, memory)

result

| some(v) => v.out

end

end

end

catalan =

memoize-1(

fun(n):

if n == 0: 1

else if n > 0:

for fold(acc from 0, k from range(0, n)):

acc + (catalan(k) * catalan(n - 1 - k));

end

end)

We don’t write fun catalan(...): ...; because the procedure bound to catalan is produced by memoize-1.

Note carefully that the recursive calls to catalan have to be to the function bound to the result of memoization, thereby behaving like an object (Objects). Failing to refer to this same shared procedure means the recursive calls will not be memoized, thereby losing the benefit of this process.

Each invocation of memoize-1 creates a new table of stored results. Therefore the memoization of different functions will each get their own tables rather than sharing tables, which is a bad idea!

Why is sharing memoization tables a bad idea? Be concrete.

##### 11.7.2` `Edit-Distance for Spelling Correction

Text editors, word processors, mobile phones, and various other devices now routinely implement spelling correction or offer suggestions on (mis-)spellings. How do they do this? Doing so requires two capabilities: computing the distance between words, and finding words that are nearby according to this metric. In this section we will study the first of these questions. (For the purposes of this discussion, we will define a “word” to be any list of characters.)

Think about how you might define the “distance between two words”. Does it define a metric space?

Will the definition we give below define a metric space over the set of words?

That the distance from a word to itself be zero.

That the distance from a word to any word other than itself be strictly positive. (Otherwise, given a word that is already in the dictionary, the “correction” might be a different dictionary word.)

That the distance between two words be symmetric, i.e., it shouldn’t matter in which order we pass arguments.

Observe that we have not included the triangle inequality relative to the properties of a metric. Why not? If we don’t need the triangle inequality, does this let us define more interesting distance functions that are not metrics?

we left out a character;

we typed a character twice; or,

we typed one character when we meant another.

There are several variations of this definition possible. For now, we will consider the simplest one, which assumes that each of these errors has equal cost. For certain input devices, we may want to assign different costs to these mistakes; we might also assign different costs depending on what wrong character was typed (two characters adjacent on a keyboard are much more likely to be a legitimate error than two that are far apart). We will return briefly to some of these considerations later (Nature as a Fat-Fingered Typist).

check: |

levenshtein([], []) is 0 |

levenshtein(["x"], ["x"]) is 0 |

levenshtein(["x"], ["y"]) is 1 |

# one of about 600 |

levenshtein( |

["b", "r", "i", "t", "n", "e", "y"], |

["b", "r", "i", "t", "t", "a", "n", "y"]) |

is 3 |

# http://en.wikipedia.org/wiki/Levenshtein_distance |

levenshtein( |

["k", "i", "t", "t", "e", "n"], |

["s", "i", "t", "t", "i", "n", "g"]) |

is 3 |

levenshtein( |

["k", "i", "t", "t", "e", "n"], |

["k", "i", "t", "t", "e", "n"]) |

is 0 |

# http://en.wikipedia.org/wiki/Levenshtein_distance |

levenshtein( |

["S", "u", "n", "d", "a", "y"], |

["S", "a", "t", "u", "r", "d", "a", "y"]) |

is 3 |

# http://www.merriampark.com/ld.htm |

levenshtein( |

["g", "u", "m", "b", "o"], |

["g", "a", "m", "b", "o", "l"]) |

is 2 |

# http://www.csse.monash.edu.au/~lloyd/tildeStrings/Alignment/92.IPL.html |

levenshtein( |

["a", "c", "g", "t", "a", "c", "g", "t", "a", "c", "g", "t"], |

["a", "c", "a", "t", "a", "c", "t", "t", "g", "t", "a", "c", "t"]) |

is 4 |

levenshtein( |

[ "s", "u", "p", "e", "r", "c", "a", "l", "i", |

"f", "r", "a", "g", "i", "l", "i", "s", "t" ], |

[ "s", "u", "p", "e", "r", "c", "a", "l", "y", |

"f", "r", "a", "g", "i", "l", "e", "s", "t" ]) |

is 2 |

end |

levenshtein :: (List<String>, List<String> -> Number) = |

if is-empty(s) and is-empty(t): 0 |

else if is-empty(s): t.length() |

else if is-empty(t): s.length() |

else: |

if s.first == t.first: |

levenshtein(s.rest, t.rest) |

else: |

min3( |

1 + levenshtein(s.rest, t), |

1 + levenshtein(s, t.rest), |

1 + levenshtein(s.rest, t.rest));; |

fun min3(a :: Number, b :: Number, c :: Number):

a.min(b).min(c)

end

This algorithm will indeed pass all the tests we have written above, but with a problem: the running time grows exponentially. That is because, each time we find a mismatch, we recur on three subproblems. In principle, therefore, the algorithm takes time proportional to three to the power of the length of the shorter word. In practice, any prefix that matches causes no branching, so it is mismatches that incur branching (thus, confirming that the distance of a word with itself is zero only takes time linear in the size of the word).

Observe, however, that many of these subproblems are the same. For instance, given “kitten” and “sitting”, the mismatch on the initial character will cause the algorithm to compute the distance of “itten” from “itting” but also “itten” from “sitting” and “kitten” from “itting”. Those latter two distance computations will also involve matching “itten” against “itting”. Thus, again, we want the computation tree to turn into a DAG of expressions that are actually evaluated.

fun<T, U, V> memoize-2(f :: (T, U -> V)) -> (T, U -> V):

data MemoryCell<T, U, V>:

| mem(in-1 :: T, in-2 :: U, out :: V)

end

var memory :: List<MemoryCell<T, U, V>> = []

fun(p, q):

answer = list.find(

fun(elt): (elt.in-1 == p) and (elt.in-2 == q);,

memory)

cases (Option) answer:

| none =>

result = f(p, q)

memory :=

link({in-1: p, in-2: q, out: result}, memory)

result

| some(v) => v.out

end

end

end

levenshtein :: (List<String>, List<String> -> Number) = |

memoize-2( |

fun(s, t): |

if is-empty(s) and is-empty(t): 0 |

else if is-empty(s): t.length() |

else if is-empty(t): s.length() |

else: |

if s.first == t.first: |

levenshtein(s.rest, t.rest) |

else: |

min3( |

1 + levenshtein(s.rest, t), |

1 + levenshtein(s, t.rest), |

1 + levenshtein(s.rest, t.rest));; |

end) |

The complexity of this algorithm is still non-trivial. First, let’s introduce the term suffix: the suffix of a string is the rest of the string starting from any point in the string. (Thus “kitten”, “itten”, “ten”, “n”, and “” are all suffixes of “kitten”.) Now, observe that in the worst case, starting with every suffix in the first word, we may need to perform a comparison against every suffix in the second word. Fortunately, for each of these suffixes we perform a constant computation relative to the recursion. Therefore, the overall time complexity of computing the distance between strings of length \(m\) and \(n\) is \(O(m \cdot n)\). (We will return to space consumption later [REF].)

Modify the above algorithm to produce an actual (optimal) sequence of edit operations. This is sometimes known as the traceback.

##### 11.7.3` `Nature as a Fat-Fingered Typist

We have talked about how to address mistakes made by humans. However, humans are not the only bad typists: nature is one, too!

When studying living matter we obtain sequences of amino acids and
other such chemicals that comprise molecules, such as DNA, that hold
important and potentially determinative information about the
organism. These sequences consist of similar fragments that we wish to
identify because they represent relationships in
the organism’s behavior or evolution.This section may
need to be skipped in
some states and countries.
Unfortunately, these sequences are never identical: like all
low-level programmers, nature slips up and sometimes makes mistakes in
copying (called—

The only difference between traditional presentations Levenshtein and Smith-Waterman is something we alluded to earlier: why is every edit given a distance of one? Instead, in the Smith-Waterman presentation, we assume that we have a function that gives us the gap score, i.e., the value to assign every character’s alignment, i.e., scores for both matches and edits, with scores driven by biological considerations. Of course, as we have already noted, this need is not peculiar to biology; we could just as well use a “gap score” to reflect the likelihood of a substitution based on keyboard characteristics.

##### 11.7.4` `Dynamic Programming

We have used memoization as our canonical means of saving the values of past computations to reuse later. There is another popular technique for doing this called dynamic programming. This technique is closely related to memoization; indeed, it can be viewed as the dual method for achieving the same end. First we will see dynamic programming at work, then discuss how it differs from memoization.

Dynamic programming also proceeds by building up a memory of answers, and looking them up instead of recomputing them. As such, it too is a process for turning a computation’s shape from a tree to a DAG of actual calls. The key difference is that instead of starting with the largest computation and recurring to smaller ones, it starts with the smallest computations and builds outward to larger ones.

We will revisit our previous examples in light of this approach.

##### 11.7.4.1` `Catalan Numbers with Dynamic Programming

MAX-CAT = 11

answers :: Array<Option<Number>> = array-of(none, MAX-CAT + 1)

fun catalan(n):

cases (Option) array-get(answers, n):

| none => raise("looking at uninitialized value")

| some(v) => v

end

end

fun fill-catalan(upper):

array-set(answers, 0, some(1))

when upper > 0:

for map(n from range(1, upper + 1)):

block:

cat-at-n =

for fold(acc from 0, k from range(0, n)):

acc + (catalan(k) * catalan(n - 1 - k));

array-set(answers, n, some(cat-at-n))

end

end

end

end

fill-catalan(MAX-CAT)

Notice that we have had to undo the natural recursive
definition—

##### 11.7.4.2` `Levenshtein Distance and Dynamic Programming

fun levenshtein(s1 :: List<String>, s2 :: List<String>): |

end |

s1-len = s1.length() |

s2-len = s2.length() |

answers = array2d(s1-len + 1, s2-len + 1, none) |

fun put(s1-idx :: Number, s2-idx :: Number, n :: Number): |

answers.set(s1-idx, s2-idx, some(n)) |

end |

fun lookup(s1-idx :: Number, s2-idx :: Number) -> Number: |

a = answers.get(s1-idx, s2-idx) |

cases (Option) a: |

| none => raise("looking at uninitialized value") |

| some(v) => v |

end |

end |

for each(s1i from range(0, s1-len + 1)): |

put(s1i, 0, s1i); |

for each(s2i from range(0, s2-len + 1)): |

put(0, s2i, s2i); |

for each(s1i from range(0, s1-len)): |

for each(s2i from range(0, s2-len)): |

end |

end |

Is this strictly true?

No, it isn’t. We did first fill in values for the “borders” of the table. This is because doing so in the midst of <levenshtein-dp/compute-dist> would be much more annoying. By initializing all the known values, we keep the core computation cleaner. But it does mean the order in which we fill in the table is fairly complex.

dist = |

if index(s1, s1i) == index(s2, s2i): |

lookup(s1i, s2i) |

else: |

min3( |

1 + lookup(s1i, s2i + 1), |

1 + lookup(s1i + 1, s2i), |

1 + lookup(s1i, s2i)) |

end |

put(s1i + 1, s2i + 1, dist) |

lookup(s1-len, s2-len) |

fun levenshtein(s1 :: List<String>, s2 :: List<String>):

s1-len = s1.length()

s2-len = s2.length()

answers = array2d(s1-len + 1, s2-len + 1, none)

for each(s1i from range(0, s1-len + 1)):

put(s1i, 0, s1i);

for each(s2i from range(0, s2-len + 1)):

put(0, s2i, s2i);

for each(s1i from range(0, s1-len)):

for each(s2i from range(0, s2-len)):

dist =

if index(s1, s1i) == index(s2, s2i):

lookup(s1i, s2i)

else:

min3(

1 + lookup(s1i, s2i + 1),

1 + lookup(s1i + 1, s2i),

1 + lookup(s1i, s2i))

end

put(s1i + 1, s2i + 1, dist)

end

end

lookup(s1-len, s2-len)

end

##### 11.7.5` `Contrasting Memoization and Dynamic Programming

Now that we’ve seen two very techniques for avoiding recomputation, it’s worth contrasting them. The important thing to note is that memoization is a much simpler technique: write the natural recursive definition; determine its space complexity; decide whether this is problematic enough to warrant a space-time trade-off; and if it is, apply memoization. The code remains clean, and subsequent readers and maintainers will be grateful for that. In contrast, dynamic programming requires a reorganization of the algorithm to work bottom-up, which can often make the code harder to follow and full of subtle invariants about boundary conditions and computation order.

That said, the dynamic programming solution can sometimes be more computationally efficient. For instance, in the Levenshtein case, observe that at each table element, we (at most) only ever use the ones that are from the previous row and column. That means we never need to store the entire table; we can retain just the fringe of the table, which reduces space to being proportional to the sum, rather than product, of the length of the words. In a computational biology setting (when using Smith-Waterman), for instance, this saving can be substantial. This optimization is essentially impossible for memoization.

Memoization |
| Dynamic Programming |

Top-down |
| Bottom-up |

Depth-first |
| Breadth-first |

Black-box |
| Requires code reorganization |

All stored calls are necessary |
| May do unnecessary computation |

Cannot easily get rid of unnecessary data |
| Can more easily get rid of unnecessary data |

Can never accidentally use an uninitialized answer |
| Can accidentally use an uninitialized answer |

Needs to check for the presence of an answer |
| Can be designed to not need to check for the presence of an answer |

From a software design perspective, there are two more considerations.

First, the performance of a memoized solution can trail that of dynamic programming when the memoized solution uses a generic data structure to store the memo table, whereas a dynamic programming solution will invariably use a custom data structure (since the code needs to be rewritten against it anyway). Therefore, before switching to dynamic programming for performance reasons, it makes sense to try to create a custom memoizer for the problem: the same knowledge embodied in the dynamic programming version can often be encoded in this custom memoizer (e.g., using an array instead of list to improve access times). This way, the program can enjoy speed comparable to that of dynamic programming while retaining readability and maintainability.

Second, suppose space is an important consideration and the dynamic programming version can make use of significantly less space. Then it does make sense to employ dynamic programming instead. Does this mean the memoized version is useless?

What do you think? Do we still have use for the memoized version?

Yes, of course we do! It can serve as an oracle [REF] for the dynamic
programming version, since the two are supposed to produce identical
answers anyway—

In short, always first produce the memoized version. If you need more performance, consider customizing the memoizer’s data structure. If you need to also save space, and can arrive at a more space-efficient dynamic programming solution, then keep both versions around, using the former to test the latter (the person who inherits your code and needs to alter it will thank you!).

We have characterized the fundamental difference between memoization and dynamic programming as that between top-down, depth-first and bottom-up, breadth-first computation. This should naturally raise the question, what about:

top-down, breadth-first

bottom-up, depth-first

orders of computation. Do they also have special names that we just happen to not know? Are they uninteresting? Or do they not get discussed for a reason?