On this page:
7.1 Making Lists and Taking Them Apart
7.2 Some Example Exercises
7.3 Structural Problems with Scalar Answers
7.3.1 my-len:   Examples
7.3.2 my-sum:   Examples
7.3.3 From Examples to Code
7.4 Structural Problems with List Answers
7.4.1 my-str-len:   Examples and Code
7.4.2 my-pos-nums:   Examples and Code
7.4.3 my-alternating:   First Attempt
7.4.4 my-running-sum:   First Attempt
7.5 Structural Problems with Sub-Domains
7.5.1 my-max:   Examples
7.5.2 my-max:   From Examples to Code
7.5.3 my-alternating:   Examples and Code
7.6 More Structural Problems with Scalar Answers
7.6.1 my-avg:   Examples
7.7 Structural Problems with Accumulators
7.7.1 my-running-sum:   Examples and Code
7.7.2 my-alternating:   Examples and Code
7.8 Dealing with Multiple Answers
7.8.1 uniq:   Problem Setup
7.8.2 uniq:   Examples
7.8.3 uniq:   Code
7.8.4 uniq:   Reducing Computation
7.8.5 uniq:   Example and Code Variations
7.8.6 uniq:   Why Produce a List?
7.9 Monomorphic Lists and Polymorphic Types

7 Processing Lists

    7.1 Making Lists and Taking Them Apart

    7.2 Some Example Exercises

    7.3 Structural Problems with Scalar Answers

      7.3.1 my-len: Examples

      7.3.2 my-sum: Examples

      7.3.3 From Examples to Code

    7.4 Structural Problems with List Answers

      7.4.1 my-str-len: Examples and Code

      7.4.2 my-pos-nums: Examples and Code

      7.4.3 my-alternating: First Attempt

      7.4.4 my-running-sum: First Attempt

    7.5 Structural Problems with Sub-Domains

      7.5.1 my-max: Examples

      7.5.2 my-max: From Examples to Code

      7.5.3 my-alternating: Examples and Code

    7.6 More Structural Problems with Scalar Answers

      7.6.1 my-avg: Examples

    7.7 Structural Problems with Accumulators

      7.7.1 my-running-sum: Examples and Code

      7.7.2 my-alternating: Examples and Code

    7.8 Dealing with Multiple Answers

      7.8.1 uniq: Problem Setup

      7.8.2 uniq: Examples

      7.8.3 uniq: Code

      7.8.4 uniq: Reducing Computation

      7.8.5 uniq: Example and Code Variations

      7.8.6 uniq: Why Produce a List?

    7.9 Monomorphic Lists and Polymorphic Types

We have already seen [From Tables to Lists] several examples of list-processing functions. They have been especially useful for advanced processing of tables. However, lists arise frequently in programs, and they do so naturally because so many things in our lives—from shopping lists to to-do lists to checklists—are naturally lists. As we already briefly discussed earlier [Lists as Anonymous Data],
  • some list functions are generic and operate on any kind of list: e.g., the length of a list is the same irrespective of what kind of values it contains;

  • some are specific at least to the type of data: e.g., the sum assumes that all the values are numbers (though they may be ages or prices or other information represented by numbers); and

  • some are somewhere in-between: e.g., a maximum function applies to any list of comparable values, such as numbers or strings.

This seems like a great variety, and we might worry about how we can handle this many different kinds of functions. Fortunately, and perhaps surprisingly, there is one standard way in which we can think about writing all these functions! Our mission is to understand and internalize this process.

7.1 Making Lists and Taking Them Apart

So far we’ve seen one way to make a list: by writing [list: …]. While useful, writing lists this way actually hides their true nature. Every list actually has two parts: a first element and the rest of the list. The rest of the list is itself a list, so it too has two parts…and so on.

Consider the list [list: 1, 2, 3]. Its head is 1, and the rest of it is [list: 2, 3]. For this second list, the head is 2 and the rest is [list: 3].

Do Now!

Take apart this third list.

For the third list, the head is 3 and the rest is [list: ], i.e., the empty list. In Pyret, we have another way of writing the empty list: empty.

Here, we’ve taken the lists apart manually. Naturally, Pyret has operations that let us do that. Lists are an instance of structured data, and in general there are two ways to take apart structured data: using cases, which we will see below, and using accessors. A list has two accessors: first and rest. We use an accessor by writing an expression, followed by a dot (.), followed by the accessor’s name. Thus:

l1 = [list: 1, 2, 3] h1 = l1.first l2 = l1.rest h2 = l2.first l3 = l2.rest h3 = l3.first l4 = l3.rest check: h1 is 1 h2 is 2 h3 is 3 l2 is [list: 2, 3] l3 is [list: 3] l4 is empty end

Thus, .first and .rest give us a way to take apart a list. Can we also put together a list piece-by-piece? This would be especially useful for building up a list. And indeed we can: the function (called a constructor) that makes lists is called link. It takes two arguments: a list element, and the rest of the list. Thus, l1 above is equivalent to a series of links followed by empty:

check: [list: 1, 2, 3] is link(1, link(2, link(3, empty))) end

Obviously, writing the link form is not very convenient to humans. But it will prove very valuable to programs!

Observe, in summary, that broadly speaking we have two kinds of lists. Some lists are empty. All other lists are non-empty lists, meaning they have at least one link. There may be more interesting structure to some lists, but all lists have this much in common. Specifically, a list is either
  • empty (written empty or [list: ]), or

  • non-empty (written link(…, …) or [list: ] with at least one value inside the brackets), where the rest is also a list (and hence may in turn be empty or non-empty, …).

7.2 Some Example Exercises

To illustrate our thinking, let’s work through a few concrete examples of list-processing functions. All of these will consume lists; some will even produce them. Since some of these functions already exist in Pyret, we’ll name them with the prefix my- to avoid errors.Be sure to use the my- name consistently, including inside the body of the function.
  • Compute the length of a list:

    my-len :: List<Any> -> Number

  • Compute the sum of a list (of numbers):

    my-sum :: List<Number> -> Number

  • Compute the maximum of a list (of numbers or strings):

    my-max :: List<Any> -> Any

  • Given a list of strings, convert each string to a number representing its length:

    my-str-len :: List<String> -> List<Number>

  • Given a list of numbers, generate a list of its positive numbers:If you want to be pedantic: its positive numbers with the same count and in the same order.

    my-pos-nums :: List<Number> -> List<Number>

  • Given a list of numbers, replace every element with the running sum, i.e., the sum of all the elements from the beginning of the list until that element (inclusive):

    my-running-sum :: List<Number> -> List<Number>

  • Given a list, keep every alternate element in it, starting from the first:

    my-alternating :: List<Any> -> List<Any>

  • Given a list of numbers, compute the average of the numbers:

    my-avg :: List<Number> -> List<Number>

To solve problems like this, there are two things we should do:
  • Construct examples of the function’s behavior.

  • Employ the template that suggests possible solutions.

Both steps sound simple but have several nuances, which we will explore.

7.3 Structural Problems with Scalar Answers

Let’s write out examples for a few of the functions described above. We’ll approach writing examples in a very specific, stylized way. First of all, we should always construct at least two examples: one with empty and the other with at least one link, so that we’ve covered the two very broad kinds of lists. Then, we should have more examples specific to the kind of list stated in the problem. Finally, we should have even more examples to illustrate how we think about solving the problem.

7.3.1 my-len: Examples

We have’t precisely defined what it means to be “the length” of a list. We confront this right away when trying to write an example. What is the length of the list empty?

Do Now!

What do you think?

Two common examples are 0 and 1. The latter, 1, certainly looks reasonable. However, if you write the list as [list: ], now it doesn’t look so right: this is clearly (as the name empty also suggests) an empty list, and an empty list has zero elements in it. Therefore, it’s conventional to declare that

my-len(empty) is 0

How about a list like [list: 7]? Well, it’s clearly got one element (7) in it, so

my-len([list: 7]) is 1

Similarly, for a list like [list: 7, 8, 9], we would say

my-len([list: 7, 8, 9]) is 3

Now let’s look at that last example in a different light. Consider the argument [list: 7, 8, 9]. Its first element is 7 and the rest of it is [list: 8, 9]. Well, 7 is a number, not a list; but [list: 8, 9] certainly is a list, so we can ask for its length. What is my-len([list: 8, 9])? It has two elements, so

my-len([list: 8, 9]) is 2

The first element of that list is 8 while its rest is [list: 9]. What is its length? Note that we asked a very similar question before, for the length of the list [list: 7]. But [list: 7] is not a sub-list of [list: 7, 8, 9], which we started with, whereas [list: 9] is. And using the same reasoning as before, we can say

my-len([list: 9]) is 1

The rest of this last list is, of course, the empty list, whose length we have already decided is 0.

Putting together these examples, and writing out empty in its other form, here’s what we get:

my-len([list: 7, 8, 9]) is 3 my-len([list: 8, 9]) is 2 my-len([list: 9]) is 1 my-len([list: ]) is 0

Another way we can write this (paying attention to the right side) is

my-len([list: 7, 8, 9]) is 1 + 2 my-len([list: 8, 9]) is 1 + 1 my-len([list: 9]) is 1 + 0 my-len([list: ]) is 0

From this, maybe you can start to see a pattern. For an empty list, the length is 0. For a non-empty list, it’s the sum of 1 (the first element’s “contribution” to the list’s length) to the length of the rest of the list. That is,

my-len([list: 7, 8, 9]) is 1 + my-len([list: 8, 9]) my-len([list: 8, 9]) is 1 + my-len([list: 9]) my-len([list: 9]) is 1 + my-len([list: ]) my-len([list: ]) is 0

That is, we can use the result of computing my-len on the rest of the list to compute the answer for the entire list.

Double-check all these and make sure you understand the calculations. It’ll prove central to how we write the program later!

7.3.2 my-sum: Examples

A similar logic applies to how we treat a function like my-sum. What do we want the sum of the empty list to be? Well, it may be entirely clear, so let’s move on for a moment. What is the sum of the list [list: 7, 8, 9]? Well, clearly we intend for this to be 24. Let’s see how that works out.

Setting aside the empty list for a moment, here are sums we can agree upon:

my-sum([list: 7, 8, 9]) is 7 + 8 + 9 my-sum([list: 8, 9]) is 8 + 9 my-sum([list: 9]) is 9

which is the same as

my-sum([list: 7, 8, 9]) is 7 + my-sum([list: 8, 9]) my-sum([list: 8, 9]) is 8 + my-sum([list: 9]) my-sum([list: 9]) is 9 + my-sum([list: ])

From this, we can see that the sum of the empty list must be 0:Zero is called the additive identity: a fancy way of saying, adding zero to any number N gives you N. Therefore, it makes sense that it would be the length of the empty list, because the empty list has no items to contribute to a sum. Can you figure out what the multiplicative identity is?

my-sum(empty) is 0

Observe, again, how we can use the result of computing my-sum of the rest of the list to compute its result for the whole list.

7.3.3 From Examples to Code

Given these examples, we can now turn them into code. We introduce the construct cases, which lets us tell apart different kinds of lists, and use it to provide answers for each kind of list.

The grammar for cases for lists is as follows:

cases (List) e: | empty => … | link(f, r) => … f … r … end

where most parts are fixed, but a few you’re free to change:
  • e is an expression whose value needs to be a list; it could be a variable bound to a list, or some complex expression that evaluates to a list.

  • f and r are names given to the first and rest of the list. You can choose any names you like, though in Pyret, it’s conventional to use f and r.

The right-hand side of every => is an expression.

Here’s how cases works in this instance. Pyret first evaluates e. It then checks that the resulting value truly is a list; otherwise it halts with an error. If it is a list, Pyret examines what kind of list it is. If it’s an empty list, it runs the expression after the => in the empty clause. Otherwise, the list is not empty, which means it has a first and rest; Pyret binds f and r to the two parts, respectively, and then evaluates the expression after the => in the link clause.

Exercise

Try using a non-list—e.g., a number—in the e position and see what happens!

Now let’s use cases to define my-len:

fun my-len(l): cases (List) l: | empty => 0 | link(f, r) => 1 + my-len(r) end end

This follows from our examples: when the list is empty my-len produces 0; when it is not empty, we add one to the length of the rest of the list (here, r).

Similarly, let’s define my-sum:

fun my-sum(l): cases (List) l: | empty => 0 | link(f, r) => f + my-sum(r) end end

Notice how similar they are in code, and how readily the structure of the data suggest a structure for the program. This is a pattern you will get very used to soon!

7.4 Structural Problems with List Answers

Now let’s tackle the functions that produce a list as the answer.

7.4.1 my-str-len: Examples and Code

As always, we’ll begin with some examples. Given a list of strings, we want the lengths of each string (in the same order). Thus, here’s a reasonable example:

my-str-len([list: "hi", "there", "mateys"]) is [list: 2, 5, 6]

As we have before, we should consider how the answers for each sub-problem of the above example:

my-str-len([list: "there", "mateys"]) is [list: 5, 6] my-str-len([list: "mateys"]) is [list: 6]

Or, in other words:

my-str-len([list: "hi", "there", "mateys"]) is link(2, [list: 5, 6]) my-str-len([list: "there", "mateys"]) is link(5, [list: 6]) my-str-len([list: "mateys"]) is link(6, [list: ])

which tells us that the response for the empty list should be empty:

my-str-len(empty) is empty

Note that for brevity we’re written the answers of converting each string (2, 5, and 6), each of which we obtain by applying string-length to the first element of the list at each point. Therefore, we can formulate a solution from this:

fun my-str-len(l): cases (List) l: | empty => empty | link(f, r) => link(string-length(f), my-str-len(r)) end end

7.4.2 my-pos-nums: Examples and Code

Do Now!

Construct the sequence of examples that we obtain from the input [list: 1, -2, 3, -4].

Here we go:

my-pos-nums([list: 1, -2, 3, -4]) is [list: 1, 3] my-pos-nums([list: -2, 3, -4]) is [list: 3] my-pos-nums([list: 3, -4]) is [list: 3] my-pos-nums([list: -4]) is [list: ] my-pos-nums([list: ]) is [list: ]

We can write this in the following form:

my-pos-nums([list: 1, -2, 3, -4]) is link(1, [list: 3]) my-pos-nums([list: -2, 3, -4]) is [list: 3] my-pos-nums([list: 3, -4]) is link(3, [list: ]) my-pos-nums([list: -4]) is [list: ] my-pos-nums([list: ]) is [list: ]

or, even more explicitly,

my-pos-nums([list: 1, -2, 3, -4]) is link(1, my-pos-nums([list: -2, 3, -4])) my-pos-nums([list: -2, 3, -4]) is my-pos-nums([list: 3, -4]) my-pos-nums([list: 3, -4]) is link(3, my-pos-nums([list: -4])) my-pos-nums([list: -4]) is my-pos-nums([list: ]) my-pos-nums([list: ]) is [list: ]

That is, when the first element is positive we link it into the result of computing my-pos-nums on the rest of the list; when the first element is negative, the result is just that of computing my-pos-nums on the rest of the list. This yields the following program:

fun my-pos-nums(l): cases (List) l: | empty => empty | link(f, r) => if f > 0: link(f, my-pos-nums(r)) else: my-pos-nums(r) end end end

Do Now!

Is our set of examples comprehensive?

Not really. There are many examples we haven’t considered, such as lists that end with positive numbers and lists with 0.

Exercise

Work through these examples and see how they affect the program!

7.4.3 my-alternating: First Attempt

Once again, we’re going to work from examples.

Do Now!

Work out the results for my-alternating starting from the list [list: 1, 2, 3, 4, 5, 6].

Here’s how they work out:
<alternating-egs-1> ::=

check: my-alternating([list: 1, 2, 3, 4, 5, 6]) is [list: 1, 3, 5] my-alternating([list: 2, 3, 4, 5, 6]) is [list: 2, 4, 6] my-alternating([list: 3, 4, 5, 6]) is [list: 3, 5] my-alternating([list: 4, 5, 6]) is [list: 4, 6] end

Wait, what’s that? The two answers above are each correct, but the second answer does not help us in any way construct the first answer. That means the way we’ve solved these problems until now is not enough, and we have more thinking to do. We’ll return to this later [my-alternating: Examples and Code and also my-alternating: Examples and Code].

7.4.4 my-running-sum: First Attempt

One more time, we’ll begin with an example.

Do Now!

Work out the results for my-running-sum starting from the list [list: 1, 2, 3, 4, 5].

Here’s what our first few examples look like:
<running-sum-egs-1> ::=

check: my-running-sum([list: 1, 2, 3, 4, 5]) is [list: 1, 3, 6, 10, 15] my-running-sum([list: 2, 3, 4, 5]) is [list: 2, 5, 9, 14] my-running-sum([list: 3, 4, 5]) is [list: 3, 7, 12] end

Again, there doesn’t appear to be any clear connection between the result on the rest of the list and the result on the entire list.

(That isn’t strictly true: we can still line up the answers as follows:

my-running-sum([list: 1, 2, 3, 4, 5]) is [list: 1, 3, 6, 10, 15] my-running-sum([list: 2, 3, 4, 5]) is [list: 2, 5, 9, 14] my-running-sum([list: 3, 4, 5]) is [list: 3, 7, 12]

and observe that we’re computing the answer for the rest of the list, then adding the first element to each element in the answer, and linking the first element to the front. In principle, we can compute this solution directly, but for now that may be more work than finding a simpler way to answer it.)

We’ll return to this function later, too [my-running-sum: Examples and Code].

7.5 Structural Problems with Sub-Domains

7.5.1 my-max: Examples

Now let’s find the maximum value of a list. Let’s assume for simplicity that we’re dealing with just lists of numbers. What kinds of lists should we construct? Clearly, we should have empty and non-empty lists…but what else? Is a list like [list: 1, 2, 3] a good example? Well, there’s nothing wrong with it, but we should also consider lists where the maximum at the beginning rather than at the end; the maximum might be in the middle; the maximum might be repeated; the maximum might be negative; and so on. While not comprehensive, here is a small but interesting set of examples:

my-max([list: 1, 2, 3]) is 3 my-max([list: 3, 2, 1]) is 3 my-max([list: 2, 3, 1]) is 3 my-max([list: 2, 3, 1, 3, 2]) is 3 my-max([list: 2, 1, 4, 3, 2]) is 4 my-max([list: -2, -1, -3]) is -1

What about my-max(empty)?

Do Now!

Could we define my-max(empty) to be 0? Returning 0 for the empty list has worked well twice already!

We’ll return to this in a while.

Before we proceed, it’s useful to know that there’s a function called num-max already defined in Pyret, that compares two numbers:

num-max(1, 2) is 2 num-max(-1, -2) is -1

Exercise

Suppose num-max were not already built in. Can you define it? You will find what you learned about Booleans handy. Remember to write some tests!

Now we can look at my-max at work:

my-max([list: 1, 2, 3]) is 3 my-max([list: 2, 3]) is 3 my-max([list: 3]) is 3

Hmm. That didn’t really teach us anything, did it? Maybe, we can’t be sure. And we still don’t know what to do with empty.

Let’s try the second example input:

my-max([list: 3, 2, 1]) is 3 my-max([list: 2, 1]) is 2 my-max([list: 1]) is 1

This is actually telling us something useful as well, but maybe we can’t see it yet. Let’s take on something more ambitious:

my-max([list: 2, 1, 4, 3, 2]) is 4 my-max([list: 1, 4, 3, 2]) is 4 my-max([list: 4, 3, 2]) is 4 my-max([list: 3, 2]) is 3 my-max([list: 2]) is 2

Observe how the maximum of the rest of the list gives us a candidate answer, but comparing it to the first element gives us a definitive one:

my-max([list: 2, 1, 4, 3, 2]) is num-max(2, 4) my-max([list: 1, 4, 3, 2]) is num-max(1, 4) my-max([list: 4, 3, 2]) is num-max(4, 3) my-max([list: 3, 2]) is num-max(3, 2) my-max([list: 2]) is …

The last one is a little awkward: we’d like to write

my-max([list: 2]) is num-max(2, …)

but we don’t really know what the maximum (or minimum, or any other element) of the empty list is, but we can only provide numbers to num-max. Therefore, leaving out that dodgy case, we’re left with

my-max([list: 2, 1, 4, 3, 2]) is num-max(2, my-max([list: 1, 4, 3, 2])) my-max([list: 1, 4, 3, 2]) is num-max(1, my-max([list: 4, 3, 2])) my-max([list: 4, 3, 2]) is num-max(4, my-max([list: 3, 2])) my-max([list: 3, 2]) is num-max(3, my-max([list: 2]))

Our examples have again helped: they’ve revealed how we can use the answer for each rest of the list to compute the answer for the whole list, which in turn is the rest of some other list, and so on. If you go back and look at the other example lists we wrote above, you’ll see the pattern holds there too.

However, it’s time we now confront the empty case. The real problem is that we don’t have a maximum for the empty list: for any number we might provide, there is always a number bigger than it (assuming our computer is large enough) that could have been the answer instead. In short, it’s nonsensical to ask for the maximum (or minimum) of the empty list: the concept of “maximum” is only defined on non-empty lists! That is, when asked for the maximum of an empty list, we should signal an error:

my-max(empty) raises ""

(which is how, in Pyret, we say that it will generate an error; we don’t care about the details of the error, hence the empty string).

7.5.2 my-max: From Examples to Code

Once again, we can codify the examples above, i.e., turn them into a uniform program that works for all instances. However, we now have a twist. If we blindly followed the pattern we’ve used earlier, we would end up with:

fun my-max(l): cases (List) l: | empty => raise("not defined for empty lists") | link(f, r) => num-max(f, my-max(r)) end end

Do Now!

What’s wrong with this?

Consider the list [list: 2]. This turns into

num-max(2, my-max([list: ]))

which of course raises an error. Therefore, this function never works for any list that has one or more elements!

That’s because we need to make sure aren’t trying to compute the maximum of the empty list. Going back to our examples, we see that what we need to do, before calling my-max, is check whether the rest of the list is empty. If it is, we do not want to call my-max at all. That is:

fun my-max(l): cases (List) l: | empty => raise("not defined for empty lists") | link(f, r) => cases (List) r: | empty => … | … end end end

We’ll return to what to do when the rest is not empty in a moment.

If the rest of the list l is empty, our examples above tell us that the maximum is the first element in the list. Therefore, we can fill this on:

fun my-max(l): cases (List) l: | empty => raise("not defined for empty lists") | link(f, r) => cases (List) r: | empty => f | … end end end

Note in particular the absence of a call to my-max. If the list is not empty, however, our examples above tell us that my-max will give us the maximum of the rest of the list, and we just need to compare this answer with the first element (f):

fun my-max(l): cases (List) l: | empty => raise("not defined for empty lists") | link(f, r) => cases (List) r: | empty => f | else => num-max(f, my-max(r)) end end end

And sure enough, this definition does the job!

7.5.3 my-alternating: Examples and Code

Looking back at my-alternating: First Attempt, we can see that every alternate example is one we want. The problem is, to get from one example to the one two below, we have to remove two elements, not just one. That is, we have to pretend our list has elements in pairs, not singles. In terms of examples, this would look as follows:

my-alternating([list: 1, 2, 3, 4, 5, 6]) is [list: 1, 3, 5] my-alternating([list: 3, 4, 5, 6]) is [list: 3, 5] my-alternating([list: 5, 6]) is [list: 5] my-alternating([list: ]) is [list: ]

Now it’s pretty easy to see how to construct a program: keep the first element, skip the second, and repeat. Let’s see how far we can get using the template:

fun my-alternating(l): cases (List) l: | empty => empty | link(f, r) => link(f, … r …) end end

Do Now!

Think about how to complete this definition.

Before we proceed, there is a small problem: our example is not good enough to cover all the cases we’ll encounter. Specifically, to traverse by two we must have two elements, but we might not: the list might have only an odd number of elements. That is, we might instead have

my-alternating([list: 1, 2, 3, 4, 5]) is [list: 1, 3, 5] my-alternating([list: 3, 4, 5]) is [list: 3, 5] my-alternating([list: 5]) is [list: 5]

What this means is: We won’t always terminate with the empty list. We have to be prepared to terminate with a list of one element. This suggests how we can complete the definition:

fun my-alternating(l): cases (List) l: | empty => empty | link(f, r) => cases (List) r: # note: deconstructing r, not l | empty => # the list has an odd number of elements [list: f] | link(fr, rr) => # fr = first of rest, rr = rest of rest link(f, my-alternating(rr)) end end end

In my-alternating: Examples and Code we’ll see another way of approaching this problem.

7.6 More Structural Problems with Scalar Answers

7.6.1 my-avg: Examples

Let’s now try to compute the average of a list of numbers. Let’s start with the example list [list: 1, 2, 3, 4] and work out more examples from it. The average of numbers in this list is clearly (1 + 2 + 3 + 4)/4, or 10/4.

Based on the list’s structure, we see that the rest of the list is [list: 2, 3, 4], and the rest of that is [list: 3, 4], and so on. The resulting averages are:

my-avg([list: 1, 2, 3, 4]) is 10/4 my-avg([list: 2, 3, 4]) is 9/3 my-avg([list: 3, 4]) is 7/2 my-avg([list: 4]) is 4/1

The problem is, it’s simply not clear how we get from the answer for the sub-list to the answer for the whole list. That is, given the following two bits of information:
  • The average of the remainder of the list is 9/3, i.e., 3.

  • The first number in the list is 1.

How do we determine that the average of the whole list must be 10/4? If it’s not clear to you, don’t worry: with just those two pieces of information, it’s impossible!

Here’s a simpler example that explains why. Let’s suppose the first value in a list is 1, and the average of the rest of the list is 2. Here are two very different lists that fit this description:

[list: 1, 2] # the rest has one element with sum 2 [list: 1, 4, 0] # the rest has two elements with sum 4

The average of the entire first list is 3/2, while the average of the entire second list is 5/3, and the two are not the same.

That is, to compute the average of a whole list, it’s not even useful to know the average of the rest of the list. Rather, we need to know the sum and the length of the rest of the list. With these two, we can add the first to the sum, and 1 to the length, and compute the new average.

In principle, we could try to make a average function that returns all this information. Instead, it will be a lot simpler to simply decompose the task into two smaller tasks. After all, we have already seen how to compute the length and how to compute the sum. The average, therefore, can just use these existing functions:

fun my-avg(l): my-sum(l) / my-len(l) end

Do Now!

What should be the average of the empty list? Does the above code produce what you would expect?

Just as we argued earlier about the maximum [Structural Problems with Sub-Domains], the average of the empty list isn’t a well-defined concept. Therefore, it would be appropriate to signal an error. The implementation above does this, but poorly: it reports an error on division. A better programming practice would be to catch this situation and report the error right away, rather than hoping some other function will report the error.

Exercise

Alter my-avg above to signal an error when given the empty list.

Therefore, we see that the process we’ve used—of inferring code from examples—won’t may not always suffice, and we’ll need more sophisticated techniques to solve some problems. However, notice that working from examples helps us quickly identify situations where this approach does and doesn’t work. Furthermore, if you look more closely you’ll notice that the examples above do hint at how to solve the problem: in our very first examples, we wrote answers like 10/4, 9/3, and 7/2, which correspond to the sum of the numbers divided by the length. Thus, writing the answers in this form (as opposed, for instance, to writing the second of those as 3) already reveals a structure for a solution.

7.7 Structural Problems with Accumulators

Now we are ready to tackle the problems we’ve left unfinished. They will require a new technique to solve.

7.7.1 my-running-sum: Examples and Code

Recall how we began in my-running-sum: First Attempt. Our examples [<running-sum-egs-1>] showed the following problem. When we process the rest of the list, we have forgotten everything about what preceded it. That is, when processing the list starting at 2 we forget that we’ve seen a 1 earlier; when starting from 3, we forget that we’ve seen both 1 and 2 earlier; and so on. In other words, we keep forgetting the past. We need some way of avoiding that.

The easiest thing we can do is simply change our function to carry along this “memory”, or what we’ll call an accumulator. That is, imagine we were defining a new function, called my-rs. It will consume a list of numbers and produce a list of numbers, but in addition it will also take the sum of numbers preceding the current list.

Do Now!

What should the initial sum be?

Initially there is no “preceding list”, so we will use the additive identity: 0. The type of my-rs is

my-rs :: Number, List<Number> -> List<Number>

Let’s now re-work our examples from <running-sum-egs-1> as examples of my-rs instead:

my-rs( 0, [list: 1, 2, 3, 4, 5]) is [list: 0 + 1] + my-rs( 0 + 1, [list: 2, 3, 4, 5]) my-rs( 1, [list: 2, 3, 4, 5]) is [list: 1 + 2] + my-rs( 1 + 2, [list: 3, 4, 5]) my-rs( 3, [list: 3, 4, 5]) is [list: 3 + 3] + my-rs( 3 + 3, [list: 4, 5]) my-rs( 6, [list: 4, 5]) is [list: 6 + 4] + my-rs( 6 + 4, [list: 5]) my-rs(10, [list: 5]) is [list: 10 + 5] + my-rs(10 + 5, [list: ]) my-rs(15, [list: ]) is empty

That is, my-rs translates into the following code:

fun my-rs(acc, l): cases (List) l: | empty => empty | link(f, r) => new-sum = acc + f link(new-sum, my-rs(new-sum, r)) end end

All that’s then left is to call it from my-running-sum:

fun my-running-sum(l): my-rs(0, l) end

Observe that we do not change my-running-sum itself to take extra arguments. There are multiple reasons for this. [FILL]

7.7.2 my-alternating: Examples and Code

Recall our effort in my-alternating: First Attempt, which we tackled in my-alternating: Examples and Code. There, we solved the problem by thinking of the list a little differently: we try, as much as possible, to skip two elements of the list at a time, so the first element we see is one we always want to keep as part of the answer. Here we will see another way to think about the same problem.

Return to the examples we’ve already seen [<alternating-egs-1>]. As we’ve already noted [my-alternating: Examples and Code], in effect we want the output from every alternate example. One option was to traverse the list essentially two elements at a time. Another is to traverse it just one element at a time, but keeping track of whether we’re at an odd or even elementi.e., add “memory” to our program. Since we just need to track that one piece of information, we can use a Boolean to do it. Let’s define a new function for this purpose:

my-alt :: List<Any>, Boolean -> List<Any>

The extra argument accumulates whether we’re at an element to keep or one to discard.

We can reuse the existing template for list functions. When we have an element, we have to consult the accumulator whether to keep it or not. If its value is true we link it to the answer; otherwise we ignore it. As we process the rest of the list, however, we have to remember to update the accumulator: if we kept an element we don’t wish to keep the next one, and vice versa.

fun my-alt(l, keep): cases (List) l: | empty => empty | link(f, r) => if keep: link(f, my-alt(r, false)) else: my-alt(r, true) end end

Finally, we have to determine the initial value of the accumulator. In this case, since we want to keep alternating elements starting with the first one, its initial value should be true:

fun my-alternating(l): my-alt(l, true) end

Exercise

Define my-max using an accumulator. What does the accumulator represent? Do you encounter any difficulty?

7.8 Dealing with Multiple Answers

Our discussion above has assumed there is only one answer for a given input. This is often true, but it also depends on how the problem is worded and how we choose to generate examples. We will study this in some detail now.

7.8.1 uniq: Problem Setup

Consider the task of writing uniq:uniq is the name of a Unix utility with similar behavior; hence the spelling of the name. given a list of values, it produces a collection of the same elements while avoiding any duplicates (hence uniq, short for “unique”).

Consider the following input: [list: 1, 2, 1, 3, 1, 2, 4, 1].

Do Now!

What is the sequence of examples this input generates? It’s really important you stop and try to do this by hand. As we will see there are multiple solutions, and it’s useful for you to consider what you generate. Even if you can’t generate a sequence, trying to do so will better prepare you for what you read next.

How did you obtain your example? If you just “thought about it for a moment and wrote something down”, you may or may not have gotten something you can turn into a program. Programs can only proceed systematically; they can’t “think”. So, hopefully you took a well-defined path to computing the answer.

7.8.2 uniq: Examples

It turns out there are several possible answers, because we have (intentionally) left the problem unspecified. Suppose there are two instances of a value in the list; which one do we keep, the first or the second? On the one hand, since the two instances must be equivalent it doesn’t matter, but it does for writing concrete examples and deriving a solution.

For instance, you might have generated this sequence:

examples: uniq([list: 1, 2, 1, 3, 1, 2, 4, 1]) is [list: 3, 2, 4, 1] uniq([list: 2, 1, 3, 1, 2, 4, 1]) is [list: 3, 2, 4, 1] uniq([list: 1, 3, 1, 2, 4, 1]) is [list: 3, 2, 4, 1] uniq([list: 3, 1, 2, 4, 1]) is [list: 3, 2, 4, 1] uniq([list: 1, 2, 4, 1]) is [list: 2, 4, 1] uniq([list: 2, 4, 1]) is [list: 2, 4, 1] uniq([list: 4, 1]) is [list: 4, 1] uniq([list: 1]) is [list: 1] uniq([list: ]) is [list: ] end

However, you might have also generated sequences that began with

uniq([list: 1, 2, 1, 3, 1, 2, 4, 1]) is [list: 1, 2, 3, 4]

or

uniq([list: 1, 2, 1, 3, 1, 2, 4, 1]) is [list: 4, 3, 2, 1]

and so on. Let’s work with the example we’ve worked out above.

7.8.3 uniq: Code

What is the systematic approach that gets us to this answer? When given a non-empty list, we split it into its first element and the rest of the list. Suppose we have the answer to uniq applied to the rest of the list. Now we can ask: is the first element in the rest of the list? If it is, then we can ignore it, since it is certain to be in the uniq of the rest of the list. If, however, it is not in the rest of the list, it’s critical that we link it to the answer.

This translates into the following program. For the empty list, we return the empty list. If the list is non-empty, we check whether the first is in the rest of the list. If it is not, we include it; otherwise we can ignore it for now.

This results in the following program:

fun uniq-rec(l :: List<Any>) -> List<Any>: cases (List) l: | empty => empty | link(f, r) => if r.member(f): uniq-rec(r) else: link(f, uniq-rec(r)) end end end

which we’ve called uniq-rec instead of uniq to differentiate it from other versions of uniq.

Exercise

Note that we’re using .member to check whether an element is a member of the list. Write a function member that consumes an element and a list, and tells us whether the element is a member of the list.

7.8.4 uniq: Reducing Computation

Notice that this function has a repeated expression. Instead of writing it twice, we could call it just once and use the result in both places:

fun uniq-rec2(l :: List<Any>) -> List<Any>: cases (List) l: | empty => empty | link(f, r) => ur = uniq-rec(r) if r.member(f): ur else: link(f, ur) end end end

While it may seem that we have merely avoided repeating an expression, by moving the computation uniq-rec(r) to before the conditional, we have actually changed the program’s behavior in a subtle way. We will discuss this later when we get to [REC tail calls].

You might think, because we replaced two function calls with one, that we’ve reduced the amount of computation the program does. It does not! The two function calls are both in the two branches of the same conditional; therefore, for any given list element, only one or the other call to uniq happens. In fact, in both cases, there was one call to uniq before, and there is one now. So we have reduced the number of calls in the source program, but not the number that take place when the program runs. In that sense, the name of this section was intentionally misleading!

However, there is one useful reduction we can perform, which is enabled by the structure of uniq-rec2. We currently check whether f is a member of r, which is the list of all the remaining elements. In our example, this means that in the very second turn, we check whether 2 is a member of the list [list: 1, 3, 1, 2, 4, 1]. This is a list of six elements, including three copies of 1. We compare 2 against two copies of 1. However, we gain nothing from the second comparison. Put differently, we can think of uniq(r) as a “summary” of the rest of the list that is exactly as good as r itself for checking membership, with the advantage that it might be significantly shorter. This, of course, is exactly what ur represents. Therefore, we can encode this intuition as follows:

fun uniq-rec3(l :: List<Any>) -> List<Any>: cases (List) l: | empty => empty | link(f, r) => ur = uniq-rec(r) if ur.member(f): ur else: link(f, ur) end end end

Note that all that changed is that we check for membership in ur rather than in r.

Exercise

Later [Predicting Growth] we will study how to formally study how long a program takes to run. By the measure introduced in that section, does the change we just made make any difference? Be careful with your answer: it depends on how we count “the length” of the list.

Observe that if the list never contained duplicates in the first place, then it wouldn’t matter which list we check membership in—but if we knew the list didn’t contain duplicates, we wouldn’t be using uniq in the first place! We will return to the issue of lists and duplicate elements in Sets Appeal.

7.8.5 uniq: Example and Code Variations

As we mentioned earlier, there are other example sequences you might have written down. Here’s a very different process:
  • Start with the entire given list and with the empty answer (so far).

  • For each list element, check whether it’s already in the answer so far. If it is, ignore it, otherwise extend the answer with it.

  • When there are no more elements in the list, the answer so far is the answer for the whole list.

Notice that this solution assumes that we will be accumulating the answer as we traverse the list. Therefore, we can’t even write the example with one parameter as we did before. We would argue that a natural solution asks whether we can solve the problem just from the structure of the data using the computation we are already defining, as we did above. If we cannot, then we have to resort to an accumulator. But because we can, the accumulator is unnecessary here and greatly complicated even writing down examples (give it a try!).

7.8.6 uniq: Why Produce a List?

If you go back to the original statement of the uniq problem [uniq: Problem Setup], you’ll notice it said nothing about what order the output should have; in fact, it didn’t even say the output needs to be a list (and hence have an order). In that case, we should think about whether a list even makes sense for this problem. In fact, if we don’t care about order and don’t want duplicates (by definition of uniq), then there is a much simpler solution, which is to produce a set. Pyret already has sets built in, and converting the list to a set automatically takes care of duplicates. This is of course cheating from the perspective of learning how to write uniq, but it is worth remembering that sometimes the right data structure to produce isn’t necessarily the same as the one we were given. Also, later [Sets Appeal], we will see how to build sets for ourselves (at which point, uniq will look familiar, since it is at the heart of set-ness).

7.9 Monomorphic Lists and Polymorphic Types

Earlier we wrote contracts like:

my-len :: List<Any> -> Number my-max :: List<Any> -> Any

These are unsatisfying for several reasons. Consider my-max. The contract suggests that any kind of element can be in the input list, but in fact that isn’t true: the input [list: 1, "two", 3] is not valid, because we can’t compare 1 with "two" or "two" with 3.

Exercise

What happens if we run 1 > "two" or "two" > 3?

Rather, what we mean is a list where all the elements are of the same kind,Technically, elements that are also comparable. and the contract has not captured that. Furthermore, we don’t mean that my-max might return any old type: if we supply it with a list of numbers, we will not get a string as the maximum element! Rather, it will only return the kind of element that is in the provided list.

In short, we mean that all elements of the list are of the same type, but they can be of any type. We call the former monomorphic: “mono” meaning one, and “morphic” meaning shape, i.e., all values have one type. But the function my-max itself can operate over many of these kinds of lists, so we call it polymorphic (“poly” meaning many).

Therefore, we need a better way of writing these contracts. Essentially, we want to say that there is a type variable (as opposed to regular program variable) that represents the type of element in the list. Given that type, my-max will return an element of that type. We write this syntactically as follows:

fun my-max<T>(l :: List<T>) -> T: … end

The notation <T> says that T is a type variable parameter that will be used in the rest of the function (both the header and the body).

Using this notation, we can also revisit my-len. Its header now becomes:

fun my-len<T>(l :: List<T>) -> Number: … end

Note that my-len did not actually “care” that whether all the values were of the same type or not: it never looks at the individual elements, much less at pairs of them. However, as a convention we demand that lists always be monomorphic. This is important because it enables us to process the elements of the list uniformly: if we know how to process elements of type T, then we will know how to process a List<T>. If the list elements can be of truly any old type, we can’t know how to process its elements.