15 Examples, Testing, and Program Checking

← prev up next →

15 Examples, Testing, and Program Checking

15.1 From Examples to Tests

15.2 More Refined Comparisons

15.3 When Tests Fail

15.4 Oracles for Testing

15.5 Testing Erroneous Programs

When we think through a problem, it is often useful to write down examples of what we are trying to do. For example (see what we did there?), if we’re asked to compute the [FILL]

There are, of course, many ways to write down examples. We could write them on a board, on paper, or even as comments in a computer document. These are all reasonable and indeed, often, the best way to begin working on a problem. However, if we can write our examples in a precise form that a computer can understand, we achieve two things:

When we’re done writing our purported solution, we can have the computer check whether we got it right.
In the process of writing down our expectation, we often find it hard to express with the precision that a computer expects. Sometimes this is because we’re still formulating the details and haven’t yet pinned them down, but at other times it’s because we don’t yet understand the problem. In such situations, the force of precision actually does us good, because it helps us understand the weakness of our understanding.

15.1 From Examples to Tests

failure of tests can be due to

- the program being wrong - the example itself being wrong

when we find a bug, we

- find an example that captures the bug - add it to the program’s test suite

so that if we make the same mistake again, we will catch it right away

15.2 More Refined Comparisons

Sometimes, a direct comparison via is isn’t enough for testing. We saw raises in the last section for testing errors. However, when doing some computations, especially involving math with approximations, we want to ask a different question. For example, consider these tests for distance-to-origin:

check:
  distance-to-origin(point(1, 1)) is ???
end

What can we check here? Typing this into the REPL, we can find that the answer prints as 1.4142135623730951. That’s an approximation of the real answer, which Pyret cannot represent exactly. But it’s hard to know that this precise answer, to this decimal place, and no more, is the one we should expect up front, and thinking through the answers is supposed to be the first thing we do!

Since we know we’re getting an approximation, we can really only check that the answer is roughly correct, not exactly correct. If we can check that the answer to distance-to-origin(point(1, 1)) is around, say, 1.41, and can do the same for some similar cases, that’s probably good enough for many applications, and for our purposes here. If we were calculating orbital dynamics, we might demand higher precision, but note that we’d still need to pick a cutoff! Testing for inexact results is a necessary task.

Let’s first define what we mean by “around” with one of the most precise ways we can, a function:

fun around(actual :: Number, expected :: Number) -> Boolean:
  doc: "Return whether actual is within 0.01 of expected"
  num-abs(actual - expected) < 0.01
where:
  around(5, 5.01) is true
  around(5.01, 5) is true
  around(5.02, 5) is false
  around(num-sqrt(2), 1.41) is true
end

The is form now helps us out. There is special syntax for supplying a user-defined function to use to compare the two values, instead of just checking if they are equal:

check:
  5 is%(around) 5.01
  num-sqrt(2) is%(around) 1.41
  distance-to-origin(point(1, 1)) is%(around) 1.41
end

Adding %(something) after is changes the behavior of is. Normally, it would compare the left and right values for equality. If something is provided with %, however, it instead passes the left and right values to the provided function (in this example around). If the provided function produces true, the test passes, if it produces false, the test fails. This gives us the control we need to test functions with predictable approximate results.

Exercise
Extend the definition of distance-to-origin to include polar points.

Exercise
This might save you a Google search: polar conversions. Use the design recipe to write x-component and y-component, which return the x and y Cartesian parts of the point (which you would need, for example, if you were plotting them on a graph). Read about num-sin and other functions you’ll need at the Pyret number documentation.

Exercise
Write a data definition called Pay for pay types that includes both hourly employees, whose pay type includes an hourly rate, and salaried employees, whose pay type includes a total salary for the year. Use the design recipe to write a function called expected-weekly-wages that takes a Pay, and returns the expected weekly salary: the expected weekly salary for an hourly employee assumes they work 40 hours, and the expected weekly salary for a salaried employee is 1/52 of their salary.

15.3 When Tests Fail

Suppose we’ve written the function sqrt, which computes the square root of a given number. We’ve written some tests for this function. We run the program, and find that a test fails. There are two obvious reasons why this can happen.

Do Now!
What are the two obvious reasons?

The two reasons are, of course, the two “sides” of the test: the problem could be with the values we’ve written or with the function we’ve written. For instance, if we’ve written

sqrt(4) is 1.75

then the fault clearly lies with the values (because \(1.75^2\) is clearly not \(4\)). On the other hand, if it fails the test

sqrt(4) is 2

then the odds are that we’ve made an error in the definition of sqrt instead, and that’s what we need to fix.

Note that there is no way for the computer to tell what went wrong. When it reports a test failure, all it’s saying is that there is an inconsistency between the program and the tests. The computer is not passing judgment on which one is “correct”, because it can’t do that. That is a matter for human judgment.For this reason, we’ve been doing research on peer review of tests, so students can help one another review their tests before they begin writing programs.

Actually...not so fast. There’s one more possibility we didn’t consider: the third, not-so-obvious reason why a test might fail. Return to this test:

sqrt(4) is 2

Clearly the inputs and outputs are correct, but it could be that the definition of sqrt is also correct, and yet the test fails.

Do Now!
Do you see why?

Depending on how we’ve programmed sqrt, it might return the root -2 instead of 2. Now -2 is a perfectly good answer, too. That is, neither the function nor the particular set of test values we specified is inherently wrong; it’s just that the function happens to be a relation, i.e., it maps one input to multiple outputs (that is, \(\sqrt{4} = \pm 2\)). The question now is how to write the test properly.

15.4 Oracles for Testing

In other words, sometimes what we want to express is not a concrete input-output pair, but rather check that the output has the right relationship to the input. Concretely, what might this be in the case of sqrt? We hinted at this earlier when we said that 1.75 clearly can’t be right, because squaring it does not yield 4. That gives us the general insight: that a number is a valid root (note the use of “a” instead of “the”) if squaring it yields the original number. That is, we might write a function like this:

fun is-sqrt(n):
  n-root = sqrt(n)
  n == (n-root * n-root)
end

and then our test looks like

check:
  is-sqrt(4) is true
end

Unfortunately, this has an awkward failure case. If sqrt does not produce a number that is in fact a root, we aren’t told what the actual value is; instead, is-sqrt returns false, and the test failure just says that false (what is-sqrt returns) is not true (what the test expects)—which is both absolutely true and utterly useless.

Fortunately, Pyret has a better way of expressing the same check. Instead of is, we can write satisfies, and then the value on the left must satisfy the predicate on the right. Concretely, this looks like:

fun check-sqrt(n):
  lam(n-root):
    n == (n-root * n-root)
  end
end

which lets us write:

check:
  sqrt(4) satisfies check-sqrt(4)
end

Now, if there’s a failure, we learn of the actual value produced by sqrt(4) that failed to satisfy the predicate.

Consider the following problem: given a word (such as a name), we would like to spell it using the symbols of atoms (ignoring upper- and lower-case). This function, call it elemental, consumes a string and produces a list of strings such that

each string in the output is an atomic symbol, and
the concatenation of the strings in the output yields the input.

For instance, consider my name; it can be spelled as [list: "S", "H", "Ri", "Ra", "M"] (for [FILL], respectively). Thus we would write:

check:
  elemental("Shriram") is [list: "S", "H", "Ri", "Ra", "M"]
end

Now consider another example: [FILL]. We can clearly see that this breaks down as

check:
  elemental("...") is [list: ...]
end

15.5 Testing Erroneous Programs

- use RAISES to check erroneous code

← prev up next →

1	Introduction
2	Acknowledgments
3	Getting Started
4	Naming Values
5	From Repeated Expressions to Functions
6	Conditionals and Booleans
7	Introduction to Tabular Data
8	Processing Tables
9	From Tables to Lists
10	Processing Lists
11	Introduction to Structured Data
12	Collections of Structured Data
13	Recursive Data
14	Interactive Games as Reactive Systems
15	Examples, Testing, and Program Checking
16	Functions as Data
17	Predicting Growth
18	Sets Appeal
19	Halloween Analysis
20	Sharing and Equality
21	Graphs
22	State, Change, and More Equality
23	Algorithms That Exploit State
24	Processing Programs: Parsing
25	Processing Programs: A First Look at Interpretation
26	Interpreting Conditionals
27	Interpreting Functions
28	Reasoning about Programs: A First Look at Types
29	Safety and Soundness
30	Parametric Polymorphism
31	Type Inference
32	Mutation: Structures and Variables
33	Objects: Interpretation and Types
34	Control Operations
35	Pyret for Racketeers and Schemers
36	Glossary

15.1	From Examples to Tests
15.2	More Refined Comparisons
15.3	When Tests Fail
15.4	Oracles for Testing
15.5	Testing Erroneous Programs