17 Safety and Soundness
Now that we’ve had a first look at a type system,A type system usually has three components: a language of types, a set of type rules, and an algorithm that enforces these rules. By presenting types via a checking function we have blurred the distinction between the second and third of these, but they should still be thought of as intellectually distinct: the former provides a declarative description while the latter an executable one. The distinction becomes relevant in (part "impl-subtyp"). we’re ready to talk about the way in which types offer some notion of predictability, a notion called type soundness. Intertwined with this are terms you have probably heard like safety (or type safety) as well as others such as strong typing (and conversely weak typing) and memory safety. We should understand all of them.
17.1 Safety
Many operations in a language are partial: given some domain
over which they are defined, they accept some but not all elements of
the domain. For instance, while addition—
halt with an error,
return a special value called “not-a-number” (NaN), or
return an imaginary number (e.g., 0+1i in Scheme).
What matters, then, is whether an operation precludes any values at all or not. If it does, then we can ask whether the language prevents it from being used with any precluded values. If the language does prevent it, then we call the language safe. If it does not, we call it unsafe. Of course, a language may be safe for some operations and unsafe for others; usually we apply the term “safe” to a language as a whole if all of its operations are safe.
A safe language offers a very important guarantee to programmers: that no operation will be performed on meaningless data. Sticking with numeric addition, in an unsafe language we might be permitted to add a number to a string, which will produce some value dependent on the precise representation of strings and might change if the representation of strings changes. (For instance, the string might be zero-terminated or might record its length, which alters what the first word of the string will be.) We might even be able to add a string to a function, a result that is certainly nonsensical (what meaningful number does the first word of the machine representation of a function represent?). Therefore, though safety does not at all ensure that computations will be correct, at least it ensures they will be meaningful: no nonsensical operations will have been performed.
Observe that we have not imposed any requirement whatsoever on how a language achieves safety. Usually, safety can be achieved dynamically through run-time checks: for instance, addition would check that its arguments are numeric, division would also ensure its second argument is not zero, and so on. In addition, a static type system can also ensure safety. Because this is a common source of confusion, we should be clear: safety only means that operations are not applied to meaningless values. It does not fix any particular implementation strategy for ensuring the property.
We will return to the issue of implementations below (Types, Time, and Space). But first, we have some important foundational material to address.
17.2 “Untyped” Languages
A language with no types at all. Of course all data have some representation that gives them meaning, but it means there is only one type in the language, and all data belong to that type. Furthermore, this datatype has no variants, because that would introduce type-based discrimination. For instance, all data might be a byte, or a number, or a string, or some other single, distinctive value. Typically, no operation can fail to take a particular kind of value, because that might imply some kind of type distinction, which by definition can’t exist. Note that this is a semantic notion of untypedness.
A language with a partitioning of its run-time values—
e.g., numbers are distinct from functions— but without static annotations or checking. Note that this is a syntactic notion of untypedness.
Because the two meanings are mutually contradictory, it would be useful to have two different names for these. Some people use the terms latently typed or dynamically typed for the latter category, to tell these apart.
Following modern convention, we will use the latter term, while recognizing that some others consider the term typed to only apply to languages that have static disciplines, so the phrase “dynamically typed” is regarded as an oxymoron. Note that our preceding discussion gives us a way out of this linguistic mess. A dynamically typed language that does not check types at run-time is not very interesting (the “types” may as well not exist). In contrast, one that does check at run-time already has a perfectly good name: safe (Safety). Therefore, it makes more sense to use the name dynamically safe for a language where all safety-checks are performed at run-time, and (with a little loss of precision) statically safe for one where as many safety-checks as possible are performed statically, with only the undecidable ones relegated to run-time.
17.3 The Central Theorem: Type Soundness
We have seen earlier (Program Termination) that certain
type languages can offer very strong theorems about their programs:
for instance, that all programs in the language terminate. In
general, of course, we cannot obtain such a guarantee (indeed, we
added general recursion precisely to let ourselves write unbounded
loops). However, a meaningful type system—
What theorem might we want of a type system? Remember that the type checker runs over the static program, before execution. In doing so, it is essentially making a prediction about the program’s behavior: for instance, when it states that a particular complex term has type Number, it is predicting that when run, that term will produce a numeric value. How do we know this prediction is sound, i.e., that the type checker never lies? Every type system should be accompanied by a theorem that proves this.
The type checker sees only program text, whereas the interpreter runs over actual data.
The type environment binds identifiers to types, whereas the interpreter’s environment binds identifiers to values or locations.
The type checker compresses (even infinite) sets of values into types, whereas the interpreter treats the elements of these sets distinctly.
The type checker always terminates, whereas the interpreter might not.
The type checker passes over the body of each expression only once, whereas the interpreter might pass over each body anywhere from zero to infinite times.
The central result we wish to have for a given type-system is called soundness. It says this. Suppose we are given an expression (or program) e. We type-check it and conclude that its type is t. When we run e, let us say we obtain the value v. Then v will also have type t.
The standard way of proving this theorem is to divide it in two parts, known as progress and preservation. Progress says that if a term passes the type-checker, it will be able to make a step of evaluation (unless it is already a value); preservation says that the result of this step will have the same type as the original. If we interleave these steps (first progress, then preservation; rinse and repeat), we can conclude that the final answer will indeed have the same type as the original, so the type system is indeed sound.
For instance, consider this expression: 5 + (2 * 3). It has
the type Number. In a sound type system, progress offers a
proof that, because this term types, and is not already a value, it
can take a step of execution—
The program may not produce an answer at all; it might loop forever. In this case, the theorem strictly speaking does not apply. However, we can still observe that every intermediate representation of the program has the same type as the whole expression, so the program is computing meaningfully even if it isn’t producing a value.
Any rich enough language has properties that cannot be decided statically (and others that perhaps could be, but the language designer chose to put off until run-time to reduce the burden on the programmer to make programs pass the type-checker). When one of these properties fails—
e.g., the array index being within bounds— there is no meaningful type for the program. Thus, implicit in every type soundness theorem is some set of published, permitted exceptions or error conditions that may occur. The developer who uses a type system implicitly signs on to accepting this set.
The latter caveat looks like a cop-out. However, it is actually a
strongly positive statement, in that says any exception not in this
set will provably not be raised. Of course, in languages
designed with static types in the first place, it is not clear (except
by loose analogy) what these exceptions might be, because there would
be no need to define them. But when we retrofit a type system onto an
existing programming language—
17.4 Types, Time, and Space
data Tree: | base | node(v :: Number, l :: Tree, r :: Tree) end
fun size(t :: Tree) -> Number: cases (Tree) t: | base => 0 | node(_, l, r) => 1 + size(l) + size(r) end end
Assume instead we are in a typed language. The type-checker will have ensured that there no non-Tree value could have been substituted for a Tree-typed identifier. Therefore, there is no need for the type tag at all.Type tags would, however, still be needed by the garbage collector, though other representations such as BIBOP [REF] can greatly reduce their space impact. However, the variant tags are still needed, and will be used to dispatch between the branches. In the example, only one bit is needed to tell apart base and node values. This same bit position can be reused to tell apart variants in some other type without causing any confusion, because the type checker is responsible for keeping the types from mixing.
In other words, if there are two different datatypes that each have two variants, in the dynamically-typed world all these four variants require distinct representations. In contrast, in the typed world their representations can overlap across types, because the static type system will ensure one type’s variants are never confused for that the another. Thus, types have a genuine space (saving representation) and time (eliminating run-time checks) performance benefit for programs.
It is conventional in computer science to have a ☛ space-time tradeoff. Instead, here we have a situation where we improve both space and time. This seems almost paradoxical! How is this possible?
This dual benefit comes at some cost to the developer, who must convince the static type system that their program does not induce type errors; due to the limitations of decidability, even programs that might have run without error might run afoul of the type system. Nevertheless, for programs for which this can be done, types provide a notable saving.
17.5 Types Versus Safety
Whether or not a language is typed, i.e., has static type checks.
Whether or not a language’s run-time system is safe, i.e., performs residual checks not done by a static system—
of which there might not even be one).
| Safe |
| Unsafe | |
Typed |
| ML, Java |
| C, C++ |
Not Typed |
| Python, Racket |
| machine code |
The entry for machine code is a little questionable because the language isn’t even typed, so there’s no classification to check statically. Similarly, there is arguably nothing to check for in the run-time system, so it must best be described as “not even unsafe”. However, in practice we do end up with genuine problems, such as security vulnerabilities that arise from being able to jump and execute from arbitrary locations that hold data.
That leaves the truly insidious corner, which languages like C and C++ inhabit. Here, the static type system gives the impression that values are actually segregated by type and checked for membership. And indeed they are, in the static world. However, once a programmer passes the type-checker there are no run-time checks. To compound the problem, the language offers primitives like arbitrary pointer arithmetic, making it possible to interpret data of one kind as data of another. As a result, we should have a special place of shame for languages that actively mislead programmers.
Construct examples of C or C++ interpreting data of one kind as data of another kind.
Historically, people have sometimes used the phrase strong typing to reflect the kind of type-checking that ML and Java use, and weak typing for the other kinds. However, these phrases are at best poorly defined.
If you have ever used the phrases “strong typing” or “weak typing”, define them.
That’s what I thought. But thank you for playing.
Indeed, the phrases are not only poorly defined, they are also wrong, because the problem is not with the “strength” of the type checker but rather with the nature of the run-time system that backs them. The phrases are even more wrong because they fail to account for whether or not a theorem backs the type system.
It is therefore better to express our intent by sticking to these concepts: safety, typedness, and soundness. Indeed, we should think of this as a continuum. With rare exceptions, we want a language that is safe. Often, we want a language that is also typed. If it is typed, we would like it to be sound, so that we know that the types are not lying. In all these cases, “strong” and “weak” typing do not have any useful meaning.