The term control refers to any programming language instruction that causes evaluation to proceed, because it “controls” the program counter of the machine. In that sense, sequential execution of instructions is “control”, as is even an arithmetic expression (and in the presence of state, this control is laid bare through the order in which effects occur); other forms of control found in all ordinary programming languages include function calls and returns. However, in practice we use the term to refer primarily to those operations that cause non-local transfer of control beyond that of mere functions and procedures, usually starting with exceptions. We will study such operations in this chapter.
As we study the following control operators, it’s worth remembering that even without them, we still have languages that are Turing-complete, so these control operations provide no more “power”. Therefore, what control operators do is change and potentially improve the way we express our intent, and therefore enhance the structure of programs. Thus, it pays to being our study by focusing on program structure.
print(read-number("First number") + read-number("Second number"))
Now suppose we want to run this on a Web server. We immediately
encounter a difficulty: the structure of server-side Web programs is
such that they generate a single Web page—
Why do Web servers behave in such a strange way?
There are at least two reasons for this behavior: one perhaps historical, and the other technical. The historical reason is that Web servers were initially designed to serve pages, i.e., static content. Any program that ran had to generate its output to a file, from which a server could offer it. Naturally, developers wondered why that same program couldn’t run on demand. This made Web content dynamic. Terminating the program after generating a single piece of output was the simplest incremental step in transitioning the Web from “pages” to “programs”.
The more important reason—
Conceptually, therefore, the Web protocol was designed to be stateless: it would not store state on the server associated with intermediate computations. Instead, Web program developers would be forced to maintain all necessary state elsewhere, and each request would need to be able to resume the computation in full. In practice the Web has not proven to be stateless, but it still hews in this direction, and studying the structure of such programs is very instructive.
Communicating with a server is not, however, instantaneous: it takes some time; if the server faces a heavy load, it could take a long time; and indeed, it may never complete at all, depending on the state of the network and the server. (These are the same problems faced above by get-number, with a user taking the place of a server: the user may take a long time to enter a number, or may never do so at all.) If the send method suspended program execution, the entire (client-side) application would be blocked, indefinitely. You would not want to use such a program.
This callback needs to embody the rest of the processing of
that request. Thus, for entirely different reasons—
print(<the result from the first interaction> + read-number("Second number"))
It needs to be a syntactically valid program.
It needs to stay suspended until the request comes in.
It needs a way—
such as a parameter— to refer to the value from the first interaction.
lam(v1): print(v1 + read-number("Second number")) end
On the Web, there is an additional wrinkle: each Web page with input elements needs to refer to a program stored on the Web, which will receive the data from the form and process it. This program is named in the action field of a form. Thus, imagine that the server generates a fresh label, stores the above function in a table associated with that label, and refers to the label in the action field. When (and if) the client actually submits the form the server extracts the associated function, supplies it with the form’s values, and thus resumes execution.
Is the solution above stateless?
read-number-suspend("First number", lam(v1): print(v1 + read-number("Second number")) end)
lam(v2): print(v1 + v2) end
read-number-suspend("First number", lam(v1): read-number-suspend("Second number", lam(v2): print(v1 + v2) end) end)
Ascribe types to the above computation. Also determine the type of the Web server and of the table holding these procedures.
We haven’t actually achieved statelessness yet, because we have this large table residing on the server, with no clear means to remove entries from it. It would be better if we could avoid the server state entirely. This means we have to move the relevant state to the client.
There are actually two ways in which the server holds state. One is
that we have reserved the right to create as many entries in the hash
table as we wish. This makes the server storage space proportional to
the number of interactions—
read-number-stateless("First number", prog-1) fun prog-1(v1): read-number-stateless("Second number", prog-2) end fun prog-2(v2): print(v1 + v2) end
The way to fix this problem is, instead of creating a closure after one step, to send v1 to the client to be stored there. Where do we store this? The browser offers two mechanisms for doing this: cookies and hidden fields. Which one do we use?
var cookie = "dummy initial value" read-number-suspend("First number", lam(v1): cookie := v1 read-number-suspend("Second number", lam(v2): print(cookie + v2) end) end)
var cookie = "dummy initial value" read-number-stateless("First number", prog-1) fun prog-1(v1): cookie := v1 read-number-stateless("Second number", prog-2) end fun prog-2(v2): print(cookie + v2) end
Unfortunately, this means every intermediate computation will share the same cookie variable. If we open up two concurrent windows and try to add different first numbers, the latest first number will always reside in cookie, so the other window is going to see unpredictable results.
In contrast, the Web offers another mechanism for storing information
on the client: the hidden field. Because they are local to each
page, and each page corresponds to a closure, they are precisely
analogous to a closure’s environment! Thus, instead of storing the
value of v1 in a single, global cookie, if we were to store it
in a hidden field in the response page, then two different response
pages would have different values in their hidden field, which would
be sent back to the server on the next request—
The style of functions we’ve been writing has a name. Though we’ve
presented ideas in terms of the Web, we’re relying on a much older
idea: the functions are called continuations, and this style of
programs is called continuation-passing style
(CPS).We will take the liberty of using CPS as both a
noun and verb: a particular structure of code and the process that
converts code into it. This is worth studying in its own right,
because it is the basis for studying a variety of other non-trivial
Earlier, we converted programs so that no Web input operation was nested inside another. The motivation was simple: when the program terminates, all nested computations are lost. A similar argument applies, in a more local sense, in the case of XMLHttpRequest: any computation depending on the result of a response from a Web server needs to reside in the callback associated with the request to the server.
In fact, we don’t need to transform every expression. We only care about expressions that involve actual Web interaction. For example, if we computed a more complex mathematical expression than just addition, we wouldn’t need to transform it. If, however, we had a function call, we’d either have to be absolutely certain the function didn’t have any Web invocations either inside it, or in the functions in invokes, or the ones they invoke...or else, to be defensive, we should transform them all. Therefore, we have to transform every expression that we can’t be sure performs no Web interactions.
The heart of our transformation is therefore to turn every function, f, into one with an extra argument. This extra argument is the continuation, which represents the rest of the computation. f, instead of returning a value, instead passes the value it would have returned to its continuation. Thus, the continuation is itself a function of one argument; this argument represents the value that would have been returned by f. A function returns a value to “pass it to the rest of the computation”; CPS makes this explicit, because invoking a continuation (in place of returning a value) precisely passes it to the function representing the rest of the computation.
CPS is a general transformation, which we can apply to any program. Because it’s a program transformation, we can think of it as a special kind of desugaring that transforms programs within the same language: from the full language to a more restricted version that obeys the pattern we’ve been discussing. As a result, we can reuse an evaluator for the full language to also evaluate programs in the CPS subset.
Let us therefore implement CPS as a source-to-source transformation. Thought of as a function, it consumes and returns ExprC expressions, but the output expressions will have the peculiar structure we have seen above, and will therefore be a strict subset of all ExprC expressions.
Put differently, the comment about “strict subset” above means that certain ExprC expressions are not legal in the output that CPS generates. Provide examples.
fun cps(e :: ExprC) -> ExprC:
cases (ExprC) e:
Our representation in CPS will be to turn every expression
into a procedure of one argument, the continuation. The converted
expression will eventually either supply a value to the continuation
or will pass the continuation on to some other expression that
| numC(_) => fdC("k", appC(idC("k"), e))
| idC(_) => fdC("k", appC(idC("k"), e))
Extend the language to handle conditionals.
Extend the language to support mutable state as well. Does this have any impact on the CPS process, i.e., does it change the pattern of conversion?
| plusC(l, r) =>
appC(idC("k"), plusC(idC("l-v"), idC("r-v"))))))))
Finally, we have function definition and application.
It’s tempting to think that, because function are just values, they too can be passed unchanged to the continuation. Why is this not true?
Before proceeding, alter the underlying language to also permit two-argument function definitions and, correspondingly, applications. Name the definitions fd2C and the applications app2C.
| appC(f, a) =>
appC(idC("k"), appC(idC("f-v"), idC("a-v"))))))))
Do you see why this is wrong?
| appC(f, a) =>
app2C(idC("f-v"), idC("a-v"), idC("k")))))))
A function is itself a value, so it should be returned to the pending computation. The application case above, however, shows that we have to transform functions to take an extra argument, namely the continuation at the point of invocation. This leaves us with a quandary: which continuation do we supply to the body?
| fdC(v, b) =>
That is, in place of ???, which continuation do we supply: k or dyn-k?
Which continuation should we supply?
The former is the continuation at the point of closure creation. The latter is the continuation at the point of closure invocation. In other words, the former is “static” and the latter is “dynamic”. In this case, we need to use the dynamic continuation, otherwise something very strange would happen: the program would return to the point where the closure was created, rather than where it is being used! This would result in seemingly very strange program behavior, so we wish to avoid it. Observe that we are consciously choosing the dynamic continuation just as, where scope was concerned we chose the static environment but where state was concerned we chose the “dynamic” (namely, most recent) store. Thus continuations are more like state than they are like lexical binding, a similarity we will return to later [REF].
| fdC(v, b) =>
After you have understood this material, replace "dyn-k" with "k", predict what should change, and check that it does.
fun icps(e): id-cps = fdC("v", idC("v")) interp(appC(cps(e), id-cps), mt-env) end
icps(plusC(numC(5), appC(quad, numC(3)))) is numV(17) icps(multC(appC(c5, numC(3)), numC(4))) is numV(20) icps(plusC(numC(10), appC(c5, numC(10)))) is numV(15)
fdC("k", appC(fdC("k", appC(idC("k"), numC(1))), fdC("l-v", appC(fdC("k", appC(idC("k"), numC(2))), fdC("r-v", appC(idC("k"), plusC(idC("l-v"), idC("r-v"))))))))
f1 = lam(k): (lam(shadow k): k(1) end)(lam(l-v): (lam(shadow k): k(2) end)(lam(r-v): k(l-v + r-v) end) end) end
check: f1(lam(x): x end) is 3 end
f2 = lam(k): (lam(k1): k1(1) end)(lam(l-v): (lam(k2): k2(2) end)(lam(r-v): k(l-v + r-v) end) end) end check: f2(lam(x): x end) is 3 end
(lam(k1): k1(1) end)(...)
There is an active line of research in creating better CPS transformations that produce fewer intermediate function terms; we’ve actually used one of the very oldest and least sophisticated. The trade-off is in simplicity of desugaring versus simplicity of output, with the two roughly inversely correlated.
At this point we have identified a problem in program structure; we hypothesized a better API for it; we transformed an example to use such an API; and then we generalized that transformation. But now we have a program structure so complex that it is unclear what use it could possibly be. The point of this transformation was so that every sub-expression would have an associated continuation, which a interaction-friendly primitive can use. Let’s see how to do that.
| read-numC(p :: ExprC) | read-num-webC(p :: ExprC, k :: ExprC)
We will assume that cps does not need to handle read-num-webC (because the end-user is not expected to write this directly), while interp does not need to handle read-numC (because we want this interpreter to function even in a setting that periodically terminates input, so it cannot block waiting for a response).
| read-numC(p) => fdC("k", appC(cps(p), fdC("p-v", read-num-webC(idC("p-v"), idC("k")))))
Now let us build an implementation of read-num-webC in the interpreter that properly simulates a program that halts.
var web-continuation = "nothing here yet"
| read-num-webC(p, k) => prompt = num-to-string(interp(p, nv).n) cont = interp(k, nv) print('Web interaction: ' + prompt) web-continuation := cont raise('Program halted waiting for user input')
Introduce an error in cps and show how halting the program highlights it, while not doing so silently masks it.
"Program halted waiting for user input"
At this point, web-continuation contains a genuine, run-time closure (a closV value). This represents a continuation: a program value representing the rest of the computation.Due to a bug in the current implementation, you can’t inspect the value of web-continuation directly; but you can access it from a function that closes over it. The user now supplies an input in the imagined Web form; this is provided as the actual argument to the continuation.
fun run-wc(n): wc = web-continuation interp(appC(wc.f, numC(n)), wc.e) end
Here, then, is the key lesson. By transforming the program into CPS
we were able to write a normal-looking
Modify the program to store each previous continuations with some kind of unique tag. Now that you have access to multiple continuations, simulate the effect of different browser actions such as reloading the page (re-invoking a continuation), going back (using a prior continuation), cloning a page (re-using a continuation), etc. Does your implementation still work?
Now that we’ve seen how CPS can be implemented through desugaring, we should ask whether it can be put in the core instead.
Recall that we’ve said that CPS applies to all programs. We have one program we are especially interested in: the interpreter. Sure enough, we can apply the CPS transformation to it, making available what are effectively the same continuations.
fun interp(e :: ExprC, nv :: List<Binding>, k):
cases (ExprC) e:
Note that we have not annotated k, and we’ve dropped the return annotation on interp. Fill them in.
| numC(n) =>
| idC(s) =>
| plusC(l, r) =>
| fdC(_, _) =>
| fd2C(_, _, _) =>
| appC(f, a) =>
xtnd-env(bind(clos-v.f.arg, arg-v), clos-v.e),
| app2C(f, a1, a2) =>
By converting the interpreter to CPS we have given it access to an extra parameter: k, the continuation of the interpreter. Because the interpreter’s execution mimics the intended behavior of the interpreted program, the continuation of the interpreter reflects the rest of the behavior of the interpreted program: i.e., applying interp to an expression e with continuation k will result in k being given the value of e. We can therefore put k to work by exposing it to programs being interpreted.
| read-numC(p) => interp(p, nv, lam(p-v): prompt = num-to-string(p-v.n) print('Web interaction: ' + prompt) web-continuation := k raise('Program halted waiting for user input') end)
fun run-wc(n): web-continuation(numC(n)) end
"Program halted waiting for user input"
When using CPS, the hard work was actually done in the program transformation. The interpreter as a whole was essentially unchanged from before; indeed, the main addition to the interpreter was effectively debugging support in the form of halting its execution, so we could make sure the continuation strategy was correct. Here, the transformation is of the interpreter itself, done one time, and the interpreter works to generate the continuations.
In particular, the continuation now closes over the rest of the behavior, not of the interpreted program but the interpreting one. Because the latter’s job, however, is to precisely mimic that of the former, we cannot observe this difference.
Many programming languages now have a notion of generators. A generator is like a procedure, in that one can invoke it in an application. Whereas a regular procedure always begins execution at the beginning, a generator resumes from where it last left off. Of course, that means a generator needs a notion of “exiting before it’s done”. This is known as yielding, namely returning control to whatever called it.
In some languages a generator is an object that is instantiated like any other object, and its execution is resumed by invoking a method (such as next in Python). In others it is just like a procedure, and indeed it is re-entered by applying it like a function.In languages where values in addition to regular procedures can be used in an application, all such values are collectively called applicables.
In some languages the yielding operation—
such as Python’s yield— is available only inside the syntactic body of the generator. In others, such as Racket, yield is an applicable value bound in the body, but by virtue of being a value, it can be passed to abstractions, stored in data structures, and so on.
(generator (yield) (from)
(rec (f (lambda (n)
(f (+ n 1)))))
(generator (y) (from)
(rec (f (lambda (n)
(f (+ n 1)))))
(generator (y) (from)
(rec (f (lam (n)
((yield-helper y) n)
(f (+ n 1)))))
Is yield a statement or expression? In many languages it is actually an expression, meaning it has a value: the one supplied when resuming the generator. This makes the generator more flexible because the user of a generator can use the parameter(s) to alter the generator’s behavior, rather than being forced to use state to communicate desired changes.
What happens at the end of the generator’s execution? In many languages, a generator raises an exception to signal its completion.
remember where in its execution it currently is, and
know where in its caller it should return to.
remember where in its execution its caller currently is, and
know where in its body it should return to.
As you might guess, these “where”s correspond to continuations.
Add generators to a CPS interpreter.
How do generators differ from coroutines and threads? Implement coroutines and threads using a similar strategy.
We have seen that Python’s generators do not permit any abstraction over yielding, whereas Racket’s do. Assuming this was intentional, why might Python have made such a design decision?
It’s a record of what remains to be done in the computation. So is the continuation.
It’s traditionally thought of as a list of stack frames. That is, each frame has a reference to the frames remaining after it finishes. Similarly, each continuation is a small procedure that refers to—
and hence closes over— its own continuation. If we had chosen a different representation for program instructions, combining this with the data structure representation of closures, we would obtain a continuation representation that is essentially the same as the machine stack.
Each stack frame also stores procedure parameters. This is implicitly managed by the procedural representation of continuations, whereas this was done explicitly in the data stucture representation (using bind).
Each frame also has space for “local variables”. In principle so does the continuation, though by desugaring local binding, we’ve effectively reduced everything to procedure parameters. Conceptually, however, some of these are “true” procedure parameters while others are local bindings turned into procedure parameters by desugaring.
The stack has references to, but does not close over, the heap. Thus changes to the heap are visible across stack frames. In precisely the same way, closures refer to, but do not close over, the store, so changes to the store are visible across closures.
Let’s use k to refer to the stack present before the function application begins to evaluate.
When we begin to evaluate the function position (f), create a new stack frame (fdC("f-v"): ...;. This frame has one free identifier: k. Thus its closure needs to record one element of the environment, namely the rest of the stack.
The code portion of the stack frame represents what is left to be done once we obtain a value for the function: evaluate the argument, and perform the application, and return the result to the stack expecting the result of the application: k.
When evaluation of f completes, we begin to evaluate a, which also creates a stack frame: fdC("a-v"): ...;. This frame has two free identifiers: k and f-v. This tells us:
We no longer need the stack frame for evaluating the function position, but
we now need a temporary that records the value—
hopefully a function value— of evaluating the function position.
The code portion of this second frame also represents what is left to be done: invoke the function value with the argument, in the stack expecting the value of the application.
Similarly, examining the CPS conversion of conditionals would tell us that we have to create a new frame to evaluate the conditional expression we have to create a new stack frame. This frame closes over the stack expecting the value of the entire conditional. This frame makes a decision based on the value of the conditional expression, and invokes one of the other expressions. Once we have examined this value the frame created to evaluate the conditional expression is no longer necessary, so evaluation can proceed in the original continuation.
Viewed through this lens, we can more easily provide an operational explanation for generators. Each generator has its own private stack, and when execution attempts to return past its end, our implementation raises an error. On invocation, a generator stores a reference to the stack of the “rest of the program”, and resumes its own stack. On yielding, the system swaps references to stacks. Coroutines, threads, and generators are all conceptually similar: they are all mechanisms to create “many little stacks” instead of having a single, global stack.
Observe that the stack patterns above add a frame to the current stack, perform some evaluation, and eventually always return to the current stack. In particular, observe that in an application, we need stack space to evaluate the function position and then the arguments, but once all these are evaluated, we resume computation using the stack we started out with before the application. In other words, function calls do not themselves need to consume stack space: we only need space to compute the arguments.
However, not all languages observe or respect this property. In languages that do, programmers can use recursion to obtain iterative behavior: i.e., a sequence of function calls can consume no more stack space than no function calls at all. This removes the need to create special looping constructs; indeed, loops can simply be expressed as a syntactic sugar.
Of course, this property does not apply in general. If a call to
f is performed to compute an argument to a call to g,
the call to f is still consuming space relative to the context
surrounding g. Thus, we should really speak of a relationship
between expressions: one expression is in tail position relative
to another if its evaluation requires no additional stack space beyond
the other. In our CPS desugaring, every expression that uses k as
Some languages have special support for tail recursion: when a procedure calls itself in tail position relative to its body. This is obviously useful, because it enables recursion to efficiently implement loops. However, it hurts “loops” that cannot be squeezed into a single recursive function. For instance, when implementing a scanner or other state machine, it is most convenient to have a set of functions each representing one state, and transitioning to other states by making (tail) function calls. It is onerous (and misses the point) to turn these into a single recursive function. If, however, a language recognizes tail calls as such, it can optimize these cross-function calls just as much as it does intra-function ones.
Scheme and Racket, in particular, promise to implement tail calls without allocating additional stack space. Though some people refer to this as “tail call optimization”, this term is misleading: an optimization is optional, whereas whether or not a language promises to properly implement tail calls is a semantic feature. Developers need to know how the language will behave because it affects how they program: they need to know how to structure their loops!
Because of this feature, observe something interesting about the program after CPS transformation: all of its function applications are themselves tail calls! Assuming the program might terminate at any call is tantamount to not using any stack space at all (because the stack would get wiped out).
Any program that consumes some amount of stack, when converted to CPS and run, suddenly consumes no stack space at all. Why?
As a corollary, does conversion to CPS reduce the overall memory footprint of the program?
Java’s native security model employs a mechanism called stack inspection (look it up if you aren’t familiar with it). What is the interaction between CPS and stack inspection? That is, if we were to CPS a program, would this affect its security behavior?
If not, why not?
If so, how, and what would you suggest doing to recover security assuming the CPS conversion was necessary?