On this page:
19.1 Control on the Web
19.1.1 Program Decomposition into Now and Later
19.1.2 A Partial Solution
19.1.3 Achieving Statelessness
19.1.4 Interaction with State
19.2 Continuation-Passing Style
19.2.1 Implementation by Desugaring
19.2.2 Implementation in the Core
19.3 Generators
19.4 Continuations and Stacks
19.5 Tail Calls

19 Control Operations

    19.1 Control on the Web

      19.1.1 Program Decomposition into Now and Later

      19.1.2 A Partial Solution

      19.1.3 Achieving Statelessness

      19.1.4 Interaction with State

    19.2 Continuation-Passing Style

      19.2.1 Implementation by Desugaring

      19.2.2 Implementation in the Core

    19.3 Generators

    19.4 Continuations and Stacks

    19.5 Tail Calls

The term control refers to any programming language instruction that causes evaluation to proceed, because it “controls” the program counter of the machine. In that sense, sequential execution of instructions is “control”, as is an even arithmetic expression (especially in the presence of state, since it forces an ordering on the operations); other forms of control found in all ordinary programming languages include function calls and returns. However, in practice we use the term to refer primarily to those operations that cause non-local transfer of control beyond that of mere functions and procedures, usually starting with exceptions. We will study such operations in this chapter.

As we study the following control operators, it’s worth remembering that even without them, we still have languages that are Turing-complete, so these control operations provide no more “power”. Therefore, what control operators do is change and potentially improve the way we express our intent, and therefore enhance the structure of programs. Thus, it pays to being our study by focusing on program structure.

19.1 Control on the Web

Let us begin our study by examining the structure of Web programs. Consider the following program:Henceforth, we’ll call this our “addition server”. You should, of course, understand this as a stand-in for more sophisticated applications. For instance, the two prompts might ask for starting and ending points for a trip, and in place of addition we might compute a route or compute airfares. There might even be computation between the two steps: e.g., after entering the first city, the airline might prompt us with choices of where it flies from there.

print(read-number("First number")

  + read-number("Second number"))

This is based on hypothetical a read-number function that, when run, suspends program execution, prompts for a number and, when the user enters one, resumes computation. You might find it excessively pedantic that I’d mention the suspension and resumption of computation, but in fact these facts will prove to be absolutely central to our study, so don’t gloss over these steps!

Now suppose we want to run this on a Web server. We immediately encounter a difficulty: the structure of server-side Web programs is such that they generate a single Web page—such as the one asking for the first number—and then halt. As a result, the rest of the programwhich in this case prompts for the second number, then adds the two, and then prints that result, is lost.

Do Now!

Why do Web servers behave in such a strange way?

There are at least two reasons for this behavior: one perhaps historical, and the other technical. The historical reason is that Web servers were initially designed to serve pages, i.e., static content. Any program that ran had to generate its output to a file, from which a server could offer it. Naturally, developers wondered why that same program couldn’t run on demand. This made Web content dynamic. Terminating the program after generating a single piece of output was the simplest incremental step in transitioning the Web from “pages” to “programs”.

The more important reason—and the one that has stayed with us—is technical. Imagine our addition server has generated its first prompt. Recall that there is considerable pending computation: the second prompt, the addition, and the display of the result. This computation must suspend waiting for the user’s input. If there are millions of users, then millions of computations must be suspended (imagine threads running in virtual machines, each consuming memory for local data), creating an enormous performance problem. Furthermore, suppose a user does not actually complete the computation—analogous to searching at an on-line bookstore or airline site, but not completing the purchase. How does the server know when or even whether to terminate the computation? Until it does, the resources associated with that computation remain in use.

Conceptually, therefore, the Web protocol was designed to be stateless: it would not store state on the server associated with intermediate computations. Instead, Web program developers would be forced to maintain all necessary state elsewhere, and each request would need to be able to resume the computation in full. In practice, the Web has not proven to be stateless at all, but it still hews largely in this direction, and studying the structure of such programs is very instructive.

Now consider client-side Web programs: those that run inside the browser, written in or compiled to JavaScript. Suppose such a computation needs to communicate with a server. The primitive for this is called XMLHttpRequest. The user makes an instance of this primitive and invokes its send method to send a message to the server.

Communicating with a server is not, however, instantaneous: it takes some time; if the server faces a heavy load, it could take a long time; and indeed, it may never complete at all, depending on the state of the network and the server. (The user of get-number above faces the same problem: the user may take a long time to enter a number, or may never do so at all.) If the send method suspended program’s execution, the entire (client-side) application would be blocked, indefinitely. This would lead to a vastly degraded user experience.

To keep the application responsive, the designers of XMLHttpRequest therefore had a choice. They could make JavaScript multi-threaded, but because the language also has state, programmers would have to confront all the problems of state and concurrency. In particular, beginners would have to wrestle with a combination of features that even experienced programmers do not use well, probably resulting in numerous deadlocked Web sites.

Instead, JavaScript is single-threaded: i.e., there is only one thread of execution at a time.Due to the structuring problems this causes, there are now various proposals to, in effect, add “safe” threads to JavaScript. The ideas described in this chapter can be viewed as an alternative that offer similar structuring benefits. When the send method is invoked, JavaScript instead suspends the current computation and returns control to an event loop, which can now invoke other suspended computations. Devlopers associate a callback with the send. When (and if) a response returns, this callback is added to the queue of suspended computations, thereby enabling it to resume.

This callback needs to embody the rest of the processing of that request. Thus, for entirely different reasons—not performance, but avoiding the problems of synchronization, non-atomicity, and deadlocks—the client-side Web has evolved to demand the same pattern of developers. Let us now better understand that pattern.

19.1.1 Program Decomposition into Now and Later

Let us consider what it takes to make our addition program work in a stateless setting, such as on a Web server. First we have to determine the first interaction. This is the prompt for the first number, because Pyret evaluates arguments from left to right. It is instructive to divide the program into two parts: what happens to generate the first interaction (which can run right now), and what needs to happen after it (which must be “remembered” somehow). The former is easy:

read-number("First number")

We’ve already explained in prose what’s left, but now it’s time to write it as a program. It seems to be something like:

print(<the result from the first interaction>

  + read-number("Second number"))

A Web server can’t execute the above, however, because it evidently isn’t a program. We instead need some way of writing this as one.

Let’s observe a few characteristics of this computation:
  • It needs to be a legitimate program.

  • It needs to stay suspended until the request comes in.

  • It needs a way—such as a parameter—to refer to the value from the first interaction.

Put together these characteristics and we have a clear representation—a function:

fun(v1):

  print(v1 + read-number("Second number"))

end

19.1.2 A Partial Solution

On the Web, there is an additional wrinkle: each Web page with input elements needs to refer to a program stored on the Web, which will receive the data from the form and process it. This program is named in the action field of a form. Thus, imagine that the server generates a fresh label, stores the above function in a table associated with that label, and refers to the table in the action field. If and when the client actually submits the form, the server extracts the associated function, supplies it with the form’s values, and thus resumes execution.

Do Now!

Is the solution above stateless?

Let’s imagine that we have a custom Web server that maintains the above table. In such a server, we might have a special version of read-numbercall it read-number-suspendthat records the rest of the program:

read-number-suspend("First number",

  fun(v1):

    print(v1 + read-number("Second number"))

  end)

Unfortunately, this is not sufficient. The moment we perform the second read-number, we’re back to having forgotten the rest of the computation. Therefore, the second one needs to be converted to use read-number-suspend, too. What is the rest of its computation?

fun(v2):

  print(v1 + v2)

end

where v1 is the value from the first computation. Putting together the pieces, the fully-translated program is

read-number-suspend("First number",

  fun(v1):

    read-number-suspend("Second number",

      fun(v2):

        print(v1 + v2)

      end)

  end)

Notice how the inner closure depends on being nested inside the outer one, so that v1 is bound in the addition. Also observe how the addition and printing got moved from happening “immediately” after the first number was provided to waiting until the second number was also available.

Exercise

Ascribe types to the above computation. Also determine the type of the Web server and of the table holding these procedures.

19.1.3 Achieving Statelessness

We haven’t actually achieved statelessness yet, because we have this large table residing on the server, with no clear means to remove entries from it. It would be better if we could avoid the server state entirely. This means we have to move the relevant state to the client.

There are actually two ways in which the server holds state. One is that we have reserved the right to create as many entries in the hash table as we wish, rather than a constant number (i.e., linear in the size of the program itself). The other is what we’re storing in the table: honest-to-goodness closures, of which there could be an unbounded number, not limited by the program size (and each of which might be closed over an arbitrary amount of state). We’ll see this more clearly soon.

Let’s start by eliminating the closure. Instead, let’s have each of the funtion arguments to be named, top-level functions (which immediately forces us to have only a fixed number of them, bounded by the size of the program):

read-number-stateless("First number", prog-1)

 

fun prog-1(v1):

  read-number-stateless("Second number", prog-2)

end

 

fun prog-2(v2):

  print(v1 + v2)

end

Observe how each code block refers only to the name of the next procedure, rather than to a real closure. The value of the argument comes from the form. There’s just one problem: v1 in prog-2 is a free identifier!

The way to fix this problem is, instead of creating a closure after one step, to send v1 to the client to be stored there. Where do we store this? The browser offers two mechanisms for doing this: cookies and hidden fields. Which one do we use?

19.1.4 Interaction with State

One way to avoid this problem is to find a channel of communication between what follows the first and second prompts. Recall that we have noted that state provides such a channel of communication (The Design of Stateful Language Operations). Therefore, we could use a top-level variable to communicate the value of v1. To be suggestive, we’ll call this variable cookie:

var cookie = "dummy initial value"

 

read-number-suspend("First number",

  fun(v1):

    cookie := v1

    read-number-suspend("Second number",

      fun(v2):

        print(cookie + v2)

      end)

  end)

from which we can eliminate closures easily:

var cookie = "dummy initial value"

 

read-number-stateless("First number", prog-1)

 

fun prog-1(v1):

  cookie := v1

  read-number-stateless("Second number", prog-2)

end

 

fun prog-2(v2):

  print(cookie + v2)

end

Unfortunately, this means every intermediate computation will share the same cookie variable. If we open up two concurrent windows and try to add different first numbers, the latest first number will always reside in cookie, so the other window is going to see unpredictable results.

This, of course, is precisely what happens on the Web.These problems are not hypothetical. For instance, see Section 2 of Modeling Web Interactions and Errors. The browser’s cookies are merely a client-side implementation of the store. Thus, Web sites that store their information in cookies are susceptible to exactly this problem: two concurrent interactions with the site will end up interfering with one another. Therefore, the pervasive use of cookies on Web sites, induced by Web programming traditions, results in actively less usable sites.

In contrast, the Web offers another mechanism for storing information on the client: the hidden field. These are local to each page, and hence not shared across pages. In fact, they are therefore precisely analogous to the environment! Thus, instead of storing the value of v1 in a single, global cookie, if we were to store it in a hidden field in the response page, then two different response pages would have different values in their hidden field, which would be sent back to the server on the next request—thereby avoiding the interference problem entirely.

19.2 Continuation-Passing Style

The style of functions we’ve been writing has a name. Though we’ve presented ideas in terms of the Web, we’re relying on a much older idea: the functions are called continuations, and this style of programs is called continuation-passing style (CPS).We will take the liberty of using CPS as both a noun and verb: a particular structure of code and the process that converts code into it. This is worth studying in its own right, because it is the basis for studying a variety of other non-trivial control operations—such as generators.

Earlier, we converted programs so that no Web input operation was nested inside another. The motivation was simple: when the program terminates, all nested computations are lost. A similar argument applies, in a more local sense, in the case of XMLHttpRequest: any computation depending on the result of a response from a Web server needs to reside in the callback associated with the request to the server.

In fact, we don’t need to transform every expression. We only care about expressions that involve actual Web interaction. For example, if we computed a more complex mathematical expression than just addition, we wouldn’t need to transform it. If, however, we had a function call, we’d either have to be absolutely certain the function didn’t have any Web invocations either inside it, or in the functions in invokes, or the ones they invoke...or else, to be defensive, we should transform them all. Therefore, we have to transform every expression that we can’t be sure performs no Web interactions.

The heart of our transformation is therefore to turn every function, f, into one with an extra argument. This extra argument is the continuation, which represents the rest of the computation. f, instead of returning a value, instead passes the value it would have returned to its continuation. Thus, the continuation is itself a function of one argument; this argument represents the value that would have been returned by f. A function returns a value to “pass it to the rest of the computation”; CPS makes this explicit, because invoking a continuation (in place of returning a value) precisely passes it to the function representing the rest of the computation.

CPS is a general transformation, which we can apply to any program. Because it’s a program transformation, we can think of it as a special kind of desugaring: in particular, instead of transforming programs from a larger language to a smaller one (as macros do), or from one language to entirely another (as compilers do), it transforms programs within the same language: from the full language to a more restricted version that obeys the pattern we’ve been discussing. As a result, we can reuse an evaluator for the full language to also evaluate programs in the CPS subset.

19.2.1 Implementation by Desugaring

Let us therefore implement CPS as a source-to-source transformation. Thought of as a function, it consumes and returns ExprC expressions, but the output expressions will have the peculiar structure we have seen above, and will therefore be a strict subset of all ExprC expressions.

Exercise

Put differently, the comment about “strict subset” above means that certain ExprC expressions are not legal in the output of CPS). Provide examples.

<cps-trans> ::=

    fun cps(e :: ExprC) -> ExprC:

      cases (ExprC) e:

        <cps-trans-numC>

        <cps-trans-plusC>

        <cps-trans-idC>

        <cps-trans-fdC>

        <cps-trans-appC>

      end

    end

Our representation in CPS will be to turn every expression into a procedure of one argument, the continuation. The converted expression will eventually either supply a value to the continuation or will pass the continuation on to some other expression that will—by preserving this invariant inductively—supply it with a value. Applied to Pyret, all output from CPS will look like fun (k): ... ;. Since we are applying CPS to Paret instead, it will look like fdC("k", ...). Either way, note that lexical scope keeps these k’s from clashing with any other identifiers of the same name.

First let’s dispatch with the easy case, which is atomic values. Because we already have a value, we are ready to “return” it, which we do by supplying it to the continuation:
<cps-trans-numC> ::=

    | numC(_) => fdC("k", appC(idC("k"), e))

and similarly:
<cps-trans-idC> ::=

    | idC(_) => fdC("k", appC(idC("k"), e))

Exercise

Extend the language to handle conditionals.

Exercise

Extend the language to support mutable state as well. Does this have any impact on the CPS process, e.g., does it change the pattern of conversion?

Next, let’s handle binary operators. We have seen the essence of this transformation earlier, applied to the Web:
<cps-trans-plusC> ::=

    | plusC(l, r) =>

      fdC("k",

        appC(cps(l),

          fdC("l-v",

            appC(cps(r),

              fdC("r-v",

                appC(idC("k"), plusC(idC("l-v"), idC("r-v"))))))))

This assumes that the primitive operator, in this case addition, does not itself need to be transformed; on the Web, for instance, it’s a safe bet that performing arithmetic does not involve any Web interactions.Unless, of course, the arithmetic is part of a cryptographic algorithm, in which case it may be necessary to notify the NSA of the results.

Finally, we have function definition and application.

Do Now!

It’s tempting to think that, because function are just values, they too can be passed unchanged to the continuation. Why is this not true?

Exercise

Before proceeding, alter the underlying language to also permit two-argument function definitions and, correspondingly, applications. Name the definitions fd2C and the applications app2C.

For an application we have to evaluate both the function and argument expressions. Once we’ve obtained these, we are ready to apply the function. Therefore, it is tempting to write
<cps-trans-appC-try-1> ::=

    | appC(f, a) =>

      fdC("k",

        appC(cps(f),

          fdC("f-v",

            appC(cps(a),

              fdC("a-v",

                appC(idC("k"), appC(idC("f-v"), idC("a-v"))))))))

Do Now!

Do you see why this is wrong?

The problem is that, though the function is a value, that value is a closure with a potentially complicated body: evaluating the body can, for example, result in further Web interactions, at which point the rest of the function’s body, as well as the pending k(...) (i.e., the rest of the program), will all be lost. To avoid this, we have to supply k to the function’s value, and let the inductive invariant ensure that k will eventually be invoked with the value of applying f-v to a-v:
<cps-trans-appC> ::=

    | appC(f, a) =>

      fdC("k",

        appC(cps(f),

          fdC("f-v",

            appC(cps(a),

              fdC("a-v",

                app2C(idC("f-v"), idC("a-v"), idC("k")))))))

A function is itself a value, so it should be returned to the pending computation. The application case above, however, shows that we have to transform functions to take an extra argument, namely the continuation at the point of invocation. This leaves us with a quandary: which continuation do we supply to the body?

<cps-trans-fdC-try-1> ::=

    | fdC(v, b) =>

      fdC("k",

        appC(idC("k"),

          fd2C(v, "dyn-k",

            appC(cps(b), ???))))

That is, in place of ???, which continuation do we supply: k or dyn-k?

Do Now!

Which continuation should we supply?

The former is the continuation at the point of closure creation. The latter is the continuation at the point of closure invocation. In other words, the former is “static” and the latter is “dynamic”. In this case, we need to use the dynamic continuation, otherwise something very strange would happen: the program would return to the point where the closure was created, rather than where it is being used! This would result in seemingly very strange program behavior, so we wish to avoid it. Observe that we are consciously choosing the dynamic continuation just as, where scope was concerned, we chose the static environment.

<cps-trans-fdC> ::=

    | fdC(v, b) =>

      fdC("k",

        appC(idC("k"),

          fd2C(v, "dyn-k",

            appC(cps(b), idC("dyn-k")))))

Testing any code converted to CPS is slightly annoying because all CPS terms expect a continuation. In a real environment, the initial continuation is one that simply either (a) consumes a value and returns it, or (b) consumes a value and prints it, or (c) consumes a value, prints it, and gets ready for another computation (as the prompt in a REPL does). All three of these are effectively just the identity function in various guises. Thus, the following definition is helpful for testing:

fun icps(e):

  id-cps = fdC("v", idC("v"))

  interp(appC(cps(e), id-cps), mt-env)

end

For instance,

icps(plusC(numC(5), appC(quad, numC(3)))) is numV(17)

icps(multC(appC(c5, numC(3)), numC(4))) is numV(20)

icps(plusC(numC(10), appC(c5, numC(10)))) is numV(15)

Lest you get lost in the myriad of code, let me highlight the important lesson here: We’ve recovered our code structure. That is, we can write the program in direct style, with properly nested expressions, and a compiler—in this case, the CPS converter—takes care of making it work with a suitable underlying API. This is what good programming languages ought to do!

19.2.2 Implementation in the Core

Now that we’ve seen how CPS can be implemented through desguaring, we should ask whether it can be put in the core instead.

Recall that we’ve said that CPS applies to all programs. We have one program we are especially interested in: the interpreter. Sure enough, we can apply the CPS transformation to it, making available what are effectively the same continuations.

Rather than mindlessly applying the transformation, which would result in a very unwieldy (and unreadable) intepreter, we’ll clean things up a little as we go. Note first of all that the interpreter needs to take an additional argument, representing the rest of the computation:
<cps-interp> ::=

    fun interp(e :: ExprC, nv :: List<Binding>, k):

      cases (ExprC) e:

        <cps-interp-numC>

        <cps-interp-plusC>

        <cps-interp-idC>

        <cps-interp-fdC/fd2C>

        <cps-interp-appC>

        <cps-interp-app2C>

      end

    end

Exercise

Note that we have not annotated k, and we’ve dropped the return annotation on interp. Fill them in.

For constant values, we simply return them through the continuation:
<cps-interp-numC> ::=

    | numC(n) =>

      k(numV(n))

<cps-interp-idC> ::=

    | idC(s) =>

      k(lookup(s, nv))

For binary operations where the operator is a primitive, we have to follow the CPS pattern:
<cps-interp-plusC> ::=

    | plusC(l, r) =>

      interp(l, nv,

        fun(l-v):

          interp(r, nv,

            fun(r-v):

              k(plus-v(l-v, r-v)););)

Note that CPS also ends up enforcing an order-of-evalation (in this case, left-to-right) just as mutation did.

For function definitions, we have to be careful. Earlier (<cps-trans-fdC>), we added a continuation parameter to closures. However, the fdC data structures are merely data; it is functions like interp that need to be given the extra parameter. Therefore, we can leave these alone:
<cps-interp-fdC/fd2C> ::=

    | fdC(_, _) =>

      k(closV(e, nv))

    | fd2C(_, _, _) =>

      k(closV(e, nv))

Finally, applications have to be converted to CPS as we have seen before:
<cps-interp-appC> ::=

    | appC(f, a) =>

      interp(f, nv,

        fun(clos-v):

          interp(a, nv,

            fun(arg-v):

              interp(clos-v.f.body,

                xtnd-env(bind(clos-v.f.arg, arg-v), clos-v.e),

                k););)

and similarly when there are two arguments:
<cps-interp-app2C> ::=

    | app2C(f, a1, a2) =>

      interp(f, nv,

        fun(clos-v):

          interp(a1, nv,

            fun(arg1-v):

              interp(a2, nv,

                fun(arg2-v):

                  interp(clos-v.f.body,

                    xtnd-env(bind(clos-v.f.arg1, arg1-v),

                      xtnd-env(bind(clos-v.f.arg2, arg2-v),

                        clos-v.e)),

                    k);););)

19.3 Generators

Many programming languages now have a notion of generators. A generator is like a procedure, in that one can invoke it in an application. Whereas a regular procedure always begins execution at the beginning, a generator resumes from where it last left off. Of course, that means a generator needs a notion of “exiting before it’s done”. This is known as yielding, namely returning control to whatever called it.

There are many variations between generators. The points of variation, predictably, have to do with how to enter and exit a generator:
  • In some languages a generator is an object that is instantiated like any other object, and its execution is resumed by invoking a method (such as next in Python). In others it is just like a procedure, and indeed it is re-entered by applying it like a function.In languages where values in addition to regular procedures can be used in an application, all such values are collectively called applicables.

  • In some languages the yielding operation—such as Python’s yieldis available only inside the syntactic body of the generator. In others, such as Racket, yield is an applicable value bound in the body, but by virtue of being a value, it can be passed to abstractions, stored in data structures, and so on.

Python’s design represents an extreme point in that a generator is simply any function that contains the keyword yield in its body. In addition, Python’s yield cannot be passed as a parameter to another function that performs the yielding on behalf of the generator.

There is also a small issue of naming. In many languages with generators, the yielder is automatically called word yield: either as a keyword (as in Python) or as an identifier bound to an applicable value (as in Racket). Another possibility is that the user of the generator must indicate in the generator expression what name to give the yielder.Curiously, Python expects users to determine what to call self or this in objects, but it does not provide the same flexibility for yield, because it has no other way to determine which functions are generators! That is, a use might look like

(generator (yield) (from)

           (rec (f (lam (n)

                     (seq

                       (yield n)

                       (f (+ n 1)))))

             (f from)))

but it might equivalently be

(generator (y) (from)

           (rec (f (lam (n)

                     (seq

                       (y n)

                       (f (+ n 1)))))

             (f from)))

and if the yielder is an actual value, a user can also abstract over yielding:

(generator (y) (from)

           (rec (f (lam (n)

                     (seq

                       ((yield-helper y) n)

                       (f (+ n 1)))))

             (f from)))

where yield-helper will presumably perform the actual yielding.

There are actually two more design decisions:
  1. Is yield a statement or expression? In many languages it is actually an expression, meaning it has a value: the one supplied when resuming the generator. This makes the generator more flexible because the user of a generator can use the parameter(s) to alter the generator’s behavior, rather than being forced to use state to communicate desired changes.

  2. What happens at the end of the generator’s execution? In many languages, a generator raises an exception to signal its completion.

To implement generators, it will be especially useful to work from our CPS interpreter. Why? It’Remember how generators work: to yield, a generator must
  • remember where in its execution it currently is, and

  • know where in its caller it should return to.

while, when invoked, it should
  • remember where in its execution its caller currently is, and

  • know where in its body it should return to.

Observe the duality between invocation and yielding.

As you might guess, these “where”s correspond to continuations.

Exercise

Add generators to a CPS interpreter.

Exercise

How do generators differ from coroutines and threads? Implement coroutines and threads using a similar strategy.

Exercise

We have seen that Python’s generators do not permit any abstraction over yielding, whereas Racket’s do. Assuming this was intentional, why might Python have made such a design decision?

19.4 Continuations and Stacks

Surprising as it may seem, CPS conversion actually provides tremendous insight into the nature of the program execution stack. The first thing to understand is that every continuation is actually the stack itself. This might seem odd, given that stacks are low-level machine primitives while continuations are seemingly complex procedures. But what is the stack, really?
  • It’s a record of what remains to be done in the computation. So is the continuation.

  • It’s traditionally thought of as a list of stack frames. That is, each frame has a reference to the frames remaining after it finishes. Similarly, each continuation is a small procedure that refers to—and hence closes over—its own continuation. If we had chosen a different representation for program instructions, combining this with the data structure representation of closures, we would obtain a continuation representation that is essentially the same as the machine stack.

  • Each stack frame also stores procedure parameters. This is implicitly managed by the procedural representation of continuations, whereas this was done explicitly in the data stucture representation (using bind).

  • Each frame also has space for “local variables”. In principle so does the continuation, though by using the macro implementation of local binding, we’ve effectively reduced everything to procedure parameters. Conceptually, however, some of these are “true” procedure parameters while others are local bindings turned into procedure parameters by a macro.

  • The stack has references to, but does not close over, the heap. Thus changes to the heap are visible across stack frames. In precisely the same way, closures refer to, but do not close over, the store, so changes to the store are visible across closures.

Therefore, traditionally the stack is responsible for maintaining lexical scope, which we get automatically because we are using closures in a statically-scoped language.

Now we can study the conversion of various terms to understand the mapping to stacks. For instance, consider the conversion of a function application (<cps-trans-appC>). How do we “read” this? As follows:
  • Let’s use k to refer to the stack present before the function application begins to evaluate.

  • When we begin to evaluate the function position (f), create a new stack frame (fdC("f-v"): ...;. This frame has one free identifier: k. Thus its closure needs to record one element of the environment, namely the rest of the stack.

  • The code portion of the stack frame represents what is left to be done once we obtain a value for the function: evaluate the argument, and perform the application, and return the result to the stack expecting the result of the application: k.

  • When evaluation of f completes, we begin to evaluate a, which also creates a stack frame: fdC("a-v"): ...;. This frame has two free identifiers: k and f-v. This tells us:

    • We no longer need the stack frame for evaluating the function position, but

    • we now need a temporary that records the value—hopefully a function value—of evaluating the function position.

  • The code portion of this second frame also represents what is left to be done: invoke the function value with the argument, in the stack expecting the value of the application.

Similarly, examining the CPS conversion of conditionals would tell us that we have to create a new frame to evaluate the conditional expression we have to create a new stack frame. This frame closes over the stack expecting the value of the entire conditional. This frame makes a decision based on the value of the conditional expression, and invokes one of the other expressions. Once we have examined this value the frame created to evaluate the conditional expression is no longer necessary, so evaluation can proceed in the original continuation.

Viewed through this lens, we can more easily provide an operational explanation for generators. Each generator has its own private stack, and when execution attempts to return past its end, our implementation raises an error. On invocation, a generator stores a reference to the stack of the “rest of the program”, and resumes its own stack. On yielding, the system swaps references to stacks. Coroutines, threads, and generators are all conceptually similar: they are all mechanisms to create “many little stacks” instead of having a single, global stack.

19.5 Tail Calls

Observe that the stack patterns above add a frame to the current stack, perform some evaluation, and eventually always return to the current stack. In particular, observe that in an application, we need stack space to evaluate the function position and then the arguments, but once all these are evaluated, we resume computation using the stack we started out with before the application. In other words, function calls do not themselves need to consume stack space: we only need space to compute the arguments.

However, not all languages observe or respect this property. In languages that do, programmers can use recursion to obtain iterative behavior: i.e., a sequence of function calls can consume no more stack space than no function calls at all. This removes the need to create special looping constructs; indeed, loops can simply be expressed as a syntactic sugar.

Of course, this property does not apply in general. If a call to f is performed to compute an argument to a call to g, the call to f is still consuming space relative to the context surrounding g. Thus, we should really speak of a relationship between expressions: one expression is in tail position relative to another if its evaluation requires no additional stack space beyond the other. In our CPS macro, every expression that uses k as its continuation—such as a function application after all the sub-expressions have been evaluated, or the then- and else-branches of a conditional—are all in tail position relative to the enclosing application (and perhaps recursively further up). In contrast, every expression that has to create a new stack frame is not in tail position.

Some languages have special support for tail recursion: when a procedure calls itself in tail position relative to its body. This is obviously useful, because it enables recursion to efficiently implement loops. However, it hurts “loops” that cannot be squeezed into a single recursive function. For instance, when implementing a scanner or other state machine, it is most convenient to have a set of functions each representing one state, and transitioning to other states by making (tail) function calls. It is onerous (and misses the point) to turn these into a single recursive function. If, however, a language recognizes tail calls as such, it can optimize these cross-function calls just as much as it does intra-function ones.

Scheme and Racket, in particular, promise to implement tail calls without allocating additional stack space. Though some people refer to this as “tail call optimization”, this term is misleading: an optimization is optional, whereas whether or not a language promises to properly implement tail calls is a semantic feature. Developers need to know how the language will behave because it affects how they program: they need to know how to structure their loops!

Because of this feature, observe something interesting about the program after CPS transformation: all of its function applications are themselves tail calls! Assuming the program might terminate at any call is tantamount to not using any stack space at all (because the stack would get wiped out).

Exercise

Any program that consumes some amount of stack, when converted to CPS and run, suddenly consumes no stack space at all. Why?

As a corollary, does conversion to CPS reduce the overall memory footprint of the program?

Exercise

Java’s native security model employs a mechanism called stack inspection (look it up if you aren’t familiar with it). What is the interaction between CPS and stack inspection? That is, if we were to CPS a program, would this affect its security behavior?

If not, why not?

If so, how, and what would you suggest doing to recover security assuming the CPS conversion was necessary?