Update: There is a bug which slipped in all of interpreter, CPS evaluator and CPS compiler: expressions like a() && b() or a() || b() always evaluate both sides. This is wrong, of course, whether or not b() is evaluated depends on the falsy value of a(). The fix is pretty simple (convert || and && binary operations to "if" nodes) but I didn't update these pages yet. Shame on me.

Transform code to continuation-passing style

In order to get unlimited recursion and continuations, we're going to proceed like we did for the interpreter: we implemented a “CPS evaluator”. That was itself written in continuation-passing style, taking an expression to compute and a callback to invoke with the result. But this time we're compiling, not evaluating, so it's not immediately obvious what this callback refers to.

Here are some example transformations that this code will do. I'm writing these in λanguage, rather than AST-s, as they are easier to understand this way, but let's be clear about it: the CPS transformer takes an AST as input, and returns a compatible and equivalent AST as output, with the difference that the output will be in continuation-passing style. It does not return a λanguage or JavaScript program as a string, that's code generator's job.

In the samples below, TC stands for the “toplevel continuation” — something that receives the result of the whole program. You can consider it “the final return”.

a = 5;
TC(a = 5);

a = foo(5);
foo(λ(R_foo){
  TC(a = R_foo);
}, 5);

a = foo(1) + bar(2);
foo(λ(R_foo){
  bar(λ(R_bar){
    TC(a = R_foo + R_bar);
  }, 2);
}, 1);

a = foo(1, bar(2, 3));
bar(λ(R_bar){
  foo(λ(R_foo){
    TC(a = R_foo);
  }, 1, R_bar);
}, 2, 3);

a = if foo(1) then bar(2) else baz(3);
foo(λ(R_foo){
  if R_foo then bar(λ(R_bar){
    TC(a = R_bar);
  }, 2) else baz(λ(R_baz){
    TC(a = R_baz);
  }, 3);
}, 1);

a = λ(a, b) a + b;
TC(a = λ(k, a, b) k(a + b));

Implementation

The to_cps function will take an AST as input and a callback function. The main entry point looks like this:

function to_cps(exp, k) {
    return cps(exp, k);

    function cps(exp, k) {
        switch (exp.type) {
          case "num"    :
          case "str"    :
          case "bool"   :
          case "var"    : return cps_atom   (exp, k);

          case "assign" :
          case "binary" : return cps_binary (exp, k);

          case "let"    : return cps_let    (exp, k);
          case "lambda" : return cps_lambda (exp, k);
          case "if"     : return cps_if     (exp, k);
          case "prog"   : return cps_prog   (exp, k);
          case "call"   : return cps_call   (exp, k);

          default:
            throw new Error("Dunno how to CPS " + JSON.stringify(exp));
        }
    }

    // NOTE, the functions defined below must be placed here.
}

It looks similar to the code generator, but all functions take that additional k which compiles the rest of the program — the continuation. The AST returned by k operates on the value of the current expression, hence we need to pass it to k — but think in more abstract terms than we did in the CPS evaluator: we don't run the program this time, we're just compiling it. So we don't work with values. We work with AST-s that represent values and computations.

So the cps_* functions must call the k (or pass it on) in order to “invoke” the rest of the program (really, to compile it and get back the AST node that represents the continuation of the current node), but they also must return an AST node for the very expression that they were invoked on. So that's another difference from the CPS evaluator, which just returned undefined on all branches. Here we return AST-s. I hope it'll be clear after going through the cps_* transformers.

Atomic values

We'll call “atomic” those values that need no other code in order to compute them. These are constants, but also variables. We just pass the same AST node to the continuation (and return its result!)

function cps_atom(exp, k) {
    return k(exp);
}

"binary" and "assign"

For binary operations we must invoke the continuation only after we compiled the "left" and "right", so we compile them, in this order, then invoke k with an AST node that represents the same binary operation, but this time left and right are (atomic) results:

function cps_binary(exp, k) {
    return cps(exp.left, function(left){
        return cps(exp.right, function(right){
            return k({ type     : exp.type,
                       operator : exp.operator,
                       left     : left,
                       right    : right });
        });
    });
}

"let" nodes

We convert "let" nodes to IIFE-s, and CPS-transform those instead. This means there will be no more "let" nodes left in the AST after the CPS transformer (which means we could remove the js_let from the code generator).

function cps_let(exp, k) {
    if (exp.vars.length == 0)
        return cps(exp.body, k);
    return cps({
        type: "call",
        args: [ exp.vars[0].def || FALSE ],
        func: {
            type: "lambda",
            vars: [ exp.vars[0].name ],
            body: {
                type: "let",
                vars: exp.vars.slice(1),
                body: exp.body
            }
        }
    }, k);
}

"lambda" nodes

For one thing, it should be clear that the continuation of a "lambda" node should get a "lambda" node, for example in this code: a = λ(x) x + 1;, the continuation of evaluating the λ node will be to assign it to a, so it must receive a function value.

But what does it mean to transform a function to CPS? The result of compiling this example should be: TC(a = λ(K, x){ K(x + 1) }); (where TC is the continuation of the assignment expression). So the "lambda" node itself is being added a new first argument to represent its continuation (K), and its body is CPS-transformed in such a way that it will call that K with the result value. We must “invent” a variable name (we won't just use “K”, of course) and must make sure that name won't clash with other names defined by the program. We could do that rigorously, but here I'll just pretend that if we prefix the generated names with "β_" we avoid any possible collision (and in fact we do, because β is not an allowed character in identifiers by our parser rules).

So, in summary, here are the steps:

  1. Invent an unique variable name (stored in cont).

  2. CPS-compile the body with a continuation that calls cont with the result.

  3. Invoke our own continuation (k) with a "lambda" node which has the updated argument list (includes cont as first argument) and the CPS-transformed body.

function cps_lambda(exp, k) {
    var cont = gensym("K");
    var body = cps(exp.body, function(body){
        return { type: "call",
                 func: { type: "var", value: cont },
                 args: [ body ] };
    });
    return k({ type: "lambda",
               name: exp.name,
               vars: [ cont ].concat(exp.vars),
               body: body });
}

The gensym function (stands for “generate symbol”) should return an unique variable name.

// these are global
var GENSYM = 0;
function gensym(name) {
    if (!name) name = "";
    name = "β_" + name;
    return name + (++GENSYM);
}

Compared to other cps_* functions (except for cps_atom), cps_lambda is the only one which invokes the k continuation from its main body rather than from a continuation. That's because "lambda"-s are actually atomic values.

"if" nodes

The compilation for "if" is straightforward:

function cps_if(exp, k) {
    return cps(exp.cond, function(cond){
        return {
            type: "if",
            cond: cond,
            then: cps(exp.then, k),
            else: cps(exp.else || FALSE, k),
        };
    });
}

So it compiles the condition, passing a continuation that returns the appropriate AST node (which is also an "if"). It then passes the k continuation for compiling both branches. There's a subtle problem here, as we'll see later.

"call" nodes

A "call" node must evaluate the function to call, and then the arguments sequentially. We must build an array with the appropriate AST nodes for each argument, so that we can compile the output "call" node. But as first argument we will insert a continuation that holds the rest of the program, as generated by k, so we'll have a make_continuation helper to create it.

function cps_call(exp, k) {
    return cps(exp.func, function(func){
        return (function loop(args, i){
            if (i == exp.args.length) return {
                type : "call",
                func : func,
                args : args
            };
            return cps(exp.args[i], function(value){
                args[i + 1] = value;
                return loop(args, i + 1);
            });
        })([ make_continuation(k) ], 0);
    });
}

The continuation is a function that must apply the rest of the program (whatever k compiles into) to the value returned by the "call" node:

function make_continuation(k) {
    var cont = gensym("R");
    return { type : "lambda",
             vars : [ cont ],
             body : k({ type  : "var",
                        value : cont }) };
}

"prog" node

That's simple: if the list of expressions is empty, then we return false (that is, call the rest of the program with the FALSE node). If we have only one expression, then we call the rest of the program with its value. And if there are more, compile the first, and recurse for the rest.

function cps_prog(exp, k) {
    return (function loop(body){
        if (body.length == 0) return k(FALSE);
        if (body.length == 1) return cps(body[0], k);
        return cps(body[0], function(first){
            return {
                type: "prog",
                prog: [ first, loop(body.slice(1)) ]
            };
        });
    })(exp.prog);
}

You can notice that "prog"-s resulted in the final AST will contain exactly two expressions (of which the second might be another "prog"). That resembles the way we defined lists. The AST is still compatible with our code generator, if a bit longer.

In the next section we'll see the CPS transformer in action and identify some of its problems.