Update: There is a bug which slipped in all of interpreter, CPS evaluator and CPS compiler: expressions like a() && b() or a() || b() always evaluate both sides. This is wrong, of course, whether or not b() is evaluated depends on the falsy value of a(). The fix is pretty simple (convert || and && binary operations to "if" nodes) but I didn't update these pages yet. Shame on me.

Optimizer

— The devil is in the details —

Optimizing code in CPS is the topic of much academic research. But even without knowing a lot of theory, we can still implement some obvious improvements. For the fib function, the speedup is about 3x versus the previous version:

The optimizer recursively walks the AST, like other code we worte, and transforms expressions into simpler ones whenever possible, according to some rules. It works until it finds what mathematicians call a “fixed point”: optimize(ast) == ast. That is, an AST which can no longer be reduced. To achieve this it resets a changes variable to zero before starting, and increments it for every AST reduction. When changes remains zero, we are done.

Examples

The fib function

Following you can see the sequence of transformations through which the “fib” function goes. I'm showing JavaScript code to dramatize, but keep in mind that the optimizer receives an AST and returns an AST; make_js is only called once at the end.

The optimizer itself is not very optimized — it could do more on a single pass, but instead I opted to do a single modification at a time, returning the original expression unchanged when changes > 0 (for reasons which I'll detail later).

let nodes

I won't show all the intermediate steps on the next examples, but only the initial and final form.

Input code:

(λ(){
  let (a = 2, b = 5) {
    print(a + b);
  };
})();

Transformation:

(function β_CC(β_K1) {
  GUARD(arguments, β_CC);
  (function β_CC(β_K2, a) {
    GUARD(arguments, β_CC);
    (function β_CC(β_K3, b) {
      GUARD(arguments, β_CC);
      print(function β_CC(β_R4) {
        GUARD(arguments, β_CC);
        β_K3(β_R4);
      }, a + b);
    })(function β_CC(β_R5) {
      GUARD(arguments, β_CC);
      β_K2(β_R5);
    }, 5);
  })(function β_CC(β_R6) {
    GUARD(arguments, β_CC);
    β_K1(β_R6);
  }, 2);
})(function β_CC(β_R7) {
  GUARD(arguments, β_CC);
  β_TOPLEVEL(β_R7);
});
(function β_CC(β_K1) {
  var a, b;
  GUARD(arguments, β_CC);
  a = 2, b = 5, print(β_K1, a + b);
})(β_TOPLEVEL);

There are no name clashes, so the a and b variables are not renamed. As you can see, to_cps added quite some overhead to a simple let by compiling it to IIFE-s, but the optimizer is able to discard the cruft.

Input code:

(λ(){
  let (a = 2) {
    let (a = 3) {
      print(a);
    };
    print(a);
  };
})();

Transformation:

(function β_CC(β_K1) {
  GUARD(arguments, β_CC);
  (function β_CC(β_K2, a) {
    GUARD(arguments, β_CC);
    (function β_CC(β_K3, a) {
      GUARD(arguments, β_CC);
      print(function β_CC(β_R4) {
        GUARD(arguments, β_CC);
        β_K3(β_R4);
      }, a);
    })(function β_CC(β_R5) {
      GUARD(arguments, β_CC);
      print(function β_CC(β_R6) {
        GUARD(arguments, β_CC);
        β_K2(β_R6);
      }, a);
    }, 3);
  })(function β_CC(β_R7) {
    GUARD(arguments, β_CC);
    β_K1(β_R7);
  }, 2);
})(function β_CC(β_R8) {
  GUARD(arguments, β_CC);
  β_TOPLEVEL(β_R8);
});
(function β_CC(β_K1) {
  var a, β_K3, β_a$9;
  GUARD(arguments, β_CC);
  a = 2, β_K3 = function(β_R5) {
    print(β_K1, a);
  }, β_a$9 = 3, print(β_K3, β_a$9);
})(β_TOPLEVEL);

The inner a variable is renamed to β_a$9 in order to avoid the clash with the outer a.

Overview of the implementation

As already stated, the optimizer will descend the AST and simplify nodes, until no more reductions are possible. Unlike the other AST walkers we wrote, this one is “destructive” — it may modify the input AST, instead of creating a fresh one. The key reductions it makes are:

Some optimizations may “cascade” — for instance, after replacing a variable with another, the former might be a candidate for discarding. For this reason it's important to repeat the process after any change was made.

make_scope

Because we're messing with the environment (sometimes introducing variables, other times removing them) we need a precise way to figure out the connections between variables. Like, “what are the references to variable x”? I wrote a helper function, make_scope, to provide that. It walks the AST, creating Environment objects as appropriate, and annotates some nodes like this:

This information enables the optimizer to do what it does. For example if a variable is assigned as many times as it is referenced, it means we're not making use of its value anywhere (because each assignment would count one reference) — hence we can drop it. Or, when we need to replace a variable X with another one Y, we can walk X's refs to change the name to “Y” in each reference node.

The locs property that we add to "lambda" nodes will serve as a place for the optimizer to put the variables it unwraps from IIFE-s. make_scope will preserve it if already present (and will take care to define those variables in the environment), or otherwise make it an empty array. The JS code generator (in js_lambda) will include a var statement for those names if they are present (so, mod needed in js_lambda).

My first take was to call make_scope once at the start, and have the optimizer maintain all the env and def extra properties for each change it made, with the advantage that we could do multiple reductions in a pass. But this complicated code so much, I gave up. The current code calls make_scope before each optimizer pass, and turns the optimizer into a no-op after any reduction (that's necessary so that we don't operate with obsolete scope information).

The optimize function

As said, it'll call make_scope before entering each optimizer pass, and repeat until no more changes were done. Finally, before returning the optimized expression, it saves the toplevel scope in the env property. This is the entry point:

function optimize(exp) {
    do {
        var changes = 0;
        var defun = exp;
        make_scope(exp);
        exp = opt(exp);
    } while (changes);
    return exp;

    function opt(exp) {
        if (changes) return exp;
        switch (exp.type) {
          case "num"    :
          case "str"    :
          case "bool"   :
          case "var"    : return exp;
          case "binary" : return opt_binary (exp);
          case "assign" : return opt_assign (exp);
          case "if"     : return opt_if     (exp);
          case "prog"   : return opt_prog   (exp);
          case "call"   : return opt_call   (exp);
          case "lambda" : return opt_lambda (exp);
        }
        throw new Error("I don't know how to optimize " + JSON.stringify(exp));
    }

    ... // the opt_* functions here
}

We don't have to care about "let" nodes, since they will have been desugared to IIFE-s after the CPS transformer.

The “trivial” nodes (constants and variables) are returned as they are.

Each opt_* function will have to call opt recursively on members of the current expression.

opt_binary will optimize binary expressions: call opt on left and right nodes, and then check if they are constant — and if so, evaluate the expression and replace with the result.

opt_assign will optimize the left and right nodes too. Then check for potential reductions.

opt_if descends into the cond, then and else. When the condition can be determined (is constant expression) then it returns the then or the else as appropriate.

opt_prog will duplicate some work already done in the CPS transformer, such as discarding expressions that have no side effects; it's useful to do that here because optimizations cascade (so even if to_cps returned tight code, it's possible that after some optimization we end up with a side-effect-free expression here).

opt_call optimizes function calls. In the event the thing we call is an anonymous "lambda" (IIFE), we direct it to opt_iife (see code) which will unwrap that function's arguments and body into the current program. This is probably the trickiest (and most effective) optimization we do.

opt_lambda will optimize a function whose only purpose is to call another function with the same arguments. When that's not the case, it discards unused local variables from locs, and descends the body.

The code

I'm not going to detail on all these functions. Click Show code below to see the whole code (hidden initially as I didn't want the size of the scroll bar to scare you away).