Wrapping up
In the interest of actually finishing this book howto, I
moved forward and added a few (non-essential, but good-to-have) features
and fixes to the λanguage without describing their implementation here.
Download the full code below.
The program is runnable with NodeJS. It reads the program in λanguage from STDIN and compiles and executes it. It produces the compiled result at STDERR and the program output at STDOUT. Example usage:
cat primitives.lambda program.lambda | node lambda.js
The final touches are:
-
Support for the negation operator (!). It's non-essential because it can be easily implemented as a function:
not = λ(x) if x then false else true; not(1 < 0) # ⇒ true
However, having a dedicated AST node for it allows us to generate more efficient code (a function call implies the
GUARD
-s and whatnot). -
I added a
js:raw
keyword. Using it you can insert arbitrary JavaScript code in the output (only complete expressions, not statements). Example use case is to easily define primitives from the λanguage:length = js:raw "function(k, thing){ k(thing.length); }"; arrayRef = js:raw "function(k, array, index){ k(array[index]); }"; arraySet = js:raw "function(k, array, index, newValue){ k(array[index] = newValue); }"; array = js:raw "function(k){ k([].slice.call(arguments, 1)); }";
I configured my syntax highlighter to use a reddish color for this keyword, because it's dangerous. Maybe not quite, but you have to know the implications of using it: the code you pass to
js:raw
will make it untouched and unchecked into the output JS. For instance the optimizer won't be able to see that you're accessing the local variable below, and it'll dropx = 10
:dumb = λ() { let (x = 10) js:raw "x + x"; };
→ dumb = function(β_K1) { β_K1(x + x); };
That's not a bug. It's supposed to happen.
-
Generate better code for boolean expressions — make_js, as we wrote it, could easily produce output like
(a < 0 ? true : false) !== false
, but that can obviously be simplified as justa < 0
. This probably doesn't add speed, but at least the output code looks less stupid. Fixed the semantics of
&&
and||
operators such thatfalse
is the only falsy value in our language.It generates
var
declarations for globals and evaluates the final code under"use strict"
(there appears to be a 5-10% speed improvement in strict mode).
Are we even close to a real language?
Believe it or not, λanguage is pretty close to Scheme. I promised you we won't be implementing a Lisp, and I kept my word. But the bigger part of the job is done: we have decent CPS transformer and optimizer. If we wanted to implement Scheme, all that's left is writing a Scheme parser that produces a compatible AST, and a pre-compiler pass to macro-expand. And a bunch of primitives for the core library. That shouldn't be too much work.
TODO
If we'd like to continue working on λanguage, the following should be on the radar.
Variable names
The JavaScript generator leaves variable names as they are, but that's
generally not a good idea (not to mention it's a plain bug, since we allow
in identifier names characters that JavaScript does not). We at least
should prefix globals and replace illegal characters. The prefix I'm
thinking about is “λ_
”.
Variable arguments lists
Any practical language will need something like JavaScript's
arguments
. We could easily add some syntax for it, for example:
foo = λ(first, second, rest...) {
## rest here is an array with arguments after second
};
but wait, what even is an array in our λanguage? (that's next on the list).
It might seem tempting to think that we can already use the “arguments”
name (since we keep the same variable names in JS), but that won't work
properly: both to_cps
and the optimizer will assume it's a
global, possible mess resulting.
To implement the syntax above without sacrificing much code size, we could
use the GUARD
function. Example output:
foo = function CC(first, second) {
var rest = GUARD(arguments, CC, 2); // returns arguments starting at index 2
};
Related to this feature, we also need an equivalent to
Function.prototype.apply
.
Arrays and objects
These are easy to define as primitive functions, as we did with
js:raw
above. However, implementing them at the syntax level
would allow generating more efficient code, as well as giving us a
familiar syntax; a[0] = 1
is no doubt nicer than
arraySet(a, 0, 1)
.
One thing I'd like to avoid is ambiguity. For instance, in JavaScript the
curly brackets are used both for representing bodies of code, and literal
objects. The rule is “when open curly bracket occurs in statement
position, then it's a code block; when in expression position, it's an
object literal”. But we don't have statements in λanguage (which is
actually a feature), and curly brackets denote a sequence of expressions.
I'd like to keep it that way, so I wouldn't use the { ... }
notation for object literals.
The “dot” notation
Related to the previous one, we should support the dot notation for accessing object properties. That's too ubiquitous to be ignored.
I'd expect it to be somewhat challenging to support “methods” like
JavaScript does (i.e. the this
keyword). The reason for this is
that all our functions are transformed to CPS (and all function calls will
insert the continuation as first argument). If we were to support JS-like
methods, how could we know that we're calling a function in CPS
(i.e. written in λanguage) and not a straight function (i.e. from some JS
library)? This requires some thought.
Syntax (separators)
The requirement to separate expressions with semicolons in "prog"
nodes is a bit too tight. For example the following is syntactically
invalid, according to the current parser rules:
if foo() {
bar()
} # ← error
print("DONE")
The problem is on the marked line. Even though it ends with a closing
curly bracket, there should be a semicolon following it (because the
if
is really an expression, and it ends at that closing bracket).
That's not quite intuitive when coming from JavaScript; it might seem
preferable to relax the rules and make the semicolon optional after an
expression that ends with a curly bracket. But it's tricky because the
following is also a valid program:
a = {
foo();
bar();
fib
}
(6);
Result of which would be to call foo()
, bar()
and then put
the result of fib(6)
into the variable a
. Silly syntax,
but you know what, most infix languages suffer from such weirdness, for
example the following is syntactically a valid JS program; you get no
parse error if you try it, although there will be obviously a run-time
error when you call foo()
:
function foo() {
a = {
foo: 1,
bar: 2,
baz: 3
}
(10);
}
Exceptions
We could provide an exception system on top of reset and shift operators, or other primitives.
Moving on…
Even without the features listed in TODO, our λanguage is pretty powerful and I'll conclude this document with some samples comparing how you'd implement trivial programs in NodeJS versus λanguage. Read on.