Writing a parser

Writing a parser is, depending on the language, a moderately complex task. In essence, it must transform a piece of code (which we inspect by looking at the characters) into an “abstract syntax tree” (AST). The AST is a structured in-memory representation of the program, and it's “abstract” in the sense that it does not care exactly what characters is the source code made of, but it faithfully represents the semantics of it. I wrote a separate page to describe our AST.

For example, for the following program text:

sum = lambda(a, b) {
  a + b;
print(sum(1, 2));

our parser will generate the following AST, as a JavaScript object:

  type: "prog",
  prog: [
    // first line:
      type: "assign",
      operator: "=",
      left: { type: "var", value: "sum" },
      right: {
        type: "lambda",
        vars: [ "a", "b" ],
        body: {
          // the body should be a "prog", but because
          // it contains a single expression, our parser
          // reduces it to the expression itself.
          type: "binary",
          operator: "+",
          left: { type: "var", value: "a" },
          right: { type: "var", value: "b" }
    // second line:
      type: "call",
      func: { type: "var", value: "print" },
      args: [{
        type: "call",
        func: { type: "var", value: "sum" },
        args: [ { type: "num", value: 1 },
                { type: "num", value: 2 } ]

The main difficulty in writing a parser consists in a failure to properly organize the code. The parser should operate at a higher level than reading characters from a string. A few advices on how to keep complexity manageable:

To keep it simple I've split my code in three parts, which are further divided into many small functions: