Colors: Light | Dark
Welcome! (login)

UglifyJS

JS compressor of world fame.

Open demo

Latest blog entries tagged "uglifyjs"

/** May the source-map be with you! **/

UglifyJS — the code generator

The code generator is a recursive process of getting back source code from an AST returned by the parser. Every AST node has a “print” method that takes an OutputStream and dumps the code from that node into it. The stream object supports a lot of options that control the output. You can specify whether you'd like to get human-readable (indented) output, the indentation level, whether you'd like to quote all properties in object literals etc.

SYNOPSIS

var stream = UglifyJS.OutputStream({ ...options... });
var code = ast.print(stream);
alert(stream.toString());

The options is a JS object. Here are the default options:

indent_start  : 0,     // start indentation on every line (only when `beautify`)
indent_level  : 4,     // indentation level (only when `beautify`)
quote_keys    : false, // quote all keys in object literals?
space_colon   : true,  // add a space after colon signs?
ascii_only    : false, // output ASCII-safe? (encodes Unicode characters as ASCII)
inline_script : false, // escape "</script"?
width         : 80,    // informative maximum line width (for beautified output)
max_line_len  : 32000, // maximum line length (for non-beautified output)
ie_proof      : true,  // output IE-safe code?
beautify      : false, // beautify output?
source_map    : null,  // output a source map
bracketize    : false, // use brackets every time?
comments      : false, // output comments?
semicolons    : true,  // use semicolons to separate statements? (otherwise, newlines)

Most of these should be obvious. The ones worth additional discussion are source_map and comments.

Source map

The output stream keeps track of the current line/column in the output and can trivially generate a source mapping to the original code via Mozilla's source-map library. To use this functionality, you must load this library (it's automatically require-d by UglifyJS in the NodeJS version, but in a browser you must load it yourself) and make it available via the global MOZ_SourceMap variable.

Next, in the code generator options you'd pass a UglifyJS.SourceMap object (that's a thin wrapper around the source-map library), like this:

var source_map = UglifyJS.SourceMap({ ...source_map_options... });
var stream = UglifyJS.OutputStream({
    ...
    source_map: source_map
});
ast.print(stream);

var code = stream.toString();
var map = source_map.toString(); // json output for your source map

The source_map_options is an optional JS object that you may pass to specify additional properties for your source map:

file : null, // the compressed file name
root : null, // the root URL to the original sources
orig : null, // the input source map

The orig is useful when you compress code that was generated from some other source (possibly other programming language). If you have an input source map, pass it in this argument (either as a JS object, or as a JSON string) and UglifyJS will generate a mapping that maps back to the original source (as opposed to the compiled code that you are compressing).

Comments

The code generator can keep certain comments in the output. If you pass comments: true it'll keep all comments. You can pass a RegExp to retain only comments whose body matches that regexp. You can pass a function for custom filtering. For example, when --comments is passed with no argument the command-line tool will keep all comments containing "@license", "@preserve" or "@cc_on". Here is the function that it uses to filter them:

function(node, comment) {
    var text = comment.value;
    var type = comment.type;
    if (type == "comment2") {
        // multiline comment
        return /@preserve|@license|@cc_on/i.test(test);
    }
}

The code generator will pass two arguments: the node that the current comment is attached to, and the comment token.

Note that some comments might still be lost, due to compressor optimizations that cut whole nodes from the tree (for example unused function declarations). The safest place where to put comments that you might want to keep is at toplevel (not nested in brackets or functions).

The OutputStream object

You don't need to know this.
I'm just dropping it here for whoever might find it useful.

I'm including a few notes about the OutputStream object, since it seems to be useful for all sorts of code generation, not only JavaScript. It implements a simple stream with functions to print text, to print a string (quotes it and escapes characters that need to be escaped in a string), to output indented blocks etc.

To start with, here's an example. The --ast-help argument to uglifyjs will output a description of the AST hierarchy. The output looks like this:

AST_Node (start end) "Base class of all AST nodes" {
    AST_Statement "Base class of all statements" {
        AST_Debugger "Represents a debugger statement"
        AST_Directive (value scope) 'Represents a directive, like "use strict";'
        AST_SimpleStatement (body) "A statement consisting of an expression, i.e. a = 1 + 2"
        AST_Block (body) "A body of statements (usually bracketed)" {
            AST_BlockStatement "A block statement"
            AST_Scope (directives variables functions uses_with uses_eval parent_scope enclosed cname) "Base class for all statements introducing a lexical scope" {
                AST_Toplevel (globals) "The toplevel scope"
                AST_Lambda (name argnames uses_arguments) "Base class for functions" {
                    AST_Function "A function expression"
                    AST_Defun "A function definition"
                }
            }
            AST_Switch (expression) "A `switch` statement"
            AST_SwitchBranch "Base class for `switch` branches" {
                AST_Default "A `default` switch branch"
                AST_Case (expression) "A `case` switch branch"
            }
            AST_Try (bcatch bfinally) "A `try` statement"
            AST_Catch (argname) "A `catch` node; only makes sense as part of a `try` statement"
            AST_Finally "A `finally` node; only makes sense as part of a `try` statement"
        }
        AST_EmptyStatement "The empty statement (empty block or simply a semicolon)"
        AST_StatementWithBody (body) "Base class for all statements that contain one nested body: `For`, `ForIn`, `Do`, `While`, `With`" {
            AST_LabeledStatement (label) "Statement with a label"
            AST_DWLoop (condition) "Base class for do/while statements" {
                AST_Do "A `do` statement"
                AST_While "A `while` statement"
            }
            AST_For (init condition step) "A `for` statement"
            AST_ForIn (init name object) "A `for ... in` statement"
            AST_With (expression) "A `with` statement"
            AST_If (condition alternative) "A `if` statement"
        }
        AST_Jump "Base class for “jumps” (for now that's `return`, `throw`, `break` and `continue`)" {
            AST_Exit (value) "Base class for “exits” (`return` and `throw`)" {
                AST_Return "A `return` statement"
                AST_Throw "A `throw` statement"
            }
            AST_LoopControl (label) "Base class for loop control statements (`break` and `continue`)" {
                AST_Break "A `break` statement"
                AST_Continue "A `continue` statement"
            }
        }
        AST_Definitions (definitions) "Base class for `var` or `const` nodes (variable declarations/initializations)" {
            AST_Var "A `var` statement"
            AST_Const "A `const` statement"
        }
    }
    AST_VarDef (name value) "A variable declaration; only appears in a AST_Definitions node"
    AST_Call (expression args) "A function call expression" {
        AST_New "An object instantiation.  Derives from a function call since it has exactly the same properties"
    }
    AST_Seq (car cdr) "A sequence expression (two comma-separated expressions)"
    AST_PropAccess (expression property) 'Base class for property access expressions, i.e. `a.foo` or `a["foo"]`' {
        AST_Dot "A dotted property access expression"
        AST_Sub 'Index-style property access, i.e. `a["foo"]`'
    }
    AST_Unary (operator expression) "Base class for unary expressions" {
        AST_UnaryPrefix "Unary prefix expression, i.e. `typeof i` or `++i`"
        AST_UnaryPostfix "Unary postfix expression, i.e. `i++`"
    }
    AST_Binary (left operator right) "Binary expression, i.e. `a + b`" {
        AST_Assign "An assignment expression — `a = b + 5`"
    }
    AST_Conditional (condition consequent alternative) "Conditional expression using the ternary operator, i.e. `a ? b : c`"
    AST_Array (elements) "An array literal"
    AST_Object (properties) "An object literal"
    AST_ObjectProperty (key value) "Base class for literal object properties" {
        AST_ObjectKeyVal "A key: value object property"
        AST_ObjectSetter "An object setter property"
        AST_ObjectGetter "An object getter property"
    }
    AST_Symbol (scope name thedef) "Base class for all symbols" {
        AST_SymbolDeclaration (init) "A declaration symbol (symbol in var/const, function name or argument, symbol in catch)" {
            AST_SymbolVar "Symbol defining a variable" {
                AST_SymbolFunarg "Symbol naming a function argument"
            }
            AST_SymbolConst "A constant declaration"
            AST_SymbolDefun "Symbol defining a function"
            AST_SymbolLambda "Symbol naming a function expression"
            AST_SymbolCatch "Symbol naming the exception in catch"
        }
        AST_Label (references) "Symbol naming a label (declaration)"
        AST_SymbolRef "Reference to some symbol (not definition/declaration)" {
            AST_LabelRef "Reference to a label symbol"
        }
        AST_This "The `this` symbol"
    }
    AST_Constant "Base class for all constants" {
        AST_String (value) "A string literal"
        AST_Number (value) "A number literal"
        AST_RegExp (value) "A regexp literal"
        AST_Atom "Base class for atoms" {
            AST_Null "The `null` atom"
            AST_NaN "The impossible value"
            AST_Undefined "The `undefined` value"
            AST_Infinity "The `Infinity` value"
            AST_Boolean "Base class for booleans" {
                AST_False "The `false` atom"
                AST_True "The `true` atom"
            }
        }
    }
}

As you can see, it resembles JavaScript, but it's not really JavaScript. The OutputStream object made it pretty simple to get the output. Here's the function that generates it:

exports.describe_ast = function() {
    var out = UglifyJS.OutputStream({ beautify: true });
    function doitem(ctor) {
        out.print("AST_" + ctor.TYPE);
        var props = ctor.SELF_PROPS.filter(function(prop){
            return !/^\$/.test(prop);
        });
        if (props.length > 0) {
            out.space();
            out.with_parens(function(){
                props.forEach(function(prop, i){
                    if (i) out.space();
                    out.print(prop);
                });
            });
        }
        if (ctor.documentation) {
            out.space();
            out.print_string(ctor.documentation);
        }
        if (ctor.SUBCLASSES.length > 0) {
            out.space();
            out.with_block(function(){
                ctor.SUBCLASSES.forEach(function(ctor, i){
                    out.indent();
                    doitem(ctor);
                    out.newline();
                });
            });
        }
    };
    doitem(UglifyJS.AST_Node);
    return out + "";
};

Construct an OutputStream as described above in synopsis. The object exports the following methods:

Fork me on Github