JS compressor of world fame.Open demo
- Code generator
- The AST structure
- SpiderMonkey AST
- Scope analysis
- AST walker
- AST transformer
- UglifyJS on Github
Latest blog entries tagged "uglifyjs"
- Using UglifyJS for code refactoring
- Should you switch to UglifyJS2?
- UglifyJS online demo
- UglifyJS — why not switching to SpiderMonkey AST
/** May the source-map be with you! **/
UglifyJS — the syntax tree (AST)
You can get a cruel description of the AST with uglifyjs --ast-help. Each definition starts with the node name (i.e. AST_Node), followed by a list of own properties in parens (if it has any), followed by a string description and followed by any subclasses (if there are any). Nodes inherit properties from the base classes; for example since the start and end properties are defined in the base class AST_Node, then every node contains those properties.
The parser will instantiate the most specific subclass; for example you will never find an object of type AST_Node in the AST; that's just the base class. You won't find an AST_Statement either, since every kind of statement has its own dedicated subclass.
The AST nodes
The following hierarchy is generated by your browser using introspection from the UglifyJS objects. Click a node to get a brief description of it. See below for some information on AST_Token, also take a look at the scope analyzer for more information about properties in red and SymbolDef.
For a higher-level operation, the parser works concomitantly with a tokenizer. The tokenizer is initialized to the stream of the source code text and reads one token at a time, producing an AST_Token object which has the following properties:
type — the type of this token; can be "num", "string", "regexp", "operator", "punc", "atom", "name", "keyword", "comment1" or "comment2".
"comment1" and "comment2" are for single-line, respectively multi-line comments.
file — the name of the file where this token originated from. Useful when compressing multiple files at once to generate the proper source map.
value — the "value" of the token; that's additional information and depends on the token type: "num", "string" and "regexp" tokens you get their literal value; for "operator" you get the operator; for "punc" it's the punctuation sign (parens, comma, semicolon etc); for "atom", "name" and "keyword" it's the name of the identifier, and for comments it's the body of the comment (excluding the initial "//" and "/*".
line and col — the location of this token in the original code. The line is 1-based index, and the column is the 0-based index.
pos and endpos — the zero-based start and end positions of this token in the original text.
nlb — short for "newline before", it's a boolean that tells us whether there was a newline before this node in the original source. It helps for automatic semicolon insertion. For multi-line comments in particular this will be set to true if there either was a newline before this comment, or if this comment contains a newline.
comments_before — this doesn't apply for comment tokens, but for all other token types it will be an array of comment tokens that were found before.
The start and end properties of AST nodes are AST_Token objects and tell you where that node begins and ends. The AST_Toplevel is the single node that might start in one file and end in another (when parsing multiple files); the parser will properly update its end property.
Read more about the scope analyzer.