UglifyJS2: compression beats version 1
Thank you again for your contributions to my pledgie! ♥
Today I pushed out another batch of commits. I've spent quite some time
on the compressor and it does very well in my tests. On jQuery it beats
version 1 by 227 bytes after gzip (although
surprisingly, the size before gzip is bigger by 80
bytes).
Following there's description of some cases where it performs better than v1.
Skip to usage if you're not interested in reading about compression.
Minimize the number of statements
As you know, v1 will join consecutive simple statements with the comma operator, in the hope that a block of statements will become a single statement and we can drop the block brackets. Here's something that v1 can do:
if (foo) {
a = x();
b = y();
}
==>
foo && (a = x(), b = y());
However, the presence of a "non-simple" statement will render this optimization useless:
if (foo) {
a = x();
b = y();
for (; a < b; a++) console.log(a);
}
==>
if (foo) {
a = x(), b = y();
for (; a < b; a++) console.log(a);
}
The new compressor handles the above code like this:
if (foo) for (a = x(), b = y(); b > a; a++) console.log(a);
managing to save 3 bytes (the block brackets and one semicolon). Similar
savings occur if the statement is switch
, if
,
with
, return
or throw
(version one handled
the last two too).
Invent a “return” statement
One thing that compresses well is something like:
function f() {
if (foo) return bar;
return baz;
}
==>
function f() {
return foo ? bar : baz;
}
But sometimes the if
is not followed by anything:
function f() {
if (foo) return bar;
}
V1 will leave that as is. V2 prefers to think there's a return
undefined
there and will transform it into the following:
function f() {
return foo ? bar : void 0;
}
For this simple case it's a little more bytes in fact, but I found that in general it helps because it allows to reduce the number of statements. Here's a more complicated function and how it compresses in both versions (use the small button on the right which shows up when you hover the code to view it easier):
function setOpacity(el, o) {
if (o != null) {
if (o == "" && o != 0) {
is_ie
? el.style.filter = ""
: el.style.opacity = "";
} else {
is_ie
? el.style.filter = "alpha(opacity=" + Math.round(o * 100) + ")"
: el.style.opacity = o;
}
return o;
} else {
if (!is_ie)
return parseFloat(el.style.opacity);
else
if (/alpha\(opacity=([0-9.])+\)/.test(el.style.opacity))
return parseFloat(RegExp.$1);
}
}
==(v1)==>
function setOpacity(el, o) {
if (o != null) return o == "" && o != 0 ? is_ie ? el.style.filter = "" : el.style.opacity = "" : is_ie ? el.style.filter = "alpha(opacity=" + Math.round(o * 100) + ")" : el.style.opacity = o, o;
if (!is_ie) return parseFloat(el.style.opacity);
if (/alpha\(opacity=([0-9.])+\)/.test(el.style.opacity)) return parseFloat(RegExp.$1);
}
==(v2)==>
function setOpacity(el, o) {
return o != null ? (o == "" && o != 0 ? is_ie ? el.style.filter = "" : el.style.opacity = "" : is_ie ? el.style.filter = "alpha(opacity=" + Math.round(o * 100) + ")" : el.style.opacity = o, o) : is_ie ? /alpha\(opacity=([0-9.])+\)/.test(el.style.opacity) ? parseFloat(RegExp.$1) : void 0 : parseFloat(el.style.opacity);
}
Thanks to the imaginary return undefined
, version 2 is able to
get rid of three if
and two return
keywords, and makes it
a single statement; all at the price of a void 0
.
IF ... return/continue
Example for if...return:
function g() {
if (foo) return;
if (bar) return;
if (baz) return;
if (baa) return;
a();
b();
}
==>
function g() {
foo || bar || baz || baa || (a(), b());
}
It can only do this, however, if the if/return appears directly under the
function body (that is, not nested in another if
, for
etc.). A similar optimization occurs for if/continue, again, only if the
if
is directly in the block that continue
refers
to:
for (var i = 0; i < 5; ++i) {
if (i < 3) continue;
console.log(i);
}
==>
for (var i = 0; 5 > i; ++i) 3 > i || console.log(i);
Undefined
We now optimize typeof x == "undefined"
to x ===
undefined
. The undefined
value itself will be rendered as
void 0
, unless there's a variable named "undefined" in scope—if
so, AST_Undefined nodes will be rendered "undefined" and the mangler
in turn will reduce the name to a single character. This saves some bytes on
jQuery after gzip (also coupled with the “return undefined” statement that I
mentioned before).
Smarter sequences
Two smallish optimizations for the “sequences” (comma operator):
x, x ==> x
x = expression(), x ==> x = expression()
Seems like nobody will write code like this, but our compressor does, for example:
stuff += expression();
if (stuff) { foo() }
==>
if (stuff += expression(), stuff) { foo() }
==(after AST_Seq optimization)==>
if (stuff += expression()) { foo() }
==(and after AST_If optimization)==>
(stuff += expression()) && foo()
I have bigger plans for this feature. Basically, if between the moment a variable is initialized and the moment it's referenced we can detect that no code may have side effects on that variable, and if that variable is only referenced once, then we could safely put the initialization in place of the reference and discard the initialization statement. It's tricky to do and I think it will slow down the compressor considerably, but it might also provide considerable savings, so I plan to work on it.
Fine-tuning for gzip
When name mangling is requested, v2 takes the trouble of counting character frequencies for all unmangleable keywords / strings / properties (idea discussed in v1) and starts generating names with the most frequently used characters1. This saves ~130 bytes on jQuery after gzip, although the unzipped file size is the same.
I also tried to sort names by number of references, and generate first those names that are most frequently used. This would mean that the most frequently used names would be shorter and made of characters with high frequency numbers. It saves almost 1K before gzip, but adds 450 bytes after gzip!! So I thought maybe gzip prefers more frequently used stuff to have longer names (it does very well compressing repetitive long sequences after all) so I tried the reverse sort, but that adds 1K after gzip. :-\
Optimizing for gzip is tricky business.
Another small optimization is to always prefer to use ">"
and
">="
(no less-than operator). I stole this idea from Google
Closure. It saves 23 bytes on jQuery.
Usage
To use the new version with NodeJS you need source-map and optimist modules, and then just clone the repository and symlink the command-line tool somewhere in your $PATH, for example:
npm install source-map
npm install optimist
cd ~
git clone git://github.com/mishoo/UglifyJS2.git
mkdir -p ~/bin
cd ~/bin
ln -s ~/UglifyJS2/bin/uglifyjs2
The command-line tool outputs the following on --help:
uglifyjs2 [options] input1.js [input2.js ...]
Maximum compression settings are on by default.
Use a single dash to read input from the standard input.
Options:
--source-map Specify an output file where to generate source map
--source-map-root The root of the original source to be included in the source map
-p, --prefix Skip prefix for original filenames that appear in source maps
-o, --output Output file (default STDOUT)
--stats Display operations run time on STDERR
-v, --verbose Verbose
-b, --beautify Beautify output
-m, --no-mangle Don't mangle names
-c, --options Pass compressor options
I briefly described the command line options in the previous post; what's new since then is the -b (if you want to get beautified output), -m if you want to skip the mangling step and the -c to specify compressor options.
The following compressor options are available (I'm sorry, this list is a bit messy and it's possible at places that it does compression even when options are off):
sequences : join simple statements with the comma operator
properties : a["foo"] ==> a.foo
dead_code : drop unreachable code
drop_debugger : drop the debugger statement
unsafe : apply the "unsafe" transformations
conditionals : optimize if-s and conditionals
comparations : optimize comparations
evaluate : try to evaluate expressions
booleans : optimize booleans
loops : optimize loops
unused_func : drop unused functions (not from toplevel scope)
hoist_funs : hoist function declarations
hoist_vars : hoist var declarations
if_return : optimize if/return and if/continue
join_vars : join consecutive var statements
cascade : for now the sequence optimization, x = foo, x ==> x = foo
The argument of -c should be a comma-separated list of
prop=value, where prop is an option from the list above and
value is a JavaScript expression (note, it's evaluated! don't use
the CLI with uncontrolled data). For now all options are booleans and
they're all true
by default.
There is no universally optimal way to compress all scripts, so you might want to disable some options and test if you get better file size after gzip. For example on jQuery hoist_vars and join_vars don't help too much; they do make the size smaller before gzip, but it's slightly bigger after gzip. I found the gzip-optimal way to compress jquery is like this:
uglifyjs2 -c hoist_vars=false,join_vars=false jquery-1.8.1.js > jq.js
If you want you can also generate a source map, for example:
uglifyjs2 jquery-1.8.1.js \
-c hoist_vars=false,join_vars=false \
--source-map jq.js.map \
--source-map-root http://code.jquery.com/ \
-o jquery-1.8.1.min.js
Thanks to “node-optimist” you can now pass arguments in any order (i.e. you can begin with the list of files to compress). UglifyJS2 supports multiple files in the command line, which is important for generating an useful source map.
Issues
(besides many missing features that v1 has)
Currently we lack support for directives (they're simply read as
strings). I'll fix that shortly — done. Otherwise, please consider
this code “bleeding edge”—I can't be sure it's as stable as version 1, until
it will be in use by so many people.
The tests are still lagging behind :-(; the way I test the compression is, to me, more reassuring than a limited test suite; I'm compressing some 650K which is used in a big application; then load that application and start clicking around, and if it doesn't crash in a few minutes I declare myself happy.
Please use the Github issue tracker if you find any bugs!