Thoughts on CommonJS / RequireJS
This rant is to explain this tweet. The short version is, I think they introduce more problems than they solve. While I do admit there is a need to define packages in JavaScript, it should probably be addressed at language level. In this fabulous class, Gerald Jay Sussman says “I consider being able to return procedural values, and therefore to have first-class procedures in general, as being essential to good modular programming” and continues saying that first-class functions cover all modularity needs, including making package systems. That's certainly true, but no language I know, not even Scheme, relies solely on that for defining a module system. Except JavaScript.
Single-file modules
This one bothers me the most about both CommonJS and RequireJS. Your module is not actually expected to sit into a single file, but if you need to split it into multiple files, you require those files in the same way and you need to qualify all names. For example in UglifyJS there's a small utility library that defines stuff likely to be used in all other files. Because UglifyJS2 doesn't use CommonJS, I have the luxury of just defining stuff I need and access it as global.
Globals are bad, yeah… some thoughts about this later. For now what I meant to say is that I'd like to have the ability to define global stuff and use it as such within my package. I'm not forcing my globals on users of UglifyJS, because there's a build script that puts everything into a lambda; you'd use that in order to use UglifyJS in a browser. As for using UglifyJS in NodeJS, there's one file that loads everything in a separate context and again, my globals won't hurt anybody.
// love:
var indentation = repeat_string(" ", indent_level);
// hate:
var utils = require("./utils");
var indentation = utils.repeat_string(" ", indent_level);
Globals are underrated
Let's look at probably the most successful programming language of all times, C. In C every global is, well, global. #include some file and everything it defines becomes available, unqualified. Easy for everyone, and programmers grew accustomed to give proper names to their libraries and prefix every global, and that's not too bad!
Then came “modern OOP”, telling us that from the following two lines, the second is better:
set_dialog_size(obj, width, height);
obj.set_size(width, height);
I many times prefer the former. It has several advantages:
- To the code reader, it's obvious that “obj” is some dialog object.
- A simple code scanner can easily find the location of set_dialog_size.
- In an interactive system1 you can redefine the global function without restarting the system.
- In JS, first line would minify to “a(b,c,d)” while the second would minify to “a.set_size(b,c)". This can't be true, of course, if set_dialog_size were a real global; but think of string_repeat in my utility library: since the build script puts everything in a lambda, it means set_dialog_size is no longer a global and thus can be safely renamed.
The last point hopefully puts into scene what I'm advocating for. I'm not simply “for globals” — that would be bad, of course. I'm advocating for “globals within one package”. The set_dialog_size function can be global within my package, but if it's really internal API, a build script will take care to hide it from the users of my library.
How's all this related to JavaScript “module systems”? Well, they prevent me from defining a global in one file and using it as such in another file. I have to export stuff in every file, and use require and qualify names. I don't like doing that; it kinda prevents me from thinking about the actual problem that I need to solve. Globals are good; I won't poison your environment with my globals, but I'd like to be able to use unqualified names inside my package. In short, let's say exports is fine, but I'd also like to have an import facility. However, this has to be done at language level; it's a syntax thing, because it puts names into the current scope; you can't do it from an external tool like RequireJS2.
Asynchronous is evil
I wrote before about why I think asynchronous is evil. I got reminded of this evil by a recent issue in the source-map module. Basically, it was using an older version of require.js, which itself used a deprecated NodeJS API and produced a warning. In trying to update require.js I found out that my code (UglifyJS) didn't work anymore, although the error I got was completely unrelated to loading packages. The way a second-hand dependency can break your own program is mental. Investigating I found the following:
$ node
> sm = require("source-map")
{} // ← empty
> sm // few seconds later...
{ SourceMapGenerator: [Function: SourceMapGenerator],
SourceMapConsumer: [Function: SourceMapConsumer],
SourceNode: [Function: SourceNode] } // ← it got stuff!
So require("source-map")
did return an object, but it was initially empty. However, a little later that
same object got populated with the things that “source-map” exports. I was a bit perplexed but quickly realized that
the files that source-map loaded with require.js were loaded asynchronously3. Unfortunately,
there is no way to hide this evil, it has to propagate to the very surface of what you need to write. Let's say foo.js
uses “source-map” like this (just an example, the actual API of source-map is not important here):
var SM = require("source-map");
...
exports.map = SM.makeSourceMap(...);
To make this code work with the new require.js, I'd have to convert it to this:
define("foo", [ "source-map" ], function(SM){
...
return { map: SM.makeSourceMap(...) };
});
Fine, let's say I do that, but can I limit this change to “foo.js”? NO. I have to change modules that are using foo.js too, as well as modules that are using modules that use foo.js, all the way up:
var foo = require("foo");
console.log(foo.map); // becomes
require("foo", function(foo){
console.log(foo.map);
});
In short, because require.js became asynchronous, I have to nest in a function all the code that's using modules that themselves are using require.js. Or, just add a setTimeout in the command line tool and hope for the best. What the …?
UPDATE: another example
Many times I've seen require being used in NodeJS apps like the following. It makes for a good example because we can see how far-reaching is the decision to make something asynchronous:
function compute_something(a, b) {
var utils = require("./utils");
return utils.compute_something(a, b);
}
"utils.js"
is loaded on demand, only if compute_something is called. The result of require is
memoized, meaning that if you call this function a second time, it won't actually load the module again, it'll just use
the previously fetched exports value. You have the great benefit of writing your code in a sequential way, and
returning a value from this function, only because require works synchronously. With an
asynchronous require you'd have to write the function like this:
function compute_something(a, b, callback) {
require("./utils", function(utils){
callback(utils.compute_something(a, b));
});
}
Now I'd be totally fine with changing a single function, but that's not possible! You have to change every place where you use “compute_something”, to pass a callback instead of getting the return value. And all other functions that themselves use “compute_something” need to be changed accordingly. “Asynchronous” is VIRAL—if you have a nested piece of code and somewhere, deep down inside of it, you use a function that suddenly turns asynchronous, you can't abstract away that change—you have to make all code in the chain async, up to and including the toplevel. It's not just a matter of taste; it completely changes everyone's code! If that's not evil, I don't know what is.
Another slight note: above we didn't care about error handling. Presumably if there's an error, “compute_something” will throw an exception. Since our “compute” function is rather small, we don't need to do any error handling in it—but we expect that upper-level code will catch the exception, if any. This model doesn't work in the “asynchronous” world—you have to keep track of errors at every async call. A more correct way of writing the above would be, perhaps:
function compute_something(a, b, callback) {
require("./utils", function(utils){
var error, result;
try {
result = utils.compute_something(a, b);
} catch(ex) {
error = ex;
}
callback(result, error);
});
}
Hard to tell what's the right way; essentially, simple and good design necessarily has to become complex. You need
to figure out a way of passing the error through, so let's say that if you pass a second argument (“error”), then the
“callback” function will know what to do (probably, it will just call its own callback with the error argument). You
have to take care about that at every step. Essentially, you give up try
/catch
, and that's a big
loss.
Uncertainty
Before this mess, you might write code like this with jQuery:
<script src="jquery.js"></script>
<script>
$(document).ready(function(){
// do stuff when the page finished loading
});
</script>
You know it works. Now, however, you'd write it like this:
<script>
require("jquery", function($){
$(document).ready(function(){
// do stuff
});
});
</script>
Does that work? Well, I don't know. Can't tell now. Is the “window.onload” event already fired when jQuery
finished loading? Depends on how require.js works: it could use document.write
to load the scripts, or
just create a <script>
tag, that would be synchronous, thus, sane; that would happen before “window.onload”,
so it should work. If it uses a XMLHttpRequest, on the other hand, then it might not work. Should dig into jQuery
too, maybe it doesn't rely on “window.onload”; maybe even if this event already fired, jQuery is smart enough to
realize that it fired before we set our handler; maybe it just runs the handler then. Worse, it might work on
my development machine, but not on the production server with real network latency.
Lots of questions. See, web development was easy strikingly complicated already; now it got worse.
Luckily, I have the choice of not using RequireJS, but as soon as I'm using a package that's itself using RequireJS,
the asynchronous malefic spirit starts plaguing my program.
I certainly know that asynchronous programming cannot be avoided at times. For example when dealing with user
input. You can't do var answer = ask_user("What's your name?")
for obvious reasons4. But here we talk about loading programs. It's possible that you need to conditionally load scripts at
run-time; in such a case, RequireJS is a great choice (and that's about the only case I can think of). But most of the
times, I'd like to be able to rely on the result returned by require, in a sequential way.
Is speed a reason? Let's see, if your bandwidth is 100KB/s and you need to load 500KB worth of scripts, then that's
sure as hell gonna take about 5 seconds, no matter if you load them sequentially, in parallel, asynchronously or not.
Furthermore, AFAIK <script>
will try to load things in parallel; it's just evaluation that happens in
sequence, and that of course would happen with NameYourFavoritePackageManager too, because JS is single-threaded. So
it can't be any good for speed.
And the boilerplate
So now I have to embed all my modules in code like this? (OK, I know that if one can rely that “require.js” is loaded then it's fine to just call “define” without checking for it first)
(typeof define == "function" && define.amd
? define
: function(name, deps, code){
// dunno what's gonna happen with deps here.
this[name] = code();
})
("MyModule", [ "jquery" ], function($){
// here we get to finally write code
});
As I said, I'm no fan of CommonJS either, but the following makes more sense to me:
var $ = require("jquery");
// start writing code.
For an extreme example: Esmangle used to contain some mind blowing boilerplate in all files to support “asynchronous module definition”; fortunately the author discarded it. Take a look at the sheer amount of useless lines that were dropped with that patch.
The requirement to write weird code to make the package system happy means that something is wrong. If you can't automate it, if you can't put it aside such that you don't see it in every file, means something is wrong. Perhaps this isn't the right answer to the “module problem”. That's what I'm thinking. I sure don't like it.
with
statement for reasons that everybody knows already.
"Dude, require is asynchronous. Pass a callback. And go turn your code upside-down."
prompt
, confirm
; other than that, it's impossible to implement a “modal” (blocking) dialog in
JavaScript