Thoughts on CommonJS / RequireJS

  • Published: 2012-11-13
  • Modified: 2013-10-21 18:23
  • By: Mishoo
  • Tags: javascript
  • Comments: 5 (add)
Nov
13
2012

Thoughts on CommonJS / RequireJS

This rant is to explain this tweet. The short version is, I think they introduce more problems than they solve. While I do admit there is a need to define packages in JavaScript, it should probably be addressed at language level. In this fabulous class, Gerald Jay Sussman says “I consider being able to return procedural values, and therefore to have first-class procedures in general, as being essential to good modular programming” and continues saying that first-class functions cover all modularity needs, including making package systems. That's certainly true, but no language I know, not even Scheme, relies solely on that for defining a module system. Except JavaScript.

Single-file modules

This one bothers me the most about both CommonJS and RequireJS. Your module is not actually expected to sit into a single file, but if you need to split it into multiple files, you require those files in the same way and you need to qualify all names. For example in UglifyJS there's a small utility library that defines stuff likely to be used in all other files. Because UglifyJS2 doesn't use CommonJS, I have the luxury of just defining stuff I need and access it as global.

Globals are bad, yeah… some thoughts about this later. For now what I meant to say is that I'd like to have the ability to define global stuff and use it as such within my package. I'm not forcing my globals on users of UglifyJS, because there's a build script that puts everything into a lambda; you'd use that in order to use UglifyJS in a browser. As for using UglifyJS in NodeJS, there's one file that loads everything in a separate context and again, my globals won't hurt anybody.

// love:
var indentation = repeat_string(" ", indent_level);

// hate:
var utils = require("./utils");
var indentation = utils.repeat_string(" ", indent_level);

Globals are underrated

Let's look at probably the most successful programming language of all times, C. In C every global is, well, global. #include some file and everything it defines becomes available, unqualified. Easy for everyone, and programmers grew accustomed to give proper names to their libraries and prefix every global, and that's not too bad!

Then came “modern OOP”, telling us that from the following two lines, the second is better:

set_dialog_size(obj, width, height);
obj.set_size(width, height);

I many times prefer the former. It has several advantages:

The last point hopefully puts into scene what I'm advocating for. I'm not simply “for globals” — that would be bad, of course. I'm advocating for “globals within one package”. The set_dialog_size function can be global within my package, but if it's really internal API, a build script will take care to hide it from the users of my library.

How's all this related to JavaScript “module systems”? Well, they prevent me from defining a global in one file and using it as such in another file. I have to export stuff in every file, and use require and qualify names. I don't like doing that; it kinda prevents me from thinking about the actual problem that I need to solve. Globals are good; I won't poison your environment with my globals, but I'd like to be able to use unqualified names inside my package. In short, let's say exports is fine, but I'd also like to have an import facility. However, this has to be done at language level; it's a syntax thing, because it puts names into the current scope; you can't do it from an external tool like RequireJS2.

Asynchronous is evil

I wrote before about why I think asynchronous is evil. I got reminded of this evil by a recent issue in the source-map module. Basically, it was using an older version of require.js, which itself used a deprecated NodeJS API and produced a warning. In trying to update require.js I found out that my code (UglifyJS) didn't work anymore, although the error I got was completely unrelated to loading packages. The way a second-hand dependency can break your own program is mental. Investigating I found the following:

$ node
> sm = require("source-map")
{}   // ← empty
> sm   // few seconds later...
{ SourceMapGenerator: [Function: SourceMapGenerator],
  SourceMapConsumer: [Function: SourceMapConsumer],
  SourceNode: [Function: SourceNode] }   // ← it got stuff!

So require("source-map") did return an object, but it was initially empty. However, a little later that same object got populated with the things that “source-map” exports. I was a bit perplexed but quickly realized that the files that source-map loaded with require.js were loaded asynchronously3. Unfortunately, there is no way to hide this evil, it has to propagate to the very surface of what you need to write. Let's say foo.js uses “source-map” like this (just an example, the actual API of source-map is not important here):

var SM = require("source-map");
...
exports.map = SM.makeSourceMap(...);

To make this code work with the new require.js, I'd have to convert it to this:

define("foo", [ "source-map" ], function(SM){
  ...
  return { map: SM.makeSourceMap(...) };
});

Fine, let's say I do that, but can I limit this change to “foo.js”? NO. I have to change modules that are using foo.js too, as well as modules that are using modules that use foo.js, all the way up:

var foo = require("foo");
console.log(foo.map);     // becomes

require("foo", function(foo){
  console.log(foo.map);
});

In short, because require.js became asynchronous, I have to nest in a function all the code that's using modules that themselves are using require.js. Or, just add a setTimeout in the command line tool and hope for the best. What the …?

UPDATE: another example

Many times I've seen require being used in NodeJS apps like the following. It makes for a good example because we can see how far-reaching is the decision to make something asynchronous:

function compute_something(a, b) {
  var utils = require("./utils");
  return utils.compute_something(a, b);
}

"utils.js" is loaded on demand, only if compute_something is called. The result of require is memoized, meaning that if you call this function a second time, it won't actually load the module again, it'll just use the previously fetched exports value. You have the great benefit of writing your code in a sequential way, and returning a value from this function, only because require works synchronously. With an asynchronous require you'd have to write the function like this:

function compute_something(a, b, callback) {
  require("./utils", function(utils){
    callback(utils.compute_something(a, b));
  });
}

Now I'd be totally fine with changing a single function, but that's not possible! You have to change every place where you use “compute_something”, to pass a callback instead of getting the return value. And all other functions that themselves use “compute_something” need to be changed accordingly. “Asynchronous” is VIRAL—if you have a nested piece of code and somewhere, deep down inside of it, you use a function that suddenly turns asynchronous, you can't abstract away that change—you have to make all code in the chain async, up to and including the toplevel. It's not just a matter of taste; it completely changes everyone's code! If that's not evil, I don't know what is.

Another slight note: above we didn't care about error handling. Presumably if there's an error, “compute_something” will throw an exception. Since our “compute” function is rather small, we don't need to do any error handling in it—but we expect that upper-level code will catch the exception, if any. This model doesn't work in the “asynchronous” world—you have to keep track of errors at every async call. A more correct way of writing the above would be, perhaps:

function compute_something(a, b, callback) {
  require("./utils", function(utils){
    var error, result;
    try {
      result = utils.compute_something(a, b);
    } catch(ex) {
      error = ex;
    }
    callback(result, error);
  });
}

Hard to tell what's the right way; essentially, simple and good design necessarily has to become complex. You need to figure out a way of passing the error through, so let's say that if you pass a second argument (“error”), then the “callback” function will know what to do (probably, it will just call its own callback with the error argument). You have to take care about that at every step. Essentially, you give up try/catch, and that's a big loss.

Uncertainty

Before this mess, you might write code like this with jQuery:

<script src="jquery.js"></script>
<script>
  $(document).ready(function(){
    // do stuff when the page finished loading
  });
</script>

You know it works. Now, however, you'd write it like this:

<script>
  require("jquery", function($){
    $(document).ready(function(){
      // do stuff
    });
  });
</script>

Does that work? Well, I don't know. Can't tell now. Is the “window.onload” event already fired when jQuery finished loading? Depends on how require.js works: it could use document.write to load the scripts, or just create a <script> tag, that would be synchronous, thus, sane; that would happen before “window.onload”, so it should work. If it uses a XMLHttpRequest, on the other hand, then it might not work. Should dig into jQuery too, maybe it doesn't rely on “window.onload”; maybe even if this event already fired, jQuery is smart enough to realize that it fired before we set our handler; maybe it just runs the handler then. Worse, it might work on my development machine, but not on the production server with real network latency.

Lots of questions. See, web development was easy strikingly complicated already; now it got worse. Luckily, I have the choice of not using RequireJS, but as soon as I'm using a package that's itself using RequireJS, the asynchronous malefic spirit starts plaguing my program.

I certainly know that asynchronous programming cannot be avoided at times. For example when dealing with user input. You can't do var answer = ask_user("What's your name?") for obvious reasons4. But here we talk about loading programs. It's possible that you need to conditionally load scripts at run-time; in such a case, RequireJS is a great choice (and that's about the only case I can think of). But most of the times, I'd like to be able to rely on the result returned by require, in a sequential way.

Is speed a reason? Let's see, if your bandwidth is 100KB/s and you need to load 500KB worth of scripts, then that's sure as hell gonna take about 5 seconds, no matter if you load them sequentially, in parallel, asynchronously or not. Furthermore, AFAIK <script> will try to load things in parallel; it's just evaluation that happens in sequence, and that of course would happen with NameYourFavoritePackageManager too, because JS is single-threaded. So it can't be any good for speed.

And the boilerplate

So now I have to embed all my modules in code like this? (OK, I know that if one can rely that “require.js” is loaded then it's fine to just call “define” without checking for it first)

(typeof define == "function" && define.amd
  ? define
  : function(name, deps, code){
      // dunno what's gonna happen with deps here.
      this[name] = code();
    })
("MyModule", [ "jquery" ], function($){
  // here we get to finally write code
});

As I said, I'm no fan of CommonJS either, but the following makes more sense to me:

var $ = require("jquery");
// start writing code.

For an extreme example: Esmangle used to contain some mind blowing boilerplate in all files to support “asynchronous module definition”; fortunately the author discarded it. Take a look at the sheer amount of useless lines that were dropped with that patch.

The requirement to write weird code to make the package system happy means that something is wrong. If you can't automate it, if you can't put it aside such that you don't see it in every file, means something is wrong. Perhaps this isn't the right answer to the “module problem”. That's what I'm thinking. I sure don't like it.

Footnotes
1. This is a feature available in every Lisp system I know, including my own.
2. Of course, we rule out the with statement for reasons that everybody knows already.
3. I'm not even sure why it bothers returning it, in this case. It would be better if the async require would return a constant string, like "Dude, require is asynchronous. Pass a callback. And go turn your code upside-down."
4. Again, we rule out prompt, confirm; other than that, it's impossible to implement a “modal” (blocking) dialog in JavaScript
5 comments. This is HOT!

Add your comment

# Andrew Petersen
2012-11-15 18:07
I agree nearly completely with your pain points of AMD. I've been in similar situations with Vash (https://github.com/kirbysayshi/vash) and some form of repeated UMD boilerplate. I've also used RequireJS in Backbone-based projects, and straight jQuery based projects using r.js as a build tool. I believe jrburke and others have said that AMD isn't perfect, but that it solves the primary problem you acknowledged: loading asynchronous scripts on demand. Otherwise I'm with you, it's not worth it. jrburke is extremely receptive to discussion in my experience, and that's refreshing in a project. Regarding your footnote about redefining global functions, as long as the function is not contained within a closure, there's nothing stopping you from running `var lib = require("blah"); lib.someexported = function() return "redefined";`. In node at least, that will redefine the function globally. I'm not bringing this up to invalid any part of your argument, I just wanted to mention it. Thanks for outlining this with concrete details! I enjoyed reading it. That bug with source-maps was mind-boggling, as I can't think of a single popular node package that asynchronously loads its modules! p.s. this comment form wouldn't let me click "Ok" on an iPad
# James Burke
2012-11-15 20:31
Thanks for writing down your thoughts. Here is what I take away from them: Globals: it sounds like what you want is what some version of ES modules have called import *, although modules will still need to explicitly list their exports. import * may not make it into ES modules this time around, but even without it, destructuring makes it pretty easy to create locally bound variables. The tradeoff with the globals is that it is difficult for others to know where they were defined, and it really hurts as projects get larger. The destructuring syntax to me seems like a good enough tradeoff -- the source of a piece of functionality is known, and use of that functionality can just be local variables to functions. async is evil: the problem you had seems to be from a change I did in requirejs 2.1, but then corrected in 2.1.1. With 2.1.1, someone using requirejs to construct a module export for node can now just doe requirejs('moduleName') and get synchronous loading behavior. However, this points out a larger problem in play here -- nodejs chose sync for its module system where it did not need to. It thought it needed to do this in order to avoid callbacks for dependencies. However, what we do in AMD is to analyze dependencies in the file first, load and execute them, then run the current module, so require('stringvalue') works, it just gives the module export, and it avoids each require call from having a callback. This sort of "async under the covers" system is a requirement for JS module system that works in browsers. As it was, node was developed from a server person's perspectives, they were still learning (we all were) and they are stuck with their system for a while. I am hopeful that ES modules will do better though. On boilerplate: the difficulty is that there was no module system for JS, just script tags, so in this transition period it will be odd. I expect to see similar boilerplate for scripts that will want to opt in to ES modules but also work in a browser globlals context. This is just one of the difficulties with upgrading the web, there is backward compat issues. Hopefully the timeframes of backcompat support can shrink if browsers are on faster upgrade paths, but there will still be transition periods. In summary, I believe most of your feedback is based on the pains of not having a language level module support, and people trying implementations to figure out what it should be, like CommonJS and AMD. This is how it should be, using real implementations that get broader use to inform an eventual standard. However, it can be rougher to navigate in the shorter term. For what it is worth though, I expect any ES module system to look closer to CommonJS/AMD styles than browser globals and developers having to manually work out dependency trees.
# Mishoo
2012-11-18 00:33
Thanks for the follow-up, James! I'm sorry if my critic is going too far... on the whole, RequireJS is a good thing (in browser land) but I still hope for a language feature that would allow to write sequential and predictable code. > destructuring makes it pretty easy to create locally bound variables. That's true, but it's just nicer without having to explicitly declare locals and import every name. > The tradeoff with the globals is that it is difficult for others to know where they were defined That's not true. Even a simple “grep” can tell me where some global is defined (assuming it has a long name, which globals should), but however in practice I wouldn't rely on “grep” but on more advanced tools. A quick and dirty script using UglifyJS's parser could easily tell me where function X is defined, if it were a global; but if it were utils.X, well, that's a lot more difficult. Async really is evil. In fact I just added an update to that section with a better example of its “viral” nature. It changes everything it touches, and you lose the benefit of returning from a function, or using exceptions. I'm happy to hear that the issue that affected source-map was just a bug—I actually thought it's intended behavior that `require` will work asynchronously in NodeJS. That would be quite bad. This said, I realize that async loading files is still the best way to handle dependencies in the browser. That if we insist to be able to list dependencies in every file, rather than in a single place as we used to do. I personally still prefer <script> tags and build scripts, and I hope to continue to have choice over this.
# Allen Rice
2013-01-23 07:49
Nice write up! It brought a lot to my attention, specifically the async part! I'm currently in the process of researching RequireJS and determining if we should switch to it. We currently manage a list of dependencies separately and concat all of our scripts (each a big, descriptively namespaced revealing module pattern) into a big minified bundle. We are definitely used to the synchronous nature of that mechanism and I think that is appropriate. I think switching to async, just to save us the hassle of tracking dependencies, isn't worth it. That said, I'm wondering, does the RequireJS Optimizer do away with the async problems? I'm not sure, but I'm hoping that since you'd be building a big bundle that is guaranteed to always be present in its entirety, you wouldn't have to use it in an async manner. This is probably a basic question but I have no experience with AMD / RequireJS yet. I'm hoping that I'm able to use RequireJS with the Optimizer to make one big, or a few common bundles for my clients. This will let me leave the dependency tracking up to the code and I can come in whenever and decide how to pick apart my bundles, or just not do anything and allow it to build one big bundle. Any feedback on how the optimizer affects the async nature of RequireJS would be very much appreciated! Until then, I'm going through every article I can find and experimenting.
# Dmitry Sheiko
2014-08-15 11:57
Agree. As Tom Dale put it "AMD is Many HTTP Requests and Too Much Ceremony". Almost all the goodies of AMD can be brought with CJS Modules/1.1 by using a compiler. I use extensively this CJC compiler: http://www.slideshare.net/dsheiko/modular-javascript-with-commonjs-compiler It completely fulfills the requirements I have. However considering a case of large scale app based on modules loaded on-demand, here going without AMD would be a headache (hello r.js and NGINX hooks).