copyTree parallel

This is the parallel version of the directory copy function. The difference is that instead of waiting for a file to finish copying before starting the next, it will process multiple files at once. Both versions presented here run about 3x faster than their sequential counterparts.

In NodeJS

It might seem more obvious how to achieve parallelism in NodeJS than in λanguage. With Node you pass the callback and know that execution of the caller continues immediately at the next line, so it opens the possibility to use a for loop. We must be careful not to invoke the callback until all the files were processed, though. The parallel version in NodeJS looks about as ugly as the sequential one:

function copyTree(srcdir, destdir, callback) {
  fs.mkdir(destdir, function(err){
    if (err) throw new Error(err);
    fs.readdir(srcdir, function(err, files){
      if (err) throw new Error(err);
      var count = files.length;
      function next(err) {
        if (err) throw new Error(err);
        if (--count == 0) callback();
      }
      if (count == 0) {
        callback();
      } else {
        files.forEach(function(f){
          var fullname = path.join(srcdir, f);
          var dest = path.join(destdir, f);
          fs.lstat(fullname, function(err, stat){
            if (err) throw new Error(err);
            if (stat.isSymbolicLink()) {
              fs.readlink(fullname, function(err, target){
                if (err) throw new Error(err);
                fs.symlink(target, dest, next);
              });
            } else if (stat.isDirectory()) {
              copyTree(fullname, dest, next);
            } else if (stat.isFile()) {
              fs.readFile(fullname, function(err, data){
                if (err) throw new Error(err);
                fs.writeFile(dest, data, next);
              });
            } else {
              next();
            }
          });
        });
      }
    });
  });
}

By using that forEach loop it starts copying all files in a directory in parallel. The callback (next) needs to check if all files were copied before invoking the original callback. Additionally, we need to check if there are zero files in a directory in which case we directly invoke callback.

In λanguage

To write the parallel version in λanguage we need a primitive function. That's because our wrappers around the asynchronous Node API will only invoke their callback after the work is done, so by default our program can only run sequentially.

I wrote a parallel primitive which takes a function and calls it with a single argument that I'm gonna name pcall. pcall receives a function argument and calls it “asynchronously” — that is, it resumes the caller program immediately while that function continues running. Finally, the parallel() call returns only after all pcall-s in its body have finished. If that sounds complicated, seeing the usage might be enlightening:

copyTree = λ(srcdir, destdir) {
  makeDir(destdir);
  parallel(λ(pcall){
    forEach(readDir(srcdir), λ(f){
      pcall(λ(){
        let (fullname = pathJoin(srcdir, f),
             dest = pathJoin(destdir, f),
             stat = lstat(fullname)) {
          if isSymlink(stat) {
            symlink(readlink(fullname), dest);
          } else if isDirectory(stat) {
            copyTree(fullname, dest);
          } else if isFile(stat) {
            writeFile(dest, readFile(fullname));
          }
        }
      });
    });
  });
};

So the change we had to do is literally trivial: wrap the forEach loop in a call to parallel, and use pcall to process each file. We need not count how many files there are and invoke a callback in random places; everything is handled by the primitive. Our code still looks sequential, but works in parallel!

A primitive that combines parallel and forEach would allow for even better looking code, and not hard to do; but parallel seems more general.

One might argue that our λanguage code is beautiful because of our parallel abstraction, and had we done something similar in NodeJS we'd end up with readable JavaScript code. This is somewhat true, as we'll see, but even with a helper function the NodeJS code won't get as clear and concise as above.

The primitive: parallel

I'll discuss a bit the implementation of parallel. Here's the code:

parallel = js:raw "function(k, f){
  var n = 0, result = [], i = 0;
  f(function(){ if (i == 0) k(false) }, pcall);
  function pcall(kpcall, chunk){
    var index = i++;
    n++;
    chunk(function(rchunk){
      n--;
      result[index] = rchunk;
      if (n == 0) Execute(k, [ result ]);
    });
    kpcall(false);
  }
}";

k is the continuation of the parallel call itself. And f is the function to invoke with the pcall argument. f is being passed a continuation which does if (i == 0) k(false) — this is necessary in case no pcall was invoked in the body of f; we just call the original continuation passing a false result.

pcall is a well behaved function: it takes a continuation (kpcall) and a function to run (chunk). It calls the chunk but immediately invokes its own pcall continuation (with false; we don't have a meaningful result at this point). So, if chunk is asynchronous, then it will run in parallel with the rest of the program (represented by kpcall).

pcall does some housekeeping to keep track of how many times it was invoked. On the last invocation (where n == 0) it executes the original continuation of parallel, passing to it an array with the result values (this is not needed for our copyTree case, but it might be useful in general).

Wanna play with it? It's defined in this page. The dostuff function below is “asynchronous” — it just prints something after a timeout, and returns the text. Check the difference of using it with and without parallel below.

Notice the time for the non-parallel calls amounts to the total of time spent in each task, that is, 1700ms; while the time for the parallel one is roughly the time spent in the longer task (1000ms). The results are collected in an array and returned after all pcall-s have finished, in the proper order.

Problems of copyTree

If you try to run it on a directory containing thousands of files you might have the surprise of getting a “EMFILE” exception. That occurs because by processing multiple files at once, we're hitting the filesystem limit of open files. Fortunately, this is very easy to fix in λanguage by just changing the primitive. Here's the new version:

PCALLS = 1000;

parallel = js:raw "function(k, f){
  var n = 0, result = [], i = 0;
  f(function(){ if (i == 0) k(false) }, pcall);
  function pcall(kpcall, chunk){
    n++;
    if (PCALLS <= 0) {
      setTimeout(function(){
        n--;
        Execute(pcall, [ kpcall, chunk ]);
      }, 5);
    } else {
      PCALLS--;
      var index = i++;
      chunk(function(rchunk){
        PCALLS++;
        n--;
        result[index] = rchunk;
        if (n == 0) Execute(k, [ result ]);
      });
      kpcall(false);
    }
  }
}";

A global PCALLS variable holds the maximum number of parallel calls allowed at once. When that maximum is reached, we schedule a retry after 5 milliseconds. I found 1000 to be a good value for PCALLS (if you make it too low it will easily starve and will run even much slower than the synchronous version).

That's it, our copyTree in λanguage remains intact — we only had to make the primitive throttle the pcalls when more than 1000 are running. Our program remains beautiful and sequential-looking, yet it's fast, runs in parallel and can handle directories of any size.

The fix for NodeJS version

I'd say the following code is the ugliest so far:

var PCALLS = 1000;

function copyTree(srcdir, destdir, callback) {
  fs.mkdir(destdir, function(err){
    if (err) throw new Error(err);
    fs.readdir(srcdir, function(err, files){
      if (err) throw new Error(err);
      var count = files.length;
      function next(err) {
        PCALLS++;
        if (err) throw new Error(err);
        if (--count == 0) callback();
      }
      if (count == 0) {
        callback();
      } else {
        (function loop(i){
          if (PCALLS <= 0) {
            setTimeout(function(){
              loop(i);
            }, 5);
          } else if (i < files.length) {
            PCALLS--;
            var f = files[i];
            var fullname = path.join(srcdir, f);
            var dest = path.join(destdir, f);
            fs.lstat(fullname, function(err, stat){
              if (err) throw new Error(err);
              if (stat.isSymbolicLink()) {
                fs.readlink(fullname, function(err, fullname){
                  if (err) throw new Error(err);
                  fs.symlink(fullname, dest, next);
                });
              } else if (stat.isDirectory()) {
                copyTree(fullname, dest, next);
              } else if (stat.isFile()) {
                fs.readFile(fullname, function(err, data){
                  if (err) throw new Error(err);
                  fs.writeFile(dest, data, next);
                });
              } else {
                next();
              }
            });
            loop(i + 1);
          }
        })(0);
      }
    });
  });
}

Awful, isnt' it? We can make it a little better by writing an abstraction similar to our parallel function — in the next section.