copyTree sequential

Here's another problem, of similar difficulty. We want to write a function that copies a directory tree. It takes two arguments: source directory (must exist, must be a directory) and destination path (must not exist, it'll be created).

We're still in “didactic mode” — this function will not be perfect. It won't care about file permissions or proper error handling. Also, it reads each source file in memory before writing it to destination (a correct approach would be to split reads in chunks, to support files bigger than the available RAM).

In NodeJS

First, here is the NodeJS version in its full horror:

function copyTreeSeq(srcdir, destdir, callback) {
  fs.mkdir(destdir, function(err){
    if (err) throw new Error(err);
    fs.readdir(srcdir, function(err, files){
      if (err) throw new Error(err);
      (function loop(i){
        function next(err) {
          if (err) throw new Error(err);
          loop(i + 1);
        }
        if (i < files.length) {
          var f = files[i];
          var fullname = path.join(srcdir, f);
          var dest = path.join(destdir, f);
          fs.lstat(fullname, function(err, stat){
            if (err) throw new Error(err);
            if (stat.isSymbolicLink()) {
              fs.readlink(fullname, function(err, target){
                if (err) throw new Error(err);
                fs.symlink(target, dest, next);
              });
            } else if (stat.isDirectory()) {
              copyTreeSeq(fullname, dest, next);
            } else if (stat.isFile()) {
              fs.readFile(fullname, function(err, data){
                if (err) throw new Error(err);
                fs.writeFile(dest, data, next);
              });
            } else {
              next();
            }
          });
        } else {
          callback();
        }
      })(0);
    });
  });
}

I named it copyTreeSeq because, even though it uses the asynchronous API, it operates in a sequential manner — that is, only one file is copied at every moment.

In λanguage

The version in λanguage is plain simple, still fully asynchronous and performs about as well as the NodeJS code:

copyTreeSeq = λ(srcdir, destdir) {
  makeDir(destdir);
  forEach(readDir(srcdir), λ(f){
    let (fullname = pathJoin(srcdir, f),
         dest = pathJoin(destdir, f),
         stat = lstat(fullname)) {
      if isSymlink(stat) {
        symlink(readlink(fullname), dest);
      } else if isDirectory(stat) {
        copyTreeSeq(fullname, dest);
      } else if isFile(stat) {
        writeFile(dest, readFile(fullname));
      }
    }
  });
};

I tested this by copying a directory containing 2676 nested subfolders, totalling 11012 files (180M), from/to a filesystem mounted in RAM. It takes 3.6s with λanguage and 3.5s with plain NodeJS, that is, Node is about 2% faster. Now look at the code. Is the 2% worth the ugliness?

But, this code is sequential — a single file is copied at a time. Since our filesystem operations are asynchronous, we can attempt to copy multiple files in parallel. I did not believe the difference would be so big, but it is: the parallel version will be about 3 times faster, both in plain NodeJS and in λanguage.