copyTree
parallel
This is the parallel version of the directory copy function. The difference is that instead of waiting for a file to finish copying before starting the next, it will process multiple files at once. Both versions presented here run about 3x faster than their sequential counterparts.
In NodeJS
It might seem more obvious how to achieve parallelism in NodeJS than in
λanguage. With Node you pass the callback and know that execution of the
caller continues immediately at the next line, so it opens the possibility
to use a for
loop. We must be careful not to invoke the callback
until all the files were processed, though. The parallel version in
NodeJS looks about as ugly as the sequential one:
function copyTree(srcdir, destdir, callback) {
fs.mkdir(destdir, function(err){
if (err) throw new Error(err);
fs.readdir(srcdir, function(err, files){
if (err) throw new Error(err);
var count = files.length;
function next(err) {
if (err) throw new Error(err);
if (--count == 0) callback();
}
if (count == 0) {
callback();
} else {
files.forEach(function(f){
var fullname = path.join(srcdir, f);
var dest = path.join(destdir, f);
fs.lstat(fullname, function(err, stat){
if (err) throw new Error(err);
if (stat.isSymbolicLink()) {
fs.readlink(fullname, function(err, target){
if (err) throw new Error(err);
fs.symlink(target, dest, next);
});
} else if (stat.isDirectory()) {
copyTree(fullname, dest, next);
} else if (stat.isFile()) {
fs.readFile(fullname, function(err, data){
if (err) throw new Error(err);
fs.writeFile(dest, data, next);
});
} else {
next();
}
});
});
}
});
});
}
By using that forEach
loop it starts copying all files in a
directory in parallel. The callback (next
) needs to check if all
files were copied before invoking the original callback
.
Additionally, we need to check if there are zero files in a directory in
which case we directly invoke callback
.
In λanguage
To write the parallel version in λanguage we need a primitive function. That's because our wrappers around the asynchronous Node API will only invoke their callback after the work is done, so by default our program can only run sequentially.
I wrote a parallel
primitive which takes a function and calls it
with a single argument that I'm gonna name pcall
. pcall
receives a function argument and calls it “asynchronously” — that is, it
resumes the caller program immediately while that function continues
running. Finally, the parallel()
call returns only after all
pcall
-s in its body have finished. If that sounds complicated,
seeing the usage might be enlightening:
copyTree = λ(srcdir, destdir) {
makeDir(destdir);
parallel(λ(pcall){
forEach(readDir(srcdir), λ(f){
pcall(λ(){
let (fullname = pathJoin(srcdir, f),
dest = pathJoin(destdir, f),
stat = lstat(fullname)) {
if isSymlink(stat) {
symlink(readlink(fullname), dest);
} else if isDirectory(stat) {
copyTree(fullname, dest);
} else if isFile(stat) {
writeFile(dest, readFile(fullname));
}
}
});
});
});
};
So the change we had to do is literally trivial: wrap the forEach
loop in a call to parallel
, and use pcall
to process each
file. We need not count how many files there are and invoke a callback in
random places; everything is handled by the primitive. Our code still
looks sequential, but works in parallel!
A primitive that combines
parallel
andforEach
would allow for even better looking code, and not hard to do; butparallel
seems more general.
One might argue that our λanguage code is beautiful because of our
parallel
abstraction, and had we done something similar in NodeJS
we'd end up with readable JavaScript code. This is somewhat true, as
we'll see, but even with a helper function the NodeJS code won't get as
clear and concise as above.
The primitive: parallel
I'll discuss a bit the implementation of parallel
. Here's the
code:
parallel = js:raw "function(k, f){
var n = 0, result = [], i = 0;
f(function(){ if (i == 0) k(false) }, pcall);
function pcall(kpcall, chunk){
var index = i++;
n++;
chunk(function(rchunk){
n--;
result[index] = rchunk;
if (n == 0) Execute(k, [ result ]);
});
kpcall(false);
}
}";
k
is the continuation of the parallel
call itself. And
f
is the function to invoke with the pcall
argument.
f
is being passed a continuation which does if (i == 0)
k(false)
— this is necessary in case no pcall
was invoked in the
body of f
; we just call the original continuation passing a
false
result.
pcall
is a well behaved function: it takes a continuation
(kpcall
) and a function to run (chunk
). It calls the
chunk but immediately invokes its own pcall
continuation (with
false
; we don't have a meaningful result at this point). So, if
chunk
is asynchronous, then it will run in parallel with the rest
of the program (represented by kpcall
).
pcall
does some housekeeping to keep track of how many times it
was invoked. On the last invocation (where n == 0
) it executes
the original continuation of parallel
, passing to it an array with
the result values (this is not needed for our copyTree
case, but
it might be useful in general).
Wanna play with it? It's defined in this page. The dostuff function below is “asynchronous” — it just prints something after a timeout, and returns the text. Check the difference of using it with and without parallel below.
Notice the time for the non-parallel calls amounts to the total of time
spent in each task, that is, 1700ms; while the time for the parallel one
is roughly the time spent in the longer task (1000ms). The results are
collected in an array and returned after all pcall
-s have
finished, in the proper order.
Problems of copyTree
If you try to run it on a directory containing thousands of files you might have the surprise of getting a “EMFILE” exception. That occurs because by processing multiple files at once, we're hitting the filesystem limit of open files. Fortunately, this is very easy to fix in λanguage by just changing the primitive. Here's the new version:
PCALLS = 1000;
parallel = js:raw "function(k, f){
var n = 0, result = [], i = 0;
f(function(){ if (i == 0) k(false) }, pcall);
function pcall(kpcall, chunk){
n++;
if (PCALLS <= 0) {
setTimeout(function(){
n--;
Execute(pcall, [ kpcall, chunk ]);
}, 5);
} else {
PCALLS--;
var index = i++;
chunk(function(rchunk){
PCALLS++;
n--;
result[index] = rchunk;
if (n == 0) Execute(k, [ result ]);
});
kpcall(false);
}
}
}";
A global PCALLS
variable holds the maximum number of parallel
calls allowed at once. When that maximum is reached, we schedule a retry
after 5 milliseconds. I found 1000 to be a good value for PCALLS (if you
make it too low it will easily starve and will run even much slower than
the synchronous version).
That's it, our copyTree
in λanguage remains intact — we only had
to make the primitive throttle the pcalls when more than 1000 are running.
Our program remains beautiful and sequential-looking, yet it's fast, runs
in parallel and can handle directories of any size.
The fix for NodeJS version
I'd say the following code is the ugliest so far:
var PCALLS = 1000;
function copyTree(srcdir, destdir, callback) {
fs.mkdir(destdir, function(err){
if (err) throw new Error(err);
fs.readdir(srcdir, function(err, files){
if (err) throw new Error(err);
var count = files.length;
function next(err) {
PCALLS++;
if (err) throw new Error(err);
if (--count == 0) callback();
}
if (count == 0) {
callback();
} else {
(function loop(i){
if (PCALLS <= 0) {
setTimeout(function(){
loop(i);
}, 5);
} else if (i < files.length) {
PCALLS--;
var f = files[i];
var fullname = path.join(srcdir, f);
var dest = path.join(destdir, f);
fs.lstat(fullname, function(err, stat){
if (err) throw new Error(err);
if (stat.isSymbolicLink()) {
fs.readlink(fullname, function(err, fullname){
if (err) throw new Error(err);
fs.symlink(fullname, dest, next);
});
} else if (stat.isDirectory()) {
copyTree(fullname, dest, next);
} else if (stat.isFile()) {
fs.readFile(fullname, function(err, data){
if (err) throw new Error(err);
fs.writeFile(dest, data, next);
});
} else {
next();
}
});
loop(i + 1);
}
})(0);
}
});
});
}
Awful, isnt' it? We can make it a little better by writing an
abstraction similar to our parallel
function — in the next section.