Multiple inheritance and multiple dispatch in JavaScript

  • Published: 2014-10-04
  • Modified: 2014-10-04 14:30
  • By: Mishoo
  • Tags: javascript, multimethods, multiple dispatch
  • Comments: 0 (add)

Attached files:

Oct
4
2014

Multiple inheritance and multiple dispatch in JavaScript

Multiple inheritance and multiple dispatch are core features in CLOS. The standard JavaScript object system supports neither, but they can be implemented. I thought I'd write this article, along with working code, for the JavaScript fellows who don't know the concept. Maybe someone smarter than me will come up with an efficient implementation.

Single dispatch systems (like JS)

In most object-oriented languages, a class can have methods. Calling a method like foo.bar() involves a "runtime dispatch", meaning that the compiler cannot generally tell what method is needed by statically analyzing the code; this detail can only be determined when the code runs. Example:

function Foo() {}
Foo.prototype.speak = function(){ alert("Hi, Foo here") };

function Moo() {}
Moo.prototype.speak = function(){ alert("Moo, man!") };

function make_object() {
    return Math.random() < 0.5 ? new Foo() : new Moo();
}

var obj = make_object();
obj.speak(); // which method do we call here?

What does this code output? We don't know, it's random. So at run time, the interpreter (or virtual machine, or the compiled program—the difference doesn't matter here) will have to determine the type of obj, usually available as obj.constructor, and locate a speak method in its prototype.

This is called “single dispatch”—the type of a single variable (obj) is sufficient to decide which method to call.

Multimethods (multiple dispatch)

These dispatch based on the types of more than one argument. Thus, a method no longer belongs to a single class. Instead, it belongs to a “generic function” — an object maintaining a list of “specialized” methods. When the generic is invoked, it determines what are the matching ("applicable") methods for the given arguments, and invokes the best match.

Let's see an example. We want to write an add function which can add numbers or arrays. Adding a number to an array will return a new array with each element of the original array added to that number. Adding two arrays will return a new array with the added elements.

// create the generic function
var add = MM.defgeneric("add");

// define the implementation for adding numbers, e.g. add(3, 4)
add.defmethod([ MM.JS_Number, MM.JS_Number ], function(a, b){
    return a + b;
});

// implementation for add([ 1, 2, 3 ], 3)
// we specialize the second arg on MM.Object
add.defmethod([ MM.JS_Array, MM.Object ], function(a, b){
    return a.map(function(el){ return add(el, b) });
});

// implementation for add(3, [ 1, 2, 3 ])
// specialize the first arg on MM.Object
add.defmethod([ MM.Object, MM.JS_Array ], function(b, a){
    return a.map(function(el){ return add(b, el) });
});

// implementation for add([ 1, 2, 3 ], [ 1, 2, 3 ])
add.defmethod([ MM.JS_Array, MM.JS_Array ], function(a, b){
    return a.map(function(el, i){ return add(el, b[i]) });
});

console.log(add(3, 4));                        // 7
console.log(add(3, [ 1, 2, 3 ]));              // [ 4, 5, 6 ]
console.log(add(10, [ 2, 3, [ 4, 5, 6 ] ]));   // [ 12, 13, [ 14, 15, 16 ] ]
console.log(add([ 2, 3, [ 4, 5, 6 ] ], 10));   // [ 12, 13, [ 14, 15, 16 ] ]
console.log(add([ 1, 2, 3 ], [ 4, 5, 6 ]));    // [ 5, 7, 9 ]

console.log(add(1, "foo"));                    // FAILS
console.log(add([ 1, 2, 3 ], [ 1, 2 ]));       // FAILS

add looks like a plain JS function. However, add.defmethod allows us to define specialized implementations. defmethod takes two arguments—an array of types, and a function to execute when the generic is called with matching arguments.

So add responds differently, depending on the argument types. Of course, we could do that with a bunch of if/else-s, but the ability to have separate implementations, while allowing others to customize behavior for types we didn't consider, is the actual point of object-oriented programming.

Finally, the last two lines will fail with “No applicable methods in add”. There is no implementation to add a number and a string, and on the last line the arrays have different lengths so eventually add would be called with an undefined argument. In a way, we have a limited form of run-time type checking: rather than returning NaN-sense, it throws an error, which is arguably a better behavior.

Notice the second and third specializations there, they list a MM.Object argument, instead of JS_Number; MM.Object is the base class for all objects. This opens the possibility for someone else to extend our method, say, if needed for strings:

add.defmethod([ MM.JS_String, MM.JS_String ], function(a, b){
    return a + b;
});

// and now it'll work for strings too:
console.log(add([ "f", "b", "m" ], "oo"));     // [ 'foo', 'boo', 'moo' ]

We didn't have to define explicit methods for adding strings and arrays this time. They're handled thanks to inheritance by the specializations on MM.Object above. And the following still fails, because there is no implementation for adding numbers and strings:

console.log(add(1, "foo"));
console.log(add("foo", 1));

Classes and inheritance

This clearly departs from the standard JavaScript object system. Our objects are still JS objects, but prototypal inheritance is of no use, since we no longer store methods in an object's prototype. Here is an example where we define a few classes:

// a base class
var Person = MM.defclass("Person");

// Man and Woman inherit from Person (second arg to defclass)
var Man = MM.defclass("Man", [ Person ]);
var Woman = MM.defclass("Woman", [ Person ]);

//---- constructor
//
// MM.initialize is a generic function called by the system when an object is created.
// Its responsibility is to initialize the new object instance — like a constructor.
// Here we specialize it for the base Person class.

MM.initialize.defmethod([ Person ], function(person, args){
    person.name = args.name;
});

// notice that our constructor takes two arguments, but we specialize only on the first.
// Whatever other arguments are received, if the first one is a Person then our method
// will be called with all arguments.
//
//----

// now create two objects with MM.new
var jane = MM.new(Woman, { name: "Jane" });
var john = MM.new(Man, { name: "John" });

// or, a shortcut syntax is available:
var alice = Woman.new({ name: "Alice" });

// create a generic function
var say_hi = MM.defgeneric("say_hi");

// ... and define its implementation for Man
say_hi.defmethod([ Man ], function(man){
    console.log("Yo, " + man.name + " my man, what's up?");
});

// ... and for Woman.
say_hi.defmethod([ Woman ], function(woman){
    console.log("Good morning miss " + woman.name + "!");
});

say_hi(john);
say_hi(jane);

It should be obvious what the output is going to be.

One nice thing is that we need not call the base class constructor from the sub-classes. We didn't even define constructors for the sub-classes; since all persons have a name, that can be handled in the base class and we did that by specializing MM.initialize on Person. Although we created Man and Woman objects, the inheritance mechanism figured out that it's appropriate to use the Person constructor.

Oh, do you want to define a method for our previous add function so that you can add two Person-s? That's totally doable:

add.defmethod([ Person, Person ], function(a, b){
    return a.name + " married " + b.name;
});

console.log(add(john, [ jane, alice ]));
// [ 'John married Jane', 'John married Alice' ]

"Next" methods

So if we wanted to do custom things when a Man or Woman objects are created, we can further specialize the constructor MM.initialize:

MM.initialize.defmethod([ Man ], function(){
    this(); // call the next available method
    console.log("A boy is born");
});

MM.initialize.defmethod([ Woman ], function(){
    this();
    console.log("A girl is born");
});

var john = Man.new({ name: "John" });
var jane = Woman.new({ name: "Jane" });

This is just logging the fact that a Man or a Woman is created. Notice a call to this() in both. It will invoke the next available method, which in this case is the implementation of MM.initialize for Person, to set person.name. Conveniently, I don't have to pass any arguments—by default it'll send the original arguments.

The nice thing is that I don't have to know or care what's the base class. There could be more than one base class, by the way. The system knows what is the next most specific method, and to call it I only write this()1. Farewell, BaseClass.prototype.method.apply(this, arguments).

Multiple inheritance

Another feature of the system is that a class can inherit from multiple base classes, here is a dumb example:

var ColoredObject = MM.defclass("ColoredObject");
MM.initialize.defmethod([ ColoredObject ], function(obj, props){
    this();
    obj.color = props.color;
});

var Vehicle = MM.defclass("Vehicle");
MM.initialize.defmethod([ Vehicle ], function(vehicle, props){
    this();
    vehicle.speed = props.speed;
    vehicle.power = props.power;
});

var Car = MM.defclass("Car", [ ColoredObject, Vehicle ]);
MM.initialize.defmethod([ Car ], function(car, props){
    this();
    car.brand = props.brand;
});

var car = Car.new({
    brand: "BMW",
    color: "black",
    power: "140 HP",
    speed: "200 km/h"
});

console.log(car);
console.log(MM.is_a(car, Car),
            MM.is_a(car, Vehicle),
            MM.is_a(car, ColoredObject));

Even though in the Car constructor we only set the brand, the object will get all properties because each constructor invokes the next one by calling this(). Also, the last output will be true true true, since the object is really an instance of all those classes.

Method combination

Let's say we have a Server object for which a handle_request generic function is implemented. And it's in a library which we don't wish to change, but we'd like to add some logging. Here's one way:

// define our own class to customize the behavior
var MyServer = MM.defclass("MyServer", [ Server ]);

var server = MyServer.new(...); // create the object

handle_request.defmethod([ MyServer, Request ], function(server, request){
    // do the logging
    console.log(request.url);

    // and call the next method
    return this();
});

A shortcut for that is to define a "before" method:

handle_request.defmethod("before", [ MyServer, Request ], function(server, request){
    console.log(request.url);
});

The "before" combinator tells the system that we'd like this method to be called before any other available methods. We don't need to worry about calling this() or returning a value, it will happen automatically.

With this trick we could do the same without defining a subclass:

var server = Server.new(...); // instantiate the original class

handle_request.defmethod("before", [ Server, Request ], function(server, request){
    console.log(request.url);
});

Without subclassing, using "before" is required; otherwise we'd simply override the original method, leaving no way to call it later. The equivalent in JavaScript would be to set Server.prototype.handle_request without saving the original value.

Similar to "before" there is an "after" combinator which arranges for the new method to be executed after the next most specific one. Back to our car example, we could have specialized the constructors with "after", and no need to call this():

MM.initialize.defmethod("after", [ ColoredObject ], function(obj, props){
    obj.color = props.color;
});

MM.initialize.defmethod("after", [ Vehicle ], function(vehicle, props){
    vehicle.speed = props.speed;
    vehicle.power = props.power;
});

MM.initialize.defmethod("after", [ Car ], function(car, props){
    // car.color, car.speed and car.power are already initialized here.
    car.brand = props.brand;
});

This is the recommended way to specialize MM.initialize, because you want any base classes fully initialized first.

The "around" combinator

Supposing we'd like to ensure that a request is authenticated before handling it, we could do the following:

// define a new class to handle authentication
var Auth = MM.defclass("Auth");

// this generic function will take an Auth object and a Request
// and will return true if the request is authorized.
var authenticate = MM.defgeneric("authenticate");

authenticate.defmethod([ Auth, Request ], function(auth, request){
    // handle authentication and return true or false
});

// the following implementation for handle_request will ensure that
// a request is authentic before passing it over to the next method.
handle_request.defmethod("around", [ Auth, Request ], function(auth, request){
    if (authenticate(auth, request)) {
        return this(); // call next method
    } else {
        request.error("403 Forbidden"); // return some error page
    }
});

// now inherit MyServer from *both* Server and Auth
var MyServer = MM.defclass("MyServer", [ Server, Auth ]);

Notice the endless possibilities of customization here. The Auth and Server objects are completely unrelated; no one inherits from the other. Yet we can still define an implementation of handle_request for Auth objects. Our implementation can just validate the request, and if it's authentic it calls the next method with this(). Since MyServer inherits from both, the next method will be the one defined on the other base class (Server).

The point is that the Auth library might be in a separate module, independent from Server, and by inheriting from both we have a server + authentication with no additional work2.

It might seem that "around" is like the default ("primary") combination (where you specify no combinator) but it's subtly different. In the example above, if "around" was not specified then the Auth's implementation for handle_request would be ignored and the default one called instead. The reason is that Server appears first in the list of MyServer's super classes—so its implementation would be more specific.

To fix this we could declare MyServer inherits from [ Auth, Server ], thus making the authentication handler more specific, but "around" is really the best option because the system makes sure that it takes priority, regardless of the class precedence order. An "around" method wraps any "before", primary and "after" methods which may be defined. The ordering is:

  • Execute any "around" methods, in most-specific-first order.
  • Execute any "before" methods, in most-specific-first order.
  • Execute any primary (standard) methods, in most-specific-first order.
  • Execute any "after" methods, in least-specific-first order (so these are reversed).

The last three happen only if all "around" methods invoked this().

Finally, let's say that the server library has three request classes: base class (Request) and two sub-classes, PlainRequest and EncryptedRequest, and it instantiates the right one before passing it to handle_request. Then if we wanted to allow requests only over SSL, thanks to multiple dispatch we could just add these lines:

handle_request.defmethod("around", [ MyServer, PlainRequest ], function(server, request){
    redirect_to_encrypted_url(request);
    // by not calling this() we end the processing here.
});

So instead of adding if-s here and there, if the class hierarchy is properly granularized we can (re)define the behavior of the application by adding appropriate methods to the generic functions.


Another scenario for "around" is initialization/cleanup for resources that might be needed over the course of a method. For example Auth could provide a database connection and an user object to any next methods:

handle_request.defmethod("around", [ Auth, Request ], function(auth, request){
    var db = database_connect(db_credentials);
    var user = get_authenticated_user(db, request);
    if (user) {
        // set these properties for next methods
        request.db = db;
        request.user = user;

        // call next method
        var result = this();

        // cleanup
        database_disconnect(db);
        delete request.db;
        delete request.user;
        return result;
    } else {
        database_disconnect(db);
        request.error("403 Forbidden"); // return some error page
    }
});

handle_request.defmethod([ MyServer, Request ], function(server, request){
    // request.db and request.user will be magically available here.
});

What about asynchronous programming?

All this should integrate pretty well with continuation-passing style. The usual “niceties” apply: return and exceptions are unusable, and you have to nest callbacks. The example above would look like this (barring error handling):

handle_request.defmethod("around", [ Auth, Request ], function(auth, request, callback){
    var next = this;
    database_connect(db_credentials, function(db){
        get_authenticated_user(db, request, function(user){
            if (user) {
                request.db = db;
                request.user = user;
                next(auth, request, function(result){
                    database_disconnect(db);
                    delete request.db;
                    delete request.user;
                    callback(result);
                });
            } else {
                database_disconnect(db);
                request.error("403 Forbidden");
                callback();
            }
        });
    });
});

Implementation notes

The implementation is a stripped-down version of TinyCLOS. (it has no MOP; but if you know the term I'm surprised you've read this far, since this article was nothing new to you)

Code: mm.js.

This implementation is slow. Too many things happen when a generic function is invoked. The system looks at the classes of each argument, and their super classes, to search for applicable methods. A list of methods is built, then sorted by how specific those methods are in relation to the given arguments, and also based on their combinators. In short, this will never beat a plain JS function.

A simple improvement would be to cache the resulted “effective method” (which is just a plain function that calls the applicable methods for the given arguments). But a much better idea that should be tried is to abuse the JavaScript JIT. This could be done by assigning an unique tag to each class, and building a new Function() that represents the entire generic function each time defmethod is called. For instance:

var add = MM.defgeneric("add");

add.defmethod([ MM.JS_Number, MM.JS_Number ], function(a, b){
    return a + b;
});

After running the above code, the VM would contain something like the following two functions:

// i.e. 1 is the type tag for JS_Number
function $impl_add_1_1(a, b) {
    return a + b;
}

function add(a, b) {
    // I'm wondering if judicious tagging of types could help implementing is_a in O(1).
    // The current implementation of is_a does an array.indexOf lookup.
    if (MM.is_a(a, JS_Number) && MM.is_a(b, JS_Number)) {
        return $impl_add_1_1(a, b);
    }
    throw new Error("No applicable methods in add");
}

I'm pretty sure this would yield much better results, but I didn't try it yet. It's not exactly trivial to get it right because in general there might be multiple applicable methods (i.e. we need to send the right argument for this() to work).

API

With NodeJS:

var MM = require("./mm.js");

In browser, just load the file and it'll create a MM global.

To create a generic function, use MM.defgeneric, as in examples above. It takes an optional argument—a string name of the generic function. That's useful during development.

var add = MM.defgeneric("add");

add is now a reference to the generic's main dispatch function. This is just for convenience; the generic is represented internally by an object accessible as add.O$_generic. But you really don't need to care about that.

To add a new method to a generic, you can do one of the following (equivalent):

MM.defmethod(add, [ MM.JS_Number, MM.JS_Number ], function(a, b){
    return a + b;
});

// syntactic sugar:
add.defmethod([ MM.JS_Number, MM.JS_Number ], function(a, b){
    return a + b;
});

defmethod takes an optional method combinator. You can place it before or after the types array; the following will all work the same:

MM.defmethod(add, "before", [ MM.JS_Number, MM.JS_Number ], function(){ ... });
MM.defmethod(add, [ MM.JS_Number, MM.JS_Number ], "before", function(){ ... });
add.defmethod("before", [ MM.JS_Number, MM.JS_Number ], function(){ ... });
add.defmethod([ MM.JS_Number, MM.JS_Number ], "before", function(){ ... });

It might be interesting to note that MM.defmethod is itself a generic function, so the way it supports the above argument ordering is simply via specialization (yes, there are a couple defmethod.defmethod(...) calls in my code).

In a method function you can call this() to invoke the next more specific method, unless the combinator is "before" or "after". this will be undefined in such case; before/after methods need not worry about calling this() and they cannot alter the returned value.

To define a class, use MM.defclass. It takes zero, one or two arguments. The first argument is the class name as a string (useful for debugging) and the second is an array of direct super-classes.

var Server     = MM.defclass("Server");
var HTTP       = MM.defclass("HTTP");
var HTTPServer = MM.defclass("HTTPServer", [ Server, HTTP ]);

To customize object initialization (constructor) you need to specialize MM.initialize for that class:

MM.initialize.defmethod([ HTTPServer ], function(server, ...){
    // init your server here.
});

And to instantiate an object, one of the following:

var http_server = HTTPServer.new(...);
var http_server = MM.new(HTTPServer, ...);

Classes defined this way can be used with defmethod for specializing a generic function. A class which does not specify a list of direct super-classes will inherit by default from MM.Object. Internally, there is this hierarchy (exported under the MM namespace): Top ← Object ← Class (a class is really just another object).

instanceof-like functionality is provided by MM.is_a:

MM.is_a(http_server, HTTPServer);   // true
MM.is_a(http_server, Server);       // true
MM.is_a(http_server, HTTP);         // true
MM.is_a(http_server, MM.Object);    // true
MM.is_a(http_server, MM.Class);     // false

To get a reference to the class of an existing object instance, you can use MM.class_of:

var cls = MM.class_of(http_server); // cls === HTTPServer

Objects created by our system do not inherit prototypically from the standard JavaScript Object. Therefore do not expect them to have properties like toString or hasOwnProperty, or JS's instanceof to work. They are created with an almost empty prototype—the only property a MM Object has is O$_class which points to the class of an object. That's read-only—you cannot alter the class of an object at run time. Hopefully that name is ugly enough so you won't need to use it for your own properties.

As seen in some samples, a set of classes is provided so that we can specialize generic functions by basic JavaScript types: JS_Value, JS_Null, JS_Object, JS_Number, JS_Array, JS_String, JS_Boolean, JS_RegExp, JS_Date and JS_Function.

Except JS_Null, all the others inherit from JS_Value (which in turn inherits from MM.Object). JS_Null is special, because in JS (typeof null == "object") === true and I really don't like this, so it inherits from nothing. JS_Null matches null or undefined, but nothing else.

Sample

printer.js is a simple implementation of a JSON serializer. It demonstrates how you can use multiple dispatch to implement printing to a custom OutputStream object, and how you can specialize a couple of methods in order to get indented output via a IndentedOutputStream object.

Footnotes
1. In retrospect, I should have named it this.next() for better readability. In Common Lisp, calling the next method is spelled (call-next-method)
2. “Mixins” is a hack for emulating multiple inheritance in JavaScript, but it's prone to method name collision, you don't really get proper function of instanceof and there's still no good way of doing the above, because you get no automatic method chaining the way we do here with this().
No comments yet. Wanna add one? Please?

Add your comment