QUEEN, chess, and writing fast Lisp code

  • Published: 2016-08-29
  • Modified: 2016-12-11 19:08
  • By: Mishoo
  • Tags: lisp, chess, optimization
  • Comments: 3 (add)

QUEEN, chess, and writing fast Lisp code

Back in 2008 I wrote an online chess game. It took me about a month to get it working, and then I gradually fixed bugs and added new features for another 3 months or so. It was online for about three years and I used to play chess there with my dad or friends. Then at some point I had to change my server, and it was a pain to get the server-side dependencies working again so I just dismissed my little service.

This year I've decided to put it back online, so I digged my old hard drives for the sources. As I said, it took like one month to write it -- so productive I was! A quick look at the code explains it, though. The server-side is Perl and it's basically write-only (awful) code. While I probably could sync it with the dependencies and get it back working, the feeling that there are a thousand bugs inside won't let me sleep at night, so I started rewriting the server side in Common Lisp (the client is somewhat OK, I can maintain it).

When you write chess software, even if it's not supposed to play chess but only allow two humans to play chess together, it must still have some notion about the rules of the game, for example to detect checkmate, or to prevent someone from making illegal moves. You need to implement a proper move generator even for something “as simple as” parsing chess moves in algebraic notation -- to read a move like "Ne5", which means “kNight moves to e5”, you must know where on the board is the knight of the current side that can move to e5 (and it better be only one). There's similar pain in generating the SAN notation for a move.

For Perl I wrote Chess::Rep back in 2008, and I knew I'd need something like this in Lisp, so I wrote QUEEN. It's the module I'm writing about here. The full game is not ready yet, but QUEEN is useful in itself so I published it and submitted it for inclusion into Quicklisp.


QUEEN contains code for chess board representation (the 0x88 method), move generation, parser and generators for FEN, PGN and SAN, and a basic evaluation engine. The documentation is brief, but hopefully it's good enough to get one going.

Evaluation was not the primary goal of this library, but I couldn't keep myself from working on it, because it's so fun and challenging. The idea is simple: to find the best move in a chess position, start with the list of all legal moves, execute each one in turn, then execute all opponent's replies, and so on -- and follow the path that leads to your victory. Ideally we'd search until no moves are left, and if we could do that, we'd know how to win from move #1 (solving chess). But a program that attempts to do that will finish long after the death of the Universe. God would be impressed, I am sure, but nobody else will be around to care about the result, so to make a chess engine practically useful to us humans, we have to stop somewhere. Like most programs, QUEEN stops after it has reached a maximum depth (number of ply-s). After that it starts evaluating capturing moves only, which should run up quickly, and then it assigns a score to the position based only on how the board looks like: material advantage and piece positions. (QUEEN's static evaluation function is rather dumb). That score is propagated back to where the search started, and the move that leads to the best final score is returned.

Dumb as it is and with only about 300 lines for the evaluation code, QUEEN consistently checkmates me with a maximum depth of 5 (but I'm not a strong player).

Notes about performance

I'm so excited about evaluation not because it's good -- by today's standards it's quite weak. But before working on it, my move generation function was able to produce 120 million positions in about 10 minutes. With that kind of “performace”, it's unpractical to search to a depth bigger than 3 and the engine will play very poorly. I really wanted to make it play decently -- at least to beat me -- and this motivated me to optimize move generation, such that we can now generate 120M positions in less than 40 seconds. That's still far behind modern chess engines (which are usually written in hand-optimized C and using a faster algorithm for move generation), but it is a major boost, so I'm happy I did it.

Here are my notes about writing fast code, with a focus on Common Lisp (SBCL is what I use). Nothing that wasn't said before, I am sure.

  1. Do not optimize unless it's worth it.

    “Rules of optimization: Rule #1: Don't do it. Rule #2 (for experts only): Don't do it yet.” — Michael A. Jackson

    Most functions don't deserve to be optimized. For example the game-san function, which generates SAN for a move -- I call that every now and then; when generating a full game in PGN that'll be called perhaps 100 times. The difference between 1 and 2 milliseconds here is just not too interesting -- you will hardly notice it.

    On the other end there is game-compute-moves, which generates the list of legal moves, and which might get called a few hundred million times for evaluating a chess position in middle play. If you squeeze out a tenth of a millisecond from this function, the perceived improvement will be dramatic (and consequently, your chess engine will play faster and can analyze 1 or 2 levels deeper, which makes a fantastic difference in playing strength).

    Actually, 0.1 ms is huge. QUEEN generates over 3000 positions per millisecond, which means move generation takes on average 1/3000 ms. That's still slow compared to mainstream chess programs.

    So don't obsess over it. First make sure the program works correctly, that's much more important than speed. But when it works, you might need to make key parts of it run fast (for a decent chess engine this is a must).

  2. Get your data structures right.

    Fortunately, now I got it better than I did in Perl six years ago -- I guess I owe it to that experience.

    My Perl module creates a hash table for each move, containing from field, to field, also from and to as row/col, SAN notation, etc. That is convenient, but slow (not to mention needs tons of memory). Back then I attempted to write an evaluation engine in Perl but its level of play was so embarrassing that I've never told anyone. True, “fast” and “Perl” don't fit in the same sentence, but I also did this mistake in my implementation.

    QUEEN stores a move in a 32 bit integer. That fits into one machine register, yet it's enough to store the from/to fields (essential for every move), moved piece, captured piece, piece promoted to, one bit to tell us if it's an en-passant move, and one to tell us if it's a checking move. Given appropriate type declarations, SBCL compiles this to very fast low-level code, much better than a hash table would do.

  3. Write accessors.

    Unfortunately, it's often not easy to get your data structures right upfront. You still have to write your program, though, and you don't want to rewrite all of it when you later decide to change the data structures.

    A program consuming data should make no assumptions on how the data is stored. Don't be afraid to write accessors. For example, a move in QUEEN is just a 32-bit integer, as I said, and it happens to have the FROM field in bits 0..5. But it's not wise to rely on this in higher-level code, so we have a function called move-from which returns the index.

    Function calls can be costly, but they can be inlined. Write a function for everything and don't worry about the cost, until it becomes obvious that it must be optimized (and then inlining will usually solve the problem).

  4. Declare types.

    Seriously now. Some people think C is “faster than Lisp”, but while thinking so they conveniently forget that they need to declare all types in C, (de)allocate memory manually, and deal with (or more likely, ignore) the consequences of overflowing numbers.

    By default, Lisp does The Right Thing, rather than The Fast Blunder. If you don't declare types, a simple operation like adding two numbers will have to check at run-time what kind of numbers they are, and use the appropriate algorithm. In C if you want to calculate the factorial of ten thousand (the result has like 35K digits), you have to write code capable of multiplying big numbers. In Lisp you write it like this:

    (defun fact (n)
      (if (< n 2) 1 (* n (fact (- n 1)))))

    Lisp does the Right Thing here, I hope there's no arguing about it. Call (fact 10000) and you get back the 35K digits. I don't think there are many C programmers in the world who can do the same (without using a bignum library, that is).

    On the other hand, when you know the numbers will be small enough to fit into a machine register you should tell the compiler about it -- just like you do it in C.

  5. (declaim (inline trivial-functions...))

    Good candidates for inlining are small functions with no loops. For example, inlining all move and board accessors doubled the performance of my move generation routine.

  6. (declare (optimize speed))

    AFAIK, most Lisp setups won't optimize for performance by default.

    Identify key functions in your program, and ask SBCL to optimize them. You'll get a ton of warnings, mostly about using generic instead of fixnum arithmetic -- it's just SBCL doing the Right Thing™, not knowing how small your integers are. Add more type declarations until you clear all the warnings. Use stricter types where possible -- for example, in chess a row/col will fit in 3 bits, so the appropriate type is (unsigned-byte 3) (or, as I came to prefer, (integer 0 7)). I was tempted to think that (unsigned-byte 8) is just as good, since there's no “cell” smaller than a byte in the CPU anyway, but it's best to use the stricter type if possible because the compiler can then prove certain facts, such as that adding or multiplying two such numbers will still fit in a byte.

    You'll also occasionally catch bugs in your program. You'll be amazed of how smart SBCL can be. Pay particular attention to warnings like “deleting unreachable code” -- sometimes they flag a potential bug.

  7. No fancy argument list for critical functions.

    I used &key arguments in a few hot functions, and changing them to mandatory (or &optional) improved speed a lot. Imagine you call a function that has named foo and bar arguments like this:

    (setq args (list :foo 1 :bar 2))
    (apply some-function args)

    I'm no expert in compilers, but I think there is no way for a compiler to optimize that -- it has to produce code that digs the argument values from a list. That can't beat register arguments. So, for functions that are gonna get called like crazy, avoid &key and &rest (avoid &optional too if you can, although it doesn't seem as bad).

  8. defstruct, not defclass. defun, not defmethod.

    This is the last major boost I could get without changing algorithms. Initially game was a class, and functions operating on it were methods. I thought that somehow that'd be a good API to my module, allowing one to subclass and customize, but in retrospect it was just “premature abstraction”. Simply changing defclass to defstruct and defmethod to defun almost doubled performance. The compiler does much better in the assumption that there will not be “derived classes”.

3 comments. This is HOT!

Add your comment

# Kaushik
2016-08-29 23:02
Thank you for posting a well documented piece of CL code. I'm learning LISP and things like this help me a lot. I'm also a big fan of ASCII based documentation put in the code itself, like you have here in your readme. I was curious as to how you generated those nice bitfield diagrams - I first thought they were embedded images! I'd like to use things like that in my own docs. Thanks!
# CG
2016-08-30 03:14
Thank you for the QUEEN library and for the explanatory post. I might use the library soon. Funny that this quicklisp update came on a day when I was searching for alternatives. I had settled on python-chess and was starting to explore how I could use Python from LISP.
# Anonymous Coward
2016-08-31 02:17
If you're using FIXNUM to represent 32-bit integers and declaring this with (TYPE FIXNUM), then your code will require a 64-bit CPU. SBCL uses some of the bits as flags to distinguish between pointers and integers, and there's a bit that gets used by the GC, too. On 32-bit x86 processors, that means you only have 29 bits available in a FIXNUM. The type (unsigned-byte 32) has to be represented by a bignum.
Fork me on Github