The Optimal Queens
Timothy Rolfe
Dr. Dobb’s Journal, Vol. 30, No. 5 (May 2005), pp. 3237 — click here for Microsoft Word version
We’re going to look at one of the classic problems in Mathematics and Computer Science and try to see how much we can optimize the solution of that problem.
This longstanding problem goes way back even before Carl Gauss (17771855), and is based on the chessboard. It is the problem of finding all of the ways to position eight queens on the chessboard so that none of them are under attack by any other. Remember, the queen can move horizontally, vertically, and in the two diagonal directions — for convenience I’ll call the direction down and to the right (and its reverse) the diagonal direction, and then call the direction up and to the right the antidiagonal direction.
You could approach this problem by looking at all possible ways of placing eight queens in the 64 available cells — and there are 64!/56! ways to do that (the permutations of 64 things taken eight at a time), but you don’t have to look at all 1.78E+14 permutations. (If you could figure out a way of just generating the combinations of 64 things taken eight at a time, the number goes down to 4.43E+09.) You can use some natural intelligence, since you know that the queens can attack horizontally. That means that you can only have one queen to a row. Since there are eight positions on each of the eight rows, the total candidate configurations is reduced to 8^{8} — 1.68E+07, a nasty number but not nearly as nasty as our earlier one. You can, however, toss out massive numbers of these.
As a programming exercise, this is the classical problem solved by backtracking. You start by positioning a queen in the top row of the board. As she sits in her cell, you move to the next row and try positioning a queen there. If you find that a queen above is attacking the queen below, you don’t have to proceed any farther: all board configurations that include this start will be illegal. So you simply position that queen on the next available cell. You back up to the earlier row when you have finished positioning a queen on all available cells of the current row and try the next cell in that row for its queen.
Eight queens is a bit challenging to start with, so you can rephrase this as the N Queens problem: positioning N queens on a grid made up of N rows of N squares to the row. We know that there is no solution for the twoqueens problem: necessarily each queen is attacking the other either vertically or diagonally. If you play around with paper and pencil, you’ll see that there is no possible solution for the threequeens problem either — there is necessarily a diagonal attack. So the fourqueens problem is the first one that has a solution. For pseudo code purposes I’m going to use the C and Java convention of numbering rows from zero on up. This means that when you hit N, you’ve already positioned N queens.
To position a queen in row j
Listing 1 gives the implementation of this algorithm in C.
The big question is how to perform the check for attack from above, and this is going to be the source for one of the optimizations. Ellis Horowitz and Sartaj Sahni, in their Fundamentals of Computer Algorithms (Computer Science Press, 1978), gave a very simple test, based on the assumption that the board above the current position is valid. They chose to represent the board as a onedimensional array: each element represents a row on the board, and the number in that element represents the column position that the queen is occupying. That means that you can easily check for vertical attack: an earlier row has the same column position as your current row. They also noticed that you can easily check for diagonal attack, as shown in the following pseudocode.
To check for a valid board[] filled from row 0 to row j
Listing 2 gives the implementation of this algorithm in C.
You can easily see that the time required for the check increases with the size of the board. In computer jargon, it is an "orderN" algorithm.
Niklaus Wirth, in his Algorithms and Data Structures (1986), showed that you can perform the check in constant time, provided that you use some additional arrays to hold information. (His first proposal of this technique, however, was in the April 1971 issue of the Communications of the ACM.) For instance, you can have an array indicating which columns have already been filled. To check whether you can position a queen in a particular column, you just look at that cell of the array to see if it is in use. Similarly you can have additional arrays to hold information about the diagonals and antidiagonals. You can see that along the diagonals the difference of the row and column subscripts is a constant, while along the antidiagonals the sum of the row and column subscripts is a constant. Listing 3 gives an implementation of Wirth’s algorithm in C.
Wirth’s algorithm dramatically speeds up the processing, requiring only about half the time as compared with the code using Horowitz and Sahni’s orderN validity check.
There is, however, another optimization possible. You’re positioning only one queen on a row because of the horizontal attack. You know exactly the same thing about the columns — for each column there can be only one queen. This means that all successful solutions are just going to be permutations of the column subscripts: each successive row has one fewer candidate position than the previous row. Thus the problem space (before backtracking) has come down from 8^{8} to 8! — from 1.68E+07 down to 40320. All you have to do is to generate candidate permutations. As a bonus, the validity check becomes a little easier, since you only need to check for diagonal and antidiagonal attacks.
[Gilles Brassard and Paul Bratley, in their Fundamentals of Algorithmics (p. 309), take note of using this method, but they approach it as examining all 8! possible permutations without combining the method with backtracking during permutation generation as we will.]
You first fill the entire array with all of the column positions — a massively illegal configuration with all the queens lined up along the diagonal — but then you march the available column positions through each row cell. You can do this with swaps: after you have evaluated the initial partial permutation, you just swap the value in the front cell with the values in the remaining cells and then evaluating the resulting partial permutation. In the end you have all the values in the same order except that the value from the last cell is now at the front. So you can regenerate the original configuration by doing a circular leftward rotation in the array. This was discussed in an earlier article in Dr. Dobb’s Journal — "Backtracking Algorithms" (May 2004).
To position a queen in row j
Listing 4 gives the implementation of this algorithm in C.
This optimization even more dramatically speeds up the processing, requiring only a fifth the time as compared with the code using the rowfilling implementation. The comparison code uses Horowitz and Sahni’s orderN validity check so that speedup comes only from the permutation vector optimization.
Combining both optimizations, of course, really speeds up the processing.
The final optimizations are to remove the benchmarking superstructure, and then to use inline code for the most frequently performed operations: marking the boolean arrays for diagonal attack, and performing the validity checks based on those boolean arrays. This roughly doubles the speed.
Figure 1 shows the time required in a benchmarking run on a Dell desktop computer with an Intel^{®} Pentium^{®} 4 2.00 GHz processor for boards of sizes from 12 up to 18. The Excel workbook and code are available electronically from the Resource Center (see page x). While the figure shows the C language results, the ZIP file also includes the Java implementations of the benchmarking code and of the final optimization code.
Note that the figure is using logarithmic scaling for its Y axis, and that the X axis runs from 12 to 18.
Rejecting Equivalent Solutions
You may want to restrict the acceptable solutions to the unique solutions. Some solutions possess what is called rotational symmetry: if you rotate the board through 180^{o}, or perhaps even 90^{o}, you end up with exactly the same configuration. Figure 2 shows two solutions, one with the 180^{o} rotational symmetry, the other with the 90^{o} rotational symmetry. The numbering in the figure shows the queens that are equivalent upon the rotation.
If a solution does not possess such a rotational symmetry, then successive rotations through 90^{o} will generate other solutions that will be discovered as you process the solutions. In addition to the rotations there is another symmetry operaton: reflection in a mirror. By the very nature of the Nqueens problem a valid solution cannot have a mirror symmetry. That means that each valid solution also has mirror images that will turn up in the processing. Figure 3 shows the solution for the 5Queens problem that lacks rotational symmetry (Figure 2 shows the symmetric solution). Consequently there are eight solutions that are equivalent, being simply rotations or reflections of an initial board configuration.
You might think that rejecting equivalent solutions would require searching through previously accepted solutions, but there is an alternative approach that does not require saving the earlier solutions. You are representing the board by an array of column positions. All you need to do is to consider lexicographic ordering of the solutions, thinking of the array as an Ndigit number. If you have the rule that you will only accept the first solution in this ordering of equivalent solutions, then the rejection of all the rest is straightforward. For a candidate solution, rotate it by successive 90^{o} increments. If at any time the result compares as "smaller," reject the candidate solution. For the mirror images, you can generate one mirror image and then rotate that through three successive 90^{o} increments to check for the four mirror images.
Listing 5 shows the C implementation of these symmetry checks. The function "intncmp" mimics the standard C library function "strncmp" but with an array of ints rather than chars.
Note that the timing results shown in Figure 1 are from running programs that implement this rejection of candidate solutions equivalent to the first on rotations and reflections.
You might think that the very fastest generation of all solutions to the NQueens problem would be achieved by stripping out the symmetry checks. There are, however, several issues that need to be faced. Because of the vertical mirror plane, the optimized version of the Nqueens procedure only processes cells in the initial row in the first half of the array. If the mirror images are being counted separately then you may think you need to go all the way across, doubling the amount of work. You can, however, just go half way across and then adjust the result. If N is an even number, simply double the result, but if N is an odd number, then you would overcount the number of solutions in which the queen is placed in the center of the first row: you will count the mirror images twice. So you need to detect that case and only count those solutions once.
For all this grief, and with a significant loss of information, you get less than a 10% speedup.
Table 1 summarizes the benchmarking results reflected in Figure 1. It gives the times and the time ratios for the benchmarking results running N from 12 to 18 in all 6 cases.
With enough work you can sometimes reduce the time required for a calculation by a factor of 20. Too bad there isn’t a commercial application for the solutions to the NQueens problem!
Parallel Threads in Java
The Queens’ problem belongs to the class of problems called "Embarrassingly parallel": the solution to one subproblem (here represented by the queen’s position in the first row) is completely independent of another subproblem (a different position in the first row). Consequently, if you have a computer with multiple processors, it is easy to keep them busy solving the problem in parallel by using Java threads. (See, for instance, "BargainBasement Parallelism", Dr. Dobb's Journal, February 2003, pp. 4650.)
The only requirement is this: you must insure that each thread works on a different firstrow position, and that the threads take turns in building the complete solution (the total number of solutions and of unique solutions). You can do this by taking advantage of the Java keyword "synchronized": in the absence of wait(), only one process at a time can execute the entirety of a synchronized method, so that (in Operating System parlance) the method represents a critical section of code.
Listing 6 shows the class Board, which controls the initial board positions that the threads work from and also accumulates the sums for total solutions and unique solutions. The synchronized method nextJob() receives the partial results from a thread (receiving zeroes on the first invocation) and sends the column position from which the thread should begin the next job — returning a negative number as the endofjob message.
Coarser synchronization (waiting for thread completion) is handled by the Java method Thread.join(). To ease the waiting game, you can daisychain thread creation: each thread generates its own child thread until all required threads are active. The main program can then simply execute child.join() and be certain that all threads have completed before it resumes execution since every thread with a child will itself execute child.join() before terminating.
Listing 7 shows the constructor for the class WorkEngine that extends Thread. The child thread creation and start is part of the constructor itself. This allows the run() method to contain just the logic to dialog with the Board object to work subproblems and then terminate after receiving the endofjob message and executing child.join(), if appropriate. Listing 8 shows that method.
Table 2 shows the statistics from a number of runs on a quadprocessor Xeon computer under Linux. Since the Xeon processor is itself a dual processor, Linux sees eight available 1.5 MHz processors. From that table you can easily see some performance degradation if you split the problem into too many parallel threads. By the nature of the problem, some starting board configurations take less time to calculate than others (due to early backtracking). The main program, however, must wait for the slowest thread to complete before it can continue execution. This same waiting means that if the threads end up computing different numbers of starting board configurations the main cannot continue until the slowest thread is finished.
The Java code and accompanying Excel workbook are also available (in the "Thread" folder) in the ZIP file accessible through the Resource Center (see page x).
Acknowledgements
The benchmarking runs reported here were performed during an academic vacation period on equipment owned by the State of Washington and located within the Computer Science Department at Eastern Washington University. The parallel thread runs were performed on one of the computers acquired as part of the "Technology Initiative for the New Economy" (TINE) Congressional grant to Eastern Washington University that, among other things, provided a parallel and distributed processing resource — which these computers do admirably well!
This article contains material first presented at the Small College Computing Symposium at Augustana College (Sioux Falls, SD), 2122 Apr 1995. It was published in SCCS: Proceedings of the 28th Annual Small College Computing Symposium (1995), 20110. The article is available online through http://penguin.ewu.edu/~trolfe/SCCS95/index.html.
For even more optimizations, see http://www.jsomers.com/nqueen_demo/nqueens.html — I have felt it appropriate as I am working on this article not to work through his solution myself, but he reports a tenfold speedup compared with the code available through the SCCS95 web page referenced above.
Figure 2: Examples of rotational symmetry in solutions
. 1 . . . . . 1 . . . . . . 2 . . . . . . 1 . . . . . 3 . . 2 . . 3 . . . . . 1 . . . . . . 2 . . . . . . 1 . . . . . 1 . Symmetric on 90^{o} rotation Symmetric on 180^{o} rotation
Figure 3: Set of solutions for N=5 equivalent by symmetry operations
Original Vertical mirror 1 . . . . . . . . 1 . . 2 . . . . 2 . . . . . . 3 3 . . . . . 4 . . . . . . 4 . . . . 5 . . 5 . . . 90 degree rotation Antidiagonal mirror . . . . 1 . . 3 . . . 4 . . . 5 . . . . . . . 2 . . . . 2 . 5 . . . . . 4 . . . . . 3 . . . . . . 1 180 degree rotation Horizontal mirror . 5 . . . . . . 5 . . . . 4 . . 4 . . . 3 . . . . . . . . 3 . . 2 . . . . 2 . . . . . . 1 1 . . . . 270 degree rotation Diagonal mirror . . 3 . . 1 . . . . . . . . 5 . . . 4 . . 2 . . . . 2 . . . . . . 4 . . . . . 5 1 . . . . . . 3 . .
Listing 1: Building partial boards based on positioning one queen across the current row.
void Nqueens (int Board[], int Trial[], int Size, int Row) { if (Row == Size) Process(Board, Size); else for (int Col = 0; Col < Size; Col++) { Board[Row] = Col; if ( Valid (Board, Size, Row) ) Nqueens (Board, Trial, Size, Row+1); } }
Listing 2: Horowitz and Sahni’s validity check
int Valid (int Board[], int Size, int Row) { for (int Idx = 0; Idx < Row; Idx++) if ( Board[Idx] == Board[Row]  abs(Board[Row]Board[Idx]) == (RowIdx) ) return 0; // boolean false return 1; // boolean true }
Listing 3: Wirth's validity check
int Valid (int Board[], int Size, int Row, int Col[], int Diag[], int AntiD[] ) { int Idx; /* Index into Diag[] / AntiD[] */ int Chk; /* Occupied flag */ Chk = Col[Board[Row]]; /* Diagonal: RowCol == constant */ Idx = Row  Board[Row] + Size1; Chk = Chk  Diag[Idx]; /* AntiDiagonal: Row+Col == constant */ Idx = Row + Board[Row]; Chk = Chk  AntiD[Idx]; return !Chk; /* Valid if NOT any occupied */ }
Listing 4: Building partial boards based on processing permutation vectors
void Nqueens (int Board[],int Size, int Row) { int Idx, Lim, Vtemp; /* Check for a partial board. */ if (Row < Size1) { if (Valid (Board, Size, Row) Nqueens (Board, Trial, Size, Row+1); for (Idx = Row+1; Idx < Size; Idx++) { Vtemp = Board[Idx]; Board[Idx] = Board[Row]; Board[Row] = Vtemp; if (Valid (Board, Size, Row)) Nqueens (Board, Trial, Size, Row+1); } } /* Regenerate original vector from Row to Size1: */ Vtemp = Board[Row]; for (Idx = Row+1; Idx < Size; Idx++) Board[Idx1] = Board[Idx]; Board[Idx1] = Vtemp; } /* This is a complete board. Final validity check */ else if ( Valid (Board, Size, Row) ) Process(Board, Size); }
Listing 5: Rejection of equivalent solutions based on rotations and reflections
/* Check the symmetries. Return 0 if this is not the 1st; */ /* solution in the set of equivalent solutions; otherwise */ /* return the number of equivalent solutions. */ int SymmetryOps( int Board[], /* The fullypopulated board */ int Trial[], /* Used for symmetry checks */ /* Holds its own scratch space too! */ int Size) /* Number of cells in a row/column */ { int Idx; /* Loop variable; intncmp result */ int Nequiv; /* Number equivalent boards */ int *Scratch=&Trial[Size]; /* Scratch space */ /* Copy; Trial will be subjected to the transformations */ for (Idx = 0; Idx < Size; Idx++) Trial[Idx] = Board[Idx]; /* 90 degrees  clockwise (4th parameter of Rotate is FALSE)*/ Rotate (Trial, Scratch, Size, false); Idx = intncmp (Board, Trial, Size); if (Idx > 0) return 0; if ( Idx == 0 ) /* No change on 90 degree rotation */ Nequiv = 1; else /* 180 degrees */ { Rotate (Trial, Scratch, Size, false); Idx = intncmp (Board, Trial, Size); if (Idx > 0) return 0; if ( Idx == 0 ) /* No change on 180 degree rotation */ Nequiv = 2; else /* 270 degrees */ { Rotate (Trial, Scratch, Size, false); Idx = intncmp (Board, Trial, Size); if (Idx > 0) return 0; Nequiv = 4; } } /* Copy the board into Trial for rotational checks */ for (Idx = 0; Idx < Size; Idx++) Trial[Idx] = Board[Idx]; /* Reflect  vertical mirror */ Vmirror (Trial, Size); Idx = intncmp (Board, Trial, Size); if (Idx > 0) return 0; if ( Nequiv > 1 ) // I.e., no fourfold rotational symmetry { /* 90 degrees  equiv. to diagonal mirror */ Rotate (Trial, Scratch, Size, true); Idx = intncmp (Board, Trial, Size); if (Idx > 0) return 0; if ( Nequiv > 2 ) // I.e., no twofold rotational symmetry { /* 180 degrees  equiv. to horizontal mirror */ Rotate (Trial, Scratch, Size, true); Idx = intncmp (Board, Trial, Size); if (Idx > 0) return 0; /* 270 degrees  equiv. to antidiagonal mirror */ Rotate (Trial, Scratch, Size, true); Idx = intncmp (Board, Trial, Size); if (Idx > 0) return 0; } } /* WE HAVE A GOOD ONE! */ return Nequiv * 2; /* Double to handle the mirror images */ }
Listing 6: The Java class Board
public class Board { private int nSoln = 0, // Total solutions for this board nUniq = 0; // Unique solutions, rejecting ones // equivalent based on rotations. private int size, // Board size AND number of queens limit, // First row midpoint nextCol = 0; // Next position to be computed public Board (int size) { this.size = size; limit = (size+1) / 2; // Mirror images done automatically } // Accumulate partial results and assign the next problem. // Synchronized because this is the critical section  // only one thread allowed in at a time. public synchronized int nextJob ( int nS, int nU) { nSoln += nS; nUniq += nU; // If all columns have been assigned, return the exit flag return nextCol < limit ? nextCol++ : 1; } } // Return the saved information on total solutions public int total(){ return nSoln; } // Return the saved information on unique solutions public int unique(){ return nUniq; } }
Listing 7: Constructor for the Java class WorkEngine
public WorkEngine(int size, int nMore, Board info) { this.size = size; this.info = info; board = new int[size]; trial = new int[size]; scratch = new int[size]; diagChk = new boolean[2*size1]; antiChk = new boolean[2*size1]; if ( nMore > 0 ) try { child = new WorkEngine( size, nMore1, info ); child.start(); } catch ( Exception e ) { System.out.println(e); } else child = null; }
Listing 8: The run() method from the Java class WorkEngine
public void run() { int nextCol; while ( true ) // Will break out on 1 for column posn. { int row, col; // On the first call, nTotal and nUnique hold zeroes. nextCol = info.nextJob(nTotal, nUnique); if ( nextCol < 0 ) break; // Empty out counts from the last board processed nTotal = nUnique = 0; // Generate the initial permutation vector, given nextCol board[0] = nextCol; for ( row = 1, col = 0; row < size; row++, col++ ) board[row] = col == nextCol ? ++col : col; // Empty out the diagChk and antiChk vectors for ( row = 0; row < 2*size1; row++ ) diagChk[row] = antiChk[row] = false; // Mark as inuse the diagonal and antidiagonal diagChk[size1nextCol] = antiChk[nextCol] = true; // Now compute from row 1 on down. nQueens (1); } if ( child != null ) try { child.join(); } catch ( Exception e ) { System.out.println(e); } }
Description 
Time 
Ratio with 
No Optimization 
20:59:51 
1.000 
Wirth's Validity Check 
8:51:50 
2.369 
Permutation Vector 
3:52:36 
5.416 
Both Optimizations 
1:54:54 
10.965 
InLine Code Optimization 
1:03:28 
19.850 
Without Symmetry Checks 
0:58:18 
21.609 
No. of Queens 
No. of Threads 
UniProc Time 
Threaded Time 
SpeedUp * 
Individual Thread Elapsed Times 

14 
1 
2.887 
2.918 
0.99 
2.912 







14 
2 
2.865 
1.662 
1.73 
1.261 
1.656 






14 
3 
2.874 
1.294 
2.22 
1.283 
1.274 
1.239 





14 
4 
2.848 
1.271 
2.26 
0.678 
1.105 
1.264 
1.211 




14 
5 
2.909 
1.008 
2.85 
0.637 
0.840 
0.637 
0.632 
1.002 



14 
6 
2.883 
1.039 
2.77 
0.670 
0.650 
0.640 
0.671 
0.623 
1.033 


14 
7 
2.855 
0.690 
4.17 
0.660 
0.459 
0.654 
0.638 
0.678 
0.641 
0.584 

15 
1 
19.559 
19.332 
1.02 
19.326 







15 
2 
19.642 
9.814 
2.00 
9.807 
9.584 






15 
3 
19.627 
9.028 
2.18 
8.372 
9.021 
7.215 





15 
4 
19.573 
7.734 
2.54 
5.335 
7.690 
7.586 
4.997 




15 
5 
19.640 
6.495 
3.02 
4.005 
5.047 
3.858 
6.479 
5.926 



15 
6 
19.614 
6.302 
3.12 
3.874 
3.888 
3.912 
6.289 
3.881 
6.033 


15 
7 
19.627 
4.584 
4.28 
4.052 
3.895 
3.956 
4.095 
3.673 
3.509 
4.578 

15 
8 
19.817 
3.965 
4.95 
3.617 
3.883 
3.949 
3.774 
3.548 
3.894 
3.918 
3.146 
16 
1 
122.277 
122.852 
1.00 
122.846 







16 
2 
122.381 
62.527 
1.96 
62.520 
61.145 






16 
3 
122.623 
51.567 
2.38 
36.502 
51.560 
44.948 





16 
4 
122.578 
44.849 
2.74 
44.819 
30.552 
36.919 
37.233 




16 
5 
122.757 
32.535 
3.77 
31.549 
32.496 
32.517 
21.761 
19.873 



16 
6 
122.718 
33.373 
3.68 
32.387 
21.243 
23.716 
33.357 
21.909 
19.660 


16 
7 
122.606 
32.415 
3.78 
24.999 
23.875 
22.509 
24.968 
32.361 
21.947 
19.909 

16 
8 
123.360 
28.887 
4.25 
24.026 
24.395 
27.755 
28.855 
25.141 
22.993 
22.309 
19.891 
* Speedup is the average uniprocessor time divided by the threaded time