The Optimal Queens

Timothy Rolfe

Dr. Dobb’s Journal, Vol. 30, No. 5 (May 2005), pp. 32-37  —  click here for Microsoft Word version

We’re going to look at one of the classic problems in Mathematics and Computer Science and try to see how much we can optimize the solution of that problem.

This long-standing problem goes way back even before Carl Gauss (1777-1855), and is based on the chessboard. It is the problem of finding all of the ways to position eight queens on the chessboard so that none of them are under attack by any other. Remember, the queen can move horizontally, vertically, and in the two diagonal directions — for convenience I’ll call the direction down and to the right (and its reverse) the diagonal direction, and then call the direction up and to the right the antidiagonal direction.

You could approach this problem by looking at all possible ways of placing eight queens in the 64 available cells — and there are 64!/56! ways to do that (the permutations of 64 things taken eight at a time), but you don’t have to look at all 1.78E+14 permutations. (If you could figure out a way of just generating the combinations of 64 things taken eight at a time, the number goes down to 4.43E+09.) You can use some natural intelligence, since you know that the queens can attack horizontally. That means that you can only have one queen to a row. Since there are eight positions on each of the eight rows, the total candidate configurations is reduced to 88 — 1.68E+07, a nasty number but not nearly as nasty as our earlier one. You can, however, toss out massive numbers of these.

As a programming exercise, this is the classical problem solved by backtracking. You start by positioning a queen in the top row of the board. As she sits in her cell, you move to the next row and try positioning a queen there. If you find that a queen above is attacking the queen below, you don’t have to proceed any farther: all board configurations that include this start will be illegal. So you simply position that queen on the next available cell. You back up to the earlier row when you have finished positioning a queen on all available cells of the current row and try the next cell in that row for its queen.

Eight queens is a bit challenging to start with, so you can rephrase this as the N Queens problem: positioning N queens on a grid made up of N rows of N squares to the row. We know that there is no solution for the two-queens problem: necessarily each queen is attacking the other either vertically or diagonally. If you play around with paper and pencil, you’ll see that there is no possible solution for the three-queens problem either — there is necessarily a diagonal attack. So the four-queens problem is the first one that has a solution. For pseudo code purposes I’m going to use the C and Java convention of numbering rows from zero on up. This means that when you hit N, you’ve already positioned N queens.

To position a queen in row j

1. if j has reached N, you have a valid solution: process it as valid.
2. otherwise for each column position k in this row
1. position a queen in the (j,k) cell
2. check for attack by all the queens above row j
3. if there is no attack, position a queen in row j+1

Listing 1 gives the implementation of this algorithm in C.

The big question is how to perform the check for attack from above, and this is going to be the source for one of the optimizations. Ellis Horowitz and Sartaj Sahni, in their Fundamentals of Computer Algorithms (Computer Science Press, 1978), gave a very simple test, based on the assumption that the board above the current position is valid. They chose to represent the board as a one-dimensional array: each element represents a row on the board, and the number in that element represents the column position that the queen is occupying. That means that you can easily check for vertical attack: an earlier row has the same column position as your current row. They also noticed that you can easily check for diagonal attack, as shown in the following pseudocode.

To check for a valid board[] filled from row 0 to row j

1. for each row k from 0 to j—1
1. if board[k] = board[j] return false
2. else if abs(board[row]—board[k]) = (row—k) return false
2. If the loop terminates normally, return true

Listing 2 gives the implementation of this algorithm in C.

You can easily see that the time required for the check increases with the size of the board. In computer jargon, it is an "order-N" algorithm.

Niklaus Wirth, in his Algorithms and Data Structures (1986), showed that you can perform the check in constant time, provided that you use some additional arrays to hold information. (His first proposal of this technique, however, was in the April 1971 issue of the Communications of the ACM.) For instance, you can have an array indicating which columns have already been filled. To check whether you can position a queen in a particular column, you just look at that cell of the array to see if it is in use. Similarly you can have additional arrays to hold information about the diagonals and antidiagonals. You can see that along the diagonals the difference of the row and column subscripts is a constant, while along the antidiagonals the sum of the row and column subscripts is a constant. Listing 3 gives an implementation of Wirth’s algorithm in C.

Wirth’s algorithm dramatically speeds up the processing, requiring only about half the time as compared with the code using Horowitz and Sahni’s order-N validity check.

There is, however, another optimization possible. You’re positioning only one queen on a row because of the horizontal attack. You know exactly the same thing about the columns — for each column there can be only one queen. This means that all successful solutions are just going to be permutations of the column subscripts: each successive row has one fewer candidate position than the previous row. Thus the problem space (before backtracking) has come down from 88 to 8! — from 1.68E+07 down to 40320. All you have to do is to generate candidate permutations. As a bonus, the validity check becomes a little easier, since you only need to check for diagonal and antidiagonal attacks.

[Gilles Brassard and Paul Bratley, in their Fundamentals of Algorithmics (p. 309), take note of using this method, but they approach it as examining all 8! possible permutations without combining the method with backtracking during permutation generation as we will.]

You first fill the entire array with all of the column positions — a massively illegal configuration with all the queens lined up along the diagonal — but then you march the available column positions through each row cell. You can do this with swaps: after you have evaluated the initial partial permutation, you just swap the value in the front cell with the values in the remaining cells and then evaluating the resulting partial permutation. In the end you have all the values in the same order except that the value from the last cell is now at the front. So you can regenerate the original configuration by doing a circular leftward rotation in the array. This was discussed in an earlier article in Dr. Dobb’s Journal — "Backtracking Algorithms" (May 2004).

To position a queen in row j

1. if j has reached N, you have a valid solution: process it as valid.
2. Otherwise
1. Loop as k takes on values from j to N—1
1. Swap entries j and k
2. check for attack by all the queens above row j
3. if there is no attack, position a queen in row j+1
2. Restore the initial state of the array:
1. Save the value in position j
2. As k goes from j+1 to the end, move elements from [k] to [k—1]
3. Position the saved value at the end of the array

Listing 4 gives the implementation of this algorithm in C.

This optimization even more dramatically speeds up the processing, requiring only a fifth the time as compared with the code using the row-filling implementation. The comparison code uses Horowitz and Sahni’s order-N validity check so that speed-up comes only from the permutation vector optimization.

Combining both optimizations, of course, really speeds up the processing.

The final optimizations are to remove the benchmarking superstructure, and then to use in-line code for the most frequently performed operations: marking the boolean arrays for diagonal attack, and performing the validity checks based on those boolean arrays. This roughly doubles the speed.

Figure 1 shows the time required in a benchmarking run on a Dell desktop computer with an Intel® Pentium® 4 2.00 GHz processor for boards of sizes from 12 up to 18. The Excel workbook and code are available electronically from the Resource Center (see page x). While the figure shows the C language results, the ZIP file also includes the Java implementations of the benchmarking code and of the final optimization code.

Note that the figure is using logarithmic scaling for its Y axis, and that the X axis runs from 12 to 18.

Rejecting Equivalent Solutions

You may want to restrict the acceptable solutions to the unique solutions. Some solutions possess what is called rotational symmetry: if you rotate the board through 180o, or perhaps even 90o, you end up with exactly the same configuration. Figure 2 shows two solutions, one with the 180o rotational symmetry, the other with the 90o rotational symmetry. The numbering in the figure shows the queens that are equivalent upon the rotation.

If a solution does not possess such a rotational symmetry, then successive rotations through 90o will generate other solutions that will be discovered as you process the solutions. In addition to the rotations there is another symmetry operaton: reflection in a mirror. By the very nature of the N-queens problem a valid solution cannot have a mirror symmetry. That means that each valid solution also has mirror images that will turn up in the processing. Figure 3 shows the solution for the 5-Queens problem that lacks rotational symmetry (Figure 2 shows the symmetric solution). Consequently there are eight solutions that are equivalent, being simply rotations or reflections of an initial board configuration.

You might think that rejecting equivalent solutions would require searching through previously accepted solutions, but there is an alternative approach that does not require saving the earlier solutions. You are representing the board by an array of column positions. All you need to do is to consider lexicographic ordering of the solutions, thinking of the array as an N-digit number. If you have the rule that you will only accept the first solution in this ordering of equivalent solutions, then the rejection of all the rest is straightforward. For a candidate solution, rotate it by successive 90o increments. If at any time the result compares as "smaller," reject the candidate solution. For the mirror images, you can generate one mirror image and then rotate that through three successive 90o increments to check for the four mirror images.

Listing 5 shows the C implementation of these symmetry checks. The function "intncmp" mimics the standard C library function "strncmp" but with an array of ints rather than chars.

Note that the timing results shown in Figure 1 are from running programs that implement this rejection of candidate solutions equivalent to the first on rotations and reflections.

You might think that the very fastest generation of all solutions to the N-Queens problem would be achieved by stripping out the symmetry checks. There are, however, several issues that need to be faced. Because of the vertical mirror plane, the optimized version of the Nqueens procedure only processes cells in the initial row in the first half of the array. If the mirror images are being counted separately then you may think you need to go all the way across, doubling the amount of work. You can, however, just go half way across and then adjust the result. If N is an even number, simply double the result, but if N is an odd number, then you would over-count the number of solutions in which the queen is placed in the center of the first row: you will count the mirror images twice. So you need to detect that case and only count those solutions once.

For all this grief, and with a significant loss of information, you get less than a 10% speed-up.

Table 1 summarizes the benchmarking results reflected in Figure 1. It gives the times and the time ratios for the benchmarking results running N from 12 to 18 in all 6 cases.

With enough work you can sometimes reduce the time required for a calculation by a factor of 20. Too bad there isn’t a commercial application for the solutions to the N-Queens problem!

The Queens’ problem belongs to the class of problems called "Embarrassingly parallel": the solution to one subproblem (here represented by the queen’s position in the first row) is completely independent of another subproblem (a different position in the first row). Consequently, if you have a computer with multiple processors, it is easy to keep them busy solving the problem in parallel by using Java threads. (See, for instance, "Bargain-Basement Parallelism", Dr. Dobb's Journal, February 2003, pp. 46-50.)

The only requirement is this: you must insure that each thread works on a different first-row position, and that the threads take turns in building the complete solution (the total number of solutions and of unique solutions). You can do this by taking advantage of the Java keyword "synchronized": in the absence of wait(), only one process at a time can execute the entirety of a synchronized method, so that (in Operating System parlance) the method represents a critical section of code.

Listing 6 shows the class Board, which controls the initial board positions that the threads work from and also accumulates the sums for total solutions and unique solutions. The synchronized method nextJob() receives the partial results from a thread (receiving zeroes on the first invocation) and sends the column position from which the thread should begin the next job — returning a negative number as the end-of-job message.

Coarser synchronization (waiting for thread completion) is handled by the Java method Thread.join(). To ease the waiting game, you can daisy-chain thread creation: each thread generates its own child thread until all required threads are active. The main program can then simply execute child.join() and be certain that all threads have completed before it resumes execution since every thread with a child will itself execute child.join() before terminating.

Listing 7 shows the constructor for the class WorkEngine that extends Thread. The child thread creation and start is part of the constructor itself. This allows the run() method to contain just the logic to dialog with the Board object to work subproblems and then terminate after receiving the end-of-job message and executing child.join(), if appropriate. Listing 8 shows that method.

Table 2 shows the statistics from a number of runs on a quad-processor Xeon computer under Linux. Since the Xeon processor is itself a dual processor, Linux sees eight available 1.5 MHz processors. From that table you can easily see some performance degradation if you split the problem into too many parallel threads. By the nature of the problem, some starting board configurations take less time to calculate than others (due to early backtracking). The main program, however, must wait for the slowest thread to complete before it can continue execution. This same waiting means that if the threads end up computing different numbers of starting board configurations the main cannot continue until the slowest thread is finished.

The Java code and accompanying Excel workbook are also available (in the "Thread" folder) in the ZIP file accessible through the Resource Center (see page x).

Acknowledgements

The benchmarking runs reported here were performed during an academic vacation period on equipment owned by the State of Washington and located within the Computer Science Department at Eastern Washington University. The parallel thread runs were performed on one of the computers acquired as part of the "Technology Initiative for the New Economy" (TINE) Congressional grant to Eastern Washington University that, among other things, provided a parallel and distributed processing resource — which these computers do admirably well!

This article contains material first presented at the Small College Computing Symposium at Augustana College (Sioux Falls, SD), 21-22 Apr 1995. It was published in SCCS:  Proceedings of the 28th Annual Small College Computing Symposium (1995), 201-10. The article is available on-line through http://penguin.ewu.edu/~trolfe/SCCS-95/index.html.

For even more optimizations, see http://www.jsomers.com/nqueen_demo/nqueens.html — I have felt it appropriate as I am working on this article not to work through his solution myself, but he reports a ten-fold speed-up compared with the code available through the SCCS-95 web page referenced above.

Figure 1

Figure 2: Examples of rotational symmetry in solutions

```      .  1  .  .  .  .                .  1  .  .  .
.  .  .  2  .  .                .  .  .  .  1
.  .  .  .  .  3                .  .  2  .  .
3  .  .  .  .  .                1  .  .  .  .
.  .  2  .  .  .                .  .  .  1  .
.  .  .  .  1  .         Symmetric on 90o rotation
Symmetric on 180o rotation
```

Figure 3: Set of solutions for N=5 equivalent by symmetry operations

```Original                       Vertical mirror
1  .  .  .  .                  .  .  .  .  1
.  .  2  .  .                  .  .  2  .  .
.  .  .  .  3                  3  .  .  .  .
.  4  .  .  .                  .  .  .  4  .
.  .  .  5  .                  .  5  .  .  .

90 degree rotation             Anti-diagonal mirror
.  .  .  .  1                  .  .  3  .  .
.  4  .  .  .                  5  .  .  .  .
.  .  .  2  .                  .  .  .  2  .
5  .  .  .  .                  .  4  .  .  .
.  .  3  .  .                  .  .  .  .  1

180 degree rotation            Horizontal mirror
.  5  .  .  .                  .  .  .  5  .
.  .  .  4  .                  .  4  .  .  .
3  .  .  .  .                  .  .  .  .  3
.  .  2  .  .                  .  .  2  .  .
.  .  .  .  1                  1  .  .  .  .

270 degree rotation            Diagonal mirror
.  .  3  .  .                  1  .  .  .  .
.  .  .  .  5                  .  .  .  4  .
.  2  .  .  .                  .  2  .  .  .
.  .  .  4  .                  .  .  .  .  5
1  .  .  .  .                  .  .  3  .  .
```

Listing 1: Building partial boards based on positioning one queen across the current row.

```void Nqueens (int Board[], int Trial[], int Size, int Row)
{
if (Row == Size)
Process(Board, Size);
else
for (int Col = 0; Col < Size; Col++)
{
Board[Row] = Col;
if ( Valid (Board, Size, Row) )
Nqueens (Board, Trial, Size, Row+1);
}
}```

Listing 2: Horowitz and Sahni’s validity check

```int Valid (int Board[], int Size, int Row)
{
for (int Idx = 0; Idx < Row; Idx++)
if (  Board[Idx] == Board[Row] ||
abs(Board[Row]-Board[Idx]) == (Row-Idx) )
return 0;    // boolean false
return 1;          // boolean true
}```

Listing 3: Wirth's validity check

```int Valid (int Board[], int Size, int Row,
int Col[], int Diag[], int AntiD[] )
{
int Idx;        /* Index into Diag[] / AntiD[] */
int Chk;        /* Occupied flag               */

Chk = Col[Board[Row]];
/* Diagonal:  Row-Col == constant */
Idx = Row - Board[Row] + Size-1;
Chk = Chk || Diag[Idx];
/* AntiDiagonal:  Row+Col == constant */
Idx = Row + Board[Row];
Chk = Chk || AntiD[Idx];
return !Chk;    /* Valid if NOT any occupied   */
}```

Listing 4: Building partial boards based on processing permutation vectors

```void Nqueens (int Board[],int Size, int Row)
{
int Idx, Lim, Vtemp;

/* Check for a partial board. */
if (Row < Size-1)
{
if (Valid (Board, Size, Row)
Nqueens (Board, Trial, Size, Row+1);
for (Idx = Row+1; Idx < Size; Idx++)
{
Vtemp = Board[Idx];
Board[Idx] = Board[Row];
Board[Row] = Vtemp;
if (Valid (Board, Size, Row))
Nqueens (Board, Trial, Size, Row+1);
}
}
/*    Regenerate original vector from Row to Size-1:  */
Vtemp = Board[Row];
for (Idx = Row+1; Idx < Size; Idx++)
Board[Idx-1] = Board[Idx];
Board[Idx-1] = Vtemp;
}
/* This is a complete board.  Final validity check    */
else if ( Valid (Board, Size, Row) )
Process(Board, Size);
}```

Listing 5: Rejection of equivalent solutions based on rotations and reflections

```/* Check the symmetries.  Return 0 if this is not the 1st;    */
/* solution in the set of equivalent solutions; otherwise     */
/* return the number of equivalent solutions.                 */
int SymmetryOps(
int Board[],     /* The fully-populated board             */
int Trial[],     /* Used for symmetry checks              */
/* Holds its own scratch space too!      */
int Size)        /* Number of cells in a row/column       */
{  int  Idx;         /* Loop variable; intncmp result         */
int  Nequiv;      /* Number equivalent boards              */
int *Scratch=&Trial[Size];   /* Scratch space              */

/* Copy; Trial will be subjected to the transformations       */
for (Idx = 0; Idx < Size; Idx++)
Trial[Idx] = Board[Idx];

/* 90 degrees --- clockwise (4th parameter of Rotate is FALSE)*/
Rotate (Trial, Scratch, Size, false);
Idx = intncmp (Board, Trial, Size);
if (Idx > 0) return 0;
if ( Idx == 0 )  /* No change on 90 degree rotation        */
Nequiv = 1;
else                                       /*  180 degrees */
{  Rotate (Trial, Scratch, Size, false);
Idx = intncmp (Board, Trial, Size);
if (Idx > 0) return 0;
if ( Idx == 0 ) /* No change on 180 degree rotation     */
Nequiv = 2;
else                                    /* 270 degrees  */
{  Rotate (Trial, Scratch, Size, false);
Idx = intncmp (Board, Trial, Size);
if (Idx > 0) return 0;
Nequiv = 4;
}
}
/* Copy the board into Trial for rotational checks */
for (Idx = 0; Idx < Size; Idx++)
Trial[Idx] = Board[Idx];
/* Reflect -- vertical mirror */
Vmirror (Trial, Size);
Idx = intncmp (Board, Trial, Size);
if (Idx > 0) return 0;
if ( Nequiv > 1 )    // I.e., no four-fold rotational symmetry
{
/* -90 degrees --- equiv. to diagonal mirror */
Rotate (Trial, Scratch, Size, true);
Idx = intncmp (Board, Trial, Size);
if (Idx > 0) return 0;
if ( Nequiv > 2 ) // I.e., no two-fold rotational symmetry
{
/* -180 degrees --- equiv. to horizontal mirror */
Rotate (Trial, Scratch, Size, true);
Idx = intncmp (Board, Trial, Size);
if (Idx > 0) return 0;
/* -270 degrees --- equiv. to anti-diagonal mirror */
Rotate (Trial, Scratch, Size, true);
Idx = intncmp (Board, Trial, Size);
if (Idx > 0) return 0;
}
}
/* WE HAVE A GOOD ONE! */
return Nequiv * 2;  /* Double to handle the mirror images */
}```

Listing 6: The Java class Board

```public class Board
{  private int nSoln = 0,    // Total solutions for this board
nUniq = 0;    // Unique solutions, rejecting ones
// equivalent based on rotations.
private int size,         // Board size AND number of queens
limit,        // First row mid-point
nextCol = 0;  // Next position to be computed

public Board (int size)
{  this.size = size;
limit = (size+1) / 2;  // Mirror images done automatically
}

// Accumulate partial results and assign the next problem.
// Synchronized because this is the critical section ---
// only one thread allowed in at a time.
public synchronized int nextJob ( int nS, int nU)
{  nSoln += nS;
nUniq += nU;
// If all columns have been assigned, return the exit flag
return nextCol < limit ? nextCol++ : -1;
}
}
// Return the saved information on total solutions
public int total(){  return nSoln;  }

// Return the saved information on unique solutions
public int unique(){  return nUniq;  }
}
```

Listing 7: Constructor for the Java class WorkEngine

```public WorkEngine(int size, int nMore, Board info)
{  this.size = size;
this.info = info;
board     = new int[size];
trial     = new int[size];
scratch   = new int[size];
diagChk   = new boolean[2*size-1];
antiChk   = new boolean[2*size-1];
if ( nMore > 0 )
try
{  child = new WorkEngine( size, nMore-1, info );
child.start();
}
catch ( Exception e )
{  System.out.println(e);  }
else
child = null;
}```

Listing 8: The run() method from the Java class WorkEngine

```public void run()
{  int  nextCol;

while ( true )    // Will break out on -1 for column posn.
{  int row, col;

// On the first call, nTotal and nUnique hold zeroes.
nextCol = info.nextJob(nTotal, nUnique);
if ( nextCol < 0 )
break;
// Empty out counts from the last board processed
nTotal = nUnique = 0;
// Generate the initial permutation vector, given nextCol
board[0] = nextCol;
for ( row = 1, col = 0; row < size; row++, col++ )
board[row] = col == nextCol ? ++col : col;
// Empty out the diagChk and antiChk vectors
for ( row = 0; row < 2*size-1; row++ )
diagChk[row] = antiChk[row] = false;
// Mark as inuse the diagonal and antidiagonal
diagChk[size-1-nextCol] = antiChk[nextCol] = true;
// Now compute from row 1 on down.
nQueens (1);
}
if ( child != null )
try
{  child.join();  }
catch ( Exception e )
{  System.out.println(e);  }
}
```

 Table 1: Benchmarking Results Description Time Required Ratio with No Opt. No Optimization 20:59:51 1.000 Wirth's Validity Check 8:51:50 2.369 Permutation Vector 3:52:36 5.416 Both Optimizations 1:54:54 10.965 In-Line Code Optimization 1:03:28 19.850 Without Symmetry Checks 0:58:18 21.609