| BargainParallel.rtf | Dr. Dobb's Journal article on use of fork and threads for parallel processing; link to HTML version |
| QueenDobb.pdf | Dr. Dobb's Journal article on the N-Queens problem, with a discussion of a threaded solution; link to HTML version |
| SCCS_Queens.doc | Conference presentation on the N-Queens problem solution, discussing several parallel solution strategies; link to HTML version |
Sections: introduction using "fork" using shared memory using threads using message-passing
Teaser Segment: N Queens Do Las Vegas With Threads
The classic problem of positioning queens on a square grid supplies the first example. To find a single solution, one can use a Las Vegas algorithm: generate random permutation vectors, which are then processed until one is found that constitutes a solution to the problem. Obviously, if there are more processes checking the random permutation vectors, the time required to find one decreases.
QueenLasVegas.java
[.txt
file]
The complete code
QueenLasVegas.rtf
Hand-out: the above as two pages of two-column eight-point
text
Computations are done within threads functioning as compute engines. The thread class is an inner class of QueenLasVegas, so that QueenLasVegas' code has complete access to all private parts of the thread class.
To insure that each thread uses a different sequence of random numbers, each has its own generator, and that generator is seeded by the product of System.currentTimeMillis() and this.hashCode(): the first generates chance seeds on successive runs, the second generates chance seeds within the same run.
The ComputeEngine class has a static int[] solution which receives the reference to the first successful solution discovered by one of the threads. When this static data member becomes non-null, all threads terminate processing.
Threaded code segment from QueenLasVegas.main
start = System.currentTimeMillis();
compute = new ComputeEngine[nThreads];
// Create the thread objects
for ( thread = 0; thread < nThreads; thread++ )
compute[thread] = new ComputeEngine();
// Start all threads at NEARLY the same time
for ( thread = 0; thread < nThreads; thread++ )
compute[thread].start();
// Wait for all threads to terminate
try
{ for ( thread = 0; thread < nThreads; thread++ )
compute[thread].join();
}
catch ( Exception e )
{ System.out.println(e); System.exit(-1); }
solution = ComputeEngine.solution;
Beginning of class ComputeEngine
/**
* The class of compute engines to find a solution.
*/
private static class ComputeEngine extends Thread
{
static int[] solution = null;
static int nQueens; // This is set by the main
Random generator = new Random
( this.hashCode() * System.currentTimeMillis() );
int nTrials = 1; // I.e., first trial is outside the while loop
// The big kahuna --- all the work is done here
public void run()
{ int[] trial = new int[nQueens];
int k;
// Generate the initial permutation vector: [0 .. nQueens-1]
for ( k = 0; k < nQueens; k++ )
trial[k] = k;
while ( ! ( solution != null || build(trial, nQueens) ) )
nTrials++;
if ( solution == null )
solution = trial;
}
Side note: solutions become sparse Click here for a graph
Data collected from running the program on one of the Beowulf
cluster computers (dual-core quad-processor Xeon servers):
QueenLasVegas.gif
Summary of two specimen runs showing trial configurations
processed per msec, nQueens = 1000 and 1500
QueenLasVegas.xls
Spreadsheet that generated the figure
Background: This was actually
"Problem I" in the Pacific Northwest
regional ACM
Intercollegiate Programming Contest (click
here for the full problem set within the problem set
zip file, it is i_NQueens.pdf). There is also
an alternative deterministic algorithm that generates a single
solution in O(n) time.
QueenLasVegasUni.java
[.txt file] Uniprocessor code comparable with QueenLasVegas.java
Queen Las
Vegas paper Article on the problem (inroads, June
2006)
Option 1 (specific to the C environment under Unix variants): fork in a multiprocessor computer
pid_t Child = fork();
At the point of the fork, the child receives effectively a copy of the parent's data area, including any open files. This effectively gives a one-time communication channel from the parent to the child the current memory state plus the knowledge that the child is indeed a child process (since the parent has a non-zero value in Child while the child has a zero).
If a file has been opened for output, it effectively becomes a shared output file between the parent and the child. This provides a communication mechanism from the child back to the parent. Any child process writes its information into that shared file, and then the parent, upon termination of such child processes, can "rewind" the file and read the information back out. A small example follows. In this example, the variable Proc tracks which process this is, so that the original process can be determined. The children are generated in a daisy-chained fashion, so that the Unix wait function gives an order to process termination. The shared file is accessed through the file descriptor fd.
if ( Proc == 1 ) // That is, the first parent in the chain
wait(NULL); // wait till everything is finished.
else
{ if ( Child != 0 )
wait(NULL);// Let the immediate descendent finish
N_out = write (fd, &Nhits, sizeof(Nhits)); // write out Nhits
exit(1); // Bow out so that the next can process
}
N_out = lseek(fd, 0, SEEK_SET); // Rewind the file
for ( Proc = 1; Proc < Nproc; Proc++ )
{ N_out = read (fd, &Part, sizeof(Part)); // read into Part
Nhits += Part;
}
Fork_Queen.c [.txt file] This is an example of using the Unix fork to solve the N-Queens problem in parallel.
Note that one could equally well in the Unix environment do communication by sockets rather than through a shared output file.
Option 2 (shared memory on a multiprocessor computer): proprietary compilers
The compiler provided with a proprietary operating system may contain procedures that allow shared memory parallel processing.
The C compiler accompanying the Silicon Graphics computer contains such procedures.
m_set_procs (Pflag); // Set the number of parallel functions
m_fork (SGIlaunch, Size); // Launch THAT NUMBER of copies of the function SGIlaunch
m_kill_procs(); // Wait for the termination of all of those processes
In addition, the compiler provides memory locking in support of critical sections that must be executed by one process at a time.
m_lock(); // Begin critical section, locking access to memory
Nunique++; // Count up one unique solution
Ntotal += Idx; // Count up the number of such solutions based on symmetries
m_unlock(); // End critical section, unlocking access to memory
SGI_Shared.c [.txt file] The file from which the above lines were taken (as part of solving the N-Queens problem).
Option 3 (Java and C): Parallel processing through threads in a multiprocessor computer
Threads have been in the Java language since its inception. In the Unix C environment, threaded programming is supported by "Posix threads", which have become generally available. The problem used here to exercise the threads is matrix multiplication. For a simple system (a 2x4 matrix X multiplied by a 4x3 matrix M generates a 2x3 matrix Y), the picture is the following.

The common definition of the operations is the following (where X is an n1xn2 matrix, M is an n2xn3 matrix, and Y is an n1xn3 matrix). [Note that this is not the most efficient form since for large enough n3 the column-wise access within the matrix M might cause page-thrashing.]
for ( i = 0; i < n1; i++ ) { for ( j = 0; j < n3; j++ ) { sum = 0; for ( k = 0; k < n2; k++ ) { sum += X[i][k] * M[k][j]; } Y[i][j] = sum; } }
Rearranging to enforce local memory access. (Click here for a paper discussing this.)
for ( i = 0; i < n1; i++ ) { double Xik = X[i][0]; for ( j = 0; j < n3; j++ ) // First part of summation { Y[i][j] = Xik * M[0][j]; } for ( k = 1; k < n2; k++ ) // Rest of the summation { Xik = X[i][k]; if ( Xik != 0 ) { for ( j = 0; j < n3; j++ ) { Y[i][j] += Xik * M[k][j]; } } } }
Option 4 (available for several languages): distributed computing through message-passing libraries
PVM Parallel Virtual Machine is a message-passing system developed at Oak Ridge National Laboratories and available for free. It allows combining computers with network connections into a single "Virtual Machine" by means of a message passing system. The following two files contain parallel-processing code for solving the N-Queens problem under PVM.
queen-host.c
[txt file]
Code for the main process
which generates the slave (node) processes which actually solve
the problem
queen-node.c
[txt file]
Code for the slave processes which
communicate with the host in solving the problem
MPI Message Passing Interface is itself not a message-passing system, but rather an agreed-upon message-passing interface. The interface is then implemented in various MPI systems, some available for free. Specifically, MPI/CH and LAM/MPI are free systems.
These two message-passing systems are examined in the course
"Distributed Multiprocessing Environments", CScD-443
and CScD-543.
This link will take you to the most recent
quarter of that course.