Posix Threads in C

MatPthread.doc    Class hand-out
MatPthread.c        [.txt file]  Code for the problem being discussed. It should open in a new browser window.
MatPthread.xls       Spreadsheet with run results

Discussion of segments

Initial declarations:

// POSIX thread stuff
#include <pthread.h>

// Used in LaunchThreads()
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;

// Shared memory area
typedef struct
{
   double *A, *B, *C;
   int     NextRow, N1, N2, N3;
} *ProbPtr;

// Posix thread runs this
void *run (void* arg);

// after they are generated by this
void LaunchThreads (int Nproc, ProbPtr Job);

The main program invokes LaunchThreads to accomplish the parallel processing.  First, though, it needs to generate the shared memory segment that will be used for communcations.  Note that the parallel code segment begins and ends with code to capture the present system clock — as wall-clock time, not processor time.

   ProbPtr    Prob = (ProbPtr) calloc (1, sizeof *Prob);

  .  .  .

  // Set up the problem structure
   Prob->A  = A;
   Prob->B  = B;
   Prob->C  = C1;
   Prob->N1 = N1;
   Prob->N2 = N2;
   Prob->N3 = N3;

  .  .  .

   getTimes ( &Mid1, &dmy );

   srand(Seed);
   for ( run = 0; run < Nruns; run++ )
   {  RandFill (A, N1*N2);
      RandFill (B, N2*N3);
      LaunchThreads (nSlaves, Prob);
   }
#ifdef DEBUG
   puts ("Finished with parallel."); fflush(stdout);
#endif

   getTimes ( &Finish, &dmy );

Note that the run function both receives as parameter and returns as value a generic pointer.  These can be cast into the appropriate struct pointer to access the data.  The LaunchThreads function passes the pointer to a struct that provides access to the necessary arrays as well as other required data.  If thread creation succeeds, the function then waits for all threads to terminate before returning to the main program.

void LaunchThreads (int Nproc, ProbPtr Job)
{  int        proc;

   pthread_t *thread_id = NULL;   // Will be used by pthread_join

   thread_id = (pthread_t*) calloc( Nproc, sizeof *thread_id );
   Job->NextRow = 0;  // New batch

   for ( proc = 0; proc < Nproc; proc++ )
   {  if ( pthread_create ( &thread_id[proc], NULL, run, (void*) Job )
           != 0 )
      {  perror("Thread creation"); exit(-1);  }
#ifdef DEBUG
      printf ("Creation of thread %d succeeded.\n", proc); fflush(stdout);
#endif
   }

// Wait for termination of all threads before exiting
   for ( proc = 0; proc < Nproc; proc++ )
   {  pthread_join ( thread_id[proc], NULL);
#ifdef DEBUG
      printf ("Join to thread %d succeeded.\n", proc); fflush(stdout);
#endif
   }
}

All threads share the same Job structure for the run function.  It contains information on the current state of the calculation.  Consequently access needs to be protected by a semaphore for "MUTual EXclusion" — the global variable pthread_mutex_t mutex1

void* run ( void *arg )
{  int Row;
   ProbPtr Job = (ProbPtr) arg;   // Cast over to a Problem Pointer
   double *A  = Job->A, *B  = Job->B, *C  = Job->C;
   int     N1 = Job->N1, N2 = Job->N2, N3 = Job->N3;

   while ( 1 )           // Will break out when all rows are done
   {//ONE AT A TIME, get the next row to be processed.
      pthread_mutex_lock( &mutex1 );
      Row = Job->NextRow++;
      pthread_mutex_unlock( &mutex1 );

      if ( Row >= N1 )
         break;

#ifdef DEBUG
      printf ("Thread computing row %d of %d\n", Row, N1); fflush(stdout);
#endif
      MatMult( A + Row*N2, B, C + Row*N3, 1, N2, N3 );
   }
#ifdef DEBUG
   puts ("Thread completing"); fflush(stdout);
#endif
   return NULL;     // We must return SOME kind of pointer
}

One can also examine page thrashing.  Although the matrix multiplication code is written to maximize localized referencing of memory, that can be frustrated simply to changing the threads so that they compute the result one column at a time rather than one row at a time.

MatPthreadCol.c    [.txt file]  Code to compute in a column-wise fashion.