UBCO COSC370, BSPlib project: Week 2: Finding the largest number (in parallel!)

In this exercise you will write a BSPlib program, for finding the maximum in an input array of numbers.

1 Finding the maximum

The goal of this week's tasks is to write a BSPlib programs that finds the largest integer in a distributed array of size \(p*M\). In practice, the input data would have to be distributed: typically one processor reads the data from disk and distributes it to the other processors using a broadcast. For the sake of this example, we will generate the distributed array of integers locally using the rand-function.

The algorithm to be implemented consists of 3 supersteps. The first superstep does some setup. In the second superstep, processes finds their the maximum of their part of the distributed array and sends it to process 0. In the last superstep, process 0 finds the global maximum element by selecting the largest number in the array of received local maxima.

In more details, the algorithm should be implemented like this:

  • (1: Setup)
    • Each process allocates and fills an array Input with \(M\) randomly chosen numbers.
    • Each process allocates an array Maxs of length \(p\) (as returned by bsp_nprocs() (man)).
      • Since we do not know the value of bsp_nprocs() before executing the program, we must dynamically allocate this array using calloc (man). Here is an example:
int *Maxs = calloc(bsp_nprocs(), sizeof(int));
  • The call to calloc will allocate bsp_nprocs() elements, each one of size sizeof(int). That is, an array of bsp_nprocs() integers. calloc then returns the adress of this allocation which is stored in the pointer Maxs.
  • The array Maxs will be used to communicate local maximas into, and must thus be registered using bsp_push_reg (man). Since Maxs is a pointer, you should write something like the following:
bsp_push_reg(Maxs, bsp_nprocs() * sizeof(int));

meaning: register the address stored in the pointer Maxs and the following bsp_nprocs() * sizeof(int) bytes.

  • The superstep is terminated using bsp_sync.
  • (2: Finding local maxima)
    • Each process iterates over the array Input and selects the largest number max.
    • Each process sends the largest number max into index bsp_pid() (man) of the array Maxs on process zero.
    • At this point, the registration of Maxs is no longer needed, and can be removed using bsp_pop_reg() (man).
    • The superstep is terminated using bsp_sync.
  • (3: Finding global maximum)
    • At this point, process 0 will have the local maxima of each processor in the array Maxs. To conclude the algorithm, process 0 iterates over this array to select the largest number, and then prints it using printf (man).
    • We then end the SPMD-section using the bsp_end()-function (man).

Here is an illustration of a run of the algorithm with 4 processors and \(M=3\), i.e. 3 elements per processors:

For each of the tasks that you complete, please make one separate source-file (i.e. keep each separate solution). Don't worry if you haven't got time to finish all the tasks.

Task 1: Implement the program as described above. For inspiration, you can study the implementation of inner product in BSPedupack, which has a very similar structure. Verify that the algorithm works by having each process print its local array.

Task 2: As described above, each process allocates the array Maxs of length \(p\). However, only processor 0 will actually store something in this array. In this case little memory is lost, but other contexts it might be interesting to try to save some space. Change the program so that the size of Maxs is zero if bsp_pid() \(\neq 0\) and verify that the program still works.

Task 3: (Ambitious). In the above description, the input data is generated locally which is not very realistic. Change the program so that the input array of \(p*M\) elements is stored in a text file, with numbers separated by spaces.

To read a file into a string, study fopen (man) and fgets (man). Once the file has been read into a string, each separate number can be read using strtok (man) and scanf (man).

Then in the first superstep, processor 0 reads this file into an array of size \(p*M\) and distributes it to the other processors. Here is the pseudo-code for performing the distribution:

// assuming the array Input has been registered in the previous superstep.
// assuming the array Read contains the p*M elements read from the input file
for i in 0 .. p*M - 1  :
   // using bsp_put, send the one integer Read[i] to processor i/M,
       into Input at offset i % M:
   bsp_put( i / bsp_nprocs() , &Read[i] , Input, i % m, sizeof(int));

Now the array Read has been distributed, and each local part is contained in the array Input. Then algorithm can then continue as before.

Back to overview.

Author: Arvid Jakobsson (arvid.jakobsson@huawei.com)

Created: 2018-02-19 Mon 15:27

Emacs 24.5.1 (Org mode 8.2.10)