KEMBAR78
Lab 5 | PDF
0% found this document useful (0 votes)
32 views1 page

Lab 5

Uploaded by

ims
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views1 page

Lab 5

Uploaded by

ims
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Lab 5.

MPI – matrix operations and performance studies

Main goals of the assignment


• Understand the principles of collective communication.

• Learn about perfformance through experiments.

The problem to solve


General overview
A MPI-based parallel version of matrix to matrix multiplication is requested.

Background
See the description from Lab 2.

How to paralelize
Similar with the solution from Lab 2.

To do
1. Recall the sequential code and OpenMP code from Lab 2 for multiplying two matrices.
2. Write the parallel code that uses MPI (hint: see the code from the text book from page 316 - bottom; note that
bands of all three matrices are split and distributed to the active processes, not only two matrices as in the case of
Lab 2).
3. Introduce time records (hint: MPI Wtime) at the start and the end of the code (without including the final displaying
of checked values).
4. Record the times for 1 to maximum number of processors that are available for the dimension of the matrices of
1600, 2000, respectively 2400, compute the speedups and display them in a graphic (similar with lab 2).
5. Compare the speedups obtained with MPI and the ones obtained with OpenMP when running on one computer.
Which one provides a faster response? Explain! (but Remember that MPI is to be used for multiple servers, while
OpenMP is a single multi-core system.
6. Investigate if another communication schema is more efficient. Eg. allocate both matrices in one process and then
distribute the parts necessary to be used in computation to other processes. Is this schema providing a faster response
of the code?

You might also like