Birla Institute of Technology and Science-Pilani, Hyderabad Campus
First Semester 2022-23
Comprehensive Examination
Course No CS F422 Course Title: Parallel Computing
Type: Closed Book Time: 3 hours Total Marks:80 (40% ) Date: 22/12/22
Attempt all the questions. Clearly mark the part of a question while answering like 1.A, 1.B,
and so on.
1. Suppose you are given n integers and p processors arranged in a hypercube connection. Also assume
that n/p is an integer greater than 1. [5 + 5 = 10]
A. Design a hypercube algorithm for computing the prefix sum of n integers.
B. Assuming that it takes time tadd to add two numbers, time ts to send a message of unit length
between two directly connected nodes and time tw to send a word to a neighbor, derive the
parallel time complexity.
W
2. If a problem of size W has a serial component WS , prove that WS is an upper bound on its speedup, no
matter how many processing elements are used. [5]
3. Suppose we are given a n × n integer matrix A and a vector B of size n. Also, we are given p processors
to work with. [5 + 5 = 10]
A. Describe a parallel formulation of matrix-vector multiplication with a schematic diagram in
which the matrix is 1-D block-partitioned along the columns and the vector is equally parti-
tioned among all the processes.
B. Show that the parallel runtime due to columnwise partitioning is asymptotically same as in
the case of rowwise 1-D block partitioning.
4. Suppose we are given n integers and p processors and our goal is to design an algorithm for adding these
n integers. [6 + 2 + 3 + 4 = 15]
n
A. Design a algorithm for computing the sum with parallel time complexity Tp = p + 2 log p.
Clearly write the steps of your algorithm.
B. What is the condition for the cost-optimality of the algorithm you have designed?
C. Compute the isoefficiency function for this algorithm.
D. Derive the number of processors such that the parallel time Tp is minimum.
5. In Bitonic sort, we perform sorting using a predefined sorting network. [3 + 5 + 2 = 10]
A. Given a random sequence 3, 6, 12, 4, 8, 10, 11, 1, convert this into a bitonic sequence of length
8. Clearly show the network.
B. Perform bitonic sort on the generated bitonic sequence to create a sorted sequence in ascending
order.
C. How many ⊕BM and how many ⊖BM are used in the entire process (creating bitonic sequence
and then sorting)?
2
6. Suppose the sequential runtime of an algorithm A is O(n2 ) and the parallel runtime is Θ( np )+Θ(n log p)
where p is the number of processors used. [6 + 2 + 2 = 10]
A. Show that minimum parallel runtime can be achieved if we use Θ( logn n ) processors.
B. To make the parallel algorithm cost-optimal, how many processors should we use?
C. Derive the isoefficiency function.
7. The sequential complexity of multiplying two n × n matrices using Strassen’s algorithm is Θ(n2.81 ).
Consider the simple matrix multiplication algorithm for multiplying two n×n matrices using p processes.
Assume that the √np × √np submatrices are multiplied using Strassen’s algorithm at each process.
[3 + 2 = 5]
A. Derive the parallel time complexity of the algorithm described using Strassen’s technique.
B. Explain if the parallel algorithm is cost optimal with respect to the Strassen’s algorithm for
matrix multiplication.
8. Consider DNS algorithm for matrix multiplication. [1 + 2 + 3 = 6]
A. What is the maximum number of processes the algorithm use?
B. Show the initial placement of 1st column (j = 1) of matrix A in the logical three dimensional
array P .
C. Show the placement of the elements of A in P after broadcasting the elements after placement
of 1st column of A to P .
9. [2 + 2 + 2 + 3 = 9]
A. Define Scaled speedup.
B. When do we consider a system as scalable in terms of scaled speedup?
C. What is memory-constrained scaled speedup?
D. Compute memory-constrained scaled speedup for matrix-vector multiplication with parallel
2
time Tp = tc np + ts log p + tw n.
Page 2