0% found this document useful (0 votes)

157 views7 pages

Algorithms For Parallel Machines

This document discusses algorithms for solving problems in parallel. It begins by asking questions about differences between parallel and sequential programming and achievability of superlinear speedup. It then discusses speedup and complexity measures and gives examples of histogram computation and parallel reduction. It describes adaptive quadrature and self-scheduling algorithms for numerically solving integrals in parallel. It concludes with an overview of matrix multiplication.

Uploaded by

shinde_jayesh2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

157 views7 pages

Algorithms For Parallel Machines

Uploaded by

shinde_jayesh2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 7

Algorithms for Parallel Machines

1) Problems which will be answered in the following

sections.

1 In what way is design PP different from design of sequential

programs?
2 Can we always write sequential programs first and then get
them to run on parallel machines?
3 Is SuperLinear Speedup achievable?

2) Speedup, Complexity and Cost

Speedup measures the degree of improvement in speed when

a problem is solved on a parallel machine as compared to
sequential machine.

Complexity can be considered of two types:

1. Worst case time complexity it is the maximum time taken

by a program to execute over all inputs.
2. Expected time complexity it is the average of execution
times over all inputs.

Analogously, we can define Worst case Space complexity and

Expected Space complexity.

3) Histogram Computation

Given an image as an array, Image[m][n] of integers in range

[0,255], find the Histogram of distribution of pixels over gray
scales in the image.
Sequential algorithm to solve above problem:

For(I = 0; I < 256; I ++)

Histogram[i] = 0;

For (I = 0; I < m; I ++)

For (j = 0; j < n; j ++){

Color = image[i][j];

Histogram[color]++;

Complexity of the above program is O(mn).

Lets parallelize the above program in following way.

Let there p processes numbered 0 to (p-1) and m rows (m > p) in

the image. Each process can be assigned (m / p) rows to compute
the Histogram. The Histogram is shared across the processes for
concurrent updation. Hence kth process executes the following:

For(i=k(m/p); I <(k+1)(m/p); i++)

For(j=0; j<n; j++){

Mutex_begin();

Color = image[i][j];

Histogram[color]++;

Mutex_end();

}
Still above program takes longer than the previous one to execute
due to use of mutex.

The better approach is that each process independently its

histogram which can be done in parallel. After all processes have
done the work, a single process accumulates the count in the P
separate histograms to update the final result.

For(i=k(m/p); I <(k+1)(m/p); i++)

For(j=0; j<n; j++){

Color = image[i][j];

Hist[color][k]++;

For(I = 0; I < 256; I ++)

For(j = 0; j < p; j ++)

Histogram [i] = histogram [i] + hist[i][j];

4) Parallel reduction :

Given a set of n values a0, a1, ..,a(n-1) and an associative

operator Reduction is the process of computing a0 a1 ..
a(n-1).

Eg of associative operator are :

Addition , multiplication etc

Algorithm for sequential reduction will be :

a) Initailise sum=0 b) iterate over the n integers in the input

adding it to the sum.

For parallel processing we view the process of reduction as a tree

structured operation. The element of input to the reduction
problem are placed at the leaf node of the binary tree.

The reduction operator is applied to the children of each parent

node and the result is propagated towards the root of the tree.

When the result at root node is completed the work of the

algorithm is over.

Analysis of parallel Reduction

There is dependency across levels of reduction. Unless

reduction at lower level is completed , higher level cannot
proceed with the operation.

After each step , the algorithm must have an equivalent of a

barrier operation.

5) Quadrature Problems :

For a function of type y=f(x) , we want to find the area under the
curve.

The range of methods for numerically computing this integral is

referred to as quadrature methods.

EG : Trapezoidal Rule.

The trapezoidal rule statically sub divides the domain into

uniformly spaced partitions .
The integral is estimated by summing up the areas of trapezia
approximating the area under the curve.

It is parallised by allotting non overlapping ranges of domain to

different process. Each process finds the local sum of areas in its
allocated range of domain and finally the global sum is updated.

Problems :

Accuracy depends on function : Since domain is divided

statically , the region where the domain has sharp variations
cannot be found very easily.Hence the accuracy suffers.

Complexity Increases.

Solution 1 : Adaptive Quadrature Algorithm :

It uses the divide and conquer method .

To find the integral of a function of type y=f(x) in the range [a,b],

we take the following steps.

1. Lets A= Area of the trapezium considering the end points

a,b.

B= Area of the trapezium considering the end points

[a, (a+b)/2]

C= Area of the trapezium considering the end points

[(a+b)/2 , b].

2. Sub-divide the range [a,b] if |A-(B+C)|>=0

3. Assign the responsibility of one sub domain to another process

and take the other part for local processing .
4. If the above condition does not hold , compute the quadrature
in the same process and update the global result.

Solution 2 : Self Scheduling Implementation

This scheme creates a group of worker processes and the

workers schedule themselves dynamically to pick up work that is
available from the shared global stack.

When there is no more work to be done , the workers perish.

This implementing strategy is called Self Scheduling.

The algorithm is :

The parent process :

1) Stores the task on shared stack

2) Creates the worker process.

Each worker Process :

1) Pops a task from the stack and starts working on it .

2) If problem cannot be solved immediately , partition the tasks

further and push the subtasks on the stack. Go to 1 and repeat
the same steps.

3) If there is no space on the stack then process must

sequentially do the tasks and update the global result.

4) All processes terminate when there is no more work to be

done.

6.Matrix Multiplication
Sequential algorithm for matrix multiplication

For(i=0 ; i<l ; i++)

For(j=0 ; j<n ; i++)

For(i=0 ; i<l ; i++)

UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
No ratings yet
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
12 pages
Co 2
No ratings yet
Co 2
22 pages
Bert 2a Parallel Algorithms Parfor Quicksort Reduction Listranking Rootfinding Postordernumbering
No ratings yet
Bert 2a Parallel Algorithms Parfor Quicksort Reduction Listranking Rootfinding Postordernumbering
73 pages
HPC Notes Unit 3
No ratings yet
HPC Notes Unit 3
7 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
Unit - 2 HPC
No ratings yet
Unit - 2 HPC
96 pages
RG2 ParallelizationPrinciples HPCAI Jan2020
No ratings yet
RG2 ParallelizationPrinciples HPCAI Jan2020
40 pages
Chapter Six
No ratings yet
Chapter Six
18 pages
Parallel Algorithms: Theory and Practice
No ratings yet
Parallel Algorithms: Theory and Practice
44 pages
Parralel Demro 002
No ratings yet
Parralel Demro 002
61 pages
Chapter 14: Parallel Algorithms
No ratings yet
Chapter 14: Parallel Algorithms
23 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Chapter Six
No ratings yet
Chapter Six
19 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
SWE2017 - Lab Assignment 1pages-7
No ratings yet
SWE2017 - Lab Assignment 1pages-7
5 pages
3.1.3 Processes and Mapping (1/5)
No ratings yet
3.1.3 Processes and Mapping (1/5)
74 pages
Parallel Algorithm Design Guide
No ratings yet
Parallel Algorithm Design Guide
35 pages
Fast Parallel Algorithm For Polynomial Interpolation: Pergamon 0898-1221 (94) 00241-X
No ratings yet
Fast Parallel Algorithm For Polynomial Interpolation: Pergamon 0898-1221 (94) 00241-X
8 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
No ratings yet
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
104 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
ECE408 MT2 Review FA24
No ratings yet
ECE408 MT2 Review FA24
58 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
Unit 2 HPC - Nap
No ratings yet
Unit 2 HPC - Nap
72 pages
Ch03 - Embarrassingly Parallel Computations 2023-2024
No ratings yet
Ch03 - Embarrassingly Parallel Computations 2023-2024
34 pages
Unit 2 - Part - 1
No ratings yet
Unit 2 - Part - 1
32 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Algorithm Study Guide for Students
No ratings yet
Algorithm Study Guide for Students
23 pages
Processes and Mapping, Decomposition Techniques
No ratings yet
Processes and Mapping, Decomposition Techniques
28 pages
MIT1 204S10 Lec05
No ratings yet
MIT1 204S10 Lec05
13 pages
01-Parallel Computing
No ratings yet
01-Parallel Computing
7 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Sols Book PDF
100% (1)
Sols Book PDF
120 pages
Unit 1
No ratings yet
Unit 1
39 pages
PST Unit 1
No ratings yet
PST Unit 1
52 pages
Answers 2
No ratings yet
Answers 2
29 pages
Algorithm - Flowchart - Examples
100% (1)
Algorithm - Flowchart - Examples
22 pages
1.1 Parallelism Is Ubiquitous
No ratings yet
1.1 Parallelism Is Ubiquitous
3 pages
qt6j57h5zw Nosplash
No ratings yet
qt6j57h5zw Nosplash
2 pages
HPC Chapter 2
No ratings yet
HPC Chapter 2
16 pages
Parallel Programming Essentials
No ratings yet
Parallel Programming Essentials
40 pages
Common PDC Module3
No ratings yet
Common PDC Module3
43 pages
Parallel Computation Models Explained
No ratings yet
Parallel Computation Models Explained
3 pages
Dis Top Tim Notes 1
No ratings yet
Dis Top Tim Notes 1
3 pages
Numerical Methods For Partial Differential Algebraic Systems of Equations
No ratings yet
Numerical Methods For Partial Differential Algebraic Systems of Equations
61 pages
High Performance Computing Labs & Concepts
No ratings yet
High Performance Computing Labs & Concepts
5 pages
3-Parallel Software
No ratings yet
3-Parallel Software
35 pages
ACA 2024W 02 Program Design
No ratings yet
ACA 2024W 02 Program Design
27 pages
Pap 3 Shared Memory Algos
No ratings yet
Pap 3 Shared Memory Algos
23 pages
Lecture 2-The Big-Oh Notation
No ratings yet
Lecture 2-The Big-Oh Notation
30 pages
The Sieve of Eratosthenes
No ratings yet
The Sieve of Eratosthenes
68 pages
VSS NumericalLibraries
No ratings yet
VSS NumericalLibraries
21 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
Web GPU
0% (1)
Web GPU
40 pages
Problem Solving & Algorithm Design
No ratings yet
Problem Solving & Algorithm Design
49 pages
BGP Lab Setup with Packet Tracer
No ratings yet
BGP Lab Setup with Packet Tracer
6 pages
How To Configure Routing Information Protocol RIP Routing Information Protocol Version 2 RIPv2 Lab PR
No ratings yet
How To Configure Routing Information Protocol RIP Routing Information Protocol Version 2 RIPv2 Lab PR
7 pages
Vmware Esxi Cookbook
No ratings yet
Vmware Esxi Cookbook
5 pages
Foundations of AI: Ch. 12: Propositional Logic
No ratings yet
Foundations of AI: Ch. 12: Propositional Logic
40 pages
8 Introduction To IDA Pro
No ratings yet
8 Introduction To IDA Pro
34 pages
AI Commonsense for Beginners
100% (2)
AI Commonsense for Beginners
26 pages
Intro To Java: 2: More Types, Methods, Conditionals
No ratings yet
Intro To Java: 2: More Types, Methods, Conditionals
55 pages
1 Templates: Int Const Int Const Int Return
No ratings yet
1 Templates: Int Const Int Const Int Return
7 pages
Pointers 1 Background: 1.1 Variables and Memory
No ratings yet
Pointers 1 Background: 1.1 Variables and Memory
12 pages
1 Compiled Languages and C++: 1.1 Why Use A Language Like C++?
No ratings yet
1 Compiled Languages and C++: 1.1 Why Use A Language Like C++?
17 pages
Project Quality Plan (PQP) Submittal Checklist: Available Y / N Adequate Y / N
100% (3)
Project Quality Plan (PQP) Submittal Checklist: Available Y / N Adequate Y / N
3 pages
Accounting Exercises for Students
No ratings yet
Accounting Exercises for Students
3 pages
Bulk Solids Chute Design Guide
No ratings yet
Bulk Solids Chute Design Guide
17 pages
Alien Influence on Atlantis and Humanity
100% (2)
Alien Influence on Atlantis and Humanity
10 pages
P Block Notes
No ratings yet
P Block Notes
6 pages
Cit211 Explanation
No ratings yet
Cit211 Explanation
4 pages
Bommer Et Al 2015 A Sshac Level 3 Probabilistic Seismic Hazard Analysis For A New Build Nuclear Site in South Africa
No ratings yet
Bommer Et Al 2015 A Sshac Level 3 Probabilistic Seismic Hazard Analysis For A New Build Nuclear Site in South Africa
38 pages
User Manual
No ratings yet
User Manual
2 pages
P2AP PartIV Learnhowtodraftapatentapplication Final 0
No ratings yet
P2AP PartIV Learnhowtodraftapatentapplication Final 0
36 pages
Ancillary Eqpmnts
No ratings yet
Ancillary Eqpmnts
24 pages
PfBlockerNGConfigurationGuide-StepByStep 1714660961696
No ratings yet
PfBlockerNGConfigurationGuide-StepByStep 1714660961696
4 pages
Tomahawk NV Manual
No ratings yet
Tomahawk NV Manual
2 pages
SML Prospekt Cast PET - Final
No ratings yet
SML Prospekt Cast PET - Final
32 pages
Applied Radiological Anatomy 2nd Semester
No ratings yet
Applied Radiological Anatomy 2nd Semester
7 pages
GB Testing in China: New Requirements
No ratings yet
GB Testing in China: New Requirements
43 pages
VAP Prevention in Critical Care
No ratings yet
VAP Prevention in Critical Care
8 pages
Lecture Notes On Topological Insulators: Zyuzin and Burkov 2012
No ratings yet
Lecture Notes On Topological Insulators: Zyuzin and Burkov 2012
3 pages
The Jungle Excerpt Questions and HIPP
No ratings yet
The Jungle Excerpt Questions and HIPP
5 pages
Value Based Healthcare - DR Robert Kaplan
100% (1)
Value Based Healthcare - DR Robert Kaplan
43 pages
Topic 8
No ratings yet
Topic 8
58 pages
Gmail - 答复 - Jomil - completion of works
No ratings yet
Gmail - 答复 - Jomil - completion of works
10 pages
Wallprinter
No ratings yet
Wallprinter
2 pages
Gold Standard Benchmark For Cisco IOS Routers. Gold Standard Benchmark Version 3.0.1
No ratings yet
Gold Standard Benchmark For Cisco IOS Routers. Gold Standard Benchmark Version 3.0.1
37 pages
Cabin Interior System - Lavatory
No ratings yet
Cabin Interior System - Lavatory
66 pages
MITWPU - Unit 2-Theory of Computation
No ratings yet
MITWPU - Unit 2-Theory of Computation
50 pages
Everest Simulation Report PDF
No ratings yet
Everest Simulation Report PDF
17 pages
Ground Sensor Ga Class 0940 Testing
No ratings yet
Ground Sensor Ga Class 0940 Testing
4 pages
Jurnal A
No ratings yet
Jurnal A
7 pages
003a L02 Education Policy Planning
No ratings yet
003a L02 Education Policy Planning
55 pages
DLL Matatag - Tle 8 q2 w2
100% (1)
DLL Matatag - Tle 8 q2 w2
12 pages

Algorithms For Parallel Machines

Uploaded by

Algorithms For Parallel Machines

Uploaded by

Algorithms for Parallel Machines

1) Problems which will be answered in the following

1 In what way is design PP different from design of sequential

2) Speedup, Complexity and Cost

Speedup measures the degree of improvement in speed when

Complexity can be considered of two types:

1. Worst case time complexity it is the maximum time taken

Analogously, we can define Worst case Space complexity and

Given an image as an array, Image[m][n] of integers in range

For(I = 0; I < 256; I ++)

For (I = 0; I < m; I ++)

For (j = 0; j < n; j ++){

Complexity of the above program is O(mn).

Lets parallelize the above program in following way.

Let there p processes numbered 0 to (p-1) and m rows (m > p) in

For(i=k*(m/p); I <(k+1)*(m/p); i++)

For(j=0; j<n; j++){

The better approach is that each process independently its

For(i=k*(m/p); I <(k+1)*(m/p); i++)

For(j=0; j<n; j++){

For(I = 0; I < 256; I ++)

For(j = 0; j < p; j ++)

Histogram [i] = histogram [i] + hist[i][j];

Given a set of n values a0, a1, ..,a(n-1) and an associative

Eg of associative operator are :

Addition , multiplication etc

a) Initailise sum=0 b) iterate over the n integers in the input

For parallel processing we view the process of reduction as a tree

The reduction operator is applied to the children of each parent

When the result at root node is completed the work of the

Analysis of parallel Reduction

There is dependency across levels of reduction. Unless

After each step , the algorithm must have an equivalent of a

The range of methods for numerically computing this integral is

The trapezoidal rule statically sub divides the domain into

It is parallised by allotting non overlapping ranges of domain to

Accuracy depends on function : Since domain is divided

Solution 1 : Adaptive Quadrature Algorithm :

It uses the divide and conquer method .

To find the integral of a function of type y=f(x) in the range [a,b],

1. Lets A= Area of the trapezium considering the end points

B= Area of the trapezium considering the end points

C= Area of the trapezium considering the end points

2. Sub-divide the range [a,b] if |A-(B+C)|>=0

3. Assign the responsibility of one sub domain to another process

Solution 2 : Self Scheduling Implementation

This scheme creates a group of worker processes and the

When there is no more work to be done , the workers perish.

This implementing strategy is called Self Scheduling.

The parent process :

1) Stores the task on shared stack

2) Creates the worker process.

Each worker Process :

1) Pops a task from the stack and starts working on it .

2) If problem cannot be solved immediately , partition the tasks

3) If there is no space on the stack then process must

4) All processes terminate when there is no more work to be

For(i=0 ; i<l ; i++)

For(j=0 ; j<n ; i++)

For(i=0 ; i<l ; i++)

You might also like

For(i=k(m/p); I <(k+1)(m/p); i++)

For(i=k(m/p); I <(k+1)(m/p); i++)