0% found this document useful (0 votes)

16 views17 pages

L14 - Parallelization

Uploaded by

李建霖

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views17 pages

L14 - Parallelization

Uploaded by

李建霖

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Lecture on Parallelization

I. Basic Parallelization
II. Data dependence analysis
III. Interprocedural parallelization

Chapter 11.1-11.1.4

Carnegie Mellon
M. Lam & S. Liao CS243: Parallelization 1
Parallelization of Numerical Applications
• DoAll loop parallelism
– Find loops whose iterations are independent
– Number of iterations typically scales with the problem
– Usually much larger than the number of processors in a machine
– Divide up iterations across machines

Carnegie Mellon
CS243: Parallelization 2 M. Lam & S. Liao
Basic Parallelism
Examples:
FOR i = 1 to 100
A[i] = B[i] + C[i]

FOR i = 11 TO 20
a[i] = a[i-1] + 3

FOR i = 11 TO 20
a[i] = a[i-10] + 3

• Does there exist a data dependence edge between two different

iterations?
• A data dependence edge is loop-carried if
it crosses iteration boundaries
• DoAll loops: loops without loop-carried dependences

Carnegie Mellon
CS243: Parallelization 3 M. Lam & S. Liao
Recall: Data Dependences
• True dependence:
a =
= a
• Anti-dependence:

= a
a =
• Output dependence

a =
a =

Carnegie Mellon
CS243: Parallelization 4 M. Lam & S. Liao
Affine Array Accesses
• Common patterns of data accesses: (i, j, k are loop indexes)
A[i], A[j], A[i-1], A[0], A[i+j], A[2*i], A[2*i+1]

A[i,j], A[i-1, j+1]

• Array indexes are affine expressions of surrounding loop indexes

– Loop indexes: in, in-1, ... , i1
– Integer constants: cn, cn-1, ... , c0
– Array index: cnin + cn-1in-1+ ... + c1i1+ c0
– Affine expression: linear expression + a constant term (c0)

Carnegie Mellon
CS243: Parallelization 5 M. Lam & S. Liao
II. Formulating Data Dependence Analysis
FOR i := 2 to 5 do
A[i-2] = A[i]+1;

• Between read access A[i] and write access A[i-2]there is a

dependence if:
– there exist two iterations ir and iw within the loop bounds, s.t.
– iterations ir & iw read & write the same array element, respectively

∃integers i w, i r 2 ≤ i w, i r ≤ 5 i r = iw - 2

• Between write access A[i-2] and write access A[i-2] there is a

dependence if:
∃integers i w, i v 2 ≤ i w, i v ≤ 5 i w – 2 = iv - 2

– To rule out the case when the same instance depends on itself:
• add constraint iw ≠ iv

Carnegie Mellon
CS243: Parallelization 6 M. Lam & S. Liao
Memory Disambiguation

is
Undecidable at Compile Time

read(n)
For i =
a[i] = a[n]

Carnegie Mellon
CS243: Parallelization 7 M. Lam & S. Liao
Domain of Data Dependence Analysis
• Only use loop bounds and array indexes that are affine functions of loop
variables
for i = 1 to n
for j = 2i to 100
a[i+2j+3][4i+2j][i*i] = …
… = a[1][2i+1][j]
• Assume a data dependence between the read & write operation if there
exists:
– a read instance with indexes ir, jr and
– a write instance with indexes iw, jw
∃integers ir, jr, iw, jw 1 ≤ iw, ir ≤ n 2iw ≤ jw ≤ 100
2ir ≤ jr ≤ 100
iw + 2jw + 3 = 1 4iw + 2jw = 2ir + 1
• Equate each dimension of array access; ignore non-affine ones
– No solution → No data dependence
– Solution → there may be a dependence

Carnegie Mellon
CS243: Parallelization 8 M. Lam & S. Liao
Complexity of Data Dependence Analysis
For every pair of accesses not necessarily distinct (F1 , f1 ) and (F2, f2)
one must be a write operation
Let B1i1+b1 ≥ 0, B2i2+b2 ≥ 0 be the corresponding loop bound
constraints,
∃ integers i1, i2 B1i1 + b1 ≥ 0, B2i2 + b2 ≥ 0
F1 i1+ f1 = F2 i2+f2

• If the accesses
Equivalent are not
to integer distinct,
linear then add the constraint i 1 ≠ i2
programming

∃ integer i A 1i ≤ b 1 A 2i = b 2

• Integer linear programming is NP-complete

– O(size of the coefficients) or O(nn)

Carnegie Mellon
CS243: Parallelization 9 M. Lam & S. Liao
Data Dependence Analysis Algorithm

• Typically solving many tiny, repeated problems

– Integer linear programming packages optimize for large problems
– Use memoization to remember the results of simple tests

• Apply a series of relatively simple tests

– GCD: 2*i, 2*i+1; GCD for simultaneous equations
– Test if the ranges overlap

• Backed up by a more expensive algorithm

– Use Fourier-Motzkin Elimination to test if there is a real solution
• Keep eliminating variables to see if a solution remains
• If there is no solution, then there is no integer solution

Carnegie Mellon
10 M. Lam & S. Liao
Fourier-Motzkin Elimination

• To eliminate a variable from a set of linear inequalities.

• To eliminate a variable x1
– Rewrite all expressions in terms of lower or upper bounds of x1
– Create a transitive constraint for each pair of lower and upper bounds.
• Example: Let L, U be lower bounds and upper bounds resp
– To eliminate x1:

L1(x2, …, xn) ≤ U1 (x2, …, xn)

L1(x2, …, xn) ≤ x1 ≤ U1 (x2, …, xn) L1(x2, …, xn) ≤ U2 (x2, …, xn)
L2(x2, …, xn) ≤ x1 ≤ U2 (x2, …, xn) L2(x2, …, xn) ≤ U1 (x2, …, xn)
L2(x2, …, xn) ≤ U2 (x2, …, xn)

Carnegie Mellon
11 M. Lam & S. Liao
Example
FOR i = 1 to 5
FOR j = i+1 to 5
A[i,j] = f(A[i,i], A[i-1,j])

1≤i 1 ≤ i’
i≤5 i’ ≤ 5
i+1 ≤ j i’ + 1 ≤ j’
j≤5 j’ ≤ 5

Carnegie Mellon
CS243: Parallelization 12 M. Lam & S. Liao
Data Dependence Analysis Algorithm
• Typically solving many tiny, repeated problems
– Integer linear programming packages optimize for large problems
– Use memoization to remember the results of simple tests

• Apply a series of relatively simple tests

– GCD: 2*i, 2*i+1; GCD for simultaneous equations
– Test if the ranges overlap

• Backed up by a more expensive algorithm

– Use Fourier-Motzkin Elimination to test if there is a real solution
• Keep eliminating variables to see if a solution remains
• Add heuristics to encourage finding an integer solution.
– Create 2 subproblems if a real, but not integer, solution is found.
• For example, if x = .5 is a solution,
create two problems,
by adding x ≤ 0 and x ≥ 1 respectively to original constraint.

Carnegie Mellon
13 M. Lam & S. Liao
Relaxing Dependences
Privatization:
• Scalar
for i = 1 to n
t = (A[i] + B[i]) / 2;
C[i] = t * t;
• Array
for i = 1 to n
for j = 1 to n
t[j] = (A[i,j] + B[i,j]) / 2;
for j = 1 to n
C[i,j] = t[j] * t[j];
Reduction:
for i = 1 to n
sum = sum + A[i];

Carnegie Mellon
CS243: Parallelization 14 M. Lam & S. Liao
Carnegie Mellon
CS243: Parallelization 15 M. Lam & S. Liao
Interprocedural Parallelization
• Why? Amdahl’s Law
• Interprocedural symbolic analysis

– Find interprocedural array indexes

which are affine expressions of outer loop indices
• Interprocedural parallelization analysis

– Data dependence based on summaries of array regions accessed

• If the regions do not intersect, there is parallelism
– Find privatizable scalar variables and arrays
– Find scalar and array reductions

Carnegie Mellon
CS243: Parallelization 16 M. Lam & S. Liao
Conclusions
• Basic parallelization
– Doall loop: loops with no loop-carried data dependences
– Data dependence for affine loop indexes = integer linear programming

• Coarse-grain parallelism because of Amdahl’s Law

– Interprocedural analysis is useful for affine indices
– Ask users for help on unresolved dependences

Carnegie Mellon
CS243: Parallelization 17 M. Lam & S. Liao

PDC Lecture 04
No ratings yet
PDC Lecture 04
44 pages
L19 Parallelization
No ratings yet
L19 Parallelization
11 pages
6.189 Lecture11 Compilers
No ratings yet
6.189 Lecture11 Compilers
68 pages
14-Parallelization and Automatic Parallelization-08!11!2024
No ratings yet
14-Parallelization and Automatic Parallelization-08!11!2024
50 pages
c3 Dependence Analysis p3
No ratings yet
c3 Dependence Analysis p3
20 pages
Capp 1
No ratings yet
Capp 1
38 pages
Dependencies, Instruction Scheduling, Optimization, and Parallelism
No ratings yet
Dependencies, Instruction Scheduling, Optimization, and Parallelism
49 pages
F10 - Parallelizing Compilers
No ratings yet
F10 - Parallelizing Compilers
77 pages
Data Dependence & Parallelization
No ratings yet
Data Dependence & Parallelization
13 pages
Advanced Loop Parallelism Techniques
No ratings yet
Advanced Loop Parallelism Techniques
35 pages
Data Dependences: CS 524 - High-Performance Computing
No ratings yet
Data Dependences: CS 524 - High-Performance Computing
20 pages
Dependence Analysis 1996
No ratings yet
Dependence Analysis 1996
226 pages
c3 Dependence Analysis p1
No ratings yet
c3 Dependence Analysis p1
32 pages
c3 Dependence Analysis p2
No ratings yet
c3 Dependence Analysis p2
22 pages
Homework #2 Solution: Department of Electrical and Computer Engineering University of Wisconsin - Madison
100% (1)
Homework #2 Solution: Department of Electrical and Computer Engineering University of Wisconsin - Madison
14 pages
25 Optimization
No ratings yet
25 Optimization
54 pages
CS-3006 9 DependenceAnalysis
No ratings yet
CS-3006 9 DependenceAnalysis
67 pages
Dependency Analysis of For-Loop Structures For Automatic Parallelization of C Code
No ratings yet
Dependency Analysis of For-Loop Structures For Automatic Parallelization of C Code
13 pages
Op Tim Ization
No ratings yet
Op Tim Ization
70 pages
HPC Unit 5 B
No ratings yet
HPC Unit 5 B
31 pages
Integer Set Library (ISL)
No ratings yet
Integer Set Library (ISL)
61 pages
Parallel Computing Architectures
No ratings yet
Parallel Computing Architectures
57 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
Fundamental of Computing
No ratings yet
Fundamental of Computing
17 pages
To Read Dynprog2
No ratings yet
To Read Dynprog2
50 pages
Introduction To Algorithms: Dynamic Programming
No ratings yet
Introduction To Algorithms: Dynamic Programming
25 pages
Advanced Dynamic Programming
No ratings yet
Advanced Dynamic Programming
4 pages
Matrix Multiplication Optimization
No ratings yet
Matrix Multiplication Optimization
32 pages
Algorithm Analysis & Pseudocode
No ratings yet
Algorithm Analysis & Pseudocode
5 pages
09 Pointers Arrays
No ratings yet
09 Pointers Arrays
34 pages
Sols Book PDF
100% (1)
Sols Book PDF
120 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
Data Flow Analysis Lecture Notes
No ratings yet
Data Flow Analysis Lecture Notes
11 pages
To Print - Dynprog2
No ratings yet
To Print - Dynprog2
46 pages
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
No ratings yet
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
35 pages
Final Exam v1 Solution
No ratings yet
Final Exam v1 Solution
5 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
No ratings yet
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
3 pages
Strassen's Algorithm & Optimization
No ratings yet
Strassen's Algorithm & Optimization
8 pages
Lab3
No ratings yet
Lab3
11 pages
DAA Lecture 9
No ratings yet
DAA Lecture 9
26 pages
AOA Comps 26 Exp3
No ratings yet
AOA Comps 26 Exp3
10 pages
228 Sakshi Pahade Lab Manual 5
No ratings yet
228 Sakshi Pahade Lab Manual 5
13 pages
Dynamic Programming: 1 Matrix-Chain Multiplication
No ratings yet
Dynamic Programming: 1 Matrix-Chain Multiplication
5 pages
AP Lab Assignment
No ratings yet
AP Lab Assignment
13 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
Class18 - Linalg II Handout PDF
No ratings yet
Class18 - Linalg II Handout PDF
48 pages
Dynamic Programming 6
No ratings yet
Dynamic Programming 6
18 pages
Dynamic Programming
No ratings yet
Dynamic Programming
18 pages
Algorithm Design
No ratings yet
Algorithm Design
579 pages
Algorithms and Data Structure
No ratings yet
Algorithms and Data Structure
29 pages
BHH 93
No ratings yet
BHH 93
27 pages
Week 9N
No ratings yet
Week 9N
9 pages
High Level Synthesis II: ECE 3401 Digital Systems Design
No ratings yet
High Level Synthesis II: ECE 3401 Digital Systems Design
35 pages
Computer Science RAM Lecture
No ratings yet
Computer Science RAM Lecture
9 pages
Fundamentals of Algorithms 10B11CI411: Dynamic Programming Instructor: Raju Pal
No ratings yet
Fundamentals of Algorithms 10B11CI411: Dynamic Programming Instructor: Raju Pal
70 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Design and Analysis of Algorithms: Dr. Muhammad Safysn Spring 2019
No ratings yet
Design and Analysis of Algorithms: Dr. Muhammad Safysn Spring 2019
89 pages
Multiple Choice Questions B A History Capitalism and Colonialism Semester VI
No ratings yet
Multiple Choice Questions B A History Capitalism and Colonialism Semester VI
5 pages
P 4 1 0 A 6 X 4 Euro 4: Chassis Specification
No ratings yet
P 4 1 0 A 6 X 4 Euro 4: Chassis Specification
4 pages
Material Selection For The Aircraft: Design Criteria
100% (1)
Material Selection For The Aircraft: Design Criteria
5 pages
Storytelling and Worksheet
No ratings yet
Storytelling and Worksheet
3 pages
LAPLACIAN SPECTRUM OF WEAKLY ZERO-DIVISOR GRAPH OF THE RING ZN
No ratings yet
LAPLACIAN SPECTRUM OF WEAKLY ZERO-DIVISOR GRAPH OF THE RING ZN
10 pages
IoT-Based Battery Health Monitoring
No ratings yet
IoT-Based Battery Health Monitoring
6 pages
SAP PM - Key Figures For Order Costs
No ratings yet
SAP PM - Key Figures For Order Costs
3 pages
ESL Brains Theyre My Friends To Be Positive SV 7941 2
No ratings yet
ESL Brains Theyre My Friends To Be Positive SV 7941 2
5 pages
Namo Venkatesa
100% (1)
Namo Venkatesa
2 pages
Sect 3. Emergency Procedures
100% (1)
Sect 3. Emergency Procedures
108 pages
Aggregate & Capacity Planning Guide
100% (2)
Aggregate & Capacity Planning Guide
10 pages
Hajvery University
No ratings yet
Hajvery University
1 page
Mitutoyo
No ratings yet
Mitutoyo
32 pages
DLL English 10 Q1 - Module 1 - Lesson 3 - Myth, Implicit and Explicit Signals, Let It Go, Orpheus, Life of Pi
No ratings yet
DLL English 10 Q1 - Module 1 - Lesson 3 - Myth, Implicit and Explicit Signals, Let It Go, Orpheus, Life of Pi
8 pages
Srinivasarao Resume
No ratings yet
Srinivasarao Resume
2 pages
Nutrition For Preschool-Age Children
50% (2)
Nutrition For Preschool-Age Children
57 pages
Study Scheme - GBS 2024-25 Circular (AB2024 C049) 0
No ratings yet
Study Scheme - GBS 2024-25 Circular (AB2024 C049) 0
11 pages
CHA Hyderabad (AutoRecovered) Jan2023
No ratings yet
CHA Hyderabad (AutoRecovered) Jan2023
18 pages
Money Adv Comp Essay
No ratings yet
Money Adv Comp Essay
5 pages
Aakriti Mahajan
No ratings yet
Aakriti Mahajan
45 pages
Jurnal A
No ratings yet
Jurnal A
7 pages
Toyota Mirai FCV Posters LR Tcm-11-564265
No ratings yet
Toyota Mirai FCV Posters LR Tcm-11-564265
10 pages
Result Declared - MJ - 2025 - 05.07.2025
No ratings yet
Result Declared - MJ - 2025 - 05.07.2025
47 pages
Value Based Healthcare - DR Robert Kaplan
100% (1)
Value Based Healthcare - DR Robert Kaplan
43 pages
7 Steps To Create Systems That Will Change Your Life
No ratings yet
7 Steps To Create Systems That Will Change Your Life
6 pages
M.Com Marketing Analysis: Apple
No ratings yet
M.Com Marketing Analysis: Apple
19 pages
SPJIMR Form B (Edit 1) - 2
100% (1)
SPJIMR Form B (Edit 1) - 2
3 pages
Unit 1 - Understanding Guidance
No ratings yet
Unit 1 - Understanding Guidance
13 pages
BTD-300 Software Manual - 106546.04C
No ratings yet
BTD-300 Software Manual - 106546.04C
46 pages
Gamma-Gamma Fading in FSO MIMO Systems
No ratings yet
Gamma-Gamma Fading in FSO MIMO Systems
12 pages

L14 - Parallelization

Uploaded by

L14 - Parallelization

Uploaded by

Lecture on Parallelization

• Does there exist a data dependence edge between two different

A[i,j], A[i-1, j+1]

• Array indexes are affine expressions of surrounding loop indexes

• Between read access A[i] and write access A[i-2]there is a

• Between write access A[i-2] and write access A[i-2] there is a

• Integer linear programming is NP-complete

• Typically solving many tiny, repeated problems

• Apply a series of relatively simple tests

• Backed up by a more expensive algorithm

• To eliminate a variable from a set of linear inequalities.

L1(x2, …, xn) ≤ U1 (x2, …, xn)

• Apply a series of relatively simple tests

• Backed up by a more expensive algorithm

– Find interprocedural array indexes

– Data dependence based on summaries of array regions accessed

• Coarse-grain parallelism because of Amdahl’s Law

You might also like