0% found this document useful (0 votes)

9 views11 pages

L19 Parallelization

The document discusses data dependence in parallel programming, defining four types: flow, anti, output, and input dependence. It explains the implications of these dependencies on the execution order of statements and introduces the concept of dependence graphs. Additionally, it presents examples and problem formulations to illustrate how dependence testing can be framed as an integer linear programming problem.

Uploaded by

Zhiyuan Lei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views11 pages

L19 Parallelization

Uploaded by

Zhiyuan Lei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Dependence

Lecture 19
Array Dependence Analysis &
S1 : A  1.0
S2 : B  A  2.0
S3 : A  C D

Parallelization S4 : A  B/C


We define four types of data dependence.

 Flow (true) dependence: a statement Si precedes a

statement Sj in execution and Si computes a data value that
Sj uses.

 Implies that Si must execute before Sj.

Si δt Sj (S1 δt S2 and S2 δt S4 )
[ALSU 11.6]
Carnegie Mellon Carnegie Mellon
Phillip B. Gibbons 15-745: Parallelization Optimizing Compilers: Parallelization -2-

Data Dependence Data Dependence

S1 : A  1.0 S1 : A  1.0
S2 : B  A  2.0 S2 : B  A  2.0
S3 : A  C D S3 : A  C D

S4 : A  B/C S4 : A  B/C
 

We define four types of data dependence. We define four types of data dependence.

 Anti dependence: a statement Si precedes a statement Sj in  Output dependence: a statement Si precedes a statement Sj
execution and Si uses a data value that Sj computes. in execution and Si computes a data value that Sj also
computes.
 It implies that Si must be executed before Sj.
 It implies that Si must be executed before Sj.
Si δ Sj
a
(S2 δ S3 )
a
Si δo Sj (S1 δo S3 and S3 δo S4 )

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -3- Optimizing Compilers: Parallelization -4-

1
Data Dependence Data Dependence (continued)
 The dependence is said to flow from Si to Sj because Si
precedes Sj in execution.
S1 : A  1.0
S2 : B  A  2.0  Si is said to be the source of the dependence. Sj is said to
S3 : A  C D be the sink of the dependence.

S4 : A  B/C The only “true” dependence is flow dependence; it



represents the flow of data in the program.
The other types of dependence are caused by programming
We define four types of data dependence.

style; they may be eliminated by re-naming.
 Input dependence: a statement Si precedes a statement Sj
in execution and Si uses a data value that Sj also uses. S1 : A  1.0
S2 : B  A  2.0
 Does this imply that Si must execute before Sj? S3 : A1  C  D

S4 : A2  B/C


Si δI Sj (S3 δI S4 )

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -5- Optimizing Compilers: Parallelization -6-

Data Dependence (continued) Value or Location?

 Data dependence in a program may be represented using a  There are two ways a dependence is defined: value-oriented
dependence graph G=(V,E), where the nodes V represent or location-oriented.
statements in the program and the directed edges E
represent dependence relations.

S1 : A  1.0
S1 S2 : B  A  2.0
S1 : A  1.0 dt
S3 : A  C D
S2 : B  A  2.0 S2 do
S3 : A  C D da

S4 : A  B/C
 dt S3
S4 : A  B/C do dI

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -7- Optimizing Compilers: Parallelization -8-

2
Example 1 Example 2

i=2 i=3 i=4 i=2 i=3 i=4

for i = 2 to 4 { S1[2] S2[2] S1[3] S2[3] S1[4] S2[4] do i = 2, 4 S1[2] S2[2] S1[3] S2[3] S1[4] S2[4]
S1: a[i] = b[i] + c[i] ; S1: a(i) = b(i) + c(i)
S2: d[i] = a[i] S2: d(i) = a(i-1)
} dt dt dt end do
a[2] a[2] a[3] a[3] a[4] a[4] a(2) a(1) a(3) a(2) a(4) a(3)

 There is an instance of S1 that precedes an instance of S2 in dt dt

execution and S1 produces data that S2 consumes.
 There is an instance of S1 that precedes an instance of S2 in
 S1 is the source of the dependence; S2 is the sink of the execution and S1 produces data that S2 consumes.
dependence.
 S1 is the source of the dependence; S2 is the sink of the
 The dependence flows between instances of statements in the dependence.
same iteration (loop-independent dependence).
 The dependence flows between instances of statements in
 The number of iterations between source and sink (dependence different iterations (loop-carried dependence).
distance) is 0. The dependence direction is =.
 The dependence distance is 1. The direction is positive (<).
S1 δt S2 or S1 δ0t S2
S1 δt S2 or S1 δ1t S2
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -9- Optimizing Compilers: Parallelization -10-

Example 3 Example 4

i=2 i=3 i=4 do i = 2, 4 a(1,3) a(1,4) a(1,5)

do j = 2, 4 S[2,2] S[2,3] S[2,4]
do i = 2, 4 S1[2] S2[2] S1[3] S2[3] S1[4] S2[4]
S: a(i,j) = a(i-1,j+1)
S1: a(i) = b(i) + c(i) end do
S2: d(i) = a(i+1) end do a(2,2) dt a(2,3) dt a(2,4)
da da
end do
a(2) a(3) a(3) a(4) a(4) a(5)  An instance of S precedes
another instance of S and a(2,3) a(2,4) a(2,5)
 There is an instance of S2 that precedes an instance of S1 in S produces data that S S[3,2] S[3,3] S[3,4]
execution and S2 consumes data that S1 produces. consumes.
 S2 is the source of the dependence; S1 is the sink of the S is both source and sink.
a(3,2) dt a(3,3) dt a(3,4)

dependence.
 The dependence is loop-
 The dependence is loop-carried. carried.
a(3,3) a(3,4) a(3,5)
 The dependence distance is 1.  The dependence distance S[4,2] S[4,3] S[4,4]
is (1,-1).
S2 δ S1
a
 or S2 δ S1
1
a

Are you sure you know why it is S2 d<a S1 even though S1 appears a(4,2) a(4,3) a(4,4)
S δ(t, ) S S δ(1, S

before S2 in the code? or
t
1)

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -11- Optimizing Compilers: Parallelization -12-

3
Problem Formulation Problem Formulation
Consider the following perfect nest of depth d: Dependence willexist
 if there exists two iteration vectors k

 
and j such that L  k  j  U and:
 

do I1  L1 , U1 array reference f1(k )  g1( j )

 
do I2  L2 , U2 and
a( , fk (I ) ,  , ) f2 (k )  g2 ( j )
 
and

do Id  Ld , Ud 


a(f1 (I ), f2 (I ), ,fm (I ))   and

 
fm (k)  gm ( j )
 
  a(g1 (I ), g2 (I ), , gm (I )) subscript subscript


enddo position function

or  That is:
enddo subscript


enddo f1(k )  g1( j )  0

 
expression
and
f2 (k )  g2 ( j )  0
 
I  (I1,I2,,Id ) and


L  (L1, L2 , , Ld ) linear functions

and
 
b0  b1 I1  b2 I2    bd Id fm (k )  gm ( j )  0
 
U  (U1,U2 ,,Ud )


L U
 
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -13- Optimizing Compilers: Parallelization -14-

Problem Formulation - Example Problem Formulation - Example

do i = 2, 4 do i = 2, 4
S1: a(i) = b(i) + c(i) S1: a(i) = b(i) + c(i)
S2: d(i) = a(i-1) S2: d(i) = a(i+1)
end do end do

Does there exist two iteration vectors i1 and i2, such that
Does there exist two iteration vectors i1 and i2, such that


2  i1  i2  4 and such that:
2  i1  i2  4 and such that:
i1 = i2 +1?
i1 = i2 -1?
 Answer: yes; i1=3 & i2=2 and i1=4 & i2 =3. (But, but!).
 Answer: yes; i1=2 & i2=3 and i1=3 & i2 =4.
 Hence, there is dependence!
Hence, there is dependence!
The dependence distance vector is i2-i1 = -1.



 The dependence distance vector is i2-i1 = 1.  The dependence direction vector is sign(-1) = .
 The dependence direction vector is sign(1) = .  Is this possible?

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -15- Optimizing Compilers: Parallelization -16-

4
Problem Formulation - Example Problem Formulation
 Dependence testing is equivalent to an integer linear
do i = 1, 10 programming (ILP) problem of 2d variables & m+d constraint!
S1: a(2*i) = b(i) + c(i)
S2: d(i) = a(2*i+1) An algorithm that
end do
  determines if there exists two iteration
vectors k and j that satisfies these constraints is called a


dependence tester.

 Does there exist two iteration vectors i1 and i2, such that
1  i1  i2  10 and such that: do I1  L1 , U1
do I2  L2 , U2
2*i1 = 2*i2 +1? do Id  Ld , Ud 


a(f1 (I ), f2 (I ), ,fm (I ))  

Answer: no; 2*i1 is even & 2*i2+1 is odd.


  a(g1 (I ), g2 (I ), , gm (I ))
 

enddo
 Hence, there is no dependence!
enddo


enddo

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -17- Optimizing Compilers: Parallelization -18-

Problem Formulation Dependence Testers

 Dependence testing is equivalent to an integer linear  Lamport’s Test.
programming (ILP) problem of 2d variables & m+d constraint!  GCD Test.
An algorithm that Banerjee’s Inequalities.
 determines if there exists two iteration
 

vectors k and j that satisfies these constraints is called a Generalized GCD Test.


dependence tester.  Power Test.
 The dependence distance vector is given by j - k.
   I-Test.
 Omega Test.
The dependence direction vector is give by sign( j - k ). Delta Test.
 



 Dependence testing is NP-complete!  Stanford Test.

 etc…
 A dependence test that reports dependence only when there is
dependence is said to be exact. Otherwise it is in-exact.

 A dependence test must be conservative; if the existence of

dependence cannot be ascertained, dependence must be assumed.
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -19- Optimizing Compilers: Parallelization -20-

5
Lamport’s Test Lamport’s Test - Example
 Lamport’s Test is used when there is a single index variable do i = 1, n
in the subscript expressions, and when the coefficients of do j = 1, n
the index variable in both expressions are the same. S: a(i,j) = a(i-1,j+1)
end do
A(  , b * i  c1 ,  )   end do
  A(  , b * i  c2 ,  )
 i1 = i2 -1?  j1 = j2 + 1?
 The dependence problem: does there exist i1 and i2, such
that Li  i1  i2  Ui and such that b = 1; c1 = 0; c2 = -1 b = 1; c1 = 0; c2 = 1
c1  c2 c1  c2 c1  c2
b*i1 + c1 = b*i2 + c2? or i2  i1  ? 1  1
b b b
There is integer solution if and only if c1 c2 is integer.

There is dependence. There is dependence.
b


The dependence distance is d = c1 c2 if Li  |d|  Ui.

 Distance (i) is 1. Distance (j) is -1.
b


 d  0  true dependence.
d = 0  loop independent dependence.
d  0  anti dependence. S δ(1,
t
1)
S or S δ(t, ) S
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -21- Optimizing Compilers: Parallelization -22-

Lamport’s Test - Example GCD Test

do i = 1, n  Given the following equation:
do j = 1, n
S: a(i,2*j) = a(i-1,2*j+1) ∑ = where and are integers
end do
end do

 i1 = i2 -1?  2j1 = 2j2 + 1? an integer solution exists if and only if:

b = 1; c1 = 0; c2 = -1 b = 2; c1 = 0; c2 = 1 gcd , ,…, divides

c1  c2 c1  c2 1
1 
b b 2
There is dependence. There is no dependence.  Problems:
– ignores loop bounds
Distance (i) is 1.
– gives no information on distance or direction of dependence

? – often gcd(……) is 1 which always divides c, resulting in false

There is no dependence! dependences
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -23- Optimizing Compilers: Parallelization -24-

6
GCD Test - Example GCD Test Example

do i = 1, 10 do i = 1, 10
S1: a(2*i) = b(i) + c(i) S1: a(i) = b(i) + c(i)
S2: d(i) = a(2*i-1) S2: d(i) = a(i-100)
end do end do

 Does there exist two iteration vectors i1 and i2, such that  Does there exist two iteration vectors i1 and i2, such that
1  i1  i2  10 and such that: 1  i1  i2  10 and such that:

2i1 = 2i2 -1? i1 = i2 -100?

or or
2*i2 - 2*i1 = 1? i2 - i1 = 100?

 There will be an integer solution if and only if gcd(2,-2)  There will be an integer solution if and only if gcd(1,-1) divides
divides 1. 100.

 This is not the case, and hence, there is no dependence!  This is the case, and hence, there is dependence! Or is there?
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -25- Optimizing Compilers: Parallelization -26-

Dependence Testing Complications More Complications

 Unknown loop bounds:  User variables:
do i = 1, N do i = 1, 10
S1: a(i) = a(i+10) S1: a(i) = a(i+k)
end do end do

What is the relationship between N and 10? Same problem as unknown loop bounds, but occur due to
some loop transformations (e.g., loop bounds normalization).

 Triangular loops:
do i = L, H
S1: a(i) = a(i-1)
do i = 1, N end do
do j = 1, i-1
S: a(i,j) = a(j,i)
end do
end do

do i = 1, H-L
Must impose j  i as an additional constraint. S1: a(i+L) = a(i+L-1)
end do
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -27- Optimizing Compilers: Parallelization -28-

7
More Complications Loop Parallelization
 Scalars:  A dependence is said to be carried by a loop if the loop is the
outermost loop whose removal eliminates the dependence. If a
do i = 1, N do i = 1, N dependence is not carried by the loop, it is loop-independent.
S1: x = a(i)
S2: b(i) = x
 S1: x(i) = a(i)
S2: b(i) = x(i)
end do end do do i = 2, n-1
do j = 2, m-1
a(i, j) = …
j = N-1 ... = a(i, j)

do i = 1, N do i = 1, N
S1: a(i) = a(j) S1: a(i) = a(N-i)
b(i, j) = …
S2: j = j - 1
… = b(i, j-1)
end do end do

c(i, j) = …
sum = 0 do i = 1, N … = c(i-1, j)
do i = 1, N
S1: sum = sum + a(i)
 S1: sum(i) = a(i)
end do
end do
end do
end do sum += sum(i) i = 1, N

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -29- Optimizing Compilers: Parallelization -30-

Loop Parallelization Loop Parallelization

 A dependence is said to be carried by a loop if the loop is the
outermost loop whose removal eliminates the dependence. If a
dependence is not carried by the loop, it is loop-independent.

do i = 2, n-1
do j = 2, m-1
a(i, j) = … The iterations of a loop may be executed
δt, ... = a(i, j) in parallel with one another if and only if
no dependences are carried by the loop!
b(i, j) = …
δt, … = b(i, j-1)

c(i, j) = …
δt, 
… = c(i-1, j)
end do
end do
 Outermost loop with a non “=“ direction carries dependence!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -31- Optimizing Compilers: Parallelization -32-

8
Loop Parallelization - Example Loop Parallelization - Example
fork fork
i=2 i=n-1 j=2 j=m-1

do i = 2, n-1 i=3 i=n-2 do i = 2, n-1 j=3 j=m-2

do j = 2, m-1 do j = 2, m-1
δt, b(i, j) = … b(i, j) = …
δt,  i=i+1
… = b(i, j-1) … = b(i-1, j)
end do end do
end do end do

join join

 Iterations of loop j must be executed sequentially, but the  Iterations of loop i must be executed sequentially, but the
iterations of loop i may be executed in parallel. iterations of loop j may be executed in parallel.

 Outer loop parallelism.  Inner loop parallelism.

Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -33- Optimizing Compilers: Parallelization -34-

Loop Parallelization - Example Loop Interchange

fork Loop interchange changes the order of the loops to improve the
j=2 j=m-1 spatial locality of a program.
do i = 2, n-1 j=3 j=m-2
do j = 2, m-1
b(i, j) = … do j = 1, n
δt, i=i+1
… = b(i-1, j-1) do i = 1, n
end do ... a(i,j) ...
end do end do
end do
join
j
P

 Iterations of loop i must be executed sequentially, but the C

iterations of loop j may be executed in parallel. Why? i
M
 Inner loop parallelism.
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -35- Optimizing Compilers: Parallelization -36-

9
Loop Interchange Loop Interchange

Loop interchange changes the order of the loops to improve the  Loop interchange can improve the granularity of parallelism!
spatial locality of a program.

do j = 1, n do i = 1, n do i = 1, n do j = 1, n
do i = 1, n do j = 1, n do j = 1, n do i = 1, n
... a(i,j) ... … a(i,j) ... a(i,j) = b(i,j) a(i,j) = b(i,j)
end do end do c(i,j) = a(i-1,j) c(i,j) = a(i-1,j)
end do end do end do end do
end do end do
i
P

C δt,  δt,
j
M

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -37- Optimizing Compilers: Parallelization -38-

Loop Interchange Loop Interchange

j j

δt, δt, δt, δt,

i i
do i = 1,n δt, do j = 1,n do i = 1,n do j = 1,n
do j = 1,n do i = 1,n do j = 1,n do i = 1,n
δt,  δt, δt,  δt,
… a(i,j) … … a(i,j) … … a(i,j) … … a(i,j) …
end do end do end do end do
end do δt, δt,  end do end do δt,  end do

δt, δt,

 When is loop interchange legal?  When is loop interchange legal?

Carnegie Mellon Carnegie Mellon

Optimizing Compilers: Parallelization -39- Optimizing Compilers: Parallelization -40-

10
Loop Interchange Loop Interchange

j j

δt, δt, δt, δt,

i i
do i = 1,n do j = 1,n do i = 1,n do j = 1,n
do j = 1,n do i = 1,n do j = 1,n do i = 1,n
δt,  δt, δt,  δt,
… a(i,j) … … a(i,j) … … a(i,j) … … a(i,j) …
end do end do end do end do
end do δt,  end do end do δt,  end do

 When is loop interchange legal?  When is loop interchange legal? when the “interchanged”
dependences remain lexiographically positive!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -41- Optimizing Compilers: Parallelization -42-

Loop Blocking (Tiling) Wednesday’s Class

 Global Scheduling, Software Pipelining [ALSU 10.4 – 10.5]

do ic = 1, n, B
do t = 1,T do jc = 1, n , B
do t = 1,T do ic = 1, n, B do t = 1,T
do i = 1,n do i = 1,B do i = 1,B
do j = 1,n do jc = 1, n, B do j = 1,B
… a(i,j) … do j = 1,B … a(ic+i-1,jc+j-1) …
end do … a(ic+i-1,jc+j-1) … end do
end do end do end do
end do end do end do
end do end do
end do

 When is loop blocking legal?

Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -43- 15-745: Parallelization

Data Dependence & Parallelization
No ratings yet
Data Dependence & Parallelization
13 pages
14-Parallelization and Automatic Parallelization-08!11!2024
No ratings yet
14-Parallelization and Automatic Parallelization-08!11!2024
50 pages
L14 - Parallelization
No ratings yet
L14 - Parallelization
17 pages
PDC Lecture 04
No ratings yet
PDC Lecture 04
44 pages
CS-3006 9 DependenceAnalysis
No ratings yet
CS-3006 9 DependenceAnalysis
67 pages
Capp 1
No ratings yet
Capp 1
38 pages
Dependencies, Instruction Scheduling, Optimization, and Parallelism
No ratings yet
Dependencies, Instruction Scheduling, Optimization, and Parallelism
49 pages
Data Dependences: CS 524 - High-Performance Computing
No ratings yet
Data Dependences: CS 524 - High-Performance Computing
20 pages
F10 - Parallelizing Compilers
No ratings yet
F10 - Parallelizing Compilers
77 pages
Advanced Loop Parallelism Techniques
No ratings yet
Advanced Loop Parallelism Techniques
35 pages
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
No ratings yet
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
47 pages
Dependence Alanysis and Loop Normalization
No ratings yet
Dependence Alanysis and Loop Normalization
23 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
c3 Dependence Analysis p1
No ratings yet
c3 Dependence Analysis p1
32 pages
6.189 Lecture11 Compilers
No ratings yet
6.189 Lecture11 Compilers
68 pages
Dependence Analysis 1996
No ratings yet
Dependence Analysis 1996
226 pages
Dependency Analysis of For-Loop Structures For Automatic Parallelization of C Code
No ratings yet
Dependency Analysis of For-Loop Structures For Automatic Parallelization of C Code
13 pages
c3 Dependence Analysis p3
No ratings yet
c3 Dependence Analysis p3
20 pages
Parallelism in Algorithms - Bernstein's Conditions
100% (2)
Parallelism in Algorithms - Bernstein's Conditions
2 pages
ACA UNIT-1 B Kai Hwang
No ratings yet
ACA UNIT-1 B Kai Hwang
23 pages
c3 Dependence Analysis p2
No ratings yet
c3 Dependence Analysis p2
22 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
ACA Unit 3
No ratings yet
ACA Unit 3
50 pages
Hardware vs. Software Parallelism
50% (2)
Hardware vs. Software Parallelism
55 pages
Unit 6
No ratings yet
Unit 6
22 pages
Lecture 5
No ratings yet
Lecture 5
80 pages
Layers of Implementing An Application in Software or Hardware Using Parallel Computers
No ratings yet
Layers of Implementing An Application in Software or Hardware Using Parallel Computers
46 pages
Parallel Programming 1
No ratings yet
Parallel Programming 1
32 pages
Automatic Parallelization - 2: Y.N. Srikant
No ratings yet
Automatic Parallelization - 2: Y.N. Srikant
30 pages
Sols Book PDF
100% (1)
Sols Book PDF
120 pages
What Do Compilers Do?: Carnegie Mellon Carnegie Mellon
No ratings yet
What Do Compilers Do?: Carnegie Mellon Carnegie Mellon
6 pages
Understanding Parallelism in Computing
No ratings yet
Understanding Parallelism in Computing
29 pages
Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana
No ratings yet
Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana
18 pages
Optimizing Instruction-Level Parallelism
No ratings yet
Optimizing Instruction-Level Parallelism
18 pages
HW 2 Is Out! Due 9/25!
No ratings yet
HW 2 Is Out! Due 9/25!
21 pages
DXC-Technical - 2
No ratings yet
DXC-Technical - 2
64 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Data Dependences and Hazards
No ratings yet
Data Dependences and Hazards
24 pages
43-Instruction Scheduling and Software Pipelining-19!11!2024
No ratings yet
43-Instruction Scheduling and Software Pipelining-19!11!2024
25 pages
Parallel Computing Dependencies
No ratings yet
Parallel Computing Dependencies
27 pages
Understanding Parallel DOALL Loops
No ratings yet
Understanding Parallel DOALL Loops
7 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
L1 Intro 1up
No ratings yet
L1 Intro 1up
29 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
Unit 4.2
No ratings yet
Unit 4.2
44 pages
15CS72 ACA Module1 Chapter2Final
No ratings yet
15CS72 ACA Module1 Chapter2Final
28 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
6 Control
No ratings yet
6 Control
25 pages
Compiler Design
No ratings yet
Compiler Design
9 pages
Module 1 Chapter2
No ratings yet
Module 1 Chapter2
98 pages
Parallel Computing Architectures
No ratings yet
Parallel Computing Architectures
57 pages
Parallelism I: Inside The Core
No ratings yet
Parallelism I: Inside The Core
61 pages
6-Codegen Opti PDF
No ratings yet
6-Codegen Opti PDF
47 pages
LC 1
No ratings yet
LC 1
337 pages
978 3 031 01725 4
No ratings yet
978 3 031 01725 4
137 pages
First Time Chip Bringup Success
No ratings yet
First Time Chip Bringup Success
22 pages
04 Coherence
No ratings yet
04 Coherence
74 pages
Ait401 DL Syllubus
100% (1)
Ait401 DL Syllubus
13 pages
Computer Science 2026 2028 Syllabus
No ratings yet
Computer Science 2026 2028 Syllabus
56 pages
Libbfd
No ratings yet
Libbfd
256 pages
McCarthy, John - 'Recursive Functions of Symbolic Expressions and Their Computation by Machine - Part I'
No ratings yet
McCarthy, John - 'Recursive Functions of Symbolic Expressions and Their Computation by Machine - Part I'
12 pages
Heap Sort PDF
No ratings yet
Heap Sort PDF
48 pages
NWU CSE Spring 2024 Class Schedule
No ratings yet
NWU CSE Spring 2024 Class Schedule
3 pages
Object Oriented Programming: Exception Handling in Java
No ratings yet
Object Oriented Programming: Exception Handling in Java
21 pages
Job Sequencing M Machines N Jobs
100% (1)
Job Sequencing M Machines N Jobs
4 pages
Sistemas de Ecuaciones Lineales: Método de Jacobi
No ratings yet
Sistemas de Ecuaciones Lineales: Método de Jacobi
45 pages
?? - DAA Solutions
No ratings yet
?? - DAA Solutions
43 pages
Number Systems for IT Students
100% (1)
Number Systems for IT Students
45 pages
Cracking The Javascript Interview11
No ratings yet
Cracking The Javascript Interview11
6 pages
Perceptron Training Guide
No ratings yet
Perceptron Training Guide
14 pages
ITC C106 Lecture - Java Pre Defined Methods
No ratings yet
ITC C106 Lecture - Java Pre Defined Methods
21 pages
Chapter 23 - Product Metrics
No ratings yet
Chapter 23 - Product Metrics
23 pages
A Malware Classification Method Based On Three-Channel Visualization and Deep Learning
No ratings yet
A Malware Classification Method Based On Three-Channel Visualization and Deep Learning
15 pages
U10 Alternative Exam Solutions
No ratings yet
U10 Alternative Exam Solutions
13 pages
Algorithm Performance Analysis
No ratings yet
Algorithm Performance Analysis
53 pages
DL & CO - Unit 2 Material - Digital Logic Circuits-II)
No ratings yet
DL & CO - Unit 2 Material - Digital Logic Circuits-II)
21 pages
Lý Thuyết Pro192
No ratings yet
Lý Thuyết Pro192
3 pages
Java Lab Manual
No ratings yet
Java Lab Manual
13 pages
Programming in C Language Assignment1
No ratings yet
Programming in C Language Assignment1
16 pages
Gunashekaran Manikandan AssessmentCenterReport 163
No ratings yet
Gunashekaran Manikandan AssessmentCenterReport 163
19 pages
More On Divide and Conquer
No ratings yet
More On Divide and Conquer
31 pages
Dsa-Ia2 Question Bank
No ratings yet
Dsa-Ia2 Question Bank
2 pages
Dijkstra Algorithm in Java
No ratings yet
Dijkstra Algorithm in Java
6 pages
Extensions To The Basic Turing Machine
No ratings yet
Extensions To The Basic Turing Machine
4 pages
Student'S Solutions Manual: Introduction To Linear Programming
No ratings yet
Student'S Solutions Manual: Introduction To Linear Programming
76 pages
Loaders Poster
No ratings yet
Loaders Poster
1 page
Non-Blocking Synchronization and System Design - CS-TR-99-1624
No ratings yet
Non-Blocking Synchronization and System Design - CS-TR-99-1624
261 pages

L19 Parallelization

Uploaded by

L19 Parallelization

Uploaded by

Data Dependence

We define four types of data dependence.

 Flow (true) dependence: a statement Si precedes a

 Implies that Si must execute before Sj.

Data Dependence Data Dependence

Carnegie Mellon Carnegie Mellon

S4 : A  B/C The only “true” dependence is flow dependence; it

Carnegie Mellon Carnegie Mellon

Data Dependence (continued) Value or Location?

Carnegie Mellon Carnegie Mellon

i=2 i=3 i=4 i=2 i=3 i=4

 There is an instance of S1 that precedes an instance of S2 in dt dt

i=2 i=3 i=4 do i = 2, 4 a(1,3) a(1,4) a(1,5)

Carnegie Mellon Carnegie Mellon

do I1  L1 , U1 array reference f1(k )  g1( j )

a(f1 (I ), f2 (I ), ,fm (I ))   and

enddo position function

enddo f1(k )  g1( j )  0

L  (L1, L2 , , Ld ) linear functions

Problem Formulation - Example Problem Formulation - Example

Carnegie Mellon Carnegie Mellon

a(f1 (I ), f2 (I ), ,fm (I ))  

Carnegie Mellon Carnegie Mellon

Problem Formulation Dependence Testers

 Dependence testing is NP-complete!  Stanford Test.

 A dependence test must be conservative; if the existence of

The dependence distance is d = c1 c2 if Li  |d|  Ui.

Lamport’s Test - Example GCD Test

 i1 = i2 -1?  2*j1 = 2*j2 + 1? an integer solution exists if and only if:

b = 1; c1 = 0; c2 = -1 b = 2; c1 = 0; c2 = 1 gcd , ,…, divides

? – often gcd(……) is 1 which always divides c, resulting in false

2*i1 = 2*i2 -1? i1 = i2 -100?

Dependence Testing Complications More Complications

Carnegie Mellon Carnegie Mellon

Loop Parallelization Loop Parallelization

do i = 2, n-1 i=3 i=n-2 do i = 2, n-1 j=3 j=m-2

 Outer loop parallelism.  Inner loop parallelism.

Loop Parallelization - Example Loop Interchange

 Iterations of loop i must be executed sequentially, but the C

Carnegie Mellon Carnegie Mellon

Loop Interchange Loop Interchange

δt, δt, δt, δt,

 When is loop interchange legal?  When is loop interchange legal?

Carnegie Mellon Carnegie Mellon

δt, δt, δt, δt,

Loop Blocking (Tiling) Wednesday’s Class

 Global Scheduling, Software Pipelining [ALSU 10.4 – 10.5]

 When is loop blocking legal?

You might also like

 i1 = i2 -1?  2j1 = 2j2 + 1? an integer solution exists if and only if:

2i1 = 2i2 -1? i1 = i2 -100?