KEMBAR78
L19 Parallelization | PDF | Algorithms And Data Structures | Algorithms
0% found this document useful (0 votes)
9 views11 pages

L19 Parallelization

The document discusses data dependence in parallel programming, defining four types: flow, anti, output, and input dependence. It explains the implications of these dependencies on the execution order of statements and introduces the concept of dependence graphs. Additionally, it presents examples and problem formulations to illustrate how dependence testing can be framed as an integer linear programming problem.

Uploaded by

Zhiyuan Lei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views11 pages

L19 Parallelization

The document discusses data dependence in parallel programming, defining four types: flow, anti, output, and input dependence. It explains the implications of these dependencies on the execution order of statements and introduces the concept of dependence graphs. Additionally, it presents examples and problem formulations to illustrate how dependence testing can be framed as an integer linear programming problem.

Uploaded by

Zhiyuan Lei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Dependence

Lecture 19
Array Dependence Analysis &
S1 : A  1.0
S2 : B  A  2.0
S3 : A  C D

Parallelization S4 : A  B/C

We define four types of data dependence.

 Flow (true) dependence: a statement Si precedes a


statement Sj in execution and Si computes a data value that
Sj uses.

 Implies that Si must execute before Sj.


Si δt Sj (S1 δt S2 and S2 δt S4 )
[ALSU 11.6]
Carnegie Mellon Carnegie Mellon
Phillip B. Gibbons 15-745: Parallelization Optimizing Compilers: Parallelization -2-

Data Dependence Data Dependence

S1 : A  1.0 S1 : A  1.0
S2 : B  A  2.0 S2 : B  A  2.0
S3 : A  C D S3 : A  C D

S4 : A  B/C S4 : A  B/C
 

We define four types of data dependence. We define four types of data dependence.

 Anti dependence: a statement Si precedes a statement Sj in  Output dependence: a statement Si precedes a statement Sj
execution and Si uses a data value that Sj computes. in execution and Si computes a data value that Sj also
computes.
 It implies that Si must be executed before Sj.
 It implies that Si must be executed before Sj.
Si δ Sj
a
(S2 δ S3 )
a
Si δo Sj (S1 δo S3 and S3 δo S4 )

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -3- Optimizing Compilers: Parallelization -4-

1
Data Dependence Data Dependence (continued)
 The dependence is said to flow from Si to Sj because Si
precedes Sj in execution.
S1 : A  1.0
S2 : B  A  2.0  Si is said to be the source of the dependence. Sj is said to
S3 : A  C D be the sink of the dependence.

S4 : A  B/C The only “true” dependence is flow dependence; it




represents the flow of data in the program.
The other types of dependence are caused by programming
We define four types of data dependence.

style; they may be eliminated by re-naming.
 Input dependence: a statement Si precedes a statement Sj
in execution and Si uses a data value that Sj also uses. S1 : A  1.0
S2 : B  A  2.0
 Does this imply that Si must execute before Sj? S3 : A1  C  D

S4 : A2  B/C

Si δI Sj (S3 δI S4 )

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -5- Optimizing Compilers: Parallelization -6-

Data Dependence (continued) Value or Location?


 Data dependence in a program may be represented using a  There are two ways a dependence is defined: value-oriented
dependence graph G=(V,E), where the nodes V represent or location-oriented.
statements in the program and the directed edges E
represent dependence relations.

S1 : A  1.0
S1 S2 : B  A  2.0
S1 : A  1.0 dt
S3 : A  C D
S2 : B  A  2.0 S2 do
S3 : A  C D da

S4 : A  B/C
 dt S3
S4 : A  B/C do dI

S4

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -7- Optimizing Compilers: Parallelization -8-

2
Example 1 Example 2

i=2 i=3 i=4 i=2 i=3 i=4


for i = 2 to 4 { S1[2] S2[2] S1[3] S2[3] S1[4] S2[4] do i = 2, 4 S1[2] S2[2] S1[3] S2[3] S1[4] S2[4]
S1: a[i] = b[i] + c[i] ; S1: a(i) = b(i) + c(i)
S2: d[i] = a[i] S2: d(i) = a(i-1)
} dt dt dt end do
a[2] a[2] a[3] a[3] a[4] a[4] a(2) a(1) a(3) a(2) a(4) a(3)

 There is an instance of S1 that precedes an instance of S2 in dt dt


execution and S1 produces data that S2 consumes.
 There is an instance of S1 that precedes an instance of S2 in
 S1 is the source of the dependence; S2 is the sink of the execution and S1 produces data that S2 consumes.
dependence.
 S1 is the source of the dependence; S2 is the sink of the
 The dependence flows between instances of statements in the dependence.
same iteration (loop-independent dependence).
 The dependence flows between instances of statements in
 The number of iterations between source and sink (dependence different iterations (loop-carried dependence).
distance) is 0. The dependence direction is =.
 The dependence distance is 1. The direction is positive (<).
S1 δt S2 or S1 δ0t S2
S1 δt S2 or S1 δ1t S2
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -9- Optimizing Compilers: Parallelization -10-

Example 3 Example 4

i=2 i=3 i=4 do i = 2, 4 a(1,3) a(1,4) a(1,5)


do j = 2, 4 S[2,2] S[2,3] S[2,4]
do i = 2, 4 S1[2] S2[2] S1[3] S2[3] S1[4] S2[4]
S: a(i,j) = a(i-1,j+1)
S1: a(i) = b(i) + c(i) end do
S2: d(i) = a(i+1) end do a(2,2) dt a(2,3) dt a(2,4)
da da
end do
a(2) a(3) a(3) a(4) a(4) a(5)  An instance of S precedes
another instance of S and a(2,3) a(2,4) a(2,5)
 There is an instance of S2 that precedes an instance of S1 in S produces data that S S[3,2] S[3,3] S[3,4]
execution and S2 consumes data that S1 produces. consumes.
 S2 is the source of the dependence; S1 is the sink of the S is both source and sink.
a(3,2) dt a(3,3) dt a(3,4)

dependence.
 The dependence is loop-
 The dependence is loop-carried. carried.
a(3,3) a(3,4) a(3,5)
 The dependence distance is 1.  The dependence distance S[4,2] S[4,3] S[4,4]
is (1,-1).
S2 δ S1
a
 or S2 δ S1
1
a

Are you sure you know why it is S2 d<a S1 even though S1 appears a(4,2) a(4,3) a(4,4)
S δ(t, ) S S δ(1, S

before S2 in the code? or
t
1)

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -11- Optimizing Compilers: Parallelization -12-

3
Problem Formulation Problem Formulation
Consider the following perfect nest of depth d: Dependence willexist
 if there exists two iteration vectors k

 
and j such that L  k  j  U and:
 

do I1  L1 , U1 array reference f1(k )  g1( j )


 
do I2  L2 , U2 and
a( , fk (I ) ,  , ) f2 (k )  g2 ( j )
 
and

do Id  Ld , Ud 

a(f1 (I ), f2 (I ), ,fm (I ))   and


 
fm (k)  gm ( j )
 
  a(g1 (I ), g2 (I ), , gm (I )) subscript subscript

enddo position function


or  That is:
enddo subscript

enddo f1(k )  g1( j )  0


 
expression
and
f2 (k )  g2 ( j )  0
 
I  (I1,I2,,Id ) and

L  (L1, L2 , , Ld ) linear functions


and
 
b0  b1 I1  b2 I2    bd Id fm (k )  gm ( j )  0
 
U  (U1,U2 ,,Ud )

L U
 
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -13- Optimizing Compilers: Parallelization -14-

Problem Formulation - Example Problem Formulation - Example


do i = 2, 4 do i = 2, 4
S1: a(i) = b(i) + c(i) S1: a(i) = b(i) + c(i)
S2: d(i) = a(i-1) S2: d(i) = a(i+1)
end do end do

Does there exist two iteration vectors i1 and i2, such that
Does there exist two iteration vectors i1 and i2, such that


2  i1  i2  4 and such that:
2  i1  i2  4 and such that:
i1 = i2 +1?
i1 = i2 -1?
 Answer: yes; i1=3 & i2=2 and i1=4 & i2 =3. (But, but!).
 Answer: yes; i1=2 & i2=3 and i1=3 & i2 =4.
 Hence, there is dependence!
Hence, there is dependence!
The dependence distance vector is i2-i1 = -1.

 The dependence distance vector is i2-i1 = 1.  The dependence direction vector is sign(-1) = .
 The dependence direction vector is sign(1) = .  Is this possible?

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -15- Optimizing Compilers: Parallelization -16-

4
Problem Formulation - Example Problem Formulation
 Dependence testing is equivalent to an integer linear
do i = 1, 10 programming (ILP) problem of 2d variables & m+d constraint!
S1: a(2*i) = b(i) + c(i)
S2: d(i) = a(2*i+1) An algorithm that
end do
  determines if there exists two iteration
vectors k and j that satisfies these constraints is called a

dependence tester.

 Does there exist two iteration vectors i1 and i2, such that
1  i1  i2  10 and such that: do I1  L1 , U1
do I2  L2 , U2
2*i1 = 2*i2 +1? do Id  Ld , Ud 

a(f1 (I ), f2 (I ), ,fm (I ))  


Answer: no; 2*i1 is even & 2*i2+1 is odd.

  a(g1 (I ), g2 (I ), , gm (I ))
 

enddo
 Hence, there is no dependence!
enddo

enddo

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -17- Optimizing Compilers: Parallelization -18-

Problem Formulation Dependence Testers


 Dependence testing is equivalent to an integer linear  Lamport’s Test.
programming (ILP) problem of 2d variables & m+d constraint!  GCD Test.
An algorithm that Banerjee’s Inequalities.
 determines if there exists two iteration
 

vectors k and j that satisfies these constraints is called a Generalized GCD Test.


dependence tester.  Power Test.
 The dependence distance vector is given by j - k.
   I-Test.
 Omega Test.
The dependence direction vector is give by sign( j - k ). Delta Test.
 

 Dependence testing is NP-complete!  Stanford Test.


 etc…
 A dependence test that reports dependence only when there is
dependence is said to be exact. Otherwise it is in-exact.

 A dependence test must be conservative; if the existence of


dependence cannot be ascertained, dependence must be assumed.
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -19- Optimizing Compilers: Parallelization -20-

5
Lamport’s Test Lamport’s Test - Example
 Lamport’s Test is used when there is a single index variable do i = 1, n
in the subscript expressions, and when the coefficients of do j = 1, n
the index variable in both expressions are the same. S: a(i,j) = a(i-1,j+1)
end do
A(  , b * i  c1 ,  )   end do
  A(  , b * i  c2 ,  )
 i1 = i2 -1?  j1 = j2 + 1?
 The dependence problem: does there exist i1 and i2, such
that Li  i1  i2  Ui and such that b = 1; c1 = 0; c2 = -1 b = 1; c1 = 0; c2 = 1
c1  c2 c1  c2 c1  c2
b*i1 + c1 = b*i2 + c2? or i2  i1  ? 1  1
b b b
There is integer solution if and only if c1 c2 is integer.

There is dependence. There is dependence.
b

The dependence distance is d = c1 c2 if Li  |d|  Ui.


 Distance (i) is 1. Distance (j) is -1.
b

 d  0  true dependence.
d = 0  loop independent dependence.
d  0  anti dependence. S δ(1,
t
1)
S or S δ(t, ) S
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -21- Optimizing Compilers: Parallelization -22-

Lamport’s Test - Example GCD Test


do i = 1, n  Given the following equation:
do j = 1, n
S: a(i,2*j) = a(i-1,2*j+1) ∑ = where and are integers
end do
end do

 i1 = i2 -1?  2*j1 = 2*j2 + 1? an integer solution exists if and only if:

b = 1; c1 = 0; c2 = -1 b = 2; c1 = 0; c2 = 1 gcd , ,…, divides

c1  c2 c1  c2 1
1 
b b 2
There is dependence. There is no dependence.  Problems:
– ignores loop bounds
Distance (i) is 1.
– gives no information on distance or direction of dependence

? – often gcd(……) is 1 which always divides c, resulting in false


There is no dependence! dependences
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -23- Optimizing Compilers: Parallelization -24-

6
GCD Test - Example GCD Test Example

do i = 1, 10 do i = 1, 10
S1: a(2*i) = b(i) + c(i) S1: a(i) = b(i) + c(i)
S2: d(i) = a(2*i-1) S2: d(i) = a(i-100)
end do end do

 Does there exist two iteration vectors i1 and i2, such that  Does there exist two iteration vectors i1 and i2, such that
1  i1  i2  10 and such that: 1  i1  i2  10 and such that:

2*i1 = 2*i2 -1? i1 = i2 -100?


or or
2*i2 - 2*i1 = 1? i2 - i1 = 100?

 There will be an integer solution if and only if gcd(2,-2)  There will be an integer solution if and only if gcd(1,-1) divides
divides 1. 100.

 This is not the case, and hence, there is no dependence!  This is the case, and hence, there is dependence! Or is there?
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -25- Optimizing Compilers: Parallelization -26-

Dependence Testing Complications More Complications


 Unknown loop bounds:  User variables:
do i = 1, N do i = 1, 10
S1: a(i) = a(i+10) S1: a(i) = a(i+k)
end do end do

What is the relationship between N and 10? Same problem as unknown loop bounds, but occur due to
some loop transformations (e.g., loop bounds normalization).

 Triangular loops:
do i = L, H
S1: a(i) = a(i-1)
do i = 1, N end do
do j = 1, i-1
S: a(i,j) = a(j,i)
end do
end do

do i = 1, H-L
Must impose j  i as an additional constraint. S1: a(i+L) = a(i+L-1)
end do
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -27- Optimizing Compilers: Parallelization -28-

7
More Complications Loop Parallelization
 Scalars:  A dependence is said to be carried by a loop if the loop is the
outermost loop whose removal eliminates the dependence. If a
do i = 1, N do i = 1, N dependence is not carried by the loop, it is loop-independent.
S1: x = a(i)
S2: b(i) = x
 S1: x(i) = a(i)
S2: b(i) = x(i)
end do end do do i = 2, n-1
do j = 2, m-1
a(i, j) = …
j = N-1 ... = a(i, j)

do i = 1, N do i = 1, N
S1: a(i) = a(j) S1: a(i) = a(N-i)
b(i, j) = …
S2: j = j - 1
… = b(i, j-1)
end do end do

c(i, j) = …
sum = 0 do i = 1, N … = c(i-1, j)
do i = 1, N
S1: sum = sum + a(i)
 S1: sum(i) = a(i)
end do
end do
end do
end do sum += sum(i) i = 1, N

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -29- Optimizing Compilers: Parallelization -30-

Loop Parallelization Loop Parallelization


 A dependence is said to be carried by a loop if the loop is the
outermost loop whose removal eliminates the dependence. If a
dependence is not carried by the loop, it is loop-independent.

do i = 2, n-1
do j = 2, m-1
a(i, j) = … The iterations of a loop may be executed
δt, ... = a(i, j) in parallel with one another if and only if
no dependences are carried by the loop!
b(i, j) = …
δt, … = b(i, j-1)

c(i, j) = …
δt, 
… = c(i-1, j)
end do
end do
 Outermost loop with a non “=“ direction carries dependence!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -31- Optimizing Compilers: Parallelization -32-

8
Loop Parallelization - Example Loop Parallelization - Example
fork fork
i=2 i=n-1 j=2 j=m-1

do i = 2, n-1 i=3 i=n-2 do i = 2, n-1 j=3 j=m-2


do j = 2, m-1 do j = 2, m-1
δt, b(i, j) = … b(i, j) = …
δt,  i=i+1
… = b(i, j-1) … = b(i-1, j)
end do end do
end do end do

join join

 Iterations of loop j must be executed sequentially, but the  Iterations of loop i must be executed sequentially, but the
iterations of loop i may be executed in parallel. iterations of loop j may be executed in parallel.

 Outer loop parallelism.  Inner loop parallelism.


Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -33- Optimizing Compilers: Parallelization -34-

Loop Parallelization - Example Loop Interchange


fork Loop interchange changes the order of the loops to improve the
j=2 j=m-1 spatial locality of a program.
do i = 2, n-1 j=3 j=m-2
do j = 2, m-1
b(i, j) = … do j = 1, n
δt, i=i+1
… = b(i-1, j-1) do i = 1, n
end do ... a(i,j) ...
end do end do
end do
join
j
P

 Iterations of loop i must be executed sequentially, but the C


iterations of loop j may be executed in parallel. Why? i
M
 Inner loop parallelism.
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -35- Optimizing Compilers: Parallelization -36-

9
Loop Interchange Loop Interchange

Loop interchange changes the order of the loops to improve the  Loop interchange can improve the granularity of parallelism!
spatial locality of a program.

do j = 1, n do i = 1, n do i = 1, n do j = 1, n
do i = 1, n do j = 1, n do j = 1, n do i = 1, n
... a(i,j) ... … a(i,j) ... a(i,j) = b(i,j) a(i,j) = b(i,j)
end do end do c(i,j) = a(i-1,j) c(i,j) = a(i-1,j)
end do end do end do end do
end do end do
i
P

C δt,  δt,
j
M

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -37- Optimizing Compilers: Parallelization -38-

Loop Interchange Loop Interchange

j j

δt, δt, δt, δt,


i i
do i = 1,n δt, do j = 1,n do i = 1,n do j = 1,n
do j = 1,n do i = 1,n do j = 1,n do i = 1,n
δt,  δt, δt,  δt,
… a(i,j) … … a(i,j) … … a(i,j) … … a(i,j) …
end do end do end do end do
end do δt, δt,  end do end do δt,  end do

δt, δt,

 When is loop interchange legal?  When is loop interchange legal?

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -39- Optimizing Compilers: Parallelization -40-

10
Loop Interchange Loop Interchange

j j

δt, δt, δt, δt,


i i
do i = 1,n do j = 1,n do i = 1,n do j = 1,n
do j = 1,n do i = 1,n do j = 1,n do i = 1,n
δt,  δt, δt,  δt,
… a(i,j) … … a(i,j) … … a(i,j) … … a(i,j) …
end do end do end do end do
end do δt,  end do end do δt,  end do

 When is loop interchange legal?  When is loop interchange legal? when the “interchanged”
dependences remain lexiographically positive!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -41- Optimizing Compilers: Parallelization -42-

Loop Blocking (Tiling) Wednesday’s Class

 Global Scheduling, Software Pipelining [ALSU 10.4 – 10.5]


do ic = 1, n, B
do t = 1,T do jc = 1, n , B
do t = 1,T do ic = 1, n, B do t = 1,T
do i = 1,n do i = 1,B do i = 1,B
do j = 1,n do jc = 1, n, B do j = 1,B
… a(i,j) … do j = 1,B … a(ic+i-1,jc+j-1) …
end do … a(ic+i-1,jc+j-1) … end do
end do end do end do
end do end do end do
end do end do
end do

 When is loop blocking legal?


Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -43- 15-745: Parallelization

11

You might also like