L19 Parallelization
L19 Parallelization
Lecture 19
Array Dependence Analysis &
S1 : A 1.0
S2 : B A 2.0
S3 : A C D
Parallelization S4 : A B/C
S1 : A 1.0 S1 : A 1.0
S2 : B A 2.0 S2 : B A 2.0
S3 : A C D S3 : A C D
S4 : A B/C S4 : A B/C
We define four types of data dependence. We define four types of data dependence.
Anti dependence: a statement Si precedes a statement Sj in Output dependence: a statement Si precedes a statement Sj
execution and Si uses a data value that Sj computes. in execution and Si computes a data value that Sj also
computes.
It implies that Si must be executed before Sj.
It implies that Si must be executed before Sj.
Si δ Sj
a
(S2 δ S3 )
a
Si δo Sj (S1 δo S3 and S3 δo S4 )
1
Data Dependence Data Dependence (continued)
The dependence is said to flow from Si to Sj because Si
precedes Sj in execution.
S1 : A 1.0
S2 : B A 2.0 Si is said to be the source of the dependence. Sj is said to
S3 : A C D be the sink of the dependence.
S4 : A2 B/C
Si δI Sj (S3 δI S4 )
S1 : A 1.0
S1 S2 : B A 2.0
S1 : A 1.0 dt
S3 : A C D
S2 : B A 2.0 S2 do
S3 : A C D da
S4 : A B/C
dt S3
S4 : A B/C do dI
S4
2
Example 1 Example 2
Example 3 Example 4
Are you sure you know why it is S2 d<a S1 even though S1 appears a(4,2) a(4,3) a(4,4)
S δ(t, ) S S δ(1, S
before S2 in the code? or
t
1)
3
Problem Formulation Problem Formulation
Consider the following perfect nest of depth d: Dependence willexist
if there exists two iteration vectors k
and j such that L k j U and:
L U
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -13- Optimizing Compilers: Parallelization -14-
Does there exist two iteration vectors i1 and i2, such that
Does there exist two iteration vectors i1 and i2, such that
2 i1 i2 4 and such that:
2 i1 i2 4 and such that:
i1 = i2 +1?
i1 = i2 -1?
Answer: yes; i1=3 & i2=2 and i1=4 & i2 =3. (But, but!).
Answer: yes; i1=2 & i2=3 and i1=3 & i2 =4.
Hence, there is dependence!
Hence, there is dependence!
The dependence distance vector is i2-i1 = -1.
The dependence distance vector is i2-i1 = 1. The dependence direction vector is sign(-1) = .
The dependence direction vector is sign(1) = . Is this possible?
4
Problem Formulation - Example Problem Formulation
Dependence testing is equivalent to an integer linear
do i = 1, 10 programming (ILP) problem of 2d variables & m+d constraint!
S1: a(2*i) = b(i) + c(i)
S2: d(i) = a(2*i+1) An algorithm that
end do
determines if there exists two iteration
vectors k and j that satisfies these constraints is called a
dependence tester.
Does there exist two iteration vectors i1 and i2, such that
1 i1 i2 10 and such that: do I1 L1 , U1
do I2 L2 , U2
2*i1 = 2*i2 +1? do Id Ld , Ud
a(g1 (I ), g2 (I ), , gm (I ))
enddo
Hence, there is no dependence!
enddo
enddo
vectors k and j that satisfies these constraints is called a Generalized GCD Test.
dependence tester. Power Test.
The dependence distance vector is given by j - k.
I-Test.
Omega Test.
The dependence direction vector is give by sign( j - k ). Delta Test.
5
Lamport’s Test Lamport’s Test - Example
Lamport’s Test is used when there is a single index variable do i = 1, n
in the subscript expressions, and when the coefficients of do j = 1, n
the index variable in both expressions are the same. S: a(i,j) = a(i-1,j+1)
end do
A( , b * i c1 , ) end do
A( , b * i c2 , )
i1 = i2 -1? j1 = j2 + 1?
The dependence problem: does there exist i1 and i2, such
that Li i1 i2 Ui and such that b = 1; c1 = 0; c2 = -1 b = 1; c1 = 0; c2 = 1
c1 c2 c1 c2 c1 c2
b*i1 + c1 = b*i2 + c2? or i2 i1 ? 1 1
b b b
There is integer solution if and only if c1 c2 is integer.
There is dependence. There is dependence.
b
d 0 true dependence.
d = 0 loop independent dependence.
d 0 anti dependence. S δ(1,
t
1)
S or S δ(t, ) S
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -21- Optimizing Compilers: Parallelization -22-
c1 c2 c1 c2 1
1
b b 2
There is dependence. There is no dependence. Problems:
– ignores loop bounds
Distance (i) is 1.
– gives no information on distance or direction of dependence
6
GCD Test - Example GCD Test Example
do i = 1, 10 do i = 1, 10
S1: a(2*i) = b(i) + c(i) S1: a(i) = b(i) + c(i)
S2: d(i) = a(2*i-1) S2: d(i) = a(i-100)
end do end do
Does there exist two iteration vectors i1 and i2, such that Does there exist two iteration vectors i1 and i2, such that
1 i1 i2 10 and such that: 1 i1 i2 10 and such that:
There will be an integer solution if and only if gcd(2,-2) There will be an integer solution if and only if gcd(1,-1) divides
divides 1. 100.
This is not the case, and hence, there is no dependence! This is the case, and hence, there is dependence! Or is there?
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -25- Optimizing Compilers: Parallelization -26-
What is the relationship between N and 10? Same problem as unknown loop bounds, but occur due to
some loop transformations (e.g., loop bounds normalization).
Triangular loops:
do i = L, H
S1: a(i) = a(i-1)
do i = 1, N end do
do j = 1, i-1
S: a(i,j) = a(j,i)
end do
end do
do i = 1, H-L
Must impose j i as an additional constraint. S1: a(i+L) = a(i+L-1)
end do
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -27- Optimizing Compilers: Parallelization -28-
7
More Complications Loop Parallelization
Scalars: A dependence is said to be carried by a loop if the loop is the
outermost loop whose removal eliminates the dependence. If a
do i = 1, N do i = 1, N dependence is not carried by the loop, it is loop-independent.
S1: x = a(i)
S2: b(i) = x
S1: x(i) = a(i)
S2: b(i) = x(i)
end do end do do i = 2, n-1
do j = 2, m-1
a(i, j) = …
j = N-1 ... = a(i, j)
do i = 1, N do i = 1, N
S1: a(i) = a(j) S1: a(i) = a(N-i)
b(i, j) = …
S2: j = j - 1
… = b(i, j-1)
end do end do
c(i, j) = …
sum = 0 do i = 1, N … = c(i-1, j)
do i = 1, N
S1: sum = sum + a(i)
S1: sum(i) = a(i)
end do
end do
end do
end do sum += sum(i) i = 1, N
do i = 2, n-1
do j = 2, m-1
a(i, j) = … The iterations of a loop may be executed
δt, ... = a(i, j) in parallel with one another if and only if
no dependences are carried by the loop!
b(i, j) = …
δt, … = b(i, j-1)
c(i, j) = …
δt,
… = c(i-1, j)
end do
end do
Outermost loop with a non “=“ direction carries dependence!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -31- Optimizing Compilers: Parallelization -32-
8
Loop Parallelization - Example Loop Parallelization - Example
fork fork
i=2 i=n-1 j=2 j=m-1
join join
Iterations of loop j must be executed sequentially, but the Iterations of loop i must be executed sequentially, but the
iterations of loop i may be executed in parallel. iterations of loop j may be executed in parallel.
9
Loop Interchange Loop Interchange
Loop interchange changes the order of the loops to improve the Loop interchange can improve the granularity of parallelism!
spatial locality of a program.
do j = 1, n do i = 1, n do i = 1, n do j = 1, n
do i = 1, n do j = 1, n do j = 1, n do i = 1, n
... a(i,j) ... … a(i,j) ... a(i,j) = b(i,j) a(i,j) = b(i,j)
end do end do c(i,j) = a(i-1,j) c(i,j) = a(i-1,j)
end do end do end do end do
end do end do
i
P
C δt, δt,
j
M
j j
δt, δt,
10
Loop Interchange Loop Interchange
j j
When is loop interchange legal? When is loop interchange legal? when the “interchanged”
dependences remain lexiographically positive!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -41- Optimizing Compilers: Parallelization -42-
11