KEMBAR78
L14 - Parallelization | PDF | Parallel Computing | Linear Programming
0% found this document useful (0 votes)
16 views17 pages

L14 - Parallelization

Uploaded by

李建霖
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views17 pages

L14 - Parallelization

Uploaded by

李建霖
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Lecture on Parallelization

I. Basic Parallelization
II. Data dependence analysis
III. Interprocedural parallelization

Chapter 11.1-11.1.4

Carnegie Mellon
M. Lam & S. Liao CS243: Parallelization 1
Parallelization of Numerical Applications
• DoAll loop parallelism
– Find loops whose iterations are independent
– Number of iterations typically scales with the problem
– Usually much larger than the number of processors in a machine
– Divide up iterations across machines

Carnegie Mellon
CS243: Parallelization 2 M. Lam & S. Liao
Basic Parallelism
Examples:
FOR i = 1 to 100
A[i] = B[i] + C[i]

FOR i = 11 TO 20
a[i] = a[i-1] + 3

FOR i = 11 TO 20
a[i] = a[i-10] + 3

• Does there exist a data dependence edge between two different


iterations?
• A data dependence edge is loop-carried if
it crosses iteration boundaries
• DoAll loops: loops without loop-carried dependences

Carnegie Mellon
CS243: Parallelization 3 M. Lam & S. Liao
Recall: Data Dependences
• True dependence:
a =
= a
• Anti-dependence:

= a
a =
• Output dependence

a =
a =

Carnegie Mellon
CS243: Parallelization 4 M. Lam & S. Liao
Affine Array Accesses
• Common patterns of data accesses: (i, j, k are loop indexes)
A[i], A[j], A[i-1], A[0], A[i+j], A[2*i], A[2*i+1]

A[i,j], A[i-1, j+1]

• Array indexes are affine expressions of surrounding loop indexes


– Loop indexes: in, in-1, ... , i1
– Integer constants: cn, cn-1, ... , c0
– Array index: cnin + cn-1in-1+ ... + c1i1+ c0
– Affine expression: linear expression + a constant term (c0)

Carnegie Mellon
CS243: Parallelization 5 M. Lam & S. Liao
II. Formulating Data Dependence Analysis
FOR i := 2 to 5 do
A[i-2] = A[i]+1;

• Between read access A[i] and write access A[i-2]there is a


dependence if:
– there exist two iterations ir and iw within the loop bounds, s.t.
– iterations ir & iw read & write the same array element, respectively

∃integers i w, i r 2 ≤ i w, i r ≤ 5 i r = iw - 2

• Between write access A[i-2] and write access A[i-2] there is a


dependence if:
∃integers i w, i v 2 ≤ i w, i v ≤ 5 i w – 2 = iv - 2

– To rule out the case when the same instance depends on itself:
• add constraint iw ≠ iv

Carnegie Mellon
CS243: Parallelization 6 M. Lam & S. Liao
Memory Disambiguation

is
Undecidable at Compile Time

read(n)
For i =
a[i] = a[n]

Carnegie Mellon
CS243: Parallelization 7 M. Lam & S. Liao
Domain of Data Dependence Analysis
• Only use loop bounds and array indexes that are affine functions of loop
variables
for i = 1 to n
for j = 2i to 100
a[i+2j+3][4i+2j][i*i] = …
… = a[1][2i+1][j]
• Assume a data dependence between the read & write operation if there
exists:
– a read instance with indexes ir, jr and
– a write instance with indexes iw, jw
∃integers ir, jr, iw, jw 1 ≤ iw, ir ≤ n 2iw ≤ jw ≤ 100
2ir ≤ jr ≤ 100
iw + 2jw + 3 = 1 4iw + 2jw = 2ir + 1
• Equate each dimension of array access; ignore non-affine ones
– No solution → No data dependence
– Solution → there may be a dependence

Carnegie Mellon
CS243: Parallelization 8 M. Lam & S. Liao
Complexity of Data Dependence Analysis
For every pair of accesses not necessarily distinct (F1 , f1 ) and (F2, f2)
one must be a write operation
Let B1i1+b1 ≥ 0, B2i2+b2 ≥ 0 be the corresponding loop bound
constraints,
∃ integers i1, i2 B1i1 + b1 ≥ 0, B2i2 + b2 ≥ 0
F1 i1+ f1 = F2 i2+f2

• If the accesses
Equivalent are not
to integer distinct,
linear then add the constraint i 1 ≠ i2
programming

∃ integer i A 1i ≤ b 1 A 2i = b 2

• Integer linear programming is NP-complete


– O(size of the coefficients) or O(nn)

Carnegie Mellon
CS243: Parallelization 9 M. Lam & S. Liao
Data Dependence Analysis Algorithm

• Typically solving many tiny, repeated problems


– Integer linear programming packages optimize for large problems
– Use memoization to remember the results of simple tests

• Apply a series of relatively simple tests


– GCD: 2*i, 2*i+1; GCD for simultaneous equations
– Test if the ranges overlap

• Backed up by a more expensive algorithm


– Use Fourier-Motzkin Elimination to test if there is a real solution
• Keep eliminating variables to see if a solution remains
• If there is no solution, then there is no integer solution

Carnegie Mellon
10 M. Lam & S. Liao
Fourier-Motzkin Elimination

• To eliminate a variable from a set of linear inequalities.


• To eliminate a variable x1
– Rewrite all expressions in terms of lower or upper bounds of x1
– Create a transitive constraint for each pair of lower and upper bounds.
• Example: Let L, U be lower bounds and upper bounds resp
– To eliminate x1:

L1(x2, …, xn) ≤ U1 (x2, …, xn)


L1(x2, …, xn) ≤ x1 ≤ U1 (x2, …, xn) L1(x2, …, xn) ≤ U2 (x2, …, xn)
L2(x2, …, xn) ≤ x1 ≤ U2 (x2, …, xn) L2(x2, …, xn) ≤ U1 (x2, …, xn)
L2(x2, …, xn) ≤ U2 (x2, …, xn)

Carnegie Mellon
11 M. Lam & S. Liao
Example
FOR i = 1 to 5
FOR j = i+1 to 5
A[i,j] = f(A[i,i], A[i-1,j])

1≤i 1 ≤ i’
i≤5 i’ ≤ 5
i+1 ≤ j i’ + 1 ≤ j’
j≤5 j’ ≤ 5

Carnegie Mellon
CS243: Parallelization 12 M. Lam & S. Liao
Data Dependence Analysis Algorithm
• Typically solving many tiny, repeated problems
– Integer linear programming packages optimize for large problems
– Use memoization to remember the results of simple tests

• Apply a series of relatively simple tests


– GCD: 2*i, 2*i+1; GCD for simultaneous equations
– Test if the ranges overlap

• Backed up by a more expensive algorithm


– Use Fourier-Motzkin Elimination to test if there is a real solution
• Keep eliminating variables to see if a solution remains
• Add heuristics to encourage finding an integer solution.
– Create 2 subproblems if a real, but not integer, solution is found.
• For example, if x = .5 is a solution,
create two problems,
by adding x ≤ 0 and x ≥ 1 respectively to original constraint.

Carnegie Mellon
13 M. Lam & S. Liao
Relaxing Dependences
Privatization:
• Scalar
for i = 1 to n
t = (A[i] + B[i]) / 2;
C[i] = t * t;
• Array
for i = 1 to n
for j = 1 to n
t[j] = (A[i,j] + B[i,j]) / 2;
for j = 1 to n
C[i,j] = t[j] * t[j];
Reduction:
for i = 1 to n
sum = sum + A[i];

Carnegie Mellon
CS243: Parallelization 14 M. Lam & S. Liao
Carnegie Mellon
CS243: Parallelization 15 M. Lam & S. Liao
Interprocedural Parallelization
• Why? Amdahl’s Law
• Interprocedural symbolic analysis

– Find interprocedural array indexes


which are affine expressions of outer loop indices
• Interprocedural parallelization analysis

– Data dependence based on summaries of array regions accessed


• If the regions do not intersect, there is parallelism
– Find privatizable scalar variables and arrays
– Find scalar and array reductions

Carnegie Mellon
CS243: Parallelization 16 M. Lam & S. Liao
Conclusions
• Basic parallelization
– Doall loop: loops with no loop-carried data dependences
– Data dependence for affine loop indexes = integer linear programming

• Coarse-grain parallelism because of Amdahl’s Law


– Interprocedural analysis is useful for affine indices
– Ask users for help on unresolved dependences

Carnegie Mellon
CS243: Parallelization 17 M. Lam & S. Liao

You might also like