Amdahl’s Law
• If 1/s of the program is sequential, then you
can never get a speedup better than s.
– (Normalized) sequential execution time =
1/s + (1- 1/s) = 1
– Best parallel execution time on p processors =
1/s + (1 - 1/s) /p
– When p goes to infinity, parallel execution =
1/s
– Speedup = s.
Why keep something sequential?
●
Some parts of the program are not
parallelizable (because of dependencies)
●
Some parts may be parallelizable, but the
overhead dwarfs the increased speedup.
How could two statements execute in parallel?
●
On one processor:
statement 1;
statement 2;
●
On two processors:
processor1: processor2:
statement1; statement2;
Fundamental Assumption
●
Processors execute independently:
no control over order of execution
among processors
How could two statements execute in parallel?
• Possibility 1
Processor1: Processor2:
statement1;
statement2;
• Possibility 2
Processor1: Processor2:
statement2:
statement1;
How could two statements execute in parallel?
●
Their order of execution must not matter!
●
In other words,
statement1; statement2;
must be equivalent to
statement2; statement1;
Example 1
a = 1;
b = a;
●
Statements cannot be executed in parallel
●
Program modifications may make it possible.
Example 2
a = f(x);
b = a;
●
May not be wise to change the program.
Example 3
a = 1;
a = 2;
●
Statements cannot be executed in parallel.
Dependencies
●
What prevent us from parallelizing the
previous codes is the dependency between
the code statements.
Types of Dependencies
●
True dependency
●
Anti dependency
●
Output dependency
True dependence
Statements S1, S2
S2 has a true dependence on S1
iff →
S2 reads a value written by S1
Anti-dependence
Statements S1, S2.
S2 has an anti-dependence on S1
iff →
S2 writes a value read by S1.
Output Dependence
Statements S1, S2.
S2 has an output dependence on S1
iff →
S2 writes a variable written by S1.
How could two statements execute in parallel?
S1 and S2 can execute in parallel
iff →
there are no dependencies between
S1 and S2
– true dependency
– anti-dependency
– output dependency
Some dependencies can be removed.
Example 4
●
Most parallelism occurs in loops.
for(i=0; i<100; i++)
a[i] = i;
●
No dependencies
●
Iterations can be executed in parallel.
Example 5
for(i=0; i<100; i++) {
a[i] = i;
b[i] = 2*i;
}
Iterations and statements can be executed
in parallel.
Example 6
for(i=0;i<100;i++) a[i] = i;
for(i=0;i<100;i++) b[i] = 2*i;
Iterations and loops can be executed in parallel
Example 7
for(i=0; i<100; i++)
a[i] = a[i] + 100;
●
There is a dependence … on itself!
●
Loop is still parallelizable.
Example 8
for( i=0; i<100; i++ )
a[i] = f(a[i-1]);
●
Dependence between a[i] and a[i-1].
●
Loop iterations are not parallelizable.
Loop-carried dependence
●
A loop carried dependence is a
dependence that is present only if the
statements are part of the execution of a
loop.
●
Otherwise, we call it a loop-independent
dependence.
●
Loop-carried dependencies prevent loop
iteration parallelization.
Example 9
for(i=0; i<100; i++ )
for(j=1; j<100; j++ )
a[i][j] = f(a[i][j-1]);
●
Loop-independent dependence on i.
●
Loop-carried dependence on j.
●
Outer loop can be parallelized, inner
loop cannot.
Example 10
for( j=1; j<100; j++ )
for( i=0; i<100; i++ )
a[i][j] = f(a[i][j-1]);
●
Inner loop can be parallelized, outer loop
cannot.
●
Less desirable situation.
●
Loop interchange is sometimes possible.
Level of loop-carried dependence
●
Is the nesting depth of the loop that
carries the dependence.
●
Indicates which loops can be parallelized.
Be careful … Example 11
printf(“a”);
printf(“b”);
Statements have a hidden output dependence
due to the output stream.
Be careful … Example 12
a = f(x);
b = g(x);
Statements could have a hidden dependence
if f and g update the same variable.
Also depends on what f and g can do to x.
Be careful … Example 13
for(i=0; i<100; i++)
a[i+10] = f(a[i]);
●
Dependence between a[10], a[20], …
●
Dependence between a[11], a[21], …
●
…
●
Some parallel execution is possible.
Be careful … Example 14
for( i=1; i<100;i++ ) {
a[i] = …;
... = a[i-1];
}
●
Dependence between a[i] and a[i-1]
●
Complete parallel execution impossible
●
Pipelined parallel execution possible
Be careful … Example 15
for( i=0; i<100; i++ )
a[i] = f(a[indexa[i]]);
Cannot tell for sure.
●
Parallelization depends on user
knowledge of values in indexa[].
●
User can tell, compiler cannot.
Optimizations: Example 16
for (i = 0; i < 100000; i++)
a[i + 1000] = a[i] + 1;
Cannot be parallelized as it is.
May be parallelized by applying certain code transformations.
An aside
●
Parallelizing compilers analyze program
dependencies to decide parallelization.
●
In parallelization by hand, user does the
same analysis.
●
Compiler more convenient and more correct
●
User more powerful, can analyze more
patterns.
To remember
●
Statement order must not matter.
●
Statements must not have dependencies
●
Some dependencies can be removed.
●
Some dependencies may not be obvious.