KEMBAR78
Example: Pipelining: Basic and Intermediate Concepts | PDF | Central Processing Unit | Integrated Circuit
0% found this document useful (0 votes)
9 views3 pages

Example: Pipelining: Basic and Intermediate Concepts

Uploaded by

abc def
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Example: Pipelining: Basic and Intermediate Concepts

Uploaded by

abc def
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

C-10 ■ Appendix C Pipelining: Basic and Intermediate Concepts

also contributes to the lower limit on the clock cycle. Once the clock cycle is as
small as the sum of the clock skew and latch overhead, no further pipelining is
useful, because there is no time left in the cycle for useful work. The interested
reader should see Kunkel and Smith (1986).

Example Consider the unpipelined processor in the previous section. Assume that it has a
4 GHz clock (or a 0.5 ns clock cycle) and that it uses four cycles for ALU oper-
ations and branches and five cycles for memory operations. Assume that the rel-
ative frequencies of these operations are 40%, 20%, and 40%, respectively.
Suppose that due to clock skew and setup, pipelining the processor adds 0.1 ns
of overhead to the clock. Ignoring any latency impact, how much speedup in
the instruction execution rate will we gain from a pipeline?

Answer The average instruction execution time on the unpipelined processor is


Average instruction execution time ¼ Clock cycle  Average CPI

¼ 0:5 ns  ½ð40% + 20%Þ  4 + 40%  5


¼ 0:5 ns  4:4
¼ 2:2 ns

In the pipelined implementation, the clock must run at the speed of the slowest
stage plus overhead, which will be 0.5 + 0.1 or 0.6 ns; this is the average instruction
execution time. Thus, the speedup from pipelining is
Average instruction time unpipelined
Speedup from pipelining ¼
Average instruction time pipelined
2:2 ns
¼ ¼ 3:7 times
0:6 ns
The 0.1 ns overhead essentially establishes a limit on the effectiveness of pipelin-
ing. If the overhead is not affected by changes in the clock cycle, Amdahl’s Law
tells us that the overhead limits the speedup.

This simple RISC pipeline would function just fine for integer instructions if
every instruction were independent of every other instruction in the pipeline. In
reality, instructions in the pipeline can depend on one another; this is the topic
of the next section.

C.2 The Major Hurdle of Pipelining—Pipeline Hazards


There are situations, called hazards, that prevent the next instruction in the instruc-
tion stream from executing during its designated clock cycle. Hazards reduce the
performance from the ideal speedup gained by pipelining. There are three classes
of hazards:
C.2 The Major Hurdle of Pipelining—Pipeline Hazards ■ C-11

1. Structural hazards arise from resource conflicts when the hardware cannot sup-
port all possible combinations of instructions simultaneously in overlapped exe-
cution. In modern processors, structural hazards occur primarily in special
purpose functional units that are less frequently used (such as floating point
divide or other complex long running instructions). They are not a major per-
formance factor, assuming programmers and compiler writers are aware of the
lower throughput of these instructions. Instead of spending more time on this
infrequent case, we focus on the two other hazards that are much more frequent.
2. Data hazards arise when an instruction depends on the results of a previous
instruction in a way that is exposed by the overlapping of instructions in the
pipeline.
3. Control hazards arise from the pipelining of branches and other instructions
that change the PC.

Hazards in pipelines can make it necessary to stall the pipeline. Avoiding a haz-
ard often requires that some instructions in the pipeline be allowed to proceed while
others are delayed. For the pipelines we discuss in this appendix, when an instruction
is stalled, all instructions issued later than the stalled instruction—and hence not as
far along in the pipeline—are also stalled. Instructions issued earlier than the stalled
instruction—and hence farther along in the pipeline—must continue, because oth-
erwise the hazard will never clear. As a result, no new instructions are fetched during
the stall. We will see several examples of how pipeline stalls operate in this section—
don’t worry, they aren’t as complex as they might sound!

Performance of Pipelines With Stalls


A stall causes the pipeline performance to degrade from the ideal performance.
Let’s look at a simple equation for finding the actual speedup from pipelining,
starting with the formula from the previous section:
Average instruction time unpipelined
Speedup from pipelining ¼
Average instruction time pipelined
CPI unpipelined  Clock cycle unpipelined
¼
CPI pipelined  Clock cycle pipelined
CPI unpipelined  Clock cycle unpipelined
¼
CPI pipelined  Clock cycle pipelined
Pipelining can be thought of as decreasing the CPI or the clock cycle time. Because
it is traditional to use the CPI to compare pipelines, let’s start with that assumption.
The ideal CPI on a pipelined processor is almost always 1. Hence, we can compute
the pipelined CPI:
CPI pipelined ¼ Ideal CPI + Pipeline stall clock cycles per instruction
¼ 1 + Pipelines stall clock cycles per instruction
C-12 ■ Appendix C Pipelining: Basic and Intermediate Concepts

If we ignore the cycle time overhead of pipelining and assume that the stages
are perfectly balanced, then the cycle time of the two processors can be equal,
leading to
CPI unpiplined
Speedup ¼
1 + Pipeline stall cycles per instruction
One important simple case is where all instructions take the same number of cycles,
which must also equal the number of pipeline stages (also called the depth of the
pipeline). In this case, the unpipelined CPI is equal to the depth of the pipeline,
leading to
Pipeline depth
Speedup ¼
1 + Pipeline stall cycles per instruction
If there are no pipeline stalls, this leads to the intuitive result that pipelining can
improve performance by the depth of the pipeline.

Data Hazards
A major effect of pipelining is to change the relative timing of instructions by over-
lapping their execution. This overlap introduces data and control hazards. Data
hazards occur when the pipeline changes the order of read/write accesses to
operands so that the order differs from the order seen by sequentially executing
instructions on an unpipelined processor. Assume instruction i occurs in program
order before instruction j and both instructions use register x, then there are three
different types of hazards that can occur between i and j:

1. Read After Write (RAW) hazard: the most common, these occur when a
read of register x by instruction j occurs before the write of register x by instruc-
tion i. If this hazard were not prevented instruction j would use the wrong value
of x.
2. Write After Read (WAR) hazard: this hazard occurs when read of register x by
instruction i occurs after a write of register x by instruction j. In this case,
instruction i would use the wrong value of x. WAR hazards are impossible
in the simple five stage, integrer pipeline, but they occur when instructions
are reordered, as we will see when we discuss dynamically scheduled pipelines
beginning on page C.65.
3. Write After Write (WAW) hazard: this hazard occurs when write of register x by
instruction i occurs after a write of register x by instruction j. When this occurs,
register x will have the wrong value going forward. WAR hazards are also
impossible in the simple five stage, integrer pipeline, but they occur when
instructions are reordered or when running times vary, as we will see later.

Chapter 3 explores the issues of data dependence and hazards in much more detail.
For now, we focus only on RAW hazards.

You might also like