PIPELINE AND VECTOR PROCESSING
Parallel processing:
• Parallel processing is a term used for a large class of techniques that
are used to provide simultaneous data-processing tasks for the purpose of increasing the
computational speed of a computer system.
It refers to techniques that are used to provide simultaneous data processing.
The system may have two or more ALUs to be able to execute two or more
instruction at the same time.
The system may have two or more processors operating concurrently.
It can be achieved by having multiple functional units that perform same or different
operation simultaneously.
• Example of parallel Processing:
– Multiple Functional Unit:
Separate the execution unit into eight functional units operating in parallel.
There are variety of ways in which the parallel processing can be classified
Internal Organization of Processor
Interconnection structure between processors
Flow of information through system
19
UNIT-V
Architectural Classification:
– Flynn's classification
» Based on the multiplicity of Instruction Streams and Data Streams
» Instruction Stream
• Sequence of Instructions read from memory
» Data Stream
• Operations performed on the data in the processor
SISD represents the organization containing single control unit, a processor unit and a
memory unit. Instruction are executed sequentially and system may or may not have
internal parallel processing capabilities.
SIMD represents an organization that includes many processing units under the
supervision of a common control unit.
MISD structure is of only theoretical interest since no practical system has been
constructed using this organization.
MIMD organization refers to a computer system capable of processing several
programs at the same time.
The main difference between multicomputer system and multiprocessor system is that the
multiprocessor system is controlled by one operating system that provides interaction
between processors and all the component of the system cooperate in the solution of a
problem.
Parallel Processing can be discussed under following topics:
Pipeline Processing
Vector Processing
Array Processors
20
UNIT-V
PIPELINING:
• A technique of decomposing a sequential process into suboperations, with
each subprocess being executed in a special dedicated segment that operates
concurrently with all other segments.
• It is a technique of decomposing a sequential process into sub operations, with
each sub process being executed in a special dedicated segments that operates
concurrently with all other segments.
• Each segment performs partial processing dictated by the way task is
partitioned.
• The result obtained from each segment is transferred to next segment.
• The final result is obtained when data have passed through all segments.
• Suppose we have to perform the following task:
• Each sub operation is to be performed in a segment within a pipeline. Each segment
has one or two registers and a combinational circuit.
21
UNIT-V
OPERATIONS IN EACH PIPELINE STAGE:
• General Structure of a 4-Segment Pipeline
• Space-Time Diagram
The following diagram shows 6 tasks T1 through T6 executed in 4segments.
PIPELINE SPEEDUP:
Consider the case where a k-segment pipeline used to execute n tasks.
n = 6 in previous example
22
UNIT-V
k = 4 in previous example
• Pipelined Machine (k stages, n tasks)
The first task t1 requires k clock cycles to complete its operation since there
are k segments
The remaining n-1 tasks require n-1 clock cycles
The n tasks clock cycles = k+(n-1) (9 in previous example)
• Conventional Machine (Non-Pipelined)
Cycles to complete each task in nonpipeline = k
For n tasks, n cycles required is
• Speedup (S)
S = Nonpipeline time /Pipeline time
For n tasks: S = nk/(k+n-1)
As n becomes much larger than k-1; Therefore, S = nk/n = k
PIPELINE AND MULTIPLE FUNCTION UNITS:
Example:
- 4-stage pipeline
- 100 tasks to be executed
- 1 task in non-pipelined system; 4 clock cycles
Pipelined System : k + n - 1 = 4 + 99 = 103 clock cycles
Non-Pipelined System : n*k = 100 * 4 = 400 clock cycles
Speedup : Sk = 400 / 103 = 3.88
Types of Pipelining:
• Arithmetic Pipeline
• Instruction Pipeline
ARITHMETIC PIPELINE:
Pipeline arithmetic units are usually found in very high speed computers.
They are used to implement floating point operations.
23
UNIT-V
We will now discuss the pipeline unit for the floating point addition and subtraction.
The inputs to floating point adder pipeline are two normalized floating point numbers.
A and B are mantissas and a and b are the exponents.
The floating point addition and subtraction can be performed in four segments.
Floating-point adder:
[1] Compare the exponents
[2] Align the mantissa
[3] Add/sub the mantissa
[4] Normalize the result
X = A x 10a = 0.9504 x 103
Y = B x 10b = 0.8200 x 102
1) Compare exponents :
3-2=1
2) Align mantissas
X = 0.9504 x 103
Y = 0.08200 x 103
3) Add mantissas
Z = 1.0324 x 103
4) Normalize result
Z = 0.10324 x 104
24
UNIT-V
Instruction Pipeline:
Pipeline processing can occur not only in the data stream but in the instruction stream
as well.
An instruction pipeline reads consecutive instruction from memory while previous
instruction are being executed in other segments.
This caused the instruction fetch and execute segments to overlap and perform
simultaneous operation.
Four Segment CPU Pipeline:
FI segment fetches the instruction.
DA segment decodes the instruction and calculate the effective address.
FO segment fetches the operand.
EX segment executes the instruction.
25
UNIT-V
INSTRUCTION CYCLE:
Pipeline processing can occur also in the instruction stream. An instruction
pipeline reads consecutive instructions from memory while previous
instructions are being executed in other segments.
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
26
UNIT-V
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
* Some instructions skip some phases
* Effective address calculation can be done in the part of the decoding phase
* Storage of the operation result into a register is done automatically in the execution phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory
[2] DA: Decode the instruction and calculate the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation
Pipeline Conflicts :
– Pipeline Conflicts : 3 major difficulties
–
1) Resource conflicts: memory access by two segments at the same time. Most of these
conflicts can be resolved by using separate instruction and data memories.
2) Data dependency: when an instruction depend on the result of a previous instruction,
but this result is not yet available.
27
UNIT-V
Example: an instruction with register indirect mode cannot proceed to fetch the operand
if the previous instruction is loading the address into the register.
3) Branch difficulties: branch and other instruction (interrupt, ret, ..) that change the value
of PC.
Handling Data Dependency:
This problem can be solved in the following ways:
Hardware interlocks: It is the circuit that detects the conflict situation and
delayed the instruction by sufficient cycles to resolve the conflict.
Operand Forwarding: It uses the special hardware to detect the conflict and
avoid it by routing the data through the special path between pipeline
segments.
Delayed Loads: The compiler detects the data conflict and reorder the
instruction as necessary to delay the loading of the conflicting data by
inserting no operation instruction.
Handling of Branch Instruction:
Pre fetch the target instruction.
Branch target buffer(BTB) included in the fetch segment of the pipeline
Branch Prediction
Delayed Branch
RISC Pipeline:
Simplicity of instruction set is utilized to implement an instruction pipeline using
small number of sub-operation, with each being executed in single clock cycle.
Since all operation are performed in the register, there is no need of effective address
calculation.
Three Segment Instruction Pipeline:
I: Instruction Fetch
A: ALU Operation
E: Execute Instruction
Delayed Load:
28
UNIT-V
Delayed Branch:
Let us consider the program having the following 5 instructions
29
UNIT-V