KEMBAR78
Assign | PDF | Parallel Computing | Central Processing Unit
0% found this document useful (0 votes)
131 views12 pages

Assign

Parallel processing is an efficient form of information processing. Pipelining is a technique where the microprocessor begins executing a second instruction before the first has been completed. Throughput is the number of results it produces per unit time.

Uploaded by

rakbatth
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views12 pages

Assign

Parallel processing is an efficient form of information processing. Pipelining is a technique where the microprocessor begins executing a second instruction before the first has been completed. Throughput is the number of results it produces per unit time.

Uploaded by

rakbatth
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

Q1. Define: parallel processing, pipelining, throughput and scaling.

Ans. Parallel processing:- Parallel processing is an efficient form of


information processing which emphasizes the exploitation of concurrent
events in the computing process.

Pipelining:- A technique used in advanced microprocessors where the


microprocessor begins executing a second instruction before the first has been
completed. That is, several instructions are in the pipeline simultaneously, each
at a different processing stage.
The pipeline is divided into segments and each segment can execute its
operation concurrently with the other segments. When a segment completes
an operation, it passes the result to the next segment in the pipeline and
fetches the next operation from the preceding segment. The final results of
each instruction emerge at the end of the pipeline in rapid succession.
Pipelining is also called pipeline processing

Throughput:- The throughput of device is the number of results it produces


per unit time.`

Scaling:- An architecture is scalable if it continues to yield the same


performance per processor even if used on larger problem size as number of
processors increased.

Q2. What is the difference between control parallelism and data


parallelism?
Ans. Difference between control parallelism and data parallelism:-
Control parallelism: - It is achieved by applying different operations to
different data elements simultaneously.
Data parallelism: - It is achieved by applying a single operation to a data
sent.

Q3. Explain the classification of computer architecture proposed by


Michael Flynn.
Ans. Currently, the most popular nomenclature for the classification of
computer architectures is that proposed by Flynn in 1966. Flynn chose not to
examine the explicit structure of the machine, but rather how instructions and
data flow through it. Specifically, the taxonomy identifies whether there are
single or multiple 'streams' for data and for instructions. The term 'stream'
refers to a sequence of either instructions or data operated on by the computer.
The four possible categories of Flynn's taxonomy are:
• SISD (Single Instruction Single Data)
• SIMD (Single Instruction Multiple Data)
• MISD (Multiple Instruction Single Data)
• MIMD (Multiple Instruction Multiple Data)
The multiplicity is taken as the maximum possible number of simultaneous
operations [instructions] or operands [data] being in the same phase of
execution at the most constrained component of the organization" Flynn
(1966).

1. SISD
SISD (Single Instruction, Single Data) is a term referring to a computer
architecture in which a single processor, a uniprocessor, executes a single
instruction stream, to operate on data stored in a single memory.
According to Michael J. Flynn, SISD can have concurrent processing
characteristics. Instruction fetching and pipelined execution of instructions are
common examples found in most modern SISD computers.

Examples of SISD machines include:


CDC 6600 which is unpipelined but has multiple functional units.
CDC 7600 which has a pipelined arithmetic unit.
Amdhal 470/6 which has pipelined instruction processing.
Cray-1 which supports vector processing.

2. SIMD
SIMD (Single Instruction, Multiple Data; colloquially, "vector instructions" is a
technique employed to achieve data level parallelism.

This category corresponds to the array processors.


Examples include; ILLIAC-IV, PEPE, BSP, STARAN, MPP, DAP and the
Connection Machine (CM-1).
3. MISD
MISD (Multiple Instruction, Single Data) is a type of parallel computing
architecture where many functional units perform different operations on the
same data. Pipeline architectures belong to this type, though a purist might say
that the data is different after processing by each stage in the pipeline.

The only known example of a computer capable of MISD operation is the


C.mmp built by Carnegie-Mellon University. This computer is reconfigurable
and can operate in SIMD, MISD and MIMD modes.
4. MIMD
MIMD (Multiple Instruction stream, Multiple Data stream) is a technique
employed to achieve parallelism. Machines using MIMD have a number of
processors that function asynchronously and independently. At any time,
different processors may be executing different instructions on different pieces
of data.
MIMD machines can be of either shared memory or distributed memory
categories.

Examples include; C.mmp, Burroughs D825, Cray-2, S1, Cray X-MP, HEP,
Pluribus, IBM 370/168 MP, Univac 1100/80, Tandem/16, IBM 3081/3084,
C.m*, BBN Butterfly, Meiko Computing Surface (CS-1), FPS T/40000, Ipsc.

Q4. Explain the triple proposed by Wolfgang Handler as a classification


scheme for identifying the parallelism degree and pipelining degree.
Ans. In 1977 Handler proposed an elaborate notation for expressing the
pipelining and parallelism of computers. Handler's taxonomy addresses the
computer at three distinct levels: the processor control unit (PCU), the
arithmetic logic unit (ALU), and the bit-level circuit (BLC). The PCU
corresponds to a processor or CPU, the ALU corresponds to a functional unit
or a processing element in an array processor, and the BLC corresponds to a
the logic needed to perform one-bit operations in the ALU.

Handler's taxonomy uses three pairs of integers to describe a computer:

Computer = (k * k', d * d', w * w')


WHERE
k = number of PCUs
k'= number of PCUs that can be pipelined
d = number of ALUs controlled by each PCU
d'= number of ALUs that can be pipelined
w = number of bits in ALU or processing element(PE) word
w'= number of pipeline segments on all ALUs or in a single PE
The following rules and operators are used to show the relationship between
various elements of the computer. The '*' operator is used to indicate that the
units are pipelined or macro-pipelined with a stream of data running through all
the units. The '+' operator is used to indicate that the units are not pipelined but
work on independent streams of data. The 'v' operator is used to indicate that
the computer hardware can work in one of several modes. The '~' symbol is
used to indicate a range of values for any one of the parameters. Peripheral
processors are shown before the main processor using another three pairs of
integers. If the value of the second element of any pair is 1, it may omitted for
brevity. Handler's taxonomy is best explained by showing how the rules and
operators are used to classify several machines.
Examples:
• The CDC 6600 has a single main processor supported by 10 I/O
processors. One control units coordinates one ALU with a 60-bit word
length. The ALU has 10 functional units which can be formed into a
pipeline. The 10 peripheral I/O processors may work in parallel with
each other and with the CPU. Each I/O processor contains one 12-bit
ALU. The description for the 10 I/O processors is:

CDC 6600I/O = (10, 1, 12)


The description for the main processor is:
CDC 6600main = (1, 1 * 10, 60)
The main processor and the I/O processors can be regarded as forming a
macro-pipeline so the '*' operator is used to combine the two structures:

CDC 6600 = (I/O processors) * (central processor)


= (10, 1, 12) * (1, 1 * 10, 60)
• Texas Instrument's Advanced Scientific Computer (ASC) has one
controller coordinating four arithmetic units. Each arithmetic unit is an
eight stage pipeline with 64-bit words. Thus we have:

ASC = (1, 4, 64 * 8)
• Another sample system is Carnegie-Mellon University's C.mmp
multiprocessor. This system was designed to facilitate research into
parallel computer architectures and consequently can be extensively
reconfigured. The system consists of 16 PDP-11 'minicomputers' (which
have a 16-bit word length), interconnected by a crossbar switching
network. Normally, the C.mmp operates in MIMD mode for which the
description is (16, 1, 16). It can also operate in SIMD mode, where all
the processors are coordinated by a single master controller. The SIMD
mode description is (1, 16, 16). Finally, the system can be rearranged to
operate in MISD mode. Here the processors are arranged in a chain
with a single stream of data passing through all of them. The MISD
modes description is (1 * 16, 1, 16). The 'v' operator is used to combine
descriptions of the same piece hardware operating in differing modes.
Thus, Handler's description for the complete C.mmp is:

C.mmp = (16, 1, 16) v (1, 16, 16) v (1 * 16, 1, 16)


The '*' and '+' operators are used to combine several separate pieces of
hardware. The 'v' operator is of a different form to the other two in that it is
used to combine the different operating modes of a single piece of hardware.

Q5. What is Amdahl's law?


Ans. Let f be the fraction of operations in a computation that must be
performed sequentially, where 0≤f≤1.
The maximum speed up s achievable by a parallel computer with p
processor performing the computation is: -
s≤1/f + (1-f)/p
The amount of sequential operations in a problem is often independent of,
or increases slowly, with the problem size but the amount of parallel operations
usually increase in direct proportion to the size of the problem. Amdahl’s law is
based upon the idea that parallel processing is used to reduce the time in
which a problem of some particular size can be solved.

Q6. Discuss the speed up achieved by pipelining in ideal conditions.


Ans. The maximum speed up achieved is: -
Ts/ Tp = mn/n + (m-1)
Where m: - instructions
n: - clock cycles
Ts/Tp=n/(n/m+m-1/m)

iff m>>n, n/m is approximately equal to zero.


Ts/Tp = n (in ideal conditions)
Q7. Define: vector operand, vector processor, array processor,
multiprocessor and vector computer.
Ans.
Vector operand: - It contains an ordered set of n elements, where n is
called as a length of vector. Each element in a vector is a scalar quantity,
which may be floating point number, an integer, a logical value or a character.
Vector processor: - It streams vectors from memory to C.P.U, where
pipelined arithmetic units manipulate them. This is also called as pipelined
vector processor.
Array processor: - A processor array is a vector computer implemented as
a sequential computer connected to a set of identical, synchronized processing
elements capable of simultaneously performing the same operation on different
data.

Multiprocessor : - Multiple –CPU computers consist of a number of fully


programmable processors, each capable of executing its own program.
Multiprocessors are multiple-CPU computers with a shared memory.
Vector computer: - A vector computer is a computer whose instruction set
includes operations on vectors as well as scalars.

Q8. What is the significance of reservation tables?


Ans.
A reservation table lists all resources used by the corresponding instruction
and, for each resource, describes the type and schedule of the accesses made
to that resource, and any relevant renaming and scheduling constraints
associated with that resource.
A reservation table has several rows and columns. Each row of the
reservation table represents one resource of the pipeline and each column
represents one-time slice of the pipeline. All the elements of the table are either
0 or 1. If one resource is used in a time slice than the element of the table will
have the entry

1. If a resource is not used in a particular time-slice, then that entry of the table
will have the value 0.
Example: -
Suppose that we have 4 resources and 6 time-slices and the usage of
resources is as follows:

resource 1 is used in time-slices 1,3,6.


resource 2 is used in time-slice 2.
resource 3 is used in time-slices 2,4.
resource 4 is used in time slice 5.
The corresponding reservation table will be as follows:

index time-1 time-2 time-3 time-4 time-5 time-6


resource 1 1 0 1 0 0 1
resource 2 0 1 0 0 0 0
resource 3 0 1 0 1 0 0
resource 4 0 0 0 0 1 0

To make the table look simpler, the 0 entries are represented by blank and 1
entries are represented by a ‘X’.

index time-1 time-2 time-3 time-4 time-5 time-6


resource 1 X X X
resource 2 X
resource 3 X X
resource 4 X

Q9. What are data and control hazards and methods to resolve them?
Ans.
Data hazards: Data hazards occur when data is modified. Ignoring potential
data hazards can result in race conditions (sometimes known as race hazards).
There are three situations a data hazard can occur in:
Read after Write (RAW) or True dependency: An operand is modified and
read soon after. Because the first instruction may not have finished writing to
the operand, the second instruction may use incorrect data.

Write after Read (WAR) or Anti dependency: Read an operand and write
soon after to that same operand. Because the write may have finished before
the read, the read instruction may incorrectly get the new written value.

Write after Write (WAW) or Output dependency: Two instructions that write
to the same operand are performed. The first one issued may finish second,
and therefore leave the operand with an incorrect data value.

Methods to resolve data hazards: - once a hazard is detected, the system


should resolve the interlock situations. A straight forward approach is to stop
the pipe and to suspend the execution of next instructions has passed the point
of resource conflict.

Control Hazards: - Control hazards are delays due to branch instructions.


A branch destructs the normal flow of control. The branching is come to known
at decode stage by that time next instruction is fetched in pipeline.
Another method is short circuiting approach or data forwarding approach which
involves copies of the data available.

Methods to resolve control hazards: - To tackle the problem at the


hardware level and other method is to have software aids. The software aid is
to use compiler to rearrange the statements in such a way that the statement
following the branch statement called as a delay slot is always executed once.
It is fetched without affecting the correctness of the program. This may not
always be possible but analysis of many programs shows that this technique
succeeds quite often.

Q10. Explain how short-circuiting approach (or data forwarding)


approach helps to resolve data hazard.
Ans.
Short-circuiting approach :-
This technique is introduced in this section for enhancing the performance
of computers with multiple execution pipelines. Internal forwarding refers to a
“short-Circuit” technique for replacing unnecessary memory accesses by
register-to-register transfers in a sequence of fetch-arithmetic-store operations.
The computer performance can be greatly enhanced if one can eliminate
unnecessary memory access.

Q11. Explain how internal-forwarding and register-tagging techniques


help to resolve logic hazards.
Ans. Two techniques are introduced in this section for enhancing the
performance of computers with multiple execution pipelines. Internal forwarding
refers to a “short-circuit” technique for replacing unnecessary memory
accesses by register-to-register transfers in a sequence of fetch-arithmetic-
store operations.
Register tagging refers to the use of tagged registers, buffers and
reservation stations for exploiting concurrent activities among multiple
arithmetic units memory access is much slower than register-to-register
operations. The computer Performance can be greatly enhanced if one can
eliminate unnecessary memory access and combine some transitive or
multiple fetch-store operations with faster register operations.

Q12. What are the various difficulties in pipelining?


Ans. Various difficulties in pipelining are: -
Stage Problem exceptions occurring

IF Page fault on instruction fetch; misaligned memory


access; memory-protection violation

ID Undefined or illegal opcode

EX Arithmetic interrupts

MEM Page fault on data fetch; misaligned memory access;


Memory-protection violation

Q13. List the various types SIMD interconnection networks.


Ans. SIMD interconnection networks are of two types: -
(1) Static network: - In this data routing network is fixed.
(2) Dynamic network: - In this data routing network is dynamic.
Static network are further divided into four types: -
a) 1- D (linear)
b) 2- D (ring, star, mesh, tree)
c) 3- D (completely connected chordal ring , 3- cube, 3-cube connected
cycle)
d) Hypercube ( mesh – a 2- D hypercube, 3- cube – a 3- D hypercube)

Dynamic network are further divided into two types: -


a) Single stage network:- It is a switching network with N input
selectors and N output selectors. To establish a desired connecting path,
different path control
Signals will be applied to all IS and OS selectors.

a.1) Single stage


a.2) Re- circulating network: - In this network , data items may have to
re- circulate through the single stage several times before reaching their
final destinations.

b) Multistage network: - Multistage network involves many stages of


interconnected switches forming a multistage SIMD network. Multistage
network are described by three characterizing features: the switch box, the
network topology, and the control structure.
These are of two types:

b.1) One- sided network: - One- sided network sometimes called full
switches, have input-output ports on the same side .
b.2) Two- sided network: - Two- sided multistage networks, which usually
have input side and an output side.

Two sided network are of three types: -


b.2.1) Blocking network (data manipulator, base line)
b.2.2) Re-arrangeable network (bones network)
b.2.3) Non-blocking network (close network)

Q14. What are the various connection issues for SIMD processing?
Ans. The interconnection networks plays a major role in vectorization.
Several connection issues in using SIMD interconnection network are
addressed below: -
1.Permutation and connectivity: - In array processing , data is often
stored in parallel memory modules in skewed forms that allow a vector of data
to be fetched without conflict. A rearrangeable network and the non-blocking
network can realize every permutation function, but using these network for
alignment requires considerable effort to calculate control settings.

2.Partitioning and reconfigurability: - A network is just a configuration in


the same topologically equivalent class. Through the reconfiguration process,
the baseline network can realize every permutation in one pass without
conflicts.

3.Reliability and bandwidth: - The reliable operation of interconnection


networks is important to overall system performance. The reliability issue can
be thought of as two problems: - fault diagnosis and fault tolerance. It is
important to design a network that combines full connection capability with
graceful degradation in spite of the existence of faults.

Q15. What are the different parameters on the basis of which various
networks of array processors are compared?
Ans. Parameters to compare SIMD interconnection network: -

(1) Total network bandwidth.


(2) Bisection
These two parameters indicate performance.

(3) Total number of links.


(4) Ports per switch
These two parameters indicate cost.

You might also like