0 ratings0% found this document useful (0 votes) 196 views48 pagesComputer Organization & Architecture
Notes for Gate Computer science
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
CHAPTER 2
www.engineeringonyourfingertips.ooo
COMPUTER ORGANIZATION AND ARCHITECTURE
‘Syllabus: Machine instructions and addressing modes, ALU and data path, CPU control design, memory Interface,
1/0 interface (interrupt and DMA mode), instruction pipelining, cache and main memory, secondary storage
2.1 INTRODUCTION
2.2 COMPUTER ARCHITECTURE
Computer architecture snd organization is the
science of interconnecting hardware components, design-
Ing and configuring the hardware/software interface (9
full functional and performance goals of a computer.
‘This chaptor outlines the bssic hardware structure of a
‘medam digital programmable computor, the basic laws
for performance evaluation, designing the control and
data path hardware for a processor, concept of pipshining
for executing machine instructions simultaneously and
designing fast memory and storage systems.
‘Computer architecture deals with the structure
and behaviour of computer system ss viewed by the
user. Tt encompasses instruction formaats, the Instruction
set architecture (ISA) and addressing modes
Computer organization deats with the operation
‘and interconnection of the various hardware components,
2.2.1 Register Set
‘Tho computer noods rogistos for processing and manip-
ulating data and for holding memory addresses that
are available to the machino-code programmer. Some
registers for a basic computer are given in Table 2.1
‘Table 2] Typos of ett nd th fants
Rogister Register Function’
Symbol__Name
DR Data register Holds memory operand
ACC ——Accurmulator Special purpose
processor regietor
AR Address Holds address for
register memory
(Continued
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
58 CAMPTER2 COMPUTER ORGAMZATON AMD ARCHITECTURE
‘Table 24 | Continua
Rogistor Register Function
Symbol__Name
1 Trstruction Holds an instruction
rogiter ‘hat is to be executed
PC Program Holi address of
countar instruction vo be
sececuted next
TR Temporary Holds temporary data
register if equine
INPR Input revister Holds input character
OUTR Output register Holds output charactor
2.2.2 Quantitative Principles to Design
High-Performance Processor
Amdahi's law focused on performance gain after enhanc.
ing the system. The performance gain is denoted by
Syyaat atid ET stands for execution time,
Performance of thesystem with enhancement
Performanceof thesystem withoutenhancement
Seen =
‘Afier enhancement, the system consists of two portions
‘unenhanced and enhanced portion.
ETTyey = ET of the unenhanced portion
+ ET of enhanced portion
‘To caleulato Tyo the following two factors are nove
1, Practiotyjanee (F): It indicates how much por
tion of the old sytem imdetgoe= enhancement
2, Spoed tance (8)T indicates how many times the
new portion i Funing faster das the eld portion,
4g Patfermanciger! _ YETyP _ ETP
PeeformanceggP ~ YETaF ~ ETargF
EtaF
So, Eye 2
(On the basis of the above factor,
ET yee F = ETya(l =F) + Eye
Substitute the value of ET in Eq. (2.1:
ETF
Eyer = Ela = F)-+ 3
Lot ETy = 1,
1
rae
Soret =
www.engineeringonyourfingertips.ooo
When the system consists of multiple frequent cases,
‘where #is the number of frequent eas
Snes =[t-Za)E8]"
Problem 2.1: Consider « hypothetical procassor weed
im mathematical model simulation, It consists of two
functional units, floating point and integer. The flost-
Ing point is enhanced then it runs two times faster,
but only 10% of the instructions are floating point.
What is the spood up?
‘Solution: Hore
2.3 MACHINE INSTRUCTIONS AND.
ADDRESSING MODES
Machine instruction is an individual machine code. The
complete set ofall machine codes recognized by a parti.
ular processor makes its Instruction Set. Instructions can
to gtonped according to the function they perform. The
nhuraber of ways by which arguments for thase machine
Instructions can be specified constitutes the addressing
modes for a processor,
2.3.1 Machine Instructions.
[An instruction is a command to the microprocessor to
perform @ given task. Most computer instructions are
chasified as follows:
1, Data transfor instructions: Those instructions
‘move data from one place to another in the com-
‘puter without changing the data content. Example:
LOAD, MOVE, IN, OUT, PUSH, STORE.
2. Data manipulation instructions: Theeoinstruc-
tions perform arithmetic, logical and shift opara-
tions on data. Example: ADD, SUB, MUL, DIV,
ING, AND, XOR, OR, SHR, SHL, ROR, ROL,
3. Program control instructions: ‘Thes» instruc
‘tons may change the address Value in program cour-
ter and cause the normal sequential flow to change
‘On the basis of the number of address folds in an inetruc-
tion, they are classified a¢ follows:
1. Three-address instruction: Computer with
thresaddress instruction format can use each
address field to specify (yo sources and a dastinae
ton, which can be either a processor register ot a
‘memory operand. It results in short program but
raquires too many bite to specify three addrassns
Example ADD R,, A,B (Ry = MA) + MB)2. Two-address instruction: Boch addres field can
sponify ether processor register of memory word,
Example: MOV Ry (Ry © MA)
MULE, R, (RoR tR)
8. Oneaddross instruction: Ie used an impliod
sccumulator (AC) register for ll data manipuls-
tion. Tho othor operand is im register ot memory.
Example: LOAD A (AC & Mal);
ADDB (AC@ AC + MB)
4. Zoro-address instruction: A stack organized
computer doo not uso an address fio for the
Inetruction ADD and MUL.
Problem 2.2:
(a) ISA of a processor consists of 64 rogisters, 125,
instructions and 8 bits for immediate moda. In @
siven program, 20% of the instructions take one
input register and have one output register, 30%
have two input registers and one output register,
20% have one immediate input, and one output
register, and remaining have Gro immediate
imput, 1 register input and one output registot
Calculate the number of bite requited for each
instruction type. Assumo that the ISA roquires
that all instructions be @ multiple of 8 bits in
length.
{b) Compare the memory space required with that of
variable longth instruction eo
Solution:
{a) Since there are 125 instructions so wo mood 7 bits
to differentiate them as G4 < 125 < 128. For 64
rogietors, we nosd 6 bits and 8 bits for immediate
mode.
Por Typo 1, 1 rog in, 1 neg out: 7 + 6 4+ 6 =
19 bits ~ 32 bits
For Type 2, 2 reg in, 1 reg out: 7+ 646+ 6=
26 bits ~ 82 bits
For Type 3, 1 imm in, 1 reg out: 7+ 6+ 8 =
21 bits ~ 82 bits
For Type 4, reg in, 2 mm in, 1 tag out: 7+ 6 +
8+ 84625 bits 48 bits
{b) As the largest instruction type requires 48. bit
instructions, the fixed-length Instruction format
uss 48 bite per instruction, Variable length instruc-
tion format uses 0 x $2 +038 x $24 02 x 82+
0.2 48 = 36 = bits on average, that is, 25% lass
space,
www.engineeringonyourfingertips.ooo
223, MACH MSTRUCTIONS AND ADDRESSING MODES 5D
1. provide programming flexibility ¢0 users through
‘ue of pointers to memory, counter for loop control,
data indexing and program relocation,
2. reduce the sizo of the addressing field of the
instruction,
‘Lot us suppose [a] moans contents at location 2 for all the
addressing modes.
2.9.2.1 Types of Addressing Modes
41. Implied modo: In this modo, tho operands ate
implicitly stated in the instruction. Por example,
rogistor reference instructions euch ae CMA (comple.
‘ment accumulator), CLA. (clesr accumulator) and
sgenovaddress instructions that we stack organization.
2. Immediate mode: In thie mode, the operand is
specified in tho instruction itself, that is, address
flold is roplaced by an actual operand. Immediate
‘mode instructions are useful for initializing rogis-
‘ets to a constant valus. For example, used for ini-
‘alizing CPU registers to some constant value such
as MOV Ry, #34.
Instruction with immediate mode
Opsode Data Oparand
3. Rogistor (direct) mode: In this modo, the opor-
ands are in CPU registers. An mbit register field
can specify any one of 2" registers, Example: ADD
R, vill add the contents of an accumulator and
contents of R,, that is, ACC = [ACC] + [Rj]
Instruction with rogister direct mods
Opoode [Register acdrese|
CPU rogistor
Operand
4. Rogistor (Indirect) mode: In this mode, the
instruction format specifies a CPU register which
contains an effective addrass of the operand resid.
ing in momory. This mode ensuras less numbor of
bits to specify a register value than to specify a
‘memory location, Example: ADD @ 2 will add the
contents of an accuraulstor with contents of the
register R,, that is, AC = [ACC] + [(Rj])
Instruction with rogister indirect mode
2.3.2 Addressing Modes
‘The addressing mode specfise how affective address of
an operand is calculated from an instruction. Computers
use various addressing mode techniques to:
Opeode [Register address
Memory.
OPU register
Pointer to |_[Oparand!
operand
www.engineeringonyourfingertips.ooo5. Auto-increment or Auto-decrement mode:
‘This is similar to register indirect mode except the
registor containing effective address is incremented
or decremented after (or before} its value is used to
access memory.
6. Direct address modo: In this modo, the offs.
tive addrass of an operand is equal to the address
part of the instruction. Example ADD A instru
fon adds content of memory cell A to accumular
tor, that i, ACO = [ACC] + MA)
Instruction with diroct address modo
Opcode | Memory address
7. Indirect address mode: In this mode, memory
address specified by address field contains the
address of (pointer to) the operand. Example:
ADD @ A will add the contents of the memory call
A, that is, ACC = [ACC] + MIM[AI).
Instruction with inditect address modo
Opeod | Memory address
Memory
{__,] Pointer to operand 4
‘Operand
‘8. Rolative address mode: In this mode, che effactive
adress of an operand is obtained by adding the cone
tent of a program counter to tho address part of tho
‘struction. ‘The addres part of the instruction can be
‘lther positive or negative reprasentad in 2s comple.
‘ment, The reult obtained after adding the content of
‘the program counter to the address fed produces an
effective address whose position in memory i relative
to the acldrees of the next instruction.
9. Index address mode: In this mode, the effective
address of an operand ie obtainad by adding the
content of an index ragister to the addrees part of
‘the instruction, The indax rogiste isa speclal CPU
register that stores an index value and the address
fiold of the instruction stores tho base address of a
data array in the memory. The distance between
the base address and the ackiness of the operand is
‘th index valuo that ie stoted in the index resistor
‘The index reglster can be incremented to facilitate
access to consecutive operands stored in arrays
using the same instruction.
www.engineeringonyourfingertips.ooo
(60 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
10. Base register addressing mode: In this mode,
the effective address of an operand is obtained by
‘adding the content of a base rogister to tho address
part of the instruction, This is somewhat: similar to
‘the indexed addressing mode except that the tase
rogister stores hace or beginning adress instead of
an index register, It is used for program relocation.
Problem 2.3: A two-word instruction LOAD is stored
at location 800 with its addrass field in the next loca-
tion, ‘Tho addross field hss value 600 and value stored
‘st 600 le 500 and at 500 is 650. Tho words stored
at 900, 901 and 902 are 400, 401 and 402, respoo-
tively. A processor register R contains the number
800 and index register has value 100, Evaluate the
ffective address and operand if addressing mode of
the instruction ie a¢ follows:
1. Direct 4, Immediate
2. Indiroct 1. Register indirect
3, Relative 6. Index
Solution: Memory layout is as follows
00 [BaD
ao. [oon
500
600
‘700
Mode Address:
Direct 009 500
Indirect 300 650
Relative on 402
Immediate 801 600
Register indirect 800 700
Index 700 #0
Problem 2.4: A relative modo branch type instruc-
ton is stored in memory at an address equivalent to
decimal 600 and the branch is made to an address
‘equivalent to decimal 400, What is the value of the
talative address fleld of the Instruction (in decimal)?
Solution: Relative address = 400 ~ 001 = -201
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
24 ARITIMETIC LOCK MIT 61
Table 2:2| Arithmetic irc function table
Select Tnput to, ‘Output of Binary “Micro-Operation
fe Adder ¥ Adder
D=AtV4 On
° 0 0 D=A Transfer A
° ° ° D=A41 Increment A
° 1 B D=A+B Add
o 1 B D=A+B+1 ‘Add with carry
1 ° B D=AsB Subtract with borrow
1 ° 1 B D=AyBy1 Subiract
1 1 ° 1 D=A-1 Decrement
1 1 1 1 D=A ‘Transfor A
2.4 ARITHMETIC LOGIC UNIT
‘Arithmetic logle unit (ALU) is « combinational circuit
that performs all arithmetic and logic operations s0 that
the entire register transfer operation from the source reg
Isters through the ALU and inio the destination register
ccan be performed during one clock pulse period.
24.1 Arithmetic Micro-Operations
‘Th basic arithmetic micro-operations such as addition,
subtraction, increment, decrement and shift ate performed
fon numeric dats stored in ragistos, The basic component
cof arithmetic is parallel binary adder, and by controlling
the input to adder, different micro-operations can be
realized, Figure 21 depicts a 2-bit arithmetic circuit which
includes to full-addor citcuits and two multiplosees for
choosing diferent arithmetic micro-operations. There are
two 2bit input numbers A snd Band 2-bit output. D. The
two inputs from A go directly to X inputs of full aor.
‘The output of multiplaxer goss to input Y of full adder
Figure 2.1] A 2: arithimti cuit
www.engineeringonyourfingertips.ooo
By controlling the output Y of multiplacers with two
selection inputs §, and 5, snd Gi, either 0 or 1, we can
‘generate the eight arithraetic mio-opetations (Table2.2)
2.4.2 Logic Micto-Operations
Logie micro-operations such as AND, OR, Exclusive OR,
fc, consider vach bit of register separately and specify
binary oparatios for stings of bts (Table 2.3}
Table 23 | ‘Typee of micr-operatons
‘Micro-operation Name.
Feo Cleat
PAB AND
Peak
Pea ‘Trataer A
FeaaB
Per ‘Trans B
Feaes Exclusive OR
Peave or
FeavB nor
FeT@B Excusiea NOR
reR Complement B
FeavB
Fea Complement.
Fedve
FeaaR AND
Peas Sot toall 's(62 MPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
Logical micro-operstions ro capable of manipulating
individual bits of « portion of word stored in CPU regis-
ters, Let us consider the data in a register A. In another
register, Bis the operand that will bo used to modify the
contents of A using Iogic micro-operations, Some of the
‘applications are as follows:
1. Selective sot operation: In this, If bit in Bis
set to 1, that same position in A sets to 1, other
wise that bit In A retains its previous value.
11004
1.0.1.0 B¢Tosetsome bits in A)
TT TO Agt4e ase)
2. Soloctive complement operation: If « bit in
Bis set to 1, that same postion in A gots comple:
‘mented from ite orginal Vale, otherwise (remains
‘unchanged,
1100 &
1.0 1 0 B(To complement some bite in A)
OTTO Aide sem)
3. Selective clear operation: If bit in Bis st to
4, that sume position in A sets to 0, otherwice it
remains unchanged.
1100 &
1.0.1 0 B(Tockar some bits in A)
O10 0 Ade A-B)
4. Mask operation: If bitin B i et to 0, that samo
position n A sets to 0, ctheraisoitramains unchanged
1100 4
10.1 0 BéTockar some bits in A)
F000 AgiiA@ AB)
5. Clear operation: Ifthe bits in the same position
in A and B aro the same, they are cleared in A, elso
they are sot in A
1100 4
1010 8B
OTTO Aide AB)
6. Insert operation: Iti usad to insert specific bit
pattern into A ragicter, lawving the other bit post
‘ions unchanged. This is accomplished by co sub-
operations: making operation to clear the desired bit
position, followed by OR operation to introduce the
now bits into the desited positions. Suppose you wanted,
{o introduce 10 into the low order two bits of 4:
M01 A (Original) and 1110 A (Desired)
{ A (Original)
Mack
A (Intermediate)
‘Added bits
A (Desired)
www.engineeringonyourfingertips.ooo
2.5 CPU CONTROL DESIGN
Central processing unit (CPU), or the brain of « com-
‘puter, performs the data processing opetations It consists
of three major parts: register set that stores intermedl-
ate data during instruction execution, ALU performs the
required micte-operations and control unit that super
vices all other elements for the tranefer of information
from one register to the other. ‘The main function of
CPU isto ftch an instruction from the memory and exe-
cute it. CPU ts divided Into three types of organizations:
1, Single accumulator organization: In this, one
‘operand ts implied in the sccumulator, a special
purpoce register, and the other operand isa register
‘or the memory. Example: ADD Ry (R, < AC +
R)), LOAD A (AC © A), STORE (Af 7} < AC).
2, General register organization: In this, the
CPU will have several general purpose tesis-
tots which lead to shorter and efficient programs
because registers ate faster. Example: ADD Ry, Ry
(Ry © Ry + Re). Figure 22 shows bus organiza-
tion for threo registers Rj, Ry and Ry. The output
of these registers and one from the external input
fs connected to two multiploxere A and B. The two
External
‘rat
1
Bend stp
Pleure 22 | Goel rege orealaton
soloct lines SELA and SELB from multiploxors A
and B select one of the input and feed to ALV.
‘OPR specifies one of the possible operation codes
that ALU will perform on the data inputs and the
‘output is transferred either to one of the registers
using 2 x 4 decoder or to the extemal output say
‘metory. The control word (Fig. 2.8) for the two
‘oparand instruction is as follows:
2bite 2bits
bite 4 bite
Figure 23] A conta wd
www.engineeringonyourfingertips. ooo8. Stack organization: Stack may consist of number
of registers ot « part of main memory in which dat
items are stored in coneacutive locations that are
ssccassed by IFO (last in, frst out) mochsnism. As
‘here sIenited nuabar of registers «part of menncry
|e implemented ae stack for storage and retrieval of
www.engineeringonyourfingertips.ooo
25 Pu CONTHOL DESK 6B
repeated continuously for s complete program and is
mown as the feteh—execute cycle (Pig. 24). The fol
lowing steps are performed for executing, an instruction
intormadiato data. Stack pointer (SP) koops track
of the top item of stack. The procos of inserting Toad PC contents
anew item onto a stack is known as push accom to MAR
ished by fet incrementing stack pointer and thon T
Inserting am iter fom the data rele haeeEoe
sPesP +1 point to next
Hae instruction
‘Tho process of removing an itom from the top of Tana
stack is known as pop performed by frst transfr- dodo MAR o Te
ting data into DB and then dacrementing SP. Tc MBAR
DR © MSP} 1
SPeSP-1 Devoe the
instruction
Problem 2.5: A cystom has CPU organized in tho
form of general reglster organization consisting of 16 Toad any data
registers, each storing 82-bi data. Assume the ALY oquited into MDR
has 85 operations
(s) How many multiplexers are therein A bus and B
bus, and what is the sizeof each multiplexer?
{b) How many selection inputs are needed for MUX A
and MUX B?
{¢) How many inputs and outputs aro therein a dasodar?
{d) How many inputs and outputs are there in. ALU
for data, including input and output cartes?
{0} Formulate a control word for the system,
‘Solution:
{a) 82 Multiplexers, each of size 16 x 1
{b)4 Inputs each, to eoloct ono of 16 rogistors.
{fe} 4 to 16 ~ Line decoder
{d)32 + 92-41 = 65 data input lines
(e) $2 +1 = 98 data output lines
Abits dbits 4 bits
SELA | SELB [ SELD
Obits
OPR
‘Sot PO to value
from Jump insot
Biscute the
instruction
Tarvice the
Interrupt
2.5.1 Instruction Execution
A CPU generally executes one instruction at time
sequentially and 9 soquonce of such instructions is
known ae a program. The CPU exocutas the instructions
that reside in the main memory. In order to exscute
fan instruction, the OPU has to fetch the instruction
first from che main memory into one of Its registers,
Ii then decodes the inctruction, that ls, it decides what
the instruction intended to do, etch operands required
and finally exeeutes the Instruction, This process is
Figure 24 Insretion le
1. Fetching the instruction: The next instruction
is fetchad from the memory address that is saved in
the program counter, and metaory content fatchedd
is stored in instruction register (IR). ‘The program
counter then points to the naxt instruction that
will be read in the next eycle
2. Decode the instruction: During this cycle, the
Serco tc: Te roa Wy She
WWW.engineer ingonyourfingertips.ooowww.engineeringonyourfingertips.ooo
(64 oarTeR> COMPUTER ORGAMZATION AND ARCHITECTURE
3. Oporand fetch: In caso of a direct or indirect
memory instruction, the exacution begins in the
next clock cycle. If the instruction has an inditect
adds, tho effective address of the operand is read
from the main memory, and the required data is
fotched from the memory into momory data regis.
tors. If the instruction has direct address, nothing
is done at this clock eyele
4. Execute the Instruction: The control unit of
the CPU paesee the instruction decoded by decoder
fs & sequence of control signals to the different
functional units of the CPU to execute the tacks
required by tho instruction such as reading, valuas
from registers or input devices, performing mathe.
matical or logic micro-operations by ALU, and writ-
ing the result back to a register ot main memory.
2.5.2 CPU Data Path
CPU contains data paths that are reeponsible for routing
data betwoen the functional units of a computer. The
following are the different data path structures available
for routing:
4, Single bus structure: In thie architacturo, al CPU
registers are connected to the same bus, Data can be
‘ansforred either betwoon CPU registers or botoen
CPU register and ALU at a given clock pulse The
‘sped of operation is slow as only one oparand ean bo
‘transferred in one clock eyele and addition operation
(Bi, © Pa + R,) occurs in three clock eycles,
2. Two bus structure: All general purpose CPU
rogistors aro connected to both buses say bus A and
bus Bs but special purpose registers are divided into
‘wo groups, say group 1 connecting bus A to pro
gram counter and one imput of ALA and group 2
connecting bus B to MDR (Memory Data Register)
‘and other input of ALU, ‘The two operands are
‘transferred (© ALU in 1 clock cycle and the addition
operation (R, @ Ry + Ry) occurs in 2 clock cycles,
3. Three bus structure: The performaance can be
farthor be improved by using throo busas such that
‘addition operation (2, = R, + R,) can occur in
one clock evele,
2.5.3 Control Unit Design
Control unit is considered as brain of a CPU that con
trols various units in the data path. The performance of
control unit is important as it determines the clock cycle
‘of the processor. Control unit can be designed either by
hardwired or by microprogram,
1. Hardwired control: Control unit is made up
of sequential and combinatorial circuits to ganer-
te the control signals and interpret instructions
(Fig. 2.5). The instruction decoder decodes the
Instruction loaded in instruction regieter. Thestep
www.engineeringonyourfingertips.ooo
decoder generates separate control line for each
step in the control sequence. The encoder gots
its input signal from the decoder, step decoder,
‘external Input and condition codes and generates
individual control signals, Tt is faster and more
ffclent but lose flauble and is dificult to add
new feature or correct mistakes in original design.
Glock —o[ Control stop [Reset_
counter
a
External
inputs
ra)
codes
Bnd
Figure 2.5| Blok dingra of hrdvied control unit
2. Micro-programmed control: Control signals
are generated by using programming known as
‘micto-programs that constitutes micro-instructions
(control word) (Fig. 26). Memory that is part
Extornal
MEET semecee EF
nt as a
an eo ea
<— Clock
Control adres
ealater
Address
Read [ Gontrat
Contral Control
signals signals to
‘within CPU system bus
Plgure 2.6] Blok diner of micro programmad
contaof OPU is known ae control memory: and stores
tmicro-instructions, ‘The micto-program sequencet
generates the address of micro-inetraction accord.
ing to instruction stored in instruction register
The address of micro-instruction to be executed
{s available in content addressable ropster. Micro
program sequencer issues read command to read
rmlero-insteuction ftom control memory into mcr
Instruction register which on execution generates
control signals for vatious parts of a processor. This
control unit design is more flexible to accommodate
now features and less errot prone but quite slower
‘han the hardwited unit.
‘The format of the comteol word ie
Contra
signal
Branch] Flas
condition
‘Conrad memiey
address
(On the basi of the type of control word supported, it ie
divided into two types:
1, Horizontal micro-programmed control unit:
In this design, the control signals are represented in
the form of 1 bit per control signal and it supports
longer control word,
2. Vertical micro-programmed control unit: In
this design, tho control signal is represented by using
encoding format,
Problem 2.6: Consbler « control unit which has 1024
control word memory; it supports 48 conteol signals
and $ flag conditions. What is the size of the control
‘word in bits and control memory in bytes?
Solution:
(a) Using horizontal programmed control unit
Obits bits 48 bits 10 bits
Branch | Flag | Control] Control
condition signal | memory
Size of control word = 61 bits
Control memory = (1024 x 61)/8 = 128 x 61 bytes
(b) Using vertical programmed control unit
Obits Shits 48 bite 10 bite
Branch | Flag | Control ] Control
condition signal | memory
Tog 48 ~ 6 hits
Size of control word = 19 bite
Conteol memory = (1024 > 19)/8 = 128 x 19 bytes
2.5.4 RISC versus CISC Processors
‘The differences between reduced and complax instruc
tion sot computers is given in Table 2.4
www.engineeringonyourfingertips.ooo
26 VOINTERFACE (INTERRUPT AMD OMA MODE) 65
‘Table 24 | RISC versus CISC
RISC (Reduced ‘CISC (Complex
Instruction Sot. osu bee
Computers) Computers)
‘Less number of registers
Supports mote number of
‘Rich register sot
Supports leas address
modes addressing modes
Supports fied length Supports variable length
instruction instruction
‘Succestful pipeline with Unsuccessful pipeline
‘ono instruction per cycle
Example: ARM, Example: Pentium
Motorola procensore
2.6 1/0 INTERFACE (INTERRUPT
AND DMA MODE)
1/0 interface bridges the diflerences between CPU and
peripheral devices and provides a method for transfer
ting Information betwaen internal storage and external
1/0 devices. Thoro are the following thrae modes af 1/0.
sransfer:
1, Programmed I/O: The 1/0 device dose not
Ihave direct access to memory. It requites execution
of several instructions by the CPU and the CPU
1has to wait for the 1/0 davice to be ready for ither
reception or tranmlssion of data.
2. Intorrupt initiated 1/O: In this, instead of
‘waiting, the control is transferred from a currently
running program to another service program a6
result of an extemal/intomal gonorated request.
+ Hardware interrupts: These interrupts are
present in the hardware pins.
+ Software interrupts: These ate the instruc.
tone wied in the program whinever the required
funetionslity is nowded.
+ Maskable interrupts: These interrupts may
bbe enabled or disabled explicitly.
‘+ Non-maskable interrupts: Those interrupts
are always thete in the enable state. We cannot
disable them by explicit concitions (lags).
+ Vectored interrupts: These interrupts are
associated with th static vector addres.
+ Non-vectored interrupts: Thove interrupts
are sseociated with dynamic vector address,
+ Extornal interrupts: ‘Those interrupts aro
generated by external devices such as 1/0.
‘+ Internal interrupts: These devices ate genot-
‘ated by the internal components of the processor
‘such as temperature sensor, power failure, error
instruction, ot:
www.engineeringonyourfingertips.ooo(66 GPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
+ Synchronous interrupts: ‘Theee interrupts
fre controlled by the fixed time interval. All
the interval interrupts are called as synchronous
interrupt.
+ Asynchronous interrupts: These interrupts
‘re Initiated based on tho foodback of provi
cous instructions. Al the external interrupts are
called as asychtonous interrupt.
8. Direct memory access (DMA): It is one of
several methods for coordinating the data transfers
between an 1/0 device and the core processing unit
or memory in a computer. It refers to transfor of
data directly between a fast storage device and
memory bypassing CPU because of its limited
spood. DMA. provides a significant improvement
im terms of latency and throughput as it allows
the 1/0 device to access the memory directly,
without using the processor. There are certain
advantages of using DMA for data transfor:
+ DMA saves processor's MIPS as the cote can
operate in parallel
+ DMA saves power because it requires lees cite
cuitry than the processor to transfer dato,
+ DMA has no modulo block sizo restrictions.
Direct memory accass (DMA) controller takes over
tho control of busos to manage the transfer directly
between the 1/0 device and memory. Bus request
{BR} and Bus grant (BG) signals are used by the
DMA controller to request the CPU to relinguish
control of the buses and get the control of system
buses (Fig, 27), The DMA oontoller consists of
8 diffrent rogistore: an. addreze register, a control
register and a word counter register. To transfor
block of data between an 1/0 device and memory,
the controller storas intial values in the address
register, The DMA channel then trancfers the
block of information from oF to memory according
to the control register. The starting address of the
Write <1WR
Bus. lop
request <“—]
Bur _,f
ott BG
Interrupt <— Interrupt S|
devi
Pigure 2] Block digrmof the DMA contol
www.engineeringonyourfingertips.ooo
block in memory ie given by the address register,
‘and the length of the bytes to transfer is given by
‘the word count register. The controller decrements
‘8 word counter each time it moves-a data byte.
‘There are soversl modes of operation of DMA:
+ Burst or block transfer mode: In this mode,
the entire block of data is transferred once the
DMA controller is granted access to the system
bus ty the CPU. The bytes of data in the block are
transferred before releasing control of the system
bbuses back to the CPU. The only disadvantage of
this mode is that it renders the CPU inactive for
some long periods of ime:
+ Cyclo stealing mode: In this mode, the DMA con-
troller obtains accees to the eystam busas like burst
‘mode; but after one byto of data transfer, the control
of the system bus is releaeed back to the OPU via
BG. It is then continually requested again via BR,
transferring one byte of data per request, until the
‘entire block of data has boon transferred. This mode
Bs suitable for tho systems in which tho CPU cannot
‘be disabled for the considerable length of time as in
burst transfor mode such as for controllers moni-
toring tho data in real time, Tho advantago is that
CPU ie not idled for ae long ae in burst moda, but
the data block isnot transferred as quickly.
‘+ ‘Transparent mode: It isthe slowest yet more ef
ficient data transfer mode in terms of overall systera
performance. The DMA controllr transfers data only
‘when the CPU is busy in performing operations that
do not uso the eystam buss. So, the CPU naver stops
‘exeouting its programs but the bigswst disadvantage
is complex hardware circuitry that noxds to dater-
‘mine when the CPU is not using the system buses.
A DMA toad transfers data from the memory +9
the 1/0 deviea, while DMA write transfers data
from an 1/0 device to memory. ‘The functional
behaviour of a DMA transfor outlined in Fig. 2.8
+ The CPU transmits the following information to
DMA controller:
(9) beginning addrese in memory which i stored in
address register in DMA controller.
(b) Number of words to transfer which is stored in
‘word count register in DMA Controller.
(6) direction (memory-to-1/0 device or 1/0 device
tomemoty), port ID, DMA mode of transfer
snd ond of block tranefer either through intor-
rupt request oF no interrupt request which is
stored in control register as command word.
+ ‘The processor them relinquishes control of address,
data and control buses to DMA Controller and
returns to other processing activities while the
DMA controlar starts the data transfor botweon
1/0 device and memory.
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
27 mSTRICTON PwELAING 67
Invorrupe
Random Access
ae DE Memory (RAM)
“BR RD WR Address Dsta| RD WR Address Date
T Read contol ]
* Write conrad
Addreee
seloct
4
RD WR Address Data
Ds DAA acknowledge
RS Direct Memory
‘Access (DMA)
BR “controller
hae
Tsonrupt
Figure 2.8| DMA controtbr intercommscton with memory, CPU and 1/0 devi,
+ When the DMA controller accesses memory, it
synchronizes this memory request with an idle
period of the processor, thus disabling the pro-
cessor, of requesting « halt of the processor, and
awaits an acknowledgement,
+ After the completion of the block transfer, the DMA
Controller either raises an interrupt request if the
Interrupts are enabled ot indicates the completion
im i status register and the processor recognizes
1/0 completion (either by interrupt signal or by
roading the status rogister) and gots ite system
Tbusee back and normal procassing starts. ‘The
ddavice has to initiate a new data transfer through,
DMA request signal which is again acknowledged
by CPU through DMA acknowledge signal via
DMA controller
2.7 INSTRUCTION PIPELINING
In carly computers, cach instruction complotaly finished
bofore the execution of the naxt one began. The hard-
ware circuits needed to perform different operations
of an instruction eyele are different and most part of
these prooassor circuits are idle at a given moment of
time. Thase processor circuits wait for the other parts
of the processor to complete its part of execution first.
Instruction pipalining is a technique for overlapping the
execution of several instructions to reduce the overall
sxecution timo of a sot of instructions and there is no
‘need to walt of the mest part of the prooasior circuit for
the other parts of the processor to complete their part
cof execution. Pipeline speed is limited by the slowest
Pipeline stage.
‘Throughput of «processor it the rato at which opara-
tions get executed. Latency i the amount of time that &
single operation takes to execute, In an unpipelined com-
puter, throughput = 1/latency, as each operation exo
utes by itself and for pipelined computer, throughput
> L/latency, since execution of instruction is overlapped.
Consider a segment pipeline with a clock cycle time
,, wed to execute n tasks (Fig. 2.9). An equivalent non-
pipelined syetom takes 7, time to compote each task.
‘The spend up of a piped system aver a non pipelined
system is given by the following rlaticn:
axt,
Gen,
‘Theoretically, maximum specd up that a pipelined
systom can achiave is given by the following equation:
KT
5-2.
Pipelining Hazards: ‘Those haeards reduce the
ideal spel up gainod by pipelining by preventing the
next instruction in the sequence from being executing
luting its dosignoted clock pub. Hazards forcas the
pipeline to be stallad. There are three types of hazards:
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
(68 APTER> COMPUTER ORGAMZATION AND ARCHITECTURE
eich an Decode Fetch an Bxocute |__| write
instruction instruction operand {instruction [result ack
Ono clock cycle
‘)
Write
Foichan |! |! Ly) | Jretch an]! |) execute |! |)
instruction Laem operand instruction beg
Pipeline latch / overbead/delay
‘One clock ‘One clock One clock One clock ‘One clock
eyele eyele cycle cycle oyele
Pipeline Stages
()
Poe rjafela}s|elz|s
Segment bFatch | hh | ty | ts | Ie
Decode Alb | |
ete epatand Al ala] y
Bxecute Alh|b|%
Write back: 4|e]a| 4
©
Figure 2.0 | (a) Unpipelined process. (b) Pipelined five-stage processor.
(©) Timing digram of fvestage instruction pipsline.
1, Structural hazards: These roeult fom resource
conflicts when the hardware cannot support
Instructions that nead simultaneous execution in
pipoling
2. Data hazards: They arise when an instruction
‘depends on the reeult of « previous instruction and.
that result isnot yet calculated.
‘There sre thtoe situations in which data hazards
+ Read after write (RAW), a true dependency
+ Write after read (WAR), an anti dependency
+ Write after write (WAW), an output dependency
Consider two instructions inet1 and inst?
‘ccurring, with insti occurring before inet? in
the program order,
+ Road after write (RAW): A read after write
(RAW) data hazard is @ situation in which an
www.engineeringonyourfingertips.ooo
Instruction refers to a result which is yet not
been calculated, that is, in this inst2 ties to
read a source before insti waites to it. This
situation arises if the tead operation by instruc
ton takes place before write done by other in-
struction. For example,
insti: RI <-R1 + RZ
Anst2: R4 12845 67 8 0 10 11 12 18 1 1B 16 17
Fach Zh ib Loh
Decode 1 == =m h = if
Operannd Fetch i PhERR | Ha ee F
Beecute h- - bh-- - bh kh - 4k
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
TO GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
Problem 2.6: Acsume a simple &-vtage pipeline (IF, ID, E, DP, W) each stage takes « single cycle, Assuring there
are no cache misses. How many cycles would the following code tako to axscute if there is no spacial hardware to
improve performance in the presence of hazards?
MOV adx,ocx+100)
MOV eb [orx}104)
ADD edx.obx
MOV [ocx+108)ecbe
MOV saxocx-109]
ADD obx.cax
‘Solution: The above code takes 14 cycles to execute, as shown below
a23 4 5 6 7 8 8 1 i a2 18 1
oD DF OR W
iD DFE W
IF oD DF sll BOW
FID stall DF stall W
1 sul DP sul stall EW
Ir 1D _stall_DF_ stall_stall_stall EW
Problem 2.9; In the tslow figure, calculate the total execution time aftr which the rvul f the fourth task eter-
ing the pipe abovo roady?
—{p D x of MEM we} >
oe t= we me om
‘Solution:
sos DS o© © © © © © ©
Intl IF ID EX EX MEM MEM WB
Inst ir 1D EX EX MEM MEM WB.
Inst rr }D EX EX MEM MEM WB.
Inst Fr oD EX EX MEM MEM WB
‘Thorofore, tho total execution time is 65 ns
Problem 2.10: What is the mesn overhead of a pipe: Problem 2.12: Calculate the time required to perforra
lino with 8 stages and an execution time per stage of 1000 operations in a Gstaged pipeline with an execu-
Qns? tion time of $ ns por stage?
Sohtion: The mean ovethoad = (Stages — 1) x aati
tine 7, =(k-14n)xT =
Execution time por stago = (8 — 1) x
TK 14 1000} x2 = 8.018 ws
Problem 2.11: How many stages has a pipeline that | | Problem 2.13 Calculste the mean overhead ofa pipaline
schiever& spond of 9 for 100 operations? with 7 stage and an execution timo pr stage of 2 nel
‘Solution: ‘Solution: Mean overhead of pipeline =
nxk _ nx (xT, -,)
Speed = 3 99 = nt Oe text = (7-1) x2 = 19 5
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
27 msTRICTON PwELAING TH
Problem 2.14; Consider a pipeline with 5 stages: IF, ID, EX, M and W. Assume that each stage requires one clock
gel, Show how tho following program sogment for adding 2 arrays procaeed and compare the clock cycle
‘needed in non-pipelined system with pipelined system when result of the branch instruction Le, content of is avail-
able stor WE stags
LOAD Ra #400
Li: LOAD Rt, 0 (RA);
LOAD Re, 400 (RA);
ADD Ri, RI, RQ
STORB RS, 0 (Ra)
SUB RG, RA, #4;
BNEZ R4, Li;
Solution: Number of cycles = [Initial instruction ++ (Number of instructions in the loop Li) x Number of loop
yeh] % Number of clock eyclas/instruction (CPT)
= [b+ (6) » 400/4) x 5 = 3005
‘Timing diagram for one loop iteration in a pipelined system is as follows:
LOAD RA #400 IF ID EX
LOAD Ri, 0 (Ra) © D
LOAD Ra, 400 (Ra) IF
ADD Rs, R1, R2
STORB RS, 0 (R4)
SUB R4R4, #4
BNEZ R4, Lt
weg
[Number of cycles in the loop = 15
\Nurnber of clock cycles for segment execution on pipelined processor
= 14 (Number of elck cycles in the loop LI) x Numer of loop cycles
= 1+ 19 x 400/4 = 1901
Number of Clock cycles for the program execution on non-pipelined prooassor
‘Nurnber of Clock eycles for the sogment execution on pipelined prooassor
Speedup =
008
= $005 = 9 times
Problem 2.15: Consider a S-stage pipeline with stages: For all following questions we aseume tha
contains stages: IF (instruction Fetch), IS (sue), FO (Fetch operand), (Execute) and W (Write).
xcept B requires one clock cyclo and system has 4 Functional Units for floating point operations, FP load/store,
FP addition/sublraction, FP multiplication and FP division, (c) Execution stage for Load Store operations raquies
1 clock cycle, for ADD ot SUB operations raquirs 1 clock evel, for MUL operation requires $ clock cycles and for
DIV operation requires 4 clock cycle. All memory references hit in cache. Pipeline has forwarding circuitry forall
Us, excopt FP-Lood/Store where operand is tay after Westago,
www.engineeringonyourfingertips.ooo72 GaPTeR> COMPUTER ORGAMZATION AND ARCHITECTURE
‘Timing diagram of is presented below:
www.engineeringonyourfingertips.ooo
22 8 4 5 6 7 8 9 1 1 2 18 4 1 16
TOAD P2R5) IF IS FOE W
LOAD F2, 28(R5) sD FOE Ww
MUL FO, F2, Fa 1S sal wll FOE B B OW
SUB FS, F6, FS wos FOB W
DIV F10, PO, Fé IF 1S tall stall stall tal FOE B EB B OW
ADD F6, F8, F2 rows POR Ww
STORE FS, 048) ros rok W
‘Kientify the basards in the following instructions from the following list (Structural, Data, Control, RAW, WAR,
WAW, None)
1, MULT FO, F2, Pa and STORE FS, 50(R5)
2. DIV F10, FO, F6 and ADD F6, FS, F2
8. MULT Fo, F2, F4 and DIV F10, FO, Fé
4. DIV F10, FO, F6 and ADD F6,
‘Solution: 1. Structural; 2, Data; 9. RAW; 4. WAR.
2.8 MEMORY HIERARCHY
“The storage media can be categorized in hierarchy accord
ing to their speed and cost (Fig. 210). As we move down
the hierarchy, access time incroasce and cost por bit
doeteasas
Increasing
ost and
speed
Decreasing
size
Figure 2.10] Memory hierarchy.
Memory
It isthe central storage unit that directly communicates
with the CPU. Tt is designed using semiconductor-
Intograted circuits and needs constant power supply to
maintain the information, Ts expensive as compsted to
uriliary storageso it has limited capacity. Examaple: R/W
(read/write) memory or RAM (random accass memory}
2.8.1 Mai
and ROM (raad only memory). Integrated RAM chips
are available in two modor:
41, Static RAM: It stores the binary information in
flip flops and information remains valid until power
‘is supplied. Tt has faster access time and is used in
‘implementing cache memory.
2. Dynamic RAME It store tho binary information
sa charge on the capacitor. It requltes refreshing
circuitry to maintain the charge on the capacitors
‘after few milliseconds. It contains more memory
calls per unit area as compared to SRAM,
2.8.1.1 Memory Interfacing
If tho required memory for the computer is larger
than the capacity of one chip, it is necessary 10
connect multiple RAM and ROM chips to a CPU
through the data and address buses (Fig. 2.11). The
low-order addrots bus lines select the word within a
chip and other lines select a particular chip through
its chip select inputs. Assume a computer system
needs 256 bytes of RAM and 512 bytes of ROM. The
configuration of RAM chip is 128 x § and ROM chip
is 512 x 8, The RAM and ROM chips required are
as follows:
Number of RAM chips = 256/128 = 2
Number of ROM chips = 612/51
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
28 mewony WERARCHY 73
‘The memory interconnection is depicted in the following diagram:
Chip select 1 hip select 2
hip select 2 | |» Data busin
Read. Bae Chip select 2088128 | oipt mode only
Write ate ROM chip.
agree ‘Address bus JAD,
@)
Figure 2.11 | (6) RAM chip, &) ROM chip
Problem 2.16: A computer employs RAM chips of 256 x 8 and ROM chips of 1024 s 16, The computer system nosds
2K bytes of RAM and 4K bytes of ROM and four interface units each with four registers. Draw a memory address
‘map for the system and give the address range in hexadecimal for RAM and ROM chips.
Solution: RAM 2018/256 ~ § chips; 2018 — 211; 256 = 28
16% 419 1 10 9
ROM 4096/1024 = 4 chips; 4096 = 212; 1024 = 210
Interface 4 4 = 16 rogers; 16 = 24
_Component Address
RAM OOF 9 9 9
ROM SO0HFFF ig yg
Inteface 5000.800F Loo
oo e
oo x
ooo
2.8.2 Secondary Memory
Secondary memory, also known as susiliary memory ot
fexternal metnory, can store « large amount of data at
Taaser cost por byte than the main meanory. Thay are
‘non-volatile in nature, that is, dats is not lost when the
device is powered off. The most common auxiliary stor
‘age devices wed in consumer eystems are lash memory,
‘optical disks and magnotic disks.
1. Flash memory: Flsch metory is an electronic
non-volatile fastest computer storage device that
can be elactrically erased and reprogrammed,
Example: lash drives and solid state drive.
2. Optical disk: Optical disks are low-cost mass
storage devices from which read and write oparae
‘tions are performed using laser technology. Optical
disks can store huge amounts of data up vo 6 GB.
6 billion bytes). Difforent types of optical disks
are CD-ROM (compact disk read-only), WORM.
(orite-once read-many), EO (erasable optical
disks) and DVD.
3. Magnetic disk: A magnetic disk is composed
of a circular platter made of metal or plastic and
www.engineeringonyourfingertips.ooo
coated with magnetized material on both sides.
‘Multiple dicks are stacked over one ancther on the
spindle with read/write heads on each surface. Bits
fare stored as spots on magnetized surface along
concentric citelss called tracks. Tracks ate further
divided into wedge-shaped sectors
4. Magnetic tapes: It consists of tape made up of
plastic covered with magnetic oxide coating. Tapes
‘are mounted on rook, Bits are recorded as magnetic
spots on tape along several tracks. R/W heads are
‘mounted in each track so that data can be recorded
‘and read as a sequence of characters. Seven or nine
bits are recorded to form a character together with
‘parity bit. Data is recorded in contiguous blocks
‘separated by inter-record gaps,
Cache Memory
It iz a special memory that compensates the spasd
mismatch between processor and main memory access
time. It temporarily stores frequently used instructions
fand data for faster procassing by the CPU. Cacho hit
ratio is calculated to measure Ke performance, If a data74 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
item requested by the CPU is found in cache it is called
hit otherwice it is miss. Hit ratio is defined a ratio
‘of number of hits divided by total CPU references to
memory.
Number of hits
Number of hits + Number of misses
Average access time =
Hit ratio (h) =
it ratio T,
+ (L—Hit ratio, +7,,)
where 77 is-cache access time and Ty, is the main memory
cease time.
2.8.3.1 Blements of Cache Design
‘The various elements of cache dasign are as follows:
1, Cache size: It should be optimum, small enough
to keep average cost per bit close to the main
‘memory and large enough to keop overall average
access time cleee to the cache memory.
2. Mapping function: It dascrites the mapping of
main memory block to cacho block. Thero are throo
different mapping techniques: fully asoclative, direct
tapped and set associative cache organization.
8. Replacement algorithm: When new memory
block is required in cache, one of the existing blocks
must be replaced by a new block. Example: FIFO
(frst in, frst out), LRU (least recently used).
4. Write policy: Cache memory follows write.
through and writeback updating policies. In
‘writethrough policy, cache controller copies data
Immediately to main memory as data is written in
cache, The data in main memory is always valid,
Dut this approach reduces system performance. In
write back, update to memory block is delayest until
the updated cache block is replaced by a new block,
2.8.4 Cache Mapping Techniques
‘The cache memory cam store & reasonable number of
blocks, but this number is always stall 2s compared 10
blocks in tho main memory to koep average cost por bit
low. The corraspenclonco between memory blocks and
ccacho block is specified by tho following mapping toch-
‘niques. Consider a cache memery consisting of 2K words
‘with 128 blocks of 16 words each. Number of bite requited
to addross a acho block is 11 bits. Main memory has 64K
words and bits required to address is 16 (Fig. 2.12).
‘Main memory
AK x8
Figure 212| Cache maping example
www.engineeringonyourfingertips.ooo
2.8.4.1 Direct Mapping
Im this technique, each block from the main memory has
only one possible location in the cache memory. In this
example, say a block from main memory maps onto «
block (é mod 128) of the cache, I there are 2" worde in the
ache memory and 2" words in the main memory, then
nnvbit main memory address is divided into two figkds: n
bits for index flld to access the cache and (mn) bits
{or the tag fiskd, Each word in cache consist of the data
and the assoriated tag, Whenever a new block is brought
Into cache, tag is stored along with data bits, Index fekd
is further divided into Bleek and word if there are mul
tiple words (say &) in a block. The lower kits select ono
of the k words In a block known a8 word field. The block
field is used to distinguish a block from other blocks.
Tag (mn) bite Index (n bits)
‘Tag (m— n) bits | Block (n — B) bits
‘Whon CPU gonorates memory request, tho block field
points t0 a particular block location in the cache. The
high-order tag field is compared with tag bits associated
with that cacko location. If they match, then tho desired
‘word is in that block of cache. If there is no tastch, then
the block containing the required word must be loaded
10 cache first (Fig. 2.13)
Word (& bits)
Main mamory addres Main memory
s[zla Block 0
‘Tag Block Word Block 1
4 +
Coche memory |_| Bleak TF
Block (the) 11 Bloat 128:
+ it +
4} -{ Bina
Block 127 7 Paameee
i»
‘| Boake aoa
Figure 213] Diset mapped eas
ganization,
‘The demerit of ditect mapping is that hit ratio drops
considerably if two or more words having same index atid
diferent tags are accassad consocutivly one after the other:
2.8.4.2 Pully Associative Mapping
In this tochniquo, « main memory block can be placed
‘nto any cache block location. Its the most flexible cache
orgsnization. The main memory address is divided into
www.engineeringonyourfingertips.oootwo flelds: word and tag. The assoctative memory stores
both tho addrass (tag) and data of the main memory.
Figure 2.14 shows the mapping of different blocks mto
cache. High-ordar 12 bits of CPU address is placed in the
argument register of the associative memory and come
pated 10 tag bits of each block ef the cache to soe if the
ddosired block is present. Onos the desirad block i pres.
sent, 4bit word is used to extract necessary word from
the cache.
‘Min momnory address
www.engineeringonyourfingertips.ooo
28 mewomy WERARCHY 75
contain the desired block. The highorder tag fled is
then compared associatively to the tags corresponding
to the matehad set, [fa match cccurs, the corresponding
word is read from cache eke main memory is referred
land block containing that word is brought into cache for
futuro reference (Fig. 215).
Tag Set Word
6 [ela
‘Main memory address Ps
fain memory
HE
neem
seofls
Fig o| Data} tac 2| Dat
wfad
Main memory
Tae Word rae
Cache memory Block 1
Tae 9 TDatel
12 bits
(23) Block}
Block 4005)
Figure 2.14| Associstive mapped cache organisation,
Tt te necessary to compare high-order bits of main
‘memory with all tag bits corresponding to each block to
find whothor a givon block is present in cacho, s0 itis
the most expensive.
2.8.4.9 Set-Associative Mapping
As fully associative mapping is an expensive solution and
direct mapping does not allow words with same index
bat differont tag to exist im cache, set associative map-
ping if @ combination of both, [tis an improvernent over
direct mapping where contention problem is solved by
having ewveral choices for block placement. The figure
bolow shovws two-way set arsociativo cacho bocause each
block of main memory has two choices for block place
‘ment in cache. A block i in the main memory can be
im any block belonging to sot ¢ mod $ of cache, where
‘Sis the number of sets. The block 0, 64, 12
con of main memory can map into any of the
in sat 0.
‘Tho main memory address is divided into throo fields:
low-order bits for word fiekl, set fell to determine the
desired block from all possible sete and high-order bits
for the tag fild. Each word in cache consists of data and
the assoclated tag.
Tag ‘Sa
[Want
‘When the CPU generates a memory roquast, the sat
fiold points to a particular set of tho cacho which might
I
sot 63 Tas| Date|Tox i BF Ine
as
Figure 2:15] Sebamiciative mappd cache againtion
Problem 2.17: Consider a memory hierarchy
system containing a cache, a main memory and a
virtual memory. Assuming, cache access time of 5
ns, amd 80% hit ratio . The access time of the main
momory is 100 ns, and it has a 99.5% hit rate.
The access time of the virtual memory is 10 ms.
Galeulate the average access time of the memory
hierarchy.
Solution: As we know, the hit rate of virtual memory
Js 100% the average access time for requests that
reach the main memory as (10 ns x 0:09) + (10 ns
0.008) — 50,090.5 na Given this, the average access
‘timo for requests that reach the cache i (5 ns 0.80) +
(50,090.5 ns x 0.20) = 10,024 ns.
Problem 2.18: A computer uses RAM chips of 1024 x
1 capacity.
(a) How many chips are needad to provide a memory
capacity of 16K bytes?
(b}How many of these lines will be common to all
chips?
Solution:
(a) Chips are needed to provide s memory espacity of
16K bytes = 16 x8 = 128 chips
(b) Using 14 address lines (16K = 2!) we have 10
lines specifying the chip address which is common
10 all chips.
www.engineeringonyourfingertips.ooo76 APTER> COMPUTER ORGAMZATION AND ARCHITECTURE
www.engineeringonyourfingertips.ooo
Problem 2.19: Consider a 2-way sot associative cache
consisting of 256 blocks of € worde each, and assume
‘hat the main memory is addressable by 16-bi¢ address
fand it consists of 4K blocks, Calculate the number
of bite in each of the TAG, BLOCK/SET and word
fields for diflerent mapping techniques?
Solution: For direct mapping, word field is of &
Dits to identify 8 different words in a block (2* = 8).
‘As cache mamory conslete of 256 blocks co (2° ~ 286)
8 bits aro roquirad to adios a block bocauso thero is
one-to-one correspondence of block k in main memory
to block (ke mod 256) in cache memory. The remain
ing 5 (16~ 8 ~ 8) high-order addrass bits ato tag bits.
‘Thus, the main memory address for dtect mapping is
divided as follows:
Block
hie
Word
a bite
‘Ta
Shite
Por fully associative mapping, number of word
Problem 2.20: The access time of a cache memory
fe 200 ne and that of main memory te 2000 ns. It is
‘estimated that 70% of the memory requests are for
read and remaining 20% for write. ‘The hit ratio for
read accesses only Is 0.9, A writethrough procedure
is used
(a) What is the average accass time of the system
considering only memory road cycles?
(b) What is the average access time of the system for
both read and write requests?
(©) What is the hit ratio taking
write eyclasalso?
to consideration the
‘Solution:
(8) Average acoaes time = 0.9 200 + O41 x 2200 =
180 + 220 = 400 ns
(b) Average aooms time = 0.8 2000 + 0.7 » 400 =
{600 + 280 = 880 ns
(0) Hit ratio = 07 x 0.9 = 0.68
bits are same, that is, 3 bits, Cache memory stores
both tag and data ‘The highorder tag bits of an
address generated by CPU aro compared with tag
Dits of each block so number of block bits is zero. AM
remaining bits (except word bits) are identified as tag
bigs. Thus, the main memory address for fully asso
ciative mapping is divided as follows:
Tag __| Wed
Tete | sbi
For two-way set-assoclative mapping, cache
‘motnory ie mapped with two blocks par sat, The word
fiold has same number of bits, $ bite. There are 128
(296/2) sots so 7 bits are required to uniquely identify
these 128 sets. The remaining 6 (16 — 7 ~ 8) bite are
used a¢ tag bits. Thus, the main memory address for
set-aceociative mapping ie divided ae follows:
Problem 2.21: A 4-way sot-associativo cacho memory
tures blocks of four words. The cache can aocommo-
date a total of 1024 words from the main memory.
‘The main memory size is 128K x 82,
(3) Formulate all pertinent Information required to
‘construct the cache memor
(b) What is the sizo of the cache memory?
Solution:
(a) Main memory is 128K = 2"7 so 17 bits address is
generated. For a sot size of 4, the nurnber of sets
In cache is 296/4 = 64, s0 6 bits are required. Each
block consiets of € wotde so 2 bite are required for
the word field
‘Toe [Se [Weed a:_|96_| Wad
Shits [The | sone WEL ELS
(b) Since cache is 4-way set assocative, 4 blocks per st are stored in cache memety.
Tat [Daal [Tae? [Data? [Tes [Daas | Tet |Daaa
stis | sebiis [ovis | s2bis [otis [setts [otic | a2 tits
Sao
i Tag [data | toe | Data [tae | Data [Toe | Data
aes
‘Size of cache memory is 128 x 4(9 + 92) = 5248 bits
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
Problem 2.22: Suppoce physical memory is of 2GB
‘and each word is of 16 bits. There is a cache contain-
ing 2K words of dats, and each eache block contains
16 words. For each of the direct mapped and 2-way
set associative cache configurations, specify how the
address would be partitioned.
Solution:
(9) Por direct mapping, the word field i of 4 bits
Identify 16 diferent words in block (2 = 16)
‘As the cacho memory consists of 2K words which
is equivalent to 2K/16 = 128 blocks so (2? =
198) 7 bits are required to address a block. Tho
remaining 20 (81 — 7 ~ 4) highordor address
bits ore tag bits. Thus, the main memory address
for dcct mapping i divided a follows:
Dobne [Tbe [4 bie
Tea Bok] Word
IMPORTANT FORMULAS
Sowen BaMPLES 77
(b) Since cache is 2-way cot ageociative, 2 blocks per
s9t are stored in cache memory. So, the number
of sete is 128/2 = 64,
bis [Obie [4 bite
Ta [Se | Word
Problem 2.23: Consider a direct mapped cache of size
82, KB with block size 82 bytes. The CPU generates
82-bit addossos. What are number of bits needed for
‘addrossing block in cache and number of tag bits?
‘Solution:
Tos Block [Word
32 —10— Shits | 10 bits | obits
‘The number of bits needed for addressing block in
cache and number of tag bits are 10, 17, respectively.
+ Amdahi's law
Sova = [o- DaA+ x4]
+ ‘The speed up of pipelined system over non-pipelined
ayer is given by:
aes
§=Gin-Dey
+ Maximum speed up that piplined eystem can
achiovo is given by:
a
s k
SOLVED EXAMPLES
+ Wit ratio (A)
Number of hits
[Norabar of hits + Number oF misee
+ Average access time
= Hit ratiox 7, +(1~Hit rato)(7: + Th)
where 7, is cache arcas time and T,, is main
memory Sccess time,
+ Direct mapping
[as (m= oy tit bak fn tte [Word (eit)
+ Savanchve mapping
[fe [Sa [Wena
1. The principle of locality is used in
(a) Invorrupe {b) Registers
(e) DMA (4) Cache memory
Solution: It is used in cache memory to help the
program accass small amounts of addrass epace at
any instant.
Ans. (dj
2. Which memory unit has loweet acoass time?
(a) Cache (b) Rovisters
{c) Optical dik (4) Main memory
Solation: Registers are usa for processing
fand manipulating data and for bolding memory
address that are available to the machine-cole
programmer. So, they havo lowet accos time
‘Ans
by
www.engineeringonyourfingertips.oooTB APTER> COMPUTER ORGAMZATION AND ARCHITECTURE
8. During DMA transfer, the DMA controller takes
over the buses to manage the transfer
(a) Directly from CPU to memory
{b) Directly from memory to CPU
{¢) Directly between the memory snd registers
(d) Directly between the 1/0 device and memory
Solution: DMA controller manages transfor
between 1/0 dovice and memory.
Ams. (4)
4, Booth’s algorithm is wed forthe arithmetic oparae
tion of,
{3} addition {b) subtraction.
{) multiplication, (2) division
Solution: It is a mukiplication algorithm that
rmuliplies two signed binary numbers in 2 com.
plement notation,
Ans. (c}
'5, The reason for improvement in CPU performance
during pipelining is
{a) reduced memory access time.
{b) increased clock speed,
{e) introduction of parallelism.
(4) increase in cache memory.
Solution: Instructiomlovel parallelism is imple
mented within a single processor to allow faster
CPU throughput.
Ans. (9)
6.
‘Use of eacho memory enlancos
(0) 1/0 access time
{b) memory access time,
{c) effective memory sccees time,
(d) secondary storage access time.
Solution: Cache memory compansatas the sped
mismatch between processor and main memory
‘access time,
Ans. (e}
77. Am instruction eycle refers to
{9) fetching an instruction.
(b) executing an instruction.
{o) fetching, decoding and executing an instruction,
(4) reading and executing an instruction.
Solution: It involves fetching, decoding and exe.
fcuting the instruction,
Ans. (9)
8. A hardware interrupt is also called
(9) an internal interrupt.
(b) an extornal interrupt.
www.engineeringonyourfingertips.ooo
(6) @ processor interrupt.
(4) « clock interrupt,
Solution: “Hardware interrupt is prosent in hard-
‘ware pins.
Ans, (0)
). Priority is provided by for acces to memory
by various 1/0 channels and procssars
(a) a register
4b) counter
10.
nu.
12.
13.
(c) the processor scheduler
(9) acontelie
Solution: Controller sets priotity for memory
access by various I/O devices and processes.
Ans, (d)
By applying the principle of temporal locality,
processes are likely to reference pages that
(a) have bom telerencad rocently.
{b) are leated at address noar recently referenced
pages in meron:
{) have boon praloadad into memory.
(d) have to be reloaded into memory.
Solution: ‘Tamperal locality refs to rows of
rexouroisrelorecad within a sir time Kanne
Ans, (a)
Which of the following is @ corect statement
ated to L2 cache mernory?
(a) The level 1 cache is always faster than the level
2eache
(b) The level 2 cache is used to mitigate the dynamic
slowdow every time level I cach miss occurs
(c) Level 2 cache comes as on board only.
{d) In modern day computer, the level 2 cache is
conshdored an internal cacho.
Solution: 12 lavel of cache is placed botwoon the
Li and RAM. Tho LI cache is always the fastest.
Ans. (a)
‘What is the control unit's function in the CPU?
(a) To docodo program instructions
(b) To transfor data to primary storage
(c) To perform logical operations
(€) To store arithmetic operations
Solution: Control unit controls savaral units of
CPU and helps decade program instructions.
Ans. (a)
(CPU fotchos tho data and instructions from —__
{a) ROM (b) control unit
(o) RAM (4) coprocessore chip
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
ua
15.
www.engineeringonyourfingertips.ooo
Solution: ‘The OPU fetches data and instructions
from RAM and axcoutes it
Ans. (e}
‘Which of the following affects the proceeeing power
of tha CPU?
(a) Data bus, addrassing schemes
{b) Clock spoed, addressing schemes
{c) Clock spovd, data, bus
{d) Clock speed, data bus, addressing schomes
Solution: CPU processing speed ie affected by
flock eyele, data bus for routing data and address
ing modoe.
ns. (@)
‘The contents of the flag registor after execution of the
following program by 8088 microprocessor willbe
Program
sUBA
MVIB, (0g,
DORE
aur
{8) Gy) (Oy (©) (O1py (4) 5) q
Solution:
SUBA Subtract comtants of memory Inc.
tion whise contents are stored in A
MVE, (O1)y Move B with (O1)y
DORB Decrement B gives (54),
mur
Ans. (a)
An Nebi¢ cary lokeabwad adder, where Mi a raul
Liple of 4, employs TCS 74181 (40-Ie ALU) and
‘alse (Abit cary lookahead generator).
‘The minimum addition time using the best arch.
tecture for adder is
(a) Proportional to 8
(0) Proportional to be W
{¢) Aconstant
{d} None of the above
Salation: The addition time for N-bit carry locke
ahead adder is always a constant
Ans. (©)
‘The main memory of « computer has 2 cm Mocks
while the cache has 2e blocks. Ifthe cache sss the
fet-ascative mapping scheme with two becks pet
fet, then Hock Kot the rain mencry mays to the ss
{2 (kod m) of the cache
{b) (emod ©) ofthe cache
{c} (k mode 2c) of the cache
{6) (mod 2em) of the cache
18.
19.
SoLveD BAMPLES 79
Solutions
[Number of sets in cache
Number of blocks in cache _ 26
~ Number of Blocks in one set 2
‘Number of sets in cache = ¢
So to map tock k, main memory maps to the sot
(mod o of the cache.
Ans. (b)
Which of the following is/ate advantage of virtual
memory?
(a) Faster memory to memory on an average
(b) Processes can be given protected address
spaces
(6) Linker can assign addresses indopondent of whore
the program will be loaded in physical memory
(4) Programs larger than the physical memory size
can be run
Solution: Virtual memory allows. programs to
hhave @ larger epace than the physical memory elze
‘to rum. So, the programmer doas not have to worry
bout physical memory size,
Ans. (d)
‘The muraber of full and half adders required to add
16-bit numbers is
{o) 8 half adders, 8 full adders
{b)1 half adder, 16 fll ads
{c) 16 half adders, 0 full addor
(4 half adders, 12 ull adders
Solution: To add bit numbers
Number of half adders required = 1
[Number of full adders required = N= 1
‘Thorefora, to add 16-bit number only, 1 half addor
and 16 full adders aro require.
Ans. (b)
|. Which of the following requires « device driver
(a) Regicter (b) Cache
(6) Main memory (a) Disk80 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
Solution: Disk is the 1/0 device attached exter.
nally to the processor, Therefore, disk requires
device driver.
‘Ans. (d)
21. More than one word are put in one cache block to
{@) exploit the temporal locality of roference in a
program
(b) exploit the spatial Locality of reference in a
program,
{) reduce the miss penalty
{d) none of the above,
Solution: "There ane wo types of fealty of teeters
temporal and spatial locality.
‘The concept of spatial locality, instead of fetching
{just one item from the main memory to the cache,
1s usoful to fetch several ems that reside at ada
comt sddrase as well
‘So, option (b) is correct.
ns. (b)
22, Which of the following statements is fabs?
(4) Virtual memory implements the tandation of
program's addross space into physical memory
addres space.
(b) Virtual memcey allows each program to exces
‘he sie of the primary memory”
(6) Virtual memory increases the degree of
‘multiprogramming
(4) Viruial memory raduoes the context-switching
overhead
Solution: Virtual memory increases the context.
switching overhead.
Ans. (4)
GATE PREVIOUS YEARS’ QUESTIONS
www.engineeringonyourfingertips.ooo
28. Advantage of synchronous sequential citeuits over
‘seynchronous ones ie
(a) faster operation.
(b) ease of avoiding problems due to hazards.
(6) lower harewaro requirement.
(4) better noise iramunity.
Solution: Because of lees dolay, synchronous
sequential ctoults have Faster operation than syne
chronous onee.
Ans. (a)
24. ‘The total siz of addness space im a virtual memory
system Is limited by
(9) the length of MAR
(b) the available socondary storage
(6) the available main memory.
(4) all of the above,
Solution: Virtual memory depends only on the
‘available sito of the socondary raemory.
‘Ans. (b)
25, Comparing the me 7, taken fr single nstruce
jon on a pipelined CPU with time 7, taken om a
‘not-pipelined but identical CPU, we can say that
@) 4st
(b) 12%
1 COMPUTER ORGAMZATION AND ARCHITECTURE
18. A dovice with data transfer rato 10 KB/s is con
nected to 4 CPU, Data is transferred. byto-wise,
Let the interrupt overhead be 4. The byte trans-
for time between the device interfaces register and
CPU or memory is negligible What is the mink
‘mum performance gain of operating the device
‘undar interrupt modo over operating #¢ undor pro
rammcontrolled mode?
(0) 15 (3
(6) 35 (aye
(GATE 2005; 2 Marks)
Data transfor rato = 10 KB/s
4x10
Solution:
Interrupt overhead
10 KB is sent = 18
1 B is sent = 1/10K
Minimum performance gxin
00 - 1077s
100 x 10/4 x 107
‘Ans. (b)
Common Data Questions 19 and 20. Consider the
following data path of a CPU. The ALU, the bus
‘and all the registers in the data path are of identi
cal siza All operations including ineremantation of
the PC and the GPRs are to be carried out in the
ALU. Two clock cysles are needed for memory read
operation ~ the first one for loading address in the
MAR and the next one for loading data from the
memory bus into the MDR.
MAR = MDR
‘GPRS
19. The instruction “add Ry, Ry" has the register
‘transfer interpretation Ry e= Ry + Ry. The mini
‘mum number of clsek eyelas netded for execution
ceyeloof this instruction is
(a2
wa
ibys
(ays
(GATE 2005: 2 Marks)
Solution: There will be three eyelee—(1) jag
Sq (2) Raoys Tin 04 (3) Sys Tras ALT ap Ba
“Ans. tb)
20. The instruction “call Ry sub" isa two-word instruct
tion. Assuming that PC is incremented during tho
a1.
22.
fetch cycle of the first word of the instruction, ite
rogister (ranefer interpretation is
R,& PCH;
PC & MPC);
‘The minimum number of CPU clock eyelae neoded
during the execution cycle of this instruction is
@2 3s 4 (as
(GATE 2005: 2 Marks)
Solution: The minimum number of CPU clock
cyelas neoded during tho arecution cycle = 4. This
is bocanso
1 cycle is required to tranefer already incremented
valuo of PC
‘eycles for getting data in MDR
1 to load value of MDR in PC
Ans. (6)
Consider a disk drive tho following
‘specifications:
16 surfaces, 512 tracks/surface, S12 sactors/track, 1
[KB/seetor, rotation speed S000 rpm. The dik is open-
ated In eycle stealing mode whereby whenever one 4
byte word is raady itis sent to memcry; similarly, for
‘writing, the dick interface reade a 4 byte word froma
the memory in each DMA cycle. Memory cycle time
fs 40 nsoc. The maximum percentage of tim that the
CPU gets blockeal duting DMA operation i:
with
(a) 10 (b) 25
(49 (a) 50
(GATE 2005: 2 Marks)
Solution:
Data transfer in one rotation = 512 x 1024 Bytes
1 rotation takes = 2-5
3000
©
‘19KB is transferred in = 3055 &
1 byto will be transferred
4 bytes wall be transferred
= 60x x 512% 1024 ~ 152.58 ne
40
152.28 id
©
spp 01201024
Block %
Ans. (b)
A.CPU has 24-bit instructions. A program starts at
‘address 300 (in decimal). Which one ofthe following
{sa legal program counter (all Values in decimal)?
(a) 400 (b) 800
() 600 (a) 700
(GATE 2006: 1 Mark)
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
24.
www.engineeringonyourfingertips.ooo
Solutions Size of instruction = 24 bite; Start
address ~ 200. Legal adress will be multiple of
three, that is, 300,
Ans. (c}
= A CPU has a cache with block size 64 bytes
‘The main mamory hae & banks, each bank being
fe byte wide Consseutive e byte chunks are
‘mapped on consecutive banks with wrap around
All the & banks can be accessed in parallel, but
two accesses to the same bank must be serialize.
A cache block access may involve multiple iter
tions of parallel bank accesses depending on the
amount of data obtained by accessing all the k
banks in parallel. Esch iteration requires decoding
the bank numbers to be accessad in parallel and
‘this takes A/2ns, The latency of one bank access Is
80 ns. Ie = 2 and k= 24, the latency of retriev-
ing a cache Hock starting at address zero from the
‘main memory is
(a) 928 {b) 104 ns
(0) 1726 (a) 184 ns
(GATE 2006: 2 Marks)
Solutions
‘Time for one parallel process = k/2 + latency
‘Time for one byte = 24/2 + 80 = 92
‘Total time for c bytes = 2 x 92 = 184 ns
Ans. (d)
A OPU has a fivestage pipeline and runs at 1
Gllz frequency. Instruction fetch happens in the
first stage of the pipeline. A conditional branch
‘ngtruction computes the target addrass and
evaluates the condition in the third stage of the
pipeline. The processor stops fetching new instruc.
ons following a conditional branch until the
branch outcome ie known. A program executes
10° instructions out of which 20% are conditional
branches. If each instruction takes one cycle to
complete on average, the total execution time of
the program is
(a) 105 ib) L2s
(14s id) 6s
(GATE 2000: 2 Marks)
Solution:
‘Total execution time ofthe program
= 10° +020 x2x 10? = 148
Ans. (3)
Consider « now instruction named branch-on-bit-
set (mnemonic bbs). The instruction *bbs reg, pos,
label” jumps to label if bit in position pos of reg
‘ster operand reg is one, A register is 92 bits wide
27.
‘CATE PREVIOUS VEARS QUESTIONS — 85
and the bite are numbered 0 to 81, bit in pection
O being the least significant. Consider the following
‘emulation of this instruction on processor that
does not have bls impletmented,
temp @ reg & mack
Branch to label if temp is non-zero,
‘Thovariabletempisatemporary register. For correct
emulation, the variable mask must be generated by
(a) mack 0x1 « pos
(b) mack <0 S97f74EF > pos
(o) mack € pos
(4) mask 0% 5
(GATE 2006: 2 Marks)
Solution: As thereis only onebit with pos, the other
bits need to be set to 0 i temp. The mask register
‘must have 1 in pos position, for which pos number
‘of left shifts over 1 need to be made.
Ans. (a)
Common Data Questions 26 and 27: Consider vwo
cache organizations. The fist one is $2 KB, 2-way
sot associative with 32-byte block size. The second
‘one is of the same size but direct mapped. The size
of an addres is $2 bite in both casas. A 2-to-1 mule
tipleser has 6 Istency of 0.6 ne while a Kbit com-
pparator has a latency of k/10 ns. The hit latency of
‘the sot-associative organization is Jy while that of
‘tho dinect mopped one is
= Tho waluo off is
(a) 24 ns (b) 23.ns
(9) 188 (17s
(GATE 2006: 2 Marks)
Solution:
‘Address bits = 32
Block size = 82 B = 5 bits
Size of cache = 82KB/32 = 1KB = 10 bits
For 2way sotassoclative memory — inclax bits ~ 9,
tag bits = 18
B/10 = 15/10 = 1.8 + latency = 18 +06 = 24
‘Ans. (a)
The value of hy le
(a) 24 ne (b) 23 ns
(6) 18 ns (4) 17 ns
(GATE 2006: 2 Marks)
Solution:
Indirect memory accoos: ‘Tag bite ~ 17; Index ~ 10
bits; word = 5 bits
A/10 = 17/10 = 1.7 + latency = 1.7 +06 =28
Ans. (b)www.engineeringonyourfingertips.ooo
www.engineeringonyourfingertips.ooo
(OAPTER2 COMPUTER ORGAMZATION AND ARCHITECTURE
Common Data Questions 28 ond 2% A CPU has
8 22. KB direct-mapped cache with 12&-byte block
size Suppose A is a two-dimensional array of size
512 x 512 with elmnents that occupy 8 bytes each.
Consider the following two C code segments, Py
and
Py for (ie0; ic8127 i+) (
for (=O J<5127 344)
x e=AC O17
’
)
Pat for (10; 1<5121 144) (
for (m0 3 COMPUTER ORGAMZATION AND ARCHITECTURE
97. How many dats cache misees will occur in total?
(4 (BD (Ba) 0
(GATE 2007: 2 Marks)
Solution:
Main memory ~ 2!° B
Block size = 64 B
[Number of blocks = 82
[Number of elements = 50 x 50 = 2500
‘Starting from Ioeation 1100 means from 68th block
‘Number of blocks = 2500/64 = 40 blocks
Initially cache is empty for $2 misses, then § are
remaining fom total 40 for one acoass
Array is traversed twice, 60 data cache misses =
10+ 848— 56
Ans. (c}
38. Which of the following lines of the data cache will
be replaced by now blocks in accessing the array
for the second time?
9) line 4 ¢0 lime 11
{6) line 0 to line 7
Solution:
Applying k mod ¢ to find the location: 68 mod
adn
{b) line 4 to tine 12,
{4} line 0 to line s
Ans. (3)
30, Which of the following e/ate true for the auto-
incremant adérossing mods?
tis wef in creating selteocating code,
TL I ic mcladad in an Testruction Sot Architece
ture, then an additional ALU is required for
sifectve addres caloulation
ML The amount of increment depends on the siz
of the data item accosead
(a) only
{e) only
(b) Monty
(4) ML and IIL only
(GATE 2008: 2 Marks)
Solution: Only statoment (IM) s true.
Ans. (c}
40. Which of the following rust be true for the RFE.
(Return from Exception) instruction on @ general
Purpose processor?
1. It must be a trap instruction.
IL Ik must be » priviloged instruction.
INL An exception cannot be allowed to occur during
‘exeeution of sn RFE instruction,
{a) Tonly (b) Tonly
{o) Land Tl only (4) 1, Hand TH only
(GATE 2008; 2 Marks)
Solution: RFE (Return from Exception) 1s a
privileged trap instruction that is executed when
an
42.
www.engineeringonyourfingertips.ooo
‘exception occurs, 60 an exception is not allowed t0
‘execute, Option (d) is the correct option,
Ans. ()
Por a magnetic disk with concentric circular tracks,
tho seck latency is not linearly proportional to the
seek distance due to
{9} nonuniform distribution of requests
(b) arm starting and stopping inertia
(6) hicher capacity of tracks on the petiphery of
the platter
(4) use of unfair arm scheduling polisies
(GATE 2008: 2 Marks)
Solution: ‘Tracks on magnetic disks are concentric
and seek Iniency from one sector to other which
‘may ot may not be in different tracks. ‘This sock
distance is not proportional to latency since the
‘racks at periphery have higher diameter, and
hence higher capacity to store data,
Ans. (0)
Which of the following are NOT true in pipelined
processor?
1 Bypassing con handle all RAW hazards.
TL Register renaming can eliminate all resister
carried WAR hazards
NL Control hazard penalties can be eliminated by
dynamic branch prediction,
(@) Land IL only (b) Lond MI only
(c) Hand Mf only (@)1 Wand
(GATE 2008: 2 Marks)
Solution: All the statements are true.
Ans. (d)
- Por inclusion to hold between two cache Iovele L1
and 12 in a multi-level cache hierarchy, which of
the following ate neceseary?
Lt must be a-write-through cache
TL L2 must be a writethrough cache
IIL The associativity of L2 must be greater than
that of Lt
TV. The 12 cache must be at least ae lange ae the
Li cache
(a) IV only
() 1, Mand 1V only
(b) Land IV only:
(4) 1,1 MM and 1V
(GATE 2008: 2 Marks)
Solution: LA and 12 cache are placed between
‘CPU and they can be both write through cache
‘ut not necessarily.
Associativity doos not matt.
12 cache must be at least as large as Lt cacho,
since all the words in L1 ate also in L2.
Ans. (8)
www.engineeringonyourfingertips.ooo44. The ure of multiple register windows with over
lap causes a reduction in the number of memory
accesses for
1 Function locals and parameters
IL Register soves and restores
TIL nctruction fetches
(a) only (b) Monly (6) Monly (4) 1, Wand mH
(GATE 2008: 2 Marks)
Solution: Multiple register windows with over
lp causet « roduction in the umber of memory
scoesses for register savas and restorer
Ans. (b)
45. Consider a machine with a 2-way setassocative
data cacho of size 64 KB and block size 16 bytes.
‘The cache is managed using 92 bit virtual addresses
and the page size is 4 KB. A program to be run on
this machine begins as follows:
double ARR (2024) (10281;
ine 4 37
/* Initlalize array ARR to 0.0 +/
For(la07 L<1024; 444)
for(3-0; }<1024; 3+)
BRR (1) [31 ~0.07
‘The sleof double 8 bytes. Aray ARR is located
in memory starting at tho boginning of vitual page
(OxF FOOD and stored in row major order. Tho cache
is insally empty and no prefetching is done. The
only data memory references made by the program
ste those to array ARR
‘The total sie ofthe tags in the cache direstory Is
{a) 32 Kbits: (b) 34 Kbits:
{et Kbits {d) 68 Kbit
(GATE 2008: 2 Marks)
Settions
Virtual address = 82 bits
2evay cache size = 64 KB
{sot will contain ~ 32 KB entries ~ 15 bite
Block size = 18 bytes = 4 bits
Tag tis [Sabie | Word bie
17 TH 7
‘Tag size = 172% 1024 = 84 bits
‘Ans. (b)
46. Consider the data given in the above question.
Which of the following array elements has the
same cache index ae ARR|O}0}
(a) ARRIO|s) (b) ARR)
{6} ARR; (a) ARR(S}}]
(GATE 2008; 2 Marks)
ar
www.engineeringonyourfingertips.ooo
‘CATE PREVIOUS EARS QUESTIONS — 89
Solution: Total slements can come in one clot 2048.
‘After 2048 elements, same cache index will be on
{2/0} and (4}[0}.
Ans. (b)
‘The cache hit ratio for this initialization loop i
(a) 0% —(b) %_— (e).80% (a) TH
(GATE 2008: 2 Marks)
Solution: Cache hit ratio found out se fellow:
yom
ane 3
As we cameo in te abow, hate wil be 05 its.
Ans. (c)
Linked Answer Questions 48 and 4%: Delayed
ranching cam lp tn the handling of control
hazards
= Por all delayed conditional branch instructions,
Irrexpoctive of whother the condition evaluates (0
true or fabs:
(a) Tho instruction following the conditional
branch instruction in memory is executed.
(b) The first instruction in the fall through path is
executed.
(©) The frst instruction in the taken path is
executed,
(4) The branch takes longer to execute than any
other instruction,
(GATE 2008: 2 Marks)
Solution: Tho first instruction following the
branch instruction is always executed (irrespective
cof whother the branch is taken or not)
Ans. (b)
). The following code is to run on a pipelined prosas
sor with one branch delay slot:
Jp ADD Ry © Ry + Ry
4, SUB Ry « Ry~ Ry
Ty ADD Ry © Re + Ry
Ig STORE Memory [RJ © R,
BRANCH to Label 2, ——0
Which of tho instructions Jy fy Tea legit
rnatly cccupy tha delay ts oithout any othor
togram modification?
@h bb
Oh ih
(GATE 2008: 2 Marks)
Solution: Instruction J, contains delayed lot. Ly
thas data dependency in I
Ans. (b)
www.engineeringonyourfingertips.ooo90 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
50. How many 92K x 1 RAM chips aro needed to pro-
‘vide a memory capacity of 256 K bytes?
(a8 ib) 2
{o) 64 (a) 128
Solution:
[Number of chips required
‘Ans, (e)
51. A CPU generally handles an interrupt by execut-
ing an interrupt service routine
(@) as soon as an interrupt i raised.
{(b) by checking the interrupt register at the end of
the fotch cycle.
(©) by checking the interrupt register after finishing
the execution of the current instruction,
(a) by checking the interrupt register at fixed time
interval
Solution: Interrupts are handled by checking the
Interrupt raglter after finishing the execution of
current instruction,
Ans. (¢)
52. Consider a 4-staxe pipeline processor. The number
of cycles needed by the four instructions J, f,, 1,
Ty im stages 5}, Sy, Sy, 5, is shown below:
a
|
wo] fo] abe
www.engineeringonyourfingertips.ooo
5B. Consider © 4-way setassccistive cache (initially
empty) with total 16 cache blocks. ‘The main
‘memory consists of 256 blocks and the requast for
memory blocks isin the following order:
0, 255, 1, 4,3, 8, 188, 199, 216, 129, 68, 8, 48, 92,
73, 92, 155
‘Which one ofthe following metnory block will NOT.
be in cache if LRU replacement policy is wed?
@s 8 (rR ~26
(GATE 2000: 2 Marks)
Solution:
‘To decide the location (addres), mod 4 is applied.
Sao [0482658 DED
Sect [1 155, 129, 7
[Seca [rs
[See 3 [255-5 160, OF
216 will not be there in cache.
Ans. (d)
Common Data Questions 54 azad 85: A hard disk has
68 sectors per track, 10 platters each with 2 record
‘ing surfaces and 1000 cylinders. The address of
sector is given as triple where cis the
cylinder number, h is the surface number and s is
the sector number. Thus, the Oth sector is addressed
5 <0, 0,.0> the Ist sector as <0, 0, I>, and so on,
(GATE 2000: 2 Marks)
4. The address <100, 16, 20> corresponds to sector
muruber
{9} 506035, (b) 505086
(6) 505087, (4) 505008
Solution:
‘Total surfaces = 102 = 20
Adrants: 400 590 x 63 + 16 x 63 + 99 — 808037
‘ans. (0)
What isthe umber ofcyles needed ‘oexecutethe 5, ‘The address of 1089h acto
following loop? {a} @, 15, 31> (b) <0, 16, 30>
for (i=l to 2) (hi Tat Iai Lei} {c) <0, 16, 31> (d) <0, 17, 81>
7 Sebtione
{a} 16 (b) 23) 28 (d} 30 Address of 1039th sector = 16 x 31 + 31 = 1039
(GATE 2000: 2 Marks) rey
Sottion:
Clock 102 3 4 5 6 7 9 0 Mo Mo
1 55 SH
L SEE EEE HY
iB 5 & % § 8
4 4 % 8% % % s
For two iterations = 15 2 — 80 clock eyelas
‘Ans, (d)
www.engineeringonyourfingertips.ooo56. A main memory unit with s capacity of 4 mega
ibys bull wing IM [bit DRAM chips. Bach
DRAM chip has 1 K rows of cols with 1K ell
in wach row. The tim taken for «single teftesh
eration i 10 ns. The time required to perform
fone rafash operation onal cho callin cho memory
smi
{s) 100 nanoseconds
{0) 100 2" nanosoconds
{) 100 2 ranossconde
{d) 3200 x 2° nanoseconds
(GATE 2010: 1 Mark)
Solution:
Main memory = 4 MB
Number of DRAM chips = 4 MB/ 1M x 1 bit = 82
‘Total cll = 92 1K «1K
‘Time taken to relocate ells ~ 82 x 1K
www.engineeringonyourfingertips.ooo
‘CATE PREVIOUS VEARS QUESTIONS 91
Hero wo have to find X in terms of Y. So,
1K» 100 ne Honea, wo suma up as
Ans. (d) X = MAX(Y, a + ¥/2)
5. The weicht of a sequence ay 4/2,» Oy of real Ans. (b}
number is defined a8 ay + 4,/2-+ + 0/24
‘A eubsequance ofa sequence ls obtained By delet.
ing some elements from the sxquencs, keeping the
order of the remaining elernonts tho same. Lat X
dlencte the maximum possible weight of a subss-
quence of gy oy sd and Y tho maximum por
sible weight of a subsequence of oy. 0 «Oe
‘Then X is equal to
{) max(Y, a+ ¥)
{e) max(¥, a, + 2¥)
(0) max(¥, a + ¥/2)
(d) ag + ¥/2
(GATE 2010: 2 Marks)
Solution: Tho concepts invelve the Dynami
Programming in Algorithms.
Given that
X= max weight from the sequence (a), 0), 03,
agatha gD
aed = Ot Bt Bt ES
Y= max weight from the sequence (4), <4)
BB. A S.stage pipelined processor has instruction fetch
(UF), mstruction decode (ID), operand fetch (OF),
perform operation (PO) and write oporand (WO)
Sages. Tho IF, ID, OF and WO stages take 1 clock
cyclo each for any instruction. The PO stage takes
1 clock cycle for ADD and SUB instructions, $clek
cyels for MUL instruction and 6 clock cycles for
DIV instruction, raspoctively. Operand forwaeding
is ued in tho pipline, What is the rumbr of clock
cycles nea to execute the following sequence of
instructions?
Instruct of Instruction
Tg: MUL Ry, Ro Ry Ry Ry x Ry
fy DIV Ry, Ry Ry Re Ry/Ry
TE ADD Ry, Ry Ry Pye Py + Py
Iz SUB Rey Roy Re Re © Ry~ Re
18 O18) Oar @is
(GATE 2010: 2 Marks)
Solution:
72 3 4 5 6 TF oo mu @ i m1
h © 1 OF PO PO PO Wo
h r WD OF _ _ PO PO PO PO PO PO WO
% r wD op | | LL PO WO,
5 Ir 1D or Po wo
‘With operand forwarding, 16 clock eycles are neaded,
‘Ans. (b)
www.engineeringonyourfingertips.ooo92 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
50. ‘The program below uses six temporary variables 0,
ede
ani
b= 10
e= 20
daatd
erecta
frcte
becte
en=btt
a=ste
return d+ f
Assuming that all operations take their operands
from registers, what isthe minimum numberof rgis.
ters newled to execute this program without spilling?
(2 ws
oa (ae
(GATE 2010: 2 Marks)
Solution: Lt us take three Registers R1, RD and R3
All the operations will be completed using thro
registers only.
‘Ans, (b)
Common Data Questions 60 and 61: A computer
system has an LI eache, an 12 cacha, and a main
‘memory unit connected as shown below. The block
siza in L1 cache is 4 words. The block siza in L2
cache is 16 words, The memory access times are 2
nanoseconds, 20 nanossconde and 200 nanoseconds
for L1 cache, L2 cache and tain memory unit,
respectively
a LDsbus | Deabee fe |]
2 eal Main
Cache Cache emery
4 words 4 words EMP)
60. When there is # miss in LA cache and a hit in L2
cache, a block Is transferred from L2 cache to
Li cache, What is the time taken for this transfer?
{b} 20 nanoseconds
(4) 88 nanoseconds
(GATE 2010: 2 Marks)
{(@) 2 nanoseconds
{c) 22 nanceoconds
on.
62.
www.engineeringonyourfingertips.ooo
Solution: Accass to Lt cache = 2 ne
‘Acoaee to 12 cache = 20 ne
Block sizeof L2 is 16 words; data bus size is 4 word.
So, time taken for this transfer = 4 x 22 = 88 ns.
Ans. (d)
‘When there is « missin both L1 cache and 12 cach,
first « block is tranaferted from main memory (0
12 cache, and then a block is transferred from L2
cache to L1 cache, What isthe total time taken for
those transfors?
(a) 222 nanoseconds
(6) 962 nanoseconds
(b) 888 nanoseconds
(4) 968 nanoseconds
(GATE 2010: 2 Marks)
Solution: Block transfer from main memory to L2
cache = 2 +4 x (20 + 200) = 2 + 880 = 882 ns
12 to Li = 882 + 20 +66 = 908 ns
Ans. (d)
A computer handlas several interrupt sources of
‘hich the following are relevant for this question,
‘+ Interrupt from CPU temperature sensor (raises
interrupt if CPU temperature is too high)
+ Interrupt from Mouse (raises interrupt if the
mouse is moved or a button is prassed)
‘+ Interrupt from Keyboard (raises interrupt when
a kay is pressad or relaasea)
+ Tnterrupt from Hard Dick (tases interrupt when
«disk raad is comploted)
‘Which one of these wll b handed at the HIGHEST
priority?
(a) Interrupe from Hard Disk
(b) Interrupt from Mouse
(c) Interrupt from Keyboard
(4) Interrupt from CPU temperature sensor
(GATE 2011: 1 Mark)
Solution: Interrupt from CPU temperature sensor
‘wll be handled at the highest prictty.
Ans, (d)
(Consider hypothetical procseor with an instruction
of type LW 2,20 (2), which during execution reads
4 22-bit word from memory and stores it in a 22-bit
‘oxiter Ry. The effective addrees ofthe memory loca
tion i obtained by the addition of «constant 20 and
the contents of register Ry. Which of the following
best reflects the addressing mode inoplemented! by
this instruction for the operand in memory?
(a) Immediate addressing
{b) Regictor addressing
(c) Ragisterinditect scaled addressing
() Base indexes addressing
(GATE 2011: 1 Mark)
Solution: Bifective address ~ contents of register
R, +20
ans. (d)
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
oA.
www.engineeringonyourfingertips.ooo
‘On s non-pipelined sequential processor, a program,
segment, which is & part of the interrupt service
routine, is given to transfer 500 bytes from an 1/0
ddovice to memory.
Initialize the addrass rezistor
Initialize the count to 500
LOOP: Load a byte from device
Store in memory at addrass given by address register
Increment the address rogistor
‘Decrement the count
If count!= 0 go 9 LOOP
Assume that each statement in this program is
equivalent (0 machine instruction which takes
ono clock eycle to execute if it is a non-load/store
Inctruction, The load:store instructions take two
clock eyeles to execute.
‘The dosigner of tho eystom also has an altemato
approach of wsing the DMA controller to implement
the same tanefer. The DMA controller requires 20
clock cycks for initialization and other overheads. Bach
DMA transfor cycle takes two clock cycles to transfor
one byte of data from the device to the memory.
What is the approximate spoed up when the DMA
comtroller based design is used in place of the inter-
rupt-driven program-based input output?
(apa iby 4a
(51 (ay 67
(GATE 2011: 2 Marks)
Solution: By using load store instructions: 14 1
+800 (24241-4141) = 2+ 800% 7 = 8502
‘By using DMA: 20 + 500 x2 = 1020
‘Spoad up = 8502/1020 = 2.4
Ans. (a)
‘An 8 KB direct-mappod writeback cacho is org
nized as multiple blocks, each of size 22 bytes. The
processor generates 2-bit addresses. The cache
controller maintains the tag information for each
cathe block comprising of the fellowing:
1 Valid bit
1 Moalified bit
‘As many bits as the minimum nooded to identify
the memory block mapped in the cache
What is the total size of the memory nesded at the
cache controller to storemetadata (tags) forthe cach’)
(a) 4864 bits b) 6144 bits
{c) 6656 bits (a) 5876 bits
(GATE 2011: 2 Marks)
Solution: Number of blocks = 8K/32 = 256 blocks
Bleck bite | Word bits
5 a
‘Tag bite
tm
‘CATE PREVIOUS VEARS QUESTIONS — 93
Memory size for tag bite = 19 + 2 = 21
‘Total size of memory for tags = 21256 ~ 8976 bits
‘Ans. (d)
66, Consider an instruction pipeline with four stages
(SI, $2, 83 and S4) each with combinational cicuit
only. The pipeline registers are given in the fizure,
seo] {2} sacs] |2] ac} |S] fs
s ‘S . What is the
cylinder number ofthe last soctor of tho fils if is
storad in a contiguous manner?
(GATE 2018: 2 Marks)
(2) 1281 (b) 1282 (<) 1288) 1284
Solution: Number of soctors required to store the
file = eee = 85504 sectors
Number of sectors in a cylinder = 16 x 64 = 1024
‘otal numberof elinders equted ~ SSM
Last sector wil be stored on 128ith cylinder
‘Ans. (0)
75. Consider an instruction pipoline with fivo stages
vwithous any branch prediction: Fetch Instruction
(FD), Decode Inetruction (DD), Fetch Operand
(FO), Exzcute instruction (ED) and Write Operand
{WO}, The stage delays fr F, DI, FO, EL and WO
are 5 ne, 7 ns, 10 ns, Sms and 6 ns, respctivly
‘There ate intcemediate storage butkes after each
stage and tho delay of each buffer is Vins. A. pro-
‘ram consisting of 12 instructions J, LJ is
‘executed im this pipelined processor, Inetructbo 4
isthe only branch instruction and ite branch target
{10 If the branch ie taken during the execution of
this program, the time (in ns) needed to complate
tho program is
(a) 182 () 165
(9.176 (a) 928
(GATE 2013: 2 Marks)
www.engineeringonyourfingertips.oooSolutions Clock pulse duration
Maximum stage delay + Buffer delay
= 1x54 4 — 1) 11 = 88 ns
np ka 1=88 4 88—11 = 165,
Ans, (b}
Common Data Questions 76 and 77 Tie fellow
ing code segment is executed on a processor which
allows only register operands in ite instructions.
Each instruction can have attmost two source oper
ands and one destination operand. Assume that all
variables are dead after this code segment,
onarb
dectay
ie >a) (
17.
www.engineeringonyourfingertips.ooo
PRACTICEEXERCKES 95
it > R)
(
Re fy = ara
)
else
(
Ree [read value ¢ from memory)
Re RtR (d= * al
RoR tB, fd=d*
Rp- Ret R, [e=c+ ail
RoR tm fe
)
‘Only ‘c’ is spilled variable here, which needs to be
stored and loaded from memory.
Ans, (b)
What ie the minimum number of registers needed
in the instruction set architecture of the proce
sor to compile this code segment without any spill
to memory? Do not apply any optimization other
than optimizing register allocation,
ete]
@s 4 5 Me
(GATE 2013: 2 Marks)
76. Suppose tho instruction sot architecture of the Solution: Let Ry = a, Ry = b. The machine code is
processor has only two registers. The only allowed = eat 45
Compiler optimization is code motion, which moves oe oe
statements ftom one place to another while pre a =
serving correctness. What is the minimum number os +h
‘of spill to memory in the compiled cede? crc Rea Ret Ry
‘a wu ri Bek
2 (as tie.
(GATE 2013: 2 Marks) Lt
Solution: Suppose Ry and R, hold a and b, respeo da=d* di RAR * RS
tivaly. Here for code optimizstion, code motion is e=ete +R
allowed. The machine code ig; )
Bee Rt R (ea +b) :
None een Four registers are requitod
Ans. ()
PRACTICE EXERCISES
Set 1 (6) The width of tho addrose bus
1. The three diffrent kinds of buses supporting the
architecture of a computer are
(a) Address bus, data bus, 1/0 bus
(b) Front bus, data bus, address bus
{¢) Control bus, data bus, addrass bus
{4) 1/0 bus, adirese bus, instruction bus
2. The term ‘WORD of a OPU equals to
{) The maximum adkressable memory sico
{b) The width ofa CPU register (integer o oa pint)
(4) The total number of general purpose CPU
registers
- Which of the following statement is false about
CISC ARCHITECTURE?
(2) CISC machine instructions may inckade com-
plex addrossing modes, which require: many
‘lock cycles to carry out.
{b) CISC control units ate typically micro-pro-
grammed, allowing tho instruction set 12 be
more flaible.
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
96
a
6.
10,
www.engineeringonyourfingertips.ooo
(OAPTER2 COMPUTER ORGAMZATION AND ARCHITECTURE
{¢) In the CISC instruction et, all arithmetic/logic
instructions must be register based for fast
processing,
(4) CISC architectures may perform better in net
‘work centric applications than RISC.
‘Tho regietor that holds the address of the loca
ton to oF from which data aro to be transforred is
called
(a) Index register
{b) Accuraulator
(c) Memory addrase registers
(d) Memory data registers
Which one of the following ts not a type of 1/0
channel?
{@) Multiplexer {) Solector
{6} Block multiplexer (4) None of the above
‘Tho porformance of a pipelined processor is
degraded if
(2) the pipeline stages have diferent dalays
{b) consocutive instructions aro to be executed serially
{) the pipeline stages share hardware resources
{d) all of the above
‘The minimum time delay betwoen the initiation of
wo independent memory operations is called
{9} Access time {b) Cycle time
{6} Rotational time (4) Latency time
‘Tho rogistor which koops track of the execution of a
program and which contains the memory address of
the instruction currently being executed is known
(8) index register
{b) memory address regieter
{) program counter
(4) inetruction registers
|. For interval arithmotic, tho best rounding toch-
gue used is
(a) rounding to plus and minus infinity
{b) rounding to zero
{) rounding to nearest zero
(4) rounding to the next number
Mardwired control unit are faster than micro.
pprograrumed control unit because
{they do not consist of slower memory elements,
{b) they do not have slower aloments such as gatas,
flip flops and resisers
(6) they consist of elements based on VVLSI design
technology
(@) they contain high-speed digital components
nn.
12,
13.
4.
15.
16.
ar.
18.
‘The most relovant addreseing mode to write post
ton-independent code is
() direct mode (b) auto mode
(6) relative mode (4) indexed mode
A.CPU uses 241i instruction. A progam starts st
address 800 (in decimal). Which one of the fllow-
ing is a legal program counter content (all values
in decimal)?
(a) 98 (by 212
(6) 600 (a) 700
‘An attempt to access a location not owned by a
program is called
(a) dats fae (b) adds fault
(c) struction fault (4) page fault
Which of the following statement. about relative
addressing mode te FALSE?
(2) It enables reduced program code
(b) It allows indexing of array olomont with samo
instruction,
() It enables easy relocation of data
(4) Tt enables fastor address calculation than abeo=
lute addrassing
Compared to CISC procescts, RISC prooasors contain
(a) mote register and emallor instruction set
{b) larger instruction sat and loss registers
(c) lees registers and smaller instruction set
(€) mote registers and larger instruction set
Micto-programmed control cannot be implemented
in RISC architecture because
(0) It tend to slow down the processor
(b) it consumes more chip areas and large instruc-
tion sot
(6) handling large mumber of registers is impos
sible in micro-programmed system.
(4) tho 1 instruction/eycle timing requirement
for RISC is dificult to achieve in micro-pro-
rammed based architecture,
Relocation of the code 1s easier in Irrspec.
tive of the program code
(a) indirect addressing
(b) indoced addressing
(c) base register addressing
{d) absolute addressing
In inverted page table organization, the size of the
page table depands on
() the number of processes
(0) the size of page
(c) the siza of main memory
(4) the number of frames in the main memory19. When using the concept of locality of reference, the
page reference being made by process
{) will always be to the page used im the previous
page reference
(b) is likely to be one of tho pages used in the past
{fos page references
{6} will always to be one of the pages existing in
the main memory
(2) will alway lead to page fault
20. If the now vrsion of processor is not made com-
patible to programs written for Its older version, it
could be able to process at « faster speed
{@) the statemant is trua,
(b) the statement is false
{¢) the epaed cannot be predicted.
{d) speed has nothing to-do with the compatibility.
21. A certain snooping cache can nop omly om acknees
line. Which of the following is true?
(8) This would adverily affect the system ifthe
swite through protocol is used
() Ie would run woll ifthe write through protocol
is wrod,
{6) Data snooring is mandatory to be implemented
om data bina
(4) Data stooping may not be aquired.
22. When the frequency of the input signal to a CMOS
‘gate Is increased, the average power dissipation
(2) decreases exponentially
(b) increases
{c) docteasas
(4) increases exponentially
28. The disadvantage of hardwired control unite with
ship Bop is
(9) design becomes complex.
{b) it requires more number of ip Hops
(c) contol circuit sposd does not match with fp
flops:
(4) flip-flops. cam hansle the data unit not the
‘contsol unit
24. In a vectored interrupt,
(a) the branch address is assigned to a fixed loca
ton in memory.
(b) the interrupting source supplies the branch
{information to the processor through an intet-
rrupt vector.
(6) the branch address is obtained from a resistor
im the processor.
(@) the branch address ie obtained from program
‘counter.
www.engineeringonyourfingertips.ooo
PRACTICEEXERCKES 97
25. A device employing INTA line for device inter
rupt puts the CALL instruction on the data bus
while
(a) INTA ts active (b) HOLD is active,
(¢) READY isactive, (4) READY is active,
226. On receiving an interrupt frm an 1/0 dion, tho CPU
(a) branches off to halt (or wait) for a prodete
‘mined time
(b) branches off to the interrupt service after come
plotion of the current instruction
(6) branches off to the interrupt. service routine
immedistely
(4) hands over control of addrass bus and data bus
to the interrupting device
27. Using largo block size in a fixed block sizo filo
system leads to
(9) better disk throughput but poorer dick space
uuilization
(b) botier disk throughput and better disk space
utilization
(6) it does not matter as the total memory size is
same
(@) poorer disk throughput but better disk space
utilization
228. Which of the following statements are true about
aging?
(a) Is divides momory into units of equal czo,
(b) Ie permits implementation of virtual memory.
(It suffers from internal fragmentation
(4) Ie ouffers from external fragmentation,
29, ‘The number of entree in an inverted page table
(9) & equal to the number of pracestes,
(b) is equal to the number of page frames in the
main memory.
(c) ie equal to the sto of tho page frame.
(4) is equal to the number of page frames in cache
memory,
80. In a virtual memory system, the addresses used by
the programmer belongs to
(a) memory space
(6) address space
() physical space
(4) main memory spaco
31. Power consumption of processors can be vastly
reduoed by making use of based transistors
to implement the ICs
(a) NMOS only
(b) TTL Schottky and PMOS
(c) PMOS only.
(d) NMOS and PMOS
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
9B iAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE
82, Address symbol table ie generated by the
(3) memory management software
{assembler
{c) table match of associative memory
(2) generated by CPU
88, How many 128 x8 RAM chips are nooded to have
total RAM of 2018 bytes?
S(O) (ope (AVRD
‘84, In 6085 microprocessor, how many 1/0 devices can
be interfaced in 1/0 mapped 1/0 technique?
(3) Either 256 input devices or 256 output devices
{b) 81/0 devices
{6} 256 input devices and 256 ousput devices
(4) 512 input-output devices
‘85. After reset, the CPU starts the execution of instruc
tom from memory address
(my
c) 0000,
36. In a microprocesve sytem, supposo TRAP, HOLD
find RESET pin got stivated st tho same time,
‘tile the pnosssea sas exeouting rome inet
tions the system wil
{a) execute the TRAP instruction
{b) execute the HOLD instruction
{o) execute the RESET instruction
(d) none of these instructions will be executed
(8) $000
(a) FFF
97. In 8085 microprocessor, the programmer cannot
‘access which flag directly?
{s) Sign flag tb) Carry fla
(6) Ausiliary carry flag (d) Parity flag
88. Which of the following is « pseudo-instruction for
808s
(a) SPHL ib) CMP
() NOP (a) END
39. The term “cycle stealing” refers to:
() Inorruptbasod data trator
(b) DMA-based data transfer
{o) Poting mode data transfer
{d) Clock cycle overriding:
40. Which of the following architacture isnot suitable
for the following SIMD architecture?
(a) Veetor processor (b) PLA-bwoal proossor
{6} Von Neumann {4) PALbasd procaor
41. In 4 multiprogramming eystem, which of the fol
lowing concopts is usod?
(a) Data parallelism (b) Paging
(6) Li cache (a) DMA
42. PAL circuit can be defined as
(a) fixed OR and programmable AND logic.
(b) programmable OR and programmable AND logic.
(6) fixed AND and programmable OR logic
(d) fixed OR and fixed AND ogi.
43. IF the clock input applied to a cascaded Mod-6 and
Mod-4 counter is 48 kHs. Thon the output of the
cascaded arrangement shall be of
() 4.8 kite (b) 12 kite
(o) SkHe (a) 8 ke
44. If there are four ROM ICs of 8K and two RAM 10s
‘of 4K words, them the address range of Ist RAM is
{accume initial addresses correspond to ROMs)
(2) (8000}y 0 (OFFF)y—(b) ($000) to (7EFP)y
(6) (8000) to (SFFF)y (4) (6000)y to (OFF)
45. ‘Tho method for updating the main memory as 60cm
a6 a word is removed froma the cache is called
(a) Writethrough, (b) Write-back
(6) Writesave (d) Cachesave
Sot 2
‘The most appropriate matching for the following
pairs
X. Indirect addressing 1 Loops
Y. Immediate ackressing 2. Pointers
%, Autordecrement addressing 3, Constants
(s)X-3Y-22-1 (t)X-1¥-37-2
()X-2¥-8Z-1 ()X-3Y-12-2
2. Which of the following is not a form of memory?
(a) Instruction cache
(b) Instruction register
(6) Instruction opoode
(4) Translation look aside buffer
9. In sorial data transmiscion, overy byte of data is
padded with a ‘0° in the beginning and one oF bwo
T's at the end of byte because
{@) Receiver is tobe synchronized for byte reception.
(b) Receiver recovers lost ‘0° snd ‘from these
padded bite
(©) Padded bits aro useful in parity computation.
(a) None of thase.
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
PRACTICE EXERCKES 99
4. in
complement addition, overflow (6) Horizontal mictoprogramming, vertical micro
programming, hardwired control.
SG ogee soho ee renee erm (4) Vertical microprogrammaing, horsontal mieno-
(b) cannot occur when a positive value is added ¢0 diinkampiarinamasenie
‘negative valu 10. A processor nesds softwate interrupt to
{o) is flagged when the carrie for sign bit and ie deeneereneenneoneberaetien
provios bit match (b) implement co-routines:
Dnemot ashore (c) obtain system gervicas which need execution of
5. Th performance of pipolinad proceson suffers en
{4} the pipeline stages have diffrent delays Y
Pe eacaestive nettions are lopaodent on each 12+ Which ie the most appropriate match forthe tas
— in the fits het with the Vlas inthe sacond Mist?
{) the pipeline stagas share hardware resourcas
{d) all of the above pees ey
6. Match the pairs in the following questions X-Indiet acidcvering, 1 Army implemen
By witing the corresponding letters only ‘aamisisating; TE Wig aes
{A) IEEE 488 (P) Specifics tho intorface for com 7
aonb A gia a 2, Baco resistor Mi Pocsing array a8
{B) IEEE 796 (Q) Specified the bus standard addressing pocameter
for connecting a computer to
‘other devlose including CPUs (9) (XM) (YD (ZW) (b) XM YM (2
“instrumentation bus 12, Consider the folowing data path of a simple nom
{D) RS282.C (8) Sponiflos the bus standaed fr the piplelined CPU. The registers A, B, Ay, Ay, MDR,
back plane” bus called muitibus tho bus and tho ALU aro sit wide. SP and MAR
are 16-bit rogistas. The MUX is of size 8 x (2:1)
{8) (A) = (P)s (B) = (BY; (6) = |S} (D) ~ (Q) and the DEMUX ls of sto 8 x (12). Each memory
{b) (A) = (P); (B) = (Q); (©) = (BY (D) — (8)
sD) — ‘operation takes 2 CPU clock cyclas and uses MAR
{6} (A} = (Q); (B) — (8); (©) — (Bs (D) — (P} (memory address register) and MDR (memory
data register), SP can be dacrementad locally
(dl) (A) — (R); (B) — (8); (C) = (Py; D) = (BY
7. When an interrupt occurs, an operating system
(0) Tenores the intereupt
{b) Always change stato of interrupted process
‘after processing the interrupt
(0) Always resumes execution of interrupted pro
ess after proceesing the interrupt
(@) May change state of interrupted procaas to
‘Blocked’ and schedule another process.
8. RAID configuration of dks are used to provide ‘The CPU instruction “push +, whete r= A or B
(a) Faulttolerance ——_(b) High epaed hs the specification
{c) High data density (a) Nono of the above Map] or
9. Arrange the following configuration for CPU in Pee
decreasing otder of operating specds: hardwired How many OPU clock cyslas are neoded to execute
control, vertical microprogramming, horizontal the “push 7 instruction?
nbroprograraaing @2 8s 4 Ws
{) Hardwired control, vertical mictoprogramming, 4, fy the abeclute addreesing modo,
horizontal mictoprogramming
(b) Hardwired conttol, horizontal ticroprogram- (a) the operand is inside tho instruction.
‘ming, vertical microprogramming. (b) the address of the operand is inside the
instruction
www.engineeringonyourfingertips.ooo100 cHAFTER2: COMPUTER ORGAMZATION AND ARCTECTURE
{c) the register containing the address of the oper:
‘and is epecifiad inside the instruction,
{@) the location of the operand fs implicit,
14, What are the states of the auxiliary earry (AC)
and carry flog (CY) after executing the following
8085 program?
MMVI, SDH
MIV L, oan
MOV A, H
ADDL.
(a) AC =0, CY =0
() AC =1, CY =0
()AC=1, CY =1
()ac=a ert
15. Horizontal microprogramming
{2} does not require uso of signal decoders
{b) results in largerstzad microinstructions than
vertical mictoprogramming
{¢) uses 1 bit for each comtrol signal
(d) all of the above
16. For the daisy chain scheme of connecting 1/0
devices, which of the following statements is trus?
(a) It gives non-uniform priority to various devices.
(b) I¢ gives uniform priority to ll devicas
(¢) Is only usaful for connecting slow devices to
{8 processor device
(a) Ie requires s separate interrupt pin on the pro
‘ceasor for each device
17. A microprogeam conteol unit is required to gener.
‘tea total of 25 control signals. Assume that during
‘any microinstruction at moet two control signals
are active. Minimum nuruber of bite required in
the control word to genotate the required control
signal is
18. The correct matching for the following paite is
19.
8
a1.
a1.
|. A decimal number has 64 di
www.engineeringonyourfingertips.ooo
(s) A-4,B~3,0-1,D-2
(b) A-2,B-1,0-3D-4
(JA-4&B-3.C-2D-1
() A-2,B-3,C-4,D-1
1/0 redirection
() implies changing the namo of fle
(b) can be employed to use an existing file as input
file for a program
(6) implies connecting two programs through a pipe
(4) none ofthe above
‘Themain differences) between a CISC and a RISC
processor is/are that a RISC processor typically
(a) has fewer instruction
(b) has fewer addresing modes and more registers
() iscsi to implement wing hardwvited contol logic
(4) all of these
How biz is a 4-way setasscciative cache made up of
16 bits word, 4 words per line and having 1024 sets?
() 82 KB (b) 64 KB
(6) 128 KB (4) 256 KB
Consider the expression to be evaluated as X =
(M+ Nx O/(P x Qs. How many threo address
Instruction are required to evaluate the above
riven expression?
@3 (4 swe
= (M+ Nx OP x Q), how many one
‘addrees instructions are required to evaluate it?
@4 (6 S10
ts Tho number of bits
nnesded for its equivalent binary representation is
(0) 200 (bas (ey 246 (a) 277
- Determine the spead up obtained from pipelining if
latencies for wach stage in single cycle processor is
A.DMA 1/0 1. High-speed RAM con
B. Cache 2. Disk
©. Interrupt 1/0 8. Printer i [p_ [aw [Mew [we
D. Condition code 4 ALU ne [2m [sm [Ane | ine
register
ANSWERS TO PRACTICE EXERCISES
Set 1
Lo 8. (0) 5a) 7. (b) 2 ()
2. (b) 4.19 «(a 8. (0) 10. (a)
www.engineeringonyourfingertips.ooo1. (3) 18. (4) 25. (a)
12. (3) 19. (b} 28. (b)
18. (b) 20. (a) 27. (a)
14. (a) a1. (a) 28. (0)
15. (a) 22. (b) 20. (b)
16. (a) 23. (b) 30. (0)
Ir) 24, (a) ai. (a)
Set 2
1, (o) Indirect addressing mode — Pointer acoess
Iamnodiato addressing moco ~> Constant access
Auto decrement addressing mode —> Loops
2. (@) An opcodo is tho portion of a machine lan-
‘uage instruction that spocifios the operation to be
performed.
8, (s) Bits im the boginning and at the end of byte are
known as start and stop bits, respectively. Start
and stop bits are used for synchronization purpose,
4. (b) When two 2's complement numbers are added,
‘overflow occurs in the following situations:
‘Both operand are postive and the result is negative,
Both operands are nogative and tho resul is postive.
‘So option (b) f true, overflow doss not occur when
positive and negative numbers are added,
5. (4) Diffarant hagarde are caused duo to various
ddopondancies, Difforont. dopendenci for pipolined
rocasor are as follows:
Structural dependency is due to diforent delays in
pipelined stages.
Control dependency is due to consecutive instruc-
dons are dependent on each other:
Data dependency is due to hardware resources
sharing.
6. (c) (A) IEEEE 488 ~ (Q)
(B) IEEE 796 ~ (S)
(C) IEBEE 696 — (R)
«D) RS282-0 ~ (P)
T. (o) When interrupt is caused, the execution of cur-
rent instruction i stopped. After handling inter-
rupt, the program resumes its execution
8. (a) RAID is random array of independent disks
that combines multiple disk drive components into
4 logical unit, RAID configuration provides fault-
tolerance and high speed
9. (b) Horizontal micro programming has high paral
lelism than vertical, So, the speed order is:
10.
a.
12,
13.
14.
15.
16.
www.engineeringonyourfingertips.ooo
[ANSWERS TO FRACTICE EXERCISES. 101
32. (b) 39. (b)
83. (b) 40. (0)
34. (0) 41.)
35. (0) 42, ()
36. (a) 48. (9)
a7. (2) a4, (0)
a8. (a) 45. (b)
Hardwired contecl > horizontal mieroprogramming
> vertical mlcroprogramming
{€) A processor noods software interrupts to obtain
system services which need execution of privileged
‘instructions.
Pointers are used for accessing indirect addrossoe,
Indirect addressing mode — Pointer access
In case of immediate addressing, the constant rep
recent the effective addres,
Immediate addressing mode —+ Constant access
‘Auto decrement/ineromente are usad for loop
variables,
Auto decrement addresing mode + Loops
(a) Push ‘? mamory operation needs 2 clocks
(b) In absolute addressing mode, the address of the
operand is inside the instruction.
«©
carry Auxilary carry
@i111@11
@)# © 101110 dy
HoD)# +0 11 0:1 0:1 Up
—_ Trot o 0 Oy
AC= land CY -0
(@) Features of horizontal microprograraming are
(0) Te dows not require use of signal decoders
(ii) Te results in largot-sized ‘microinstructions
‘han vertical microprogramming
(i) Te wses 1 bit for each contol signal
(a) The daisy chaining method of establishing
priority consists ofa serial ecnnection ofall devices
‘that request an interrupt. Tho davice with the
highest priority is placed in the first position, fol-
lowed by lower-priotity dovicos up to the dovice
‘with the lowest priority, which is placed last in
‘the chain. The farther the device is from the fist
rorition, the lower is its priority. Therefore, daisy
chain gives non-uniform priority to-vatious devices.
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
102 cHArTER2: COMPUTER ORGANZATION AND ARCHTECTURE
17. (10) To generate 2% control signals, 5 bits sre
required. To generate two control signals, the fol-
lowing scheme will be used,
5 bite to ientify first and 5 bits to Kbntiy second
including tho caso when cne of them isnot present,
So total bite required = 10.
. (b) DMA.1/0 — Disk
Cacho ~ Highspead RAM
Interrupt 1/0 ~ Printor
Condition code register — ALU
- (6) 1/0 retiection impliss conmoction eo. pro-
rams through a pipe.
20. (d) The major characteristics of « RISC procesor are
(0) Relatively fo inctruction
(il) Relatively few addressing modes
(ii) Mote resistors
(iv) Hlardwited rather than microprograromed
control
21. (a) Number of bytas per line = 16 bit» 4 byte =
Shytes
Cacho size = 8 x 4x 1024 = 62 KB
22, (b) Thros-adirce instructions ato os follow:
MUL RI, N, 0
MULR2, P,Q
ADD R3, M, RI
DIV X, R3, R2
23. (0) LOAD P(AC © P)
MPY Q(AC = AC Q)
STORE X (X © AC)
LOADN (AG © N)
MPY 0 (AC & AG x0)
ADD M(AC © AC + M)
DIV X (AC © AC/X)
STORE X (X © AG)
24, (b) The number of bits is
1-128 =1
=> 10 2 = r= loge 10 = 218
25, (44) Spoad up obtained by pipelining
_ (p20 S24 419 34g
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
UNIT Ill: PROGRAMMING AND DATA STRUCTURES
LAST SIX YEARS’ GATE ANALYSIS
= Marks 1
= Marks 2
‘Total number of questions
5
‘
3
i
z
2015 2014 213 2012 2011 2010
Concepts on which questions were asked in the previous six years
Year Goncept
2015 Binary tres, Minitaum-maximum heap, By tree, Functions, Recursion, Kuh Smallast
lement in C, Hashing, Stack, C strings, Postfix equation, Binary search tree,
Merge sort, Pointers, Switch-caso, Storage classes, Quick sort, Array, Complexity
2014 Loops, Inorder traversal, Binary search tree, Tree traversal
2013 Binary search tree, Heap sort, Control statements, Recursion, Queuss, Array and
Pointer
2012 Switch 260, Queue, Tose, Functions, Storage classoe
2011 Array, Heaps, Binary search tree, Recursion
2010 __Binary tree, Pointers, Heaps, Recursion, Link list
www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo
www.engineeringonyourfingertips.ooo