Computer Organization & Architecture

Notes for Gate Computer science

Uploaded by

Rakesh Kumar Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

196 views48 pages

Computer Organization & Architecture

Notes for Gate Computer science

Uploaded by

Rakesh Kumar Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 48

CHAPTER 2 www.engineeringonyourfingertips.ooo COMPUTER ORGANIZATION AND ARCHITECTURE ‘Syllabus: Machine instructions and addressing modes, ALU and data path, CPU control design, memory Interface, 1/0 interface (interrupt and DMA mode), instruction pipelining, cache and main memory, secondary storage 2.1 INTRODUCTION 2.2 COMPUTER ARCHITECTURE Computer architecture snd organization is the science of interconnecting hardware components, design- Ing and configuring the hardware/software interface (9 full functional and performance goals of a computer. ‘This chaptor outlines the bssic hardware structure of a ‘medam digital programmable computor, the basic laws for performance evaluation, designing the control and data path hardware for a processor, concept of pipshining for executing machine instructions simultaneously and designing fast memory and storage systems. ‘Computer architecture deals with the structure and behaviour of computer system ss viewed by the user. Tt encompasses instruction formaats, the Instruction set architecture (ISA) and addressing modes Computer organization deats with the operation ‘and interconnection of the various hardware components, 2.2.1 Register Set ‘Tho computer noods rogistos for processing and manipulating data and for holding memory addresses that are available to the machino-code programmer. Some registers for a basic computer are given in Table 2.1 ‘Table 2] Typos of ett nd th fants Rogister Register Function’ Symbol__Name DR Data register Holds memory operand ACC ——Accurmulator Special purpose processor regietor AR Address Holds address for register memory (Continued www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 58 CAMPTER2 COMPUTER ORGAMZATON AMD ARCHITECTURE ‘Table 24 | Continua Rogistor Register Function Symbol__Name 1 Trstruction Holds an instruction rogiter ‘hat is to be executed PC Program Holi address of countar instruction vo be sececuted next TR Temporary Holds temporary data register if equine INPR Input revister Holds input character OUTR Output register Holds output charactor 2.2.2 Quantitative Principles to Design High-Performance Processor Amdahi's law focused on performance gain after enhanc. ing the system. The performance gain is denoted by Syyaat atid ET stands for execution time, Performance of thesystem with enhancement Performanceof thesystem withoutenhancement Seen = ‘Afier enhancement, the system consists of two portions ‘unenhanced and enhanced portion. ETTyey = ET of the unenhanced portion + ET of enhanced portion ‘To caleulato Tyo the following two factors are nove 1, Practiotyjanee (F): It indicates how much por tion of the old sytem imdetgoe= enhancement 2, Spoed tance (8)T indicates how many times the new portion i Funing faster das the eld portion, 4g Patfermanciger! _ YETyP _ ETP PeeformanceggP ~ YETaF ~ ETargF EtaF So, Eye 2 (On the basis of the above factor, ET yee F = ETya(l =F) + Eye Substitute the value of ET in Eq. (2.1: ETF Eyer = Ela = F)-+ 3 Lot ETy = 1, 1 rae Soret = www.engineeringonyourfingertips.ooo When the system consists of multiple frequent cases, ‘where #is the number of frequent eas Snes =[t-Za)E8]" Problem 2.1: Consider « hypothetical procassor weed im mathematical model simulation, It consists of two functional units, floating point and integer. The flost- Ing point is enhanced then it runs two times faster, but only 10% of the instructions are floating point. What is the spood up? ‘Solution: Hore 2.3 MACHINE INSTRUCTIONS AND. ADDRESSING MODES Machine instruction is an individual machine code. The complete set ofall machine codes recognized by a parti. ular processor makes its Instruction Set. Instructions can to gtonped according to the function they perform. The nhuraber of ways by which arguments for thase machine Instructions can be specified constitutes the addressing modes for a processor, 2.3.1 Machine Instructions. [An instruction is a command to the microprocessor to perform @ given task. Most computer instructions are chasified as follows: 1, Data transfor instructions: Those instructions ‘move data from one place to another in the com- ‘puter without changing the data content. Example: LOAD, MOVE, IN, OUT, PUSH, STORE. 2. Data manipulation instructions: Theeoinstruc- tions perform arithmetic, logical and shift opara- tions on data. Example: ADD, SUB, MUL, DIV, ING, AND, XOR, OR, SHR, SHL, ROR, ROL, 3. Program control instructions: ‘Thes» instruc ‘tons may change the address Value in program cour- ter and cause the normal sequential flow to change ‘On the basis of the number of address folds in an inetruction, they are classified a¢ follows: 1. Three-address instruction: Computer with thresaddress instruction format can use each address field to specify (yo sources and a dastinae ton, which can be either a processor register ot a ‘memory operand. It results in short program but raquires too many bite to specify three addrassns Example ADD R,, A,B (Ry = MA) + MB)2. Two-address instruction: Boch addres field can sponify ether processor register of memory word, Example: MOV Ry (Ry © MA) MULE, R, (RoR tR) 8. Oneaddross instruction: Ie used an impliod sccumulator (AC) register for ll data manipuls- tion. Tho othor operand is im register ot memory. Example: LOAD A (AC & Mal); ADDB (AC@ AC + MB) 4. Zoro-address instruction: A stack organized computer doo not uso an address fio for the Inetruction ADD and MUL. Problem 2.2: (a) ISA of a processor consists of 64 rogisters, 125, instructions and 8 bits for immediate moda. In @ siven program, 20% of the instructions take one input register and have one output register, 30% have two input registers and one output register, 20% have one immediate input, and one output register, and remaining have Gro immediate imput, 1 register input and one output registot Calculate the number of bite requited for each instruction type. Assumo that the ISA roquires that all instructions be @ multiple of 8 bits in length. {b) Compare the memory space required with that of variable longth instruction eo Solution: {a) Since there are 125 instructions so wo mood 7 bits to differentiate them as G4 < 125 < 128. For 64 rogietors, we nosd 6 bits and 8 bits for immediate mode. Por Typo 1, 1 rog in, 1 neg out: 7 + 6 4+ 6 = 19 bits ~ 32 bits For Type 2, 2 reg in, 1 reg out: 7+ 646+ 6= 26 bits ~ 82 bits For Type 3, 1 imm in, 1 reg out: 7+ 6+ 8 = 21 bits ~ 82 bits For Type 4, reg in, 2 mm in, 1 tag out: 7+ 6 + 8+ 84625 bits 48 bits {b) As the largest instruction type requires 48. bit instructions, the fixed-length Instruction format uss 48 bite per instruction, Variable length instruction format uses 0 x $2 +038 x $24 02 x 82+ 0.2 48 = 36 = bits on average, that is, 25% lass space, www.engineeringonyourfingertips.ooo 223, MACH MSTRUCTIONS AND ADDRESSING MODES 5D 1. provide programming flexibility ¢0 users through ‘ue of pointers to memory, counter for loop control, data indexing and program relocation, 2. reduce the sizo of the addressing field of the instruction, ‘Lot us suppose [a] moans contents at location 2 for all the addressing modes. 2.9.2.1 Types of Addressing Modes 41. Implied modo: In this modo, tho operands ate implicitly stated in the instruction. Por example, rogistor reference instructions euch ae CMA (comple. ‘ment accumulator), CLA. (clesr accumulator) and sgenovaddress instructions that we stack organization. 2. Immediate mode: In thie mode, the operand is specified in tho instruction itself, that is, address flold is roplaced by an actual operand. Immediate ‘mode instructions are useful for initializing rogis- ‘ets to a constant valus. For example, used for ini- ‘alizing CPU registers to some constant value such as MOV Ry, #34. Instruction with immediate mode Opsode Data Oparand 3. Rogistor (direct) mode: In this modo, the opor- ands are in CPU registers. An mbit register field can specify any one of 2" registers, Example: ADD R, vill add the contents of an accumulator and contents of R,, that is, ACC = [ACC] + [Rj] Instruction with rogister direct mods Opoode [Register acdrese| CPU rogistor Operand 4. Rogistor (Indirect) mode: In this mode, the instruction format specifies a CPU register which contains an effective addrass of the operand resid. ing in momory. This mode ensuras less numbor of bits to specify a register value than to specify a ‘memory location, Example: ADD @ 2 will add the contents of an accuraulstor with contents of the register R,, that is, AC = [ACC] + [(Rj]) Instruction with rogister indirect mode 2.3.2 Addressing Modes ‘The addressing mode specfise how affective address of an operand is calculated from an instruction. Computers use various addressing mode techniques to: Opeode [Register address Memory. OPU register Pointer to |_[Oparand! operand www.engineeringonyourfingertips.ooo5. Auto-increment or Auto-decrement mode: ‘This is similar to register indirect mode except the registor containing effective address is incremented or decremented after (or before} its value is used to access memory. 6. Direct address modo: In this modo, the offs. tive addrass of an operand is equal to the address part of the instruction. Example ADD A instru fon adds content of memory cell A to accumular tor, that i, ACO = [ACC] + MA) Instruction with diroct address modo Opcode | Memory address 7. Indirect address mode: In this mode, memory address specified by address field contains the address of (pointer to) the operand. Example: ADD @ A will add the contents of the memory call A, that is, ACC = [ACC] + MIM[AI). Instruction with inditect address modo Opeod | Memory address Memory {__,] Pointer to operand 4 ‘Operand ‘8. Rolative address mode: In this mode, che effactive adress of an operand is obtained by adding the cone tent of a program counter to tho address part of tho ‘struction. ‘The addres part of the instruction can be ‘lther positive or negative reprasentad in 2s comple. ‘ment, The reult obtained after adding the content of ‘the program counter to the address fed produces an effective address whose position in memory i relative to the acldrees of the next instruction. 9. Index address mode: In this mode, the effective address of an operand ie obtainad by adding the content of an index ragister to the addrees part of ‘the instruction, The indax rogiste isa speclal CPU register that stores an index value and the address fiold of the instruction stores tho base address of a data array in the memory. The distance between the base address and the ackiness of the operand is ‘th index valuo that ie stoted in the index resistor ‘The index reglster can be incremented to facilitate access to consecutive operands stored in arrays using the same instruction. www.engineeringonyourfingertips.ooo (60 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE 10. Base register addressing mode: In this mode, the effective address of an operand is obtained by ‘adding the content of a base rogister to tho address part of the instruction, This is somewhat: similar to ‘the indexed addressing mode except that the tase rogister stores hace or beginning adress instead of an index register, It is used for program relocation. Problem 2.3: A two-word instruction LOAD is stored at location 800 with its addrass field in the next location, ‘Tho addross field hss value 600 and value stored ‘st 600 le 500 and at 500 is 650. Tho words stored at 900, 901 and 902 are 400, 401 and 402, respoo- tively. A processor register R contains the number 800 and index register has value 100, Evaluate the ffective address and operand if addressing mode of the instruction ie a¢ follows: 1. Direct 4, Immediate 2. Indiroct 1. Register indirect 3, Relative 6. Index Solution: Memory layout is as follows 00 [BaD ao. [oon 500 600 ‘700 Mode Address: Direct 009 500 Indirect 300 650 Relative on 402 Immediate 801 600 Register indirect 800 700 Index 700 #0 Problem 2.4: A relative modo branch type instruc- ton is stored in memory at an address equivalent to decimal 600 and the branch is made to an address ‘equivalent to decimal 400, What is the value of the talative address fleld of the Instruction (in decimal)? Solution: Relative address = 400 ~ 001 = -201 www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 24 ARITIMETIC LOCK MIT 61 Table 2:2| Arithmetic irc function table Select Tnput to, ‘Output of Binary “Micro-Operation fe Adder ¥ Adder D=AtV4 On ° 0 0 D=A Transfer A ° ° ° D=A41 Increment A ° 1 B D=A+B Add o 1 B D=A+B+1 ‘Add with carry 1 ° B D=AsB Subtract with borrow 1 ° 1 B D=AyBy1 Subiract 1 1 ° 1 D=A-1 Decrement 1 1 1 1 D=A ‘Transfor A 2.4 ARITHMETIC LOGIC UNIT ‘Arithmetic logle unit (ALU) is « combinational circuit that performs all arithmetic and logic operations s0 that the entire register transfer operation from the source reg Isters through the ALU and inio the destination register ccan be performed during one clock pulse period. 24.1 Arithmetic Micro-Operations ‘Th basic arithmetic micro-operations such as addition, subtraction, increment, decrement and shift ate performed fon numeric dats stored in ragistos, The basic component cof arithmetic is parallel binary adder, and by controlling the input to adder, different micro-operations can be realized, Figure 21 depicts a 2-bit arithmetic circuit which includes to full-addor citcuits and two multiplosees for choosing diferent arithmetic micro-operations. There are two 2bit input numbers A snd Band 2-bit output. D. The two inputs from A go directly to X inputs of full aor. ‘The output of multiplaxer goss to input Y of full adder Figure 2.1] A 2: arithimti cuit www.engineeringonyourfingertips.ooo By controlling the output Y of multiplacers with two selection inputs §, and 5, snd Gi, either 0 or 1, we can ‘generate the eight arithraetic mio-opetations (Table2.2) 2.4.2 Logic Micto-Operations Logie micro-operations such as AND, OR, Exclusive OR, fc, consider vach bit of register separately and specify binary oparatios for stings of bts (Table 2.3} Table 23 | ‘Typee of micr-operatons ‘Micro-operation Name. Feo Cleat PAB AND Peak Pea ‘Trataer A FeaaB Per ‘Trans B Feaes Exclusive OR Peave or FeavB nor FeT@B Excusiea NOR reR Complement B FeavB Fea Complement. Fedve FeaaR AND Peas Sot toall 's(62 MPTER> COMPUTER ORGAMZATION AND ARCHITECTURE Logical micro-operstions ro capable of manipulating individual bits of « portion of word stored in CPU registers, Let us consider the data in a register A. In another register, Bis the operand that will bo used to modify the contents of A using Iogic micro-operations, Some of the ‘applications are as follows: 1. Selective sot operation: In this, If bit in Bis set to 1, that same position in A sets to 1, other wise that bit In A retains its previous value. 11004 1.0.1.0 B¢Tosetsome bits in A) TT TO Agt4e ase) 2. Soloctive complement operation: If « bit in Bis set to 1, that same postion in A gots comple: ‘mented from ite orginal Vale, otherwise (remains ‘unchanged, 1100 & 1.0 1 0 B(To complement some bite in A) OTTO Aide sem) 3. Selective clear operation: If bit in Bis st to 4, that sume position in A sets to 0, otherwice it remains unchanged. 1100 & 1.0.1 0 B(Tockar some bits in A) O10 0 Ade A-B) 4. Mask operation: If bitin B i et to 0, that samo position n A sets to 0, ctheraisoitramains unchanged 1100 4 10.1 0 BéTockar some bits in A) F000 AgiiA@ AB) 5. Clear operation: Ifthe bits in the same position in A and B aro the same, they are cleared in A, elso they are sot in A 1100 4 1010 8B OTTO Aide AB) 6. Insert operation: Iti usad to insert specific bit pattern into A ragicter, lawving the other bit post ‘ions unchanged. This is accomplished by co sub- operations: making operation to clear the desired bit position, followed by OR operation to introduce the now bits into the desited positions. Suppose you wanted, {o introduce 10 into the low order two bits of 4: M01 A (Original) and 1110 A (Desired) { A (Original) Mack A (Intermediate) ‘Added bits A (Desired) www.engineeringonyourfingertips.ooo 2.5 CPU CONTROL DESIGN Central processing unit (CPU), or the brain of « com- ‘puter, performs the data processing opetations It consists of three major parts: register set that stores intermedl- ate data during instruction execution, ALU performs the required micte-operations and control unit that super vices all other elements for the tranefer of information from one register to the other. ‘The main function of CPU isto ftch an instruction from the memory and execute it. CPU ts divided Into three types of organizations: 1, Single accumulator organization: In this, one ‘operand ts implied in the sccumulator, a special purpoce register, and the other operand isa register ‘or the memory. Example: ADD Ry (R, < AC + R)), LOAD A (AC © A), STORE (Af 7} < AC). 2, General register organization: In this, the CPU will have several general purpose tesis- tots which lead to shorter and efficient programs because registers ate faster. Example: ADD Ry, Ry (Ry © Ry + Re). Figure 22 shows bus organization for threo registers Rj, Ry and Ry. The output of these registers and one from the external input fs connected to two multiploxere A and B. The two External ‘rat 1 Bend stp Pleure 22 | Goel rege orealaton soloct lines SELA and SELB from multiploxors A and B select one of the input and feed to ALV. ‘OPR specifies one of the possible operation codes that ALU will perform on the data inputs and the ‘output is transferred either to one of the registers using 2 x 4 decoder or to the extemal output say ‘metory. The control word (Fig. 2.8) for the two ‘oparand instruction is as follows: 2bite 2bits bite 4 bite Figure 23] A conta wd www.engineeringonyourfingertips. ooo8. Stack organization: Stack may consist of number of registers ot « part of main memory in which dat items are stored in coneacutive locations that are ssccassed by IFO (last in, frst out) mochsnism. As ‘here sIenited nuabar of registers «part of menncry |e implemented ae stack for storage and retrieval of www.engineeringonyourfingertips.ooo 25 Pu CONTHOL DESK 6B repeated continuously for s complete program and is mown as the feteh—execute cycle (Pig. 24). The fol lowing steps are performed for executing, an instruction intormadiato data. Stack pointer (SP) koops track of the top item of stack. The procos of inserting Toad PC contents anew item onto a stack is known as push accom to MAR ished by fet incrementing stack pointer and thon T Inserting am iter fom the data rele haeeEoe sPesP +1 point to next Hae instruction ‘Tho process of removing an itom from the top of Tana stack is known as pop performed by frst transfr- dodo MAR o Te ting data into DB and then dacrementing SP. Tc MBAR DR © MSP} 1 SPeSP-1 Devoe the instruction Problem 2.5: A cystom has CPU organized in tho form of general reglster organization consisting of 16 Toad any data registers, each storing 82-bi data. Assume the ALY oquited into MDR has 85 operations (s) How many multiplexers are therein A bus and B bus, and what is the sizeof each multiplexer? {b) How many selection inputs are needed for MUX A and MUX B? {¢) How many inputs and outputs aro therein a dasodar? {d) How many inputs and outputs are there in. ALU for data, including input and output cartes? {0} Formulate a control word for the system, ‘Solution: {a) 82 Multiplexers, each of size 16 x 1 {b)4 Inputs each, to eoloct ono of 16 rogistors. {fe} 4 to 16 ~ Line decoder {d)32 + 92-41 = 65 data input lines (e) $2 +1 = 98 data output lines Abits dbits 4 bits SELA | SELB [ SELD Obits OPR ‘Sot PO to value from Jump insot Biscute the instruction Tarvice the Interrupt 2.5.1 Instruction Execution A CPU generally executes one instruction at time sequentially and 9 soquonce of such instructions is known ae a program. The CPU exocutas the instructions that reside in the main memory. In order to exscute fan instruction, the OPU has to fetch the instruction first from che main memory into one of Its registers, Ii then decodes the inctruction, that ls, it decides what the instruction intended to do, etch operands required and finally exeeutes the Instruction, This process is Figure 24 Insretion le 1. Fetching the instruction: The next instruction is fetchad from the memory address that is saved in the program counter, and metaory content fatchedd is stored in instruction register (IR). ‘The program counter then points to the naxt instruction that will be read in the next eycle 2. Decode the instruction: During this cycle, the Serco tc: Te roa Wy She WWW.engineer ingonyourfingertips.ooowww.engineeringonyourfingertips.ooo (64 oarTeR> COMPUTER ORGAMZATION AND ARCHITECTURE 3. Oporand fetch: In caso of a direct or indirect memory instruction, the exacution begins in the next clock cycle. If the instruction has an inditect adds, tho effective address of the operand is read from the main memory, and the required data is fotched from the memory into momory data regis. tors. If the instruction has direct address, nothing is done at this clock eyele 4. Execute the Instruction: The control unit of the CPU paesee the instruction decoded by decoder fs & sequence of control signals to the different functional units of the CPU to execute the tacks required by tho instruction such as reading, valuas from registers or input devices, performing mathe. matical or logic micro-operations by ALU, and writing the result back to a register ot main memory. 2.5.2 CPU Data Path CPU contains data paths that are reeponsible for routing data betwoen the functional units of a computer. The following are the different data path structures available for routing: 4, Single bus structure: In thie architacturo, al CPU registers are connected to the same bus, Data can be ‘ansforred either betwoon CPU registers or botoen CPU register and ALU at a given clock pulse The ‘sped of operation is slow as only one oparand ean bo ‘transferred in one clock eyele and addition operation (Bi, © Pa + R,) occurs in three clock eycles, 2. Two bus structure: All general purpose CPU rogistors aro connected to both buses say bus A and bus Bs but special purpose registers are divided into ‘wo groups, say group 1 connecting bus A to pro gram counter and one imput of ALA and group 2 connecting bus B to MDR (Memory Data Register) ‘and other input of ALU, ‘The two operands are ‘transferred (© ALU in 1 clock cycle and the addition operation (R, @ Ry + Ry) occurs in 2 clock cycles, 3. Three bus structure: The performaance can be farthor be improved by using throo busas such that ‘addition operation (2, = R, + R,) can occur in one clock evele, 2.5.3 Control Unit Design Control unit is considered as brain of a CPU that con trols various units in the data path. The performance of control unit is important as it determines the clock cycle ‘of the processor. Control unit can be designed either by hardwired or by microprogram, 1. Hardwired control: Control unit is made up of sequential and combinatorial circuits to ganer- te the control signals and interpret instructions (Fig. 2.5). The instruction decoder decodes the Instruction loaded in instruction regieter. Thestep www.engineeringonyourfingertips.ooo decoder generates separate control line for each step in the control sequence. The encoder gots its input signal from the decoder, step decoder, ‘external Input and condition codes and generates individual control signals, Tt is faster and more ffclent but lose flauble and is dificult to add new feature or correct mistakes in original design. Glock —o[ Control stop [Reset_ counter a External inputs ra) codes Bnd Figure 2.5| Blok dingra of hrdvied control unit 2. Micro-programmed control: Control signals are generated by using programming known as ‘micto-programs that constitutes micro-instructions (control word) (Fig. 26). Memory that is part Extornal MEET semecee EF nt as a an eo ea <— Clock Control adres ealater Address Read [ Gontrat Contral Control signals signals to ‘within CPU system bus Plgure 2.6] Blok diner of micro programmad contaof OPU is known ae control memory: and stores tmicro-instructions, ‘The micto-program sequencet generates the address of micro-inetraction accord. ing to instruction stored in instruction register The address of micro-instruction to be executed {s available in content addressable ropster. Micro program sequencer issues read command to read rmlero-insteuction ftom control memory into mcr Instruction register which on execution generates control signals for vatious parts of a processor. This control unit design is more flexible to accommodate now features and less errot prone but quite slower ‘han the hardwited unit. ‘The format of the comteol word ie Contra signal Branch] Flas condition ‘Conrad memiey address (On the basi of the type of control word supported, it ie divided into two types: 1, Horizontal micro-programmed control unit: In this design, the control signals are represented in the form of 1 bit per control signal and it supports longer control word, 2. Vertical micro-programmed control unit: In this design, tho control signal is represented by using encoding format, Problem 2.6: Consbler « control unit which has 1024 control word memory; it supports 48 conteol signals and $ flag conditions. What is the size of the control ‘word in bits and control memory in bytes? Solution: (a) Using horizontal programmed control unit Obits bits 48 bits 10 bits Branch | Flag | Control] Control condition signal | memory Size of control word = 61 bits Control memory = (1024 x 61)/8 = 128 x 61 bytes (b) Using vertical programmed control unit Obits Shits 48 bite 10 bite Branch | Flag | Control ] Control condition signal | memory Tog 48 ~ 6 hits Size of control word = 19 bite Conteol memory = (1024 > 19)/8 = 128 x 19 bytes 2.5.4 RISC versus CISC Processors ‘The differences between reduced and complax instruc tion sot computers is given in Table 2.4 www.engineeringonyourfingertips.ooo 26 VOINTERFACE (INTERRUPT AMD OMA MODE) 65 ‘Table 24 | RISC versus CISC RISC (Reduced ‘CISC (Complex Instruction Sot. osu bee Computers) Computers) ‘Less number of registers Supports mote number of ‘Rich register sot Supports leas address modes addressing modes Supports fied length Supports variable length instruction instruction ‘Succestful pipeline with Unsuccessful pipeline ‘ono instruction per cycle Example: ARM, Example: Pentium Motorola procensore 2.6 1/0 INTERFACE (INTERRUPT AND DMA MODE) 1/0 interface bridges the diflerences between CPU and peripheral devices and provides a method for transfer ting Information betwaen internal storage and external 1/0 devices. Thoro are the following thrae modes af 1/0. sransfer: 1, Programmed I/O: The 1/0 device dose not Ihave direct access to memory. It requites execution of several instructions by the CPU and the CPU 1has to wait for the 1/0 davice to be ready for ither reception or tranmlssion of data. 2. Intorrupt initiated 1/O: In this, instead of ‘waiting, the control is transferred from a currently running program to another service program a6 result of an extemal/intomal gonorated request. + Hardware interrupts: These interrupts are present in the hardware pins. + Software interrupts: These ate the instruc. tone wied in the program whinever the required funetionslity is nowded. + Maskable interrupts: These interrupts may bbe enabled or disabled explicitly. ‘+ Non-maskable interrupts: Those interrupts are always thete in the enable state. We cannot disable them by explicit concitions (lags). + Vectored interrupts: These interrupts are associated with th static vector addres. + Non-vectored interrupts: Thove interrupts are sseociated with dynamic vector address, + Extornal interrupts: ‘Those interrupts aro generated by external devices such as 1/0. ‘+ Internal interrupts: These devices ate genot- ‘ated by the internal components of the processor ‘such as temperature sensor, power failure, error instruction, ot: www.engineeringonyourfingertips.ooo(66 GPTER> COMPUTER ORGAMZATION AND ARCHITECTURE + Synchronous interrupts: ‘Theee interrupts fre controlled by the fixed time interval. All the interval interrupts are called as synchronous interrupt. + Asynchronous interrupts: These interrupts ‘re Initiated based on tho foodback of provi cous instructions. Al the external interrupts are called as asychtonous interrupt. 8. Direct memory access (DMA): It is one of several methods for coordinating the data transfers between an 1/0 device and the core processing unit or memory in a computer. It refers to transfor of data directly between a fast storage device and memory bypassing CPU because of its limited spood. DMA. provides a significant improvement im terms of latency and throughput as it allows the 1/0 device to access the memory directly, without using the processor. There are certain advantages of using DMA for data transfor: + DMA saves processor's MIPS as the cote can operate in parallel + DMA saves power because it requires lees cite cuitry than the processor to transfer dato, + DMA has no modulo block sizo restrictions. Direct memory accass (DMA) controller takes over tho control of busos to manage the transfer directly between the 1/0 device and memory. Bus request {BR} and Bus grant (BG) signals are used by the DMA controller to request the CPU to relinguish control of the buses and get the control of system buses (Fig, 27), The DMA oontoller consists of 8 diffrent rogistore: an. addreze register, a control register and a word counter register. To transfor block of data between an 1/0 device and memory, the controller storas intial values in the address register, The DMA channel then trancfers the block of information from oF to memory according to the control register. The starting address of the Write <1WR Bus. lop request <“—] Bur _,f ott BG Interrupt <— Interrupt S| devi Pigure 2] Block digrmof the DMA contol www.engineeringonyourfingertips.ooo block in memory ie given by the address register, ‘and the length of the bytes to transfer is given by ‘the word count register. The controller decrements ‘8 word counter each time it moves-a data byte. ‘There are soversl modes of operation of DMA: + Burst or block transfer mode: In this mode, the entire block of data is transferred once the DMA controller is granted access to the system bus ty the CPU. The bytes of data in the block are transferred before releasing control of the system bbuses back to the CPU. The only disadvantage of this mode is that it renders the CPU inactive for some long periods of ime: + Cyclo stealing mode: In this mode, the DMA controller obtains accees to the eystam busas like burst ‘mode; but after one byto of data transfer, the control of the system bus is releaeed back to the OPU via BG. It is then continually requested again via BR, transferring one byte of data per request, until the ‘entire block of data has boon transferred. This mode Bs suitable for tho systems in which tho CPU cannot ‘be disabled for the considerable length of time as in burst transfor mode such as for controllers moni- toring tho data in real time, Tho advantago is that CPU ie not idled for ae long ae in burst moda, but the data block isnot transferred as quickly. ‘+ ‘Transparent mode: It isthe slowest yet more ef ficient data transfer mode in terms of overall systera performance. The DMA controllr transfers data only ‘when the CPU is busy in performing operations that do not uso the eystam buss. So, the CPU naver stops ‘exeouting its programs but the bigswst disadvantage is complex hardware circuitry that noxds to dater- ‘mine when the CPU is not using the system buses. A DMA toad transfers data from the memory +9 the 1/0 deviea, while DMA write transfers data from an 1/0 device to memory. ‘The functional behaviour of a DMA transfor outlined in Fig. 2.8 + The CPU transmits the following information to DMA controller: (9) beginning addrese in memory which i stored in address register in DMA controller. (b) Number of words to transfer which is stored in ‘word count register in DMA Controller. (6) direction (memory-to-1/0 device or 1/0 device tomemoty), port ID, DMA mode of transfer snd ond of block tranefer either through intorrupt request oF no interrupt request which is stored in control register as command word. + ‘The processor them relinquishes control of address, data and control buses to DMA Controller and returns to other processing activities while the DMA controlar starts the data transfor botweon 1/0 device and memory. www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 27 mSTRICTON PwELAING 67 Invorrupe Random Access ae DE Memory (RAM) “BR RD WR Address Dsta| RD WR Address Date T Read contol ] * Write conrad Addreee seloct 4 RD WR Address Data Ds DAA acknowledge RS Direct Memory ‘Access (DMA) BR “controller hae Tsonrupt Figure 2.8| DMA controtbr intercommscton with memory, CPU and 1/0 devi, + When the DMA controller accesses memory, it synchronizes this memory request with an idle period of the processor, thus disabling the processor, of requesting « halt of the processor, and awaits an acknowledgement, + After the completion of the block transfer, the DMA Controller either raises an interrupt request if the Interrupts are enabled ot indicates the completion im i status register and the processor recognizes 1/0 completion (either by interrupt signal or by roading the status rogister) and gots ite system Tbusee back and normal procassing starts. ‘The ddavice has to initiate a new data transfer through, DMA request signal which is again acknowledged by CPU through DMA acknowledge signal via DMA controller 2.7 INSTRUCTION PIPELINING In carly computers, cach instruction complotaly finished bofore the execution of the naxt one began. The hardware circuits needed to perform different operations of an instruction eyele are different and most part of these prooassor circuits are idle at a given moment of time. Thase processor circuits wait for the other parts of the processor to complete its part of execution first. Instruction pipalining is a technique for overlapping the execution of several instructions to reduce the overall sxecution timo of a sot of instructions and there is no ‘need to walt of the mest part of the prooasior circuit for the other parts of the processor to complete their part cof execution. Pipeline speed is limited by the slowest Pipeline stage. ‘Throughput of «processor it the rato at which opara- tions get executed. Latency i the amount of time that & single operation takes to execute, In an unpipelined computer, throughput = 1/latency, as each operation exo utes by itself and for pipelined computer, throughput > L/latency, since execution of instruction is overlapped. Consider a segment pipeline with a clock cycle time ,, wed to execute n tasks (Fig. 2.9). An equivalent non- pipelined syetom takes 7, time to compote each task. ‘The spend up of a piped system aver a non pipelined system is given by the following rlaticn: axt, Gen, ‘Theoretically, maximum specd up that a pipelined systom can achiave is given by the following equation: KT 5-2. Pipelining Hazards: ‘Those haeards reduce the ideal spel up gainod by pipelining by preventing the next instruction in the sequence from being executing luting its dosignoted clock pub. Hazards forcas the pipeline to be stallad. There are three types of hazards: www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo (68 APTER> COMPUTER ORGAMZATION AND ARCHITECTURE eich an Decode Fetch an Bxocute |__| write instruction instruction operand {instruction [result ack Ono clock cycle ‘) Write Foichan |! |! Ly) | Jretch an]! |) execute |! |) instruction Laem operand instruction beg Pipeline latch / overbead/delay ‘One clock ‘One clock One clock One clock ‘One clock eyele eyele cycle cycle oyele Pipeline Stages () Poe rjafela}s|elz|s Segment bFatch | hh | ty | ts | Ie Decode Alb | | ete epatand Al ala] y Bxecute Alh|b|% Write back: 4|e]a| 4 © Figure 2.0 | (a) Unpipelined process. (b) Pipelined five-stage processor. (©) Timing digram of fvestage instruction pipsline. 1, Structural hazards: These roeult fom resource conflicts when the hardware cannot support Instructions that nead simultaneous execution in pipoling 2. Data hazards: They arise when an instruction ‘depends on the reeult of « previous instruction and. that result isnot yet calculated. ‘There sre thtoe situations in which data hazards + Read after write (RAW), a true dependency + Write after read (WAR), an anti dependency + Write after write (WAW), an output dependency Consider two instructions inet1 and inst? ‘ccurring, with insti occurring before inet? in the program order, + Road after write (RAW): A read after write (RAW) data hazard is @ situation in which an www.engineeringonyourfingertips.ooo Instruction refers to a result which is yet not been calculated, that is, in this inst2 ties to read a source before insti waites to it. This situation arises if the tead operation by instruc ton takes place before write done by other instruction. For example, insti: RI <-R1 + RZ Anst2: R4 12845 67 8 0 10 11 12 18 1 1B 16 17 Fach Zh ib Loh Decode 1 == =m h = if Operannd Fetch i PhERR | Ha ee F Beecute h- - bh-- - bh kh - 4k www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo TO GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE Problem 2.6: Acsume a simple &-vtage pipeline (IF, ID, E, DP, W) each stage takes « single cycle, Assuring there are no cache misses. How many cycles would the following code tako to axscute if there is no spacial hardware to improve performance in the presence of hazards? MOV adx,ocx+100) MOV eb [orx}104) ADD edx.obx MOV [ocx+108)ecbe MOV saxocx-109] ADD obx.cax ‘Solution: The above code takes 14 cycles to execute, as shown below a23 4 5 6 7 8 8 1 i a2 18 1 oD DF OR W iD DFE W IF oD DF sll BOW FID stall DF stall W 1 sul DP sul stall EW Ir 1D _stall_DF_ stall_stall_stall EW Problem 2.9; In the tslow figure, calculate the total execution time aftr which the rvul f the fourth task eter- ing the pipe abovo roady? —{p D x of MEM we} > oe t= we me om ‘Solution: sos DS o© © © © © © © Intl IF ID EX EX MEM MEM WB Inst ir 1D EX EX MEM MEM WB. Inst rr }D EX EX MEM MEM WB. Inst Fr oD EX EX MEM MEM WB ‘Thorofore, tho total execution time is 65 ns Problem 2.10: What is the mesn overhead of a pipe: Problem 2.12: Calculate the time required to perforra lino with 8 stages and an execution time per stage of 1000 operations in a Gstaged pipeline with an execu- Qns? tion time of $ ns por stage? Sohtion: The mean ovethoad = (Stages — 1) x aati tine 7, =(k-14n)xT = Execution time por stago = (8 — 1) x TK 14 1000} x2 = 8.018 ws Problem 2.11: How many stages has a pipeline that | | Problem 2.13 Calculste the mean overhead ofa pipaline schiever& spond of 9 for 100 operations? with 7 stage and an execution timo pr stage of 2 nel ‘Solution: ‘Solution: Mean overhead of pipeline = nxk _ nx (xT, -,) Speed = 3 99 = nt Oe text = (7-1) x2 = 19 5 www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 27 msTRICTON PwELAING TH Problem 2.14; Consider a pipeline with 5 stages: IF, ID, EX, M and W. Assume that each stage requires one clock gel, Show how tho following program sogment for adding 2 arrays procaeed and compare the clock cycle ‘needed in non-pipelined system with pipelined system when result of the branch instruction Le, content of is available stor WE stags LOAD Ra #400 Li: LOAD Rt, 0 (RA); LOAD Re, 400 (RA); ADD Ri, RI, RQ STORB RS, 0 (Ra) SUB RG, RA, #4; BNEZ R4, Li; Solution: Number of cycles = [Initial instruction ++ (Number of instructions in the loop Li) x Number of loop yeh] % Number of clock eyclas/instruction (CPT) = [b+ (6) » 400/4) x 5 = 3005 ‘Timing diagram for one loop iteration in a pipelined system is as follows: LOAD RA #400 IF ID EX LOAD Ri, 0 (Ra) © D LOAD Ra, 400 (Ra) IF ADD Rs, R1, R2 STORB RS, 0 (R4) SUB R4R4, #4 BNEZ R4, Lt weg [Number of cycles in the loop = 15 \Nurnber of clock cycles for segment execution on pipelined processor = 14 (Number of elck cycles in the loop LI) x Numer of loop cycles = 1+ 19 x 400/4 = 1901 Number of Clock cycles for the program execution on non-pipelined prooassor ‘Nurnber of Clock eycles for the sogment execution on pipelined prooassor Speedup = 008 = $005 = 9 times Problem 2.15: Consider a S-stage pipeline with stages: For all following questions we aseume tha contains stages: IF (instruction Fetch), IS (sue), FO (Fetch operand), (Execute) and W (Write). xcept B requires one clock cyclo and system has 4 Functional Units for floating point operations, FP load/store, FP addition/sublraction, FP multiplication and FP division, (c) Execution stage for Load Store operations raquies 1 clock cycle, for ADD ot SUB operations raquirs 1 clock evel, for MUL operation requires $ clock cycles and for DIV operation requires 4 clock cycle. All memory references hit in cache. Pipeline has forwarding circuitry forall Us, excopt FP-Lood/Store where operand is tay after Westago, www.engineeringonyourfingertips.ooo72 GaPTeR> COMPUTER ORGAMZATION AND ARCHITECTURE ‘Timing diagram of is presented below: www.engineeringonyourfingertips.ooo 22 8 4 5 6 7 8 9 1 1 2 18 4 1 16 TOAD P2R5) IF IS FOE W LOAD F2, 28(R5) sD FOE Ww MUL FO, F2, Fa 1S sal wll FOE B B OW SUB FS, F6, FS wos FOB W DIV F10, PO, Fé IF 1S tall stall stall tal FOE B EB B OW ADD F6, F8, F2 rows POR Ww STORE FS, 048) ros rok W ‘Kientify the basards in the following instructions from the following list (Structural, Data, Control, RAW, WAR, WAW, None) 1, MULT FO, F2, Pa and STORE FS, 50(R5) 2. DIV F10, FO, F6 and ADD F6, FS, F2 8. MULT Fo, F2, F4 and DIV F10, FO, Fé 4. DIV F10, FO, F6 and ADD F6, ‘Solution: 1. Structural; 2, Data; 9. RAW; 4. WAR. 2.8 MEMORY HIERARCHY “The storage media can be categorized in hierarchy accord ing to their speed and cost (Fig. 210). As we move down the hierarchy, access time incroasce and cost por bit doeteasas Increasing ost and speed Decreasing size Figure 2.10] Memory hierarchy. Memory It isthe central storage unit that directly communicates with the CPU. Tt is designed using semiconductor- Intograted circuits and needs constant power supply to maintain the information, Ts expensive as compsted to uriliary storageso it has limited capacity. Examaple: R/W (read/write) memory or RAM (random accass memory} 2.8.1 Mai and ROM (raad only memory). Integrated RAM chips are available in two modor: 41, Static RAM: It stores the binary information in flip flops and information remains valid until power ‘is supplied. Tt has faster access time and is used in ‘implementing cache memory. 2. Dynamic RAME It store tho binary information sa charge on the capacitor. It requltes refreshing circuitry to maintain the charge on the capacitors ‘after few milliseconds. It contains more memory calls per unit area as compared to SRAM, 2.8.1.1 Memory Interfacing If tho required memory for the computer is larger than the capacity of one chip, it is necessary 10 connect multiple RAM and ROM chips to a CPU through the data and address buses (Fig. 2.11). The low-order addrots bus lines select the word within a chip and other lines select a particular chip through its chip select inputs. Assume a computer system needs 256 bytes of RAM and 512 bytes of ROM. The configuration of RAM chip is 128 x § and ROM chip is 512 x 8, The RAM and ROM chips required are as follows: Number of RAM chips = 256/128 = 2 Number of ROM chips = 612/51 www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 28 mewony WERARCHY 73 ‘The memory interconnection is depicted in the following diagram: Chip select 1 hip select 2 hip select 2 | |» Data busin Read. Bae Chip select 2088128 | oipt mode only Write ate ROM chip. agree ‘Address bus JAD, @) Figure 2.11 | (6) RAM chip, &) ROM chip Problem 2.16: A computer employs RAM chips of 256 x 8 and ROM chips of 1024 s 16, The computer system nosds 2K bytes of RAM and 4K bytes of ROM and four interface units each with four registers. Draw a memory address ‘map for the system and give the address range in hexadecimal for RAM and ROM chips. Solution: RAM 2018/256 ~ § chips; 2018 — 211; 256 = 28 16% 419 1 10 9 ROM 4096/1024 = 4 chips; 4096 = 212; 1024 = 210 Interface 4 4 = 16 rogers; 16 = 24 _Component Address RAM OOF 9 9 9 ROM SO0HFFF ig yg Inteface 5000.800F Loo oo e oo x ooo 2.8.2 Secondary Memory Secondary memory, also known as susiliary memory ot fexternal metnory, can store « large amount of data at Taaser cost por byte than the main meanory. Thay are ‘non-volatile in nature, that is, dats is not lost when the device is powered off. The most common auxiliary stor ‘age devices wed in consumer eystems are lash memory, ‘optical disks and magnotic disks. 1. Flash memory: Flsch metory is an electronic non-volatile fastest computer storage device that can be elactrically erased and reprogrammed, Example: lash drives and solid state drive. 2. Optical disk: Optical disks are low-cost mass storage devices from which read and write oparae ‘tions are performed using laser technology. Optical disks can store huge amounts of data up vo 6 GB. 6 billion bytes). Difforent types of optical disks are CD-ROM (compact disk read-only), WORM. (orite-once read-many), EO (erasable optical disks) and DVD. 3. Magnetic disk: A magnetic disk is composed of a circular platter made of metal or plastic and www.engineeringonyourfingertips.ooo coated with magnetized material on both sides. ‘Multiple dicks are stacked over one ancther on the spindle with read/write heads on each surface. Bits fare stored as spots on magnetized surface along concentric citelss called tracks. Tracks ate further divided into wedge-shaped sectors 4. Magnetic tapes: It consists of tape made up of plastic covered with magnetic oxide coating. Tapes ‘are mounted on rook, Bits are recorded as magnetic spots on tape along several tracks. R/W heads are ‘mounted in each track so that data can be recorded ‘and read as a sequence of characters. Seven or nine bits are recorded to form a character together with ‘parity bit. Data is recorded in contiguous blocks ‘separated by inter-record gaps, Cache Memory It iz a special memory that compensates the spasd mismatch between processor and main memory access time. It temporarily stores frequently used instructions fand data for faster procassing by the CPU. Cacho hit ratio is calculated to measure Ke performance, If a data74 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE item requested by the CPU is found in cache it is called hit otherwice it is miss. Hit ratio is defined a ratio ‘of number of hits divided by total CPU references to memory. Number of hits Number of hits + Number of misses Average access time = Hit ratio (h) = it ratio T, + (L—Hit ratio, +7,,) where 77 is-cache access time and Ty, is the main memory cease time. 2.8.3.1 Blements of Cache Design ‘The various elements of cache dasign are as follows: 1, Cache size: It should be optimum, small enough to keep average cost per bit close to the main ‘memory and large enough to keop overall average access time cleee to the cache memory. 2. Mapping function: It dascrites the mapping of main memory block to cacho block. Thero are throo different mapping techniques: fully asoclative, direct tapped and set associative cache organization. 8. Replacement algorithm: When new memory block is required in cache, one of the existing blocks must be replaced by a new block. Example: FIFO (frst in, frst out), LRU (least recently used). 4. Write policy: Cache memory follows write. through and writeback updating policies. In ‘writethrough policy, cache controller copies data Immediately to main memory as data is written in cache, The data in main memory is always valid, Dut this approach reduces system performance. In write back, update to memory block is delayest until the updated cache block is replaced by a new block, 2.8.4 Cache Mapping Techniques ‘The cache memory cam store & reasonable number of blocks, but this number is always stall 2s compared 10 blocks in tho main memory to koep average cost por bit low. The corraspenclonco between memory blocks and ccacho block is specified by tho following mapping toch- ‘niques. Consider a cache memery consisting of 2K words ‘with 128 blocks of 16 words each. Number of bite requited to addross a acho block is 11 bits. Main memory has 64K words and bits required to address is 16 (Fig. 2.12). ‘Main memory AK x8 Figure 212| Cache maping example www.engineeringonyourfingertips.ooo 2.8.4.1 Direct Mapping Im this technique, each block from the main memory has only one possible location in the cache memory. In this example, say a block from main memory maps onto « block (é mod 128) of the cache, I there are 2" worde in the ache memory and 2" words in the main memory, then nnvbit main memory address is divided into two figkds: n bits for index flld to access the cache and (mn) bits {or the tag fiskd, Each word in cache consist of the data and the assoriated tag, Whenever a new block is brought Into cache, tag is stored along with data bits, Index fekd is further divided into Bleek and word if there are mul tiple words (say &) in a block. The lower kits select ono of the k words In a block known a8 word field. The block field is used to distinguish a block from other blocks. Tag (mn) bite Index (n bits) ‘Tag (m— n) bits | Block (n — B) bits ‘Whon CPU gonorates memory request, tho block field points t0 a particular block location in the cache. The high-order tag field is compared with tag bits associated with that cacko location. If they match, then tho desired ‘word is in that block of cache. If there is no tastch, then the block containing the required word must be loaded 10 cache first (Fig. 2.13) Word (& bits) Main mamory addres Main memory s[zla Block 0 ‘Tag Block Word Block 1 4 + Coche memory |_| Bleak TF Block (the) 11 Bloat 128: + it + 4} -{ Bina Block 127 7 Paameee i» ‘| Boake aoa Figure 213] Diset mapped eas ganization, ‘The demerit of ditect mapping is that hit ratio drops considerably if two or more words having same index atid diferent tags are accassad consocutivly one after the other: 2.8.4.2 Pully Associative Mapping In this tochniquo, « main memory block can be placed ‘nto any cache block location. Its the most flexible cache orgsnization. The main memory address is divided into www.engineeringonyourfingertips.oootwo flelds: word and tag. The assoctative memory stores both tho addrass (tag) and data of the main memory. Figure 2.14 shows the mapping of different blocks mto cache. High-ordar 12 bits of CPU address is placed in the argument register of the associative memory and come pated 10 tag bits of each block ef the cache to soe if the ddosired block is present. Onos the desirad block i pres. sent, 4bit word is used to extract necessary word from the cache. ‘Min momnory address www.engineeringonyourfingertips.ooo 28 mewomy WERARCHY 75 contain the desired block. The highorder tag fled is then compared associatively to the tags corresponding to the matehad set, [fa match cccurs, the corresponding word is read from cache eke main memory is referred land block containing that word is brought into cache for futuro reference (Fig. 215). Tag Set Word 6 [ela ‘Main memory address Ps fain memory HE neem seofls Fig o| Data} tac 2| Dat wfad Main memory Tae Word rae Cache memory Block 1 Tae 9 TDatel 12 bits (23) Block} Block 4005) Figure 2.14| Associstive mapped cache organisation, Tt te necessary to compare high-order bits of main ‘memory with all tag bits corresponding to each block to find whothor a givon block is present in cacho, s0 itis the most expensive. 2.8.4.9 Set-Associative Mapping As fully associative mapping is an expensive solution and direct mapping does not allow words with same index bat differont tag to exist im cache, set associative mapping if @ combination of both, [tis an improvernent over direct mapping where contention problem is solved by having ewveral choices for block placement. The figure bolow shovws two-way set arsociativo cacho bocause each block of main memory has two choices for block place ‘ment in cache. A block i in the main memory can be im any block belonging to sot ¢ mod $ of cache, where ‘Sis the number of sets. The block 0, 64, 12 con of main memory can map into any of the in sat 0. ‘Tho main memory address is divided into throo fields: low-order bits for word fiekl, set fell to determine the desired block from all possible sete and high-order bits for the tag fild. Each word in cache consists of data and the assoclated tag. Tag ‘Sa [Want ‘When the CPU generates a memory roquast, the sat fiold points to a particular set of tho cacho which might I sot 63 Tas| Date|Tox i BF Ine as Figure 2:15] Sebamiciative mappd cache againtion Problem 2.17: Consider a memory hierarchy system containing a cache, a main memory and a virtual memory. Assuming, cache access time of 5 ns, amd 80% hit ratio . The access time of the main momory is 100 ns, and it has a 99.5% hit rate. The access time of the virtual memory is 10 ms. Galeulate the average access time of the memory hierarchy. Solution: As we know, the hit rate of virtual memory Js 100% the average access time for requests that reach the main memory as (10 ns x 0:09) + (10 ns 0.008) — 50,090.5 na Given this, the average access ‘timo for requests that reach the cache i (5 ns 0.80) + (50,090.5 ns x 0.20) = 10,024 ns. Problem 2.18: A computer uses RAM chips of 1024 x 1 capacity. (a) How many chips are needad to provide a memory capacity of 16K bytes? (b}How many of these lines will be common to all chips? Solution: (a) Chips are needed to provide s memory espacity of 16K bytes = 16 x8 = 128 chips (b) Using 14 address lines (16K = 2!) we have 10 lines specifying the chip address which is common 10 all chips. www.engineeringonyourfingertips.ooo76 APTER> COMPUTER ORGAMZATION AND ARCHITECTURE www.engineeringonyourfingertips.ooo Problem 2.19: Consider a 2-way sot associative cache consisting of 256 blocks of € worde each, and assume ‘hat the main memory is addressable by 16-bi¢ address fand it consists of 4K blocks, Calculate the number of bite in each of the TAG, BLOCK/SET and word fields for diflerent mapping techniques? Solution: For direct mapping, word field is of & Dits to identify 8 different words in a block (2* = 8). ‘As cache mamory conslete of 256 blocks co (2° ~ 286) 8 bits aro roquirad to adios a block bocauso thero is one-to-one correspondence of block k in main memory to block (ke mod 256) in cache memory. The remain ing 5 (16~ 8 ~ 8) high-order addrass bits ato tag bits. ‘Thus, the main memory address for dtect mapping is divided as follows: Block hie Word a bite ‘Ta Shite Por fully associative mapping, number of word Problem 2.20: The access time of a cache memory fe 200 ne and that of main memory te 2000 ns. It is ‘estimated that 70% of the memory requests are for read and remaining 20% for write. ‘The hit ratio for read accesses only Is 0.9, A writethrough procedure is used (a) What is the average accass time of the system considering only memory road cycles? (b) What is the average access time of the system for both read and write requests? (©) What is the hit ratio taking write eyclasalso? to consideration the ‘Solution: (8) Average acoaes time = 0.9 200 + O41 x 2200 = 180 + 220 = 400 ns (b) Average aooms time = 0.8 2000 + 0.7 » 400 = {600 + 280 = 880 ns (0) Hit ratio = 07 x 0.9 = 0.68 bits are same, that is, 3 bits, Cache memory stores both tag and data ‘The highorder tag bits of an address generated by CPU aro compared with tag Dits of each block so number of block bits is zero. AM remaining bits (except word bits) are identified as tag bigs. Thus, the main memory address for fully asso ciative mapping is divided as follows: Tag __| Wed Tete | sbi For two-way set-assoclative mapping, cache ‘motnory ie mapped with two blocks par sat, The word fiold has same number of bits, $ bite. There are 128 (296/2) sots so 7 bits are required to uniquely identify these 128 sets. The remaining 6 (16 — 7 ~ 8) bite are used a¢ tag bits. Thus, the main memory address for set-aceociative mapping ie divided ae follows: Problem 2.21: A 4-way sot-associativo cacho memory tures blocks of four words. The cache can aocommo- date a total of 1024 words from the main memory. ‘The main memory size is 128K x 82, (3) Formulate all pertinent Information required to ‘construct the cache memor (b) What is the sizo of the cache memory? Solution: (a) Main memory is 128K = 2"7 so 17 bits address is generated. For a sot size of 4, the nurnber of sets In cache is 296/4 = 64, s0 6 bits are required. Each block consiets of € wotde so 2 bite are required for the word field ‘Toe [Se [Weed a:_|96_| Wad Shits [The | sone WEL ELS (b) Since cache is 4-way set assocative, 4 blocks per st are stored in cache memety. Tat [Daal [Tae? [Data? [Tes [Daas | Tet |Daaa stis | sebiis [ovis | s2bis [otis [setts [otic | a2 tits Sao i Tag [data | toe | Data [tae | Data [Toe | Data aes ‘Size of cache memory is 128 x 4(9 + 92) = 5248 bits www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo Problem 2.22: Suppoce physical memory is of 2GB ‘and each word is of 16 bits. There is a cache containing 2K words of dats, and each eache block contains 16 words. For each of the direct mapped and 2-way set associative cache configurations, specify how the address would be partitioned. Solution: (9) Por direct mapping, the word field i of 4 bits Identify 16 diferent words in block (2 = 16) ‘As the cacho memory consists of 2K words which is equivalent to 2K/16 = 128 blocks so (2? = 198) 7 bits are required to address a block. Tho remaining 20 (81 — 7 ~ 4) highordor address bits ore tag bits. Thus, the main memory address for dcct mapping i divided a follows: Dobne [Tbe [4 bie Tea Bok] Word IMPORTANT FORMULAS Sowen BaMPLES 77 (b) Since cache is 2-way cot ageociative, 2 blocks per s9t are stored in cache memory. So, the number of sete is 128/2 = 64, bis [Obie [4 bite Ta [Se | Word Problem 2.23: Consider a direct mapped cache of size 82, KB with block size 82 bytes. The CPU generates 82-bit addossos. What are number of bits needed for ‘addrossing block in cache and number of tag bits? ‘Solution: Tos Block [Word 32 —10— Shits | 10 bits | obits ‘The number of bits needed for addressing block in cache and number of tag bits are 10, 17, respectively. + Amdahi's law Sova = [o- DaA+ x4] + ‘The speed up of pipelined system over non-pipelined ayer is given by: aes §=Gin-Dey + Maximum speed up that piplined eystem can achiovo is given by: a s k SOLVED EXAMPLES + Wit ratio (A) Number of hits [Norabar of hits + Number oF misee + Average access time = Hit ratiox 7, +(1~Hit rato)(7: + Th) where 7, is cache arcas time and T,, is main memory Sccess time, + Direct mapping [as (m= oy tit bak fn tte [Word (eit) + Savanchve mapping [fe [Sa [Wena 1. The principle of locality is used in (a) Invorrupe {b) Registers (e) DMA (4) Cache memory Solution: It is used in cache memory to help the program accass small amounts of addrass epace at any instant. Ans. (dj 2. Which memory unit has loweet acoass time? (a) Cache (b) Rovisters {c) Optical dik (4) Main memory Solation: Registers are usa for processing fand manipulating data and for bolding memory address that are available to the machine-cole programmer. So, they havo lowet accos time ‘Ans by www.engineeringonyourfingertips.oooTB APTER> COMPUTER ORGAMZATION AND ARCHITECTURE 8. During DMA transfer, the DMA controller takes over the buses to manage the transfer (a) Directly from CPU to memory {b) Directly from memory to CPU {¢) Directly between the memory snd registers (d) Directly between the 1/0 device and memory Solution: DMA controller manages transfor between 1/0 dovice and memory. Ams. (4) 4, Booth’s algorithm is wed forthe arithmetic oparae tion of, {3} addition {b) subtraction. {) multiplication, (2) division Solution: It is a mukiplication algorithm that rmuliplies two signed binary numbers in 2 com. plement notation, Ans. (c} '5, The reason for improvement in CPU performance during pipelining is {a) reduced memory access time. {b) increased clock speed, {e) introduction of parallelism. (4) increase in cache memory. Solution: Instructiomlovel parallelism is imple mented within a single processor to allow faster CPU throughput. Ans. (9) 6. ‘Use of eacho memory enlancos (0) 1/0 access time {b) memory access time, {c) effective memory sccees time, (d) secondary storage access time. Solution: Cache memory compansatas the sped mismatch between processor and main memory ‘access time, Ans. (e} 77. Am instruction eycle refers to {9) fetching an instruction. (b) executing an instruction. {o) fetching, decoding and executing an instruction, (4) reading and executing an instruction. Solution: It involves fetching, decoding and exe. fcuting the instruction, Ans. (9) 8. A hardware interrupt is also called (9) an internal interrupt. (b) an extornal interrupt. www.engineeringonyourfingertips.ooo (6) @ processor interrupt. (4) « clock interrupt, Solution: “Hardware interrupt is prosent in hard- ‘ware pins. Ans, (0) ). Priority is provided by for acces to memory by various 1/0 channels and procssars (a) a register 4b) counter 10. nu. 12. 13. (c) the processor scheduler (9) acontelie Solution: Controller sets priotity for memory access by various I/O devices and processes. Ans, (d) By applying the principle of temporal locality, processes are likely to reference pages that (a) have bom telerencad rocently. {b) are leated at address noar recently referenced pages in meron: {) have boon praloadad into memory. (d) have to be reloaded into memory. Solution: ‘Tamperal locality refs to rows of rexouroisrelorecad within a sir time Kanne Ans, (a) Which of the following is @ corect statement ated to L2 cache mernory? (a) The level 1 cache is always faster than the level 2eache (b) The level 2 cache is used to mitigate the dynamic slowdow every time level I cach miss occurs (c) Level 2 cache comes as on board only. {d) In modern day computer, the level 2 cache is conshdored an internal cacho. Solution: 12 lavel of cache is placed botwoon the Li and RAM. Tho LI cache is always the fastest. Ans. (a) ‘What is the control unit's function in the CPU? (a) To docodo program instructions (b) To transfor data to primary storage (c) To perform logical operations (€) To store arithmetic operations Solution: Control unit controls savaral units of CPU and helps decade program instructions. Ans. (a) (CPU fotchos tho data and instructions from —__ {a) ROM (b) control unit (o) RAM (4) coprocessore chip www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo ua 15. www.engineeringonyourfingertips.ooo Solution: ‘The OPU fetches data and instructions from RAM and axcoutes it Ans. (e} ‘Which of the following affects the proceeeing power of tha CPU? (a) Data bus, addrassing schemes {b) Clock spoed, addressing schemes {c) Clock spovd, data, bus {d) Clock speed, data bus, addressing schomes Solution: CPU processing speed ie affected by flock eyele, data bus for routing data and address ing modoe. ns. (@) ‘The contents of the flag registor after execution of the following program by 8088 microprocessor willbe Program sUBA MVIB, (0g, DORE aur {8) Gy) (Oy (©) (O1py (4) 5) q Solution: SUBA Subtract comtants of memory Inc. tion whise contents are stored in A MVE, (O1)y Move B with (O1)y DORB Decrement B gives (54), mur Ans. (a) An Nebi¢ cary lokeabwad adder, where Mi a raul Liple of 4, employs TCS 74181 (40-Ie ALU) and ‘alse (Abit cary lookahead generator). ‘The minimum addition time using the best arch. tecture for adder is (a) Proportional to 8 (0) Proportional to be W {¢) Aconstant {d} None of the above Salation: The addition time for N-bit carry locke ahead adder is always a constant Ans. (©) ‘The main memory of « computer has 2 cm Mocks while the cache has 2e blocks. Ifthe cache sss the fet-ascative mapping scheme with two becks pet fet, then Hock Kot the rain mencry mays to the ss {2 (kod m) of the cache {b) (emod ©) ofthe cache {c} (k mode 2c) of the cache {6) (mod 2em) of the cache 18. 19. SoLveD BAMPLES 79 Solutions [Number of sets in cache Number of blocks in cache _ 26 ~ Number of Blocks in one set 2 ‘Number of sets in cache = ¢ So to map tock k, main memory maps to the sot (mod o of the cache. Ans. (b) Which of the following is/ate advantage of virtual memory? (a) Faster memory to memory on an average (b) Processes can be given protected address spaces (6) Linker can assign addresses indopondent of whore the program will be loaded in physical memory (4) Programs larger than the physical memory size can be run Solution: Virtual memory allows. programs to hhave @ larger epace than the physical memory elze ‘to rum. So, the programmer doas not have to worry bout physical memory size, Ans. (d) ‘The muraber of full and half adders required to add 16-bit numbers is {o) 8 half adders, 8 full adders {b)1 half adder, 16 fll ads {c) 16 half adders, 0 full addor (4 half adders, 12 ull adders Solution: To add bit numbers Number of half adders required = 1 [Number of full adders required = N= 1 ‘Thorefora, to add 16-bit number only, 1 half addor and 16 full adders aro require. Ans. (b) |. Which of the following requires « device driver (a) Regicter (b) Cache (6) Main memory (a) Disk80 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE Solution: Disk is the 1/0 device attached exter. nally to the processor, Therefore, disk requires device driver. ‘Ans. (d) 21. More than one word are put in one cache block to {@) exploit the temporal locality of roference in a program (b) exploit the spatial Locality of reference in a program, {) reduce the miss penalty {d) none of the above, Solution: "There ane wo types of fealty of teeters temporal and spatial locality. ‘The concept of spatial locality, instead of fetching {just one item from the main memory to the cache, 1s usoful to fetch several ems that reside at ada comt sddrase as well ‘So, option (b) is correct. ns. (b) 22, Which of the following statements is fabs? (4) Virtual memory implements the tandation of program's addross space into physical memory addres space. (b) Virtual memcey allows each program to exces ‘he sie of the primary memory” (6) Virtual memory increases the degree of ‘multiprogramming (4) Viruial memory raduoes the context-switching overhead Solution: Virtual memory increases the context. switching overhead. Ans. (4) GATE PREVIOUS YEARS’ QUESTIONS www.engineeringonyourfingertips.ooo 28. Advantage of synchronous sequential citeuits over ‘seynchronous ones ie (a) faster operation. (b) ease of avoiding problems due to hazards. (6) lower harewaro requirement. (4) better noise iramunity. Solution: Because of lees dolay, synchronous sequential ctoults have Faster operation than syne chronous onee. Ans. (a) 24. ‘The total siz of addness space im a virtual memory system Is limited by (9) the length of MAR (b) the available socondary storage (6) the available main memory. (4) all of the above, Solution: Virtual memory depends only on the ‘available sito of the socondary raemory. ‘Ans. (b) 25, Comparing the me 7, taken fr single nstruce jon on a pipelined CPU with time 7, taken om a ‘not-pipelined but identical CPU, we can say that @) 4st (b) 12% 1 COMPUTER ORGAMZATION AND ARCHITECTURE 18. A dovice with data transfer rato 10 KB/s is con nected to 4 CPU, Data is transferred. byto-wise, Let the interrupt overhead be 4. The byte transfor time between the device interfaces register and CPU or memory is negligible What is the mink ‘mum performance gain of operating the device ‘undar interrupt modo over operating #¢ undor pro rammcontrolled mode? (0) 15 (3 (6) 35 (aye (GATE 2005; 2 Marks) Data transfor rato = 10 KB/s 4x10 Solution: Interrupt overhead 10 KB is sent = 18 1 B is sent = 1/10K Minimum performance gxin 00 - 1077s 100 x 10/4 x 107 ‘Ans. (b) Common Data Questions 19 and 20. Consider the following data path of a CPU. The ALU, the bus ‘and all the registers in the data path are of identi cal siza All operations including ineremantation of the PC and the GPRs are to be carried out in the ALU. Two clock cysles are needed for memory read operation ~ the first one for loading address in the MAR and the next one for loading data from the memory bus into the MDR. MAR = MDR ‘GPRS 19. The instruction “add Ry, Ry" has the register ‘transfer interpretation Ry e= Ry + Ry. The mini ‘mum number of clsek eyelas netded for execution ceyeloof this instruction is (a2 wa ibys (ays (GATE 2005: 2 Marks) Solution: There will be three eyelee—(1) jag Sq (2) Raoys Tin 04 (3) Sys Tras ALT ap Ba “Ans. tb) 20. The instruction “call Ry sub" isa two-word instruct tion. Assuming that PC is incremented during tho a1. 22. fetch cycle of the first word of the instruction, ite rogister (ranefer interpretation is R,& PCH; PC & MPC); ‘The minimum number of CPU clock eyelae neoded during the execution cycle of this instruction is @2 3s 4 (as (GATE 2005: 2 Marks) Solution: The minimum number of CPU clock cyelas neoded during tho arecution cycle = 4. This is bocanso 1 cycle is required to tranefer already incremented valuo of PC ‘eycles for getting data in MDR 1 to load value of MDR in PC Ans. (6) Consider a disk drive tho following ‘specifications: 16 surfaces, 512 tracks/surface, S12 sactors/track, 1 [KB/seetor, rotation speed S000 rpm. The dik is open- ated In eycle stealing mode whereby whenever one 4 byte word is raady itis sent to memcry; similarly, for ‘writing, the dick interface reade a 4 byte word froma the memory in each DMA cycle. Memory cycle time fs 40 nsoc. The maximum percentage of tim that the CPU gets blockeal duting DMA operation i: with (a) 10 (b) 25 (49 (a) 50 (GATE 2005: 2 Marks) Solution: Data transfer in one rotation = 512 x 1024 Bytes 1 rotation takes = 2-5 3000 © ‘19KB is transferred in = 3055 & 1 byto will be transferred 4 bytes wall be transferred = 60x x 512% 1024 ~ 152.58 ne 40 152.28 id © spp 01201024 Block % Ans. (b) A.CPU has 24-bit instructions. A program starts at ‘address 300 (in decimal). Which one ofthe following {sa legal program counter (all Values in decimal)? (a) 400 (b) 800 () 600 (a) 700 (GATE 2006: 1 Mark) www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 24. www.engineeringonyourfingertips.ooo Solutions Size of instruction = 24 bite; Start address ~ 200. Legal adress will be multiple of three, that is, 300, Ans. (c} = A CPU has a cache with block size 64 bytes ‘The main mamory hae & banks, each bank being fe byte wide Consseutive e byte chunks are ‘mapped on consecutive banks with wrap around All the & banks can be accessed in parallel, but two accesses to the same bank must be serialize. A cache block access may involve multiple iter tions of parallel bank accesses depending on the amount of data obtained by accessing all the k banks in parallel. Esch iteration requires decoding the bank numbers to be accessad in parallel and ‘this takes A/2ns, The latency of one bank access Is 80 ns. Ie = 2 and k= 24, the latency of retriev- ing a cache Hock starting at address zero from the ‘main memory is (a) 928 {b) 104 ns (0) 1726 (a) 184 ns (GATE 2006: 2 Marks) Solutions ‘Time for one parallel process = k/2 + latency ‘Time for one byte = 24/2 + 80 = 92 ‘Total time for c bytes = 2 x 92 = 184 ns Ans. (d) A OPU has a fivestage pipeline and runs at 1 Gllz frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch ‘ngtruction computes the target addrass and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instruc. ons following a conditional branch until the branch outcome ie known. A program executes 10° instructions out of which 20% are conditional branches. If each instruction takes one cycle to complete on average, the total execution time of the program is (a) 105 ib) L2s (14s id) 6s (GATE 2000: 2 Marks) Solution: ‘Total execution time ofthe program = 10° +020 x2x 10? = 148 Ans. (3) Consider « now instruction named branch-on-bit- set (mnemonic bbs). The instruction *bbs reg, pos, label” jumps to label if bit in position pos of reg ‘ster operand reg is one, A register is 92 bits wide 27. ‘CATE PREVIOUS VEARS QUESTIONS — 85 and the bite are numbered 0 to 81, bit in pection O being the least significant. Consider the following ‘emulation of this instruction on processor that does not have bls impletmented, temp @ reg & mack Branch to label if temp is non-zero, ‘Thovariabletempisatemporary register. For correct emulation, the variable mask must be generated by (a) mack 0x1 « pos (b) mack <0 S97f74EF > pos (o) mack € pos (4) mask 0% 5 (GATE 2006: 2 Marks) Solution: As thereis only onebit with pos, the other bits need to be set to 0 i temp. The mask register ‘must have 1 in pos position, for which pos number ‘of left shifts over 1 need to be made. Ans. (a) Common Data Questions 26 and 27: Consider vwo cache organizations. The fist one is $2 KB, 2-way sot associative with 32-byte block size. The second ‘one is of the same size but direct mapped. The size of an addres is $2 bite in both casas. A 2-to-1 mule tipleser has 6 Istency of 0.6 ne while a Kbit com- pparator has a latency of k/10 ns. The hit latency of ‘the sot-associative organization is Jy while that of ‘tho dinect mopped one is = Tho waluo off is (a) 24 ns (b) 23.ns (9) 188 (17s (GATE 2006: 2 Marks) Solution: ‘Address bits = 32 Block size = 82 B = 5 bits Size of cache = 82KB/32 = 1KB = 10 bits For 2way sotassoclative memory — inclax bits ~ 9, tag bits = 18 B/10 = 15/10 = 1.8 + latency = 18 +06 = 24 ‘Ans. (a) The value of hy le (a) 24 ne (b) 23 ns (6) 18 ns (4) 17 ns (GATE 2006: 2 Marks) Solution: Indirect memory accoos: ‘Tag bite ~ 17; Index ~ 10 bits; word = 5 bits A/10 = 17/10 = 1.7 + latency = 1.7 +06 =28 Ans. (b)www.engineeringonyourfingertips.ooo www.engineeringonyourfingertips.ooo (OAPTER2 COMPUTER ORGAMZATION AND ARCHITECTURE Common Data Questions 28 ond 2% A CPU has 8 22. KB direct-mapped cache with 12&-byte block size Suppose A is a two-dimensional array of size 512 x 512 with elmnents that occupy 8 bytes each. Consider the following two C code segments, Py and Py for (ie0; ic8127 i+) ( for (=O J<5127 344) x e=AC O17 ’ ) Pat for (10; 1<5121 144) ( for (m0 3 COMPUTER ORGAMZATION AND ARCHITECTURE 97. How many dats cache misees will occur in total? (4 (BD (Ba) 0 (GATE 2007: 2 Marks) Solution: Main memory ~ 2!° B Block size = 64 B [Number of blocks = 82 [Number of elements = 50 x 50 = 2500 ‘Starting from Ioeation 1100 means from 68th block ‘Number of blocks = 2500/64 = 40 blocks Initially cache is empty for $2 misses, then § are remaining fom total 40 for one acoass Array is traversed twice, 60 data cache misses = 10+ 848— 56 Ans. (c} 38. Which of the following lines of the data cache will be replaced by now blocks in accessing the array for the second time? 9) line 4 ¢0 lime 11 {6) line 0 to line 7 Solution: Applying k mod ¢ to find the location: 68 mod adn {b) line 4 to tine 12, {4} line 0 to line s Ans. (3) 30, Which of the following e/ate true for the auto- incremant adérossing mods? tis wef in creating selteocating code, TL I ic mcladad in an Testruction Sot Architece ture, then an additional ALU is required for sifectve addres caloulation ML The amount of increment depends on the siz of the data item accosead (a) only {e) only (b) Monty (4) ML and IIL only (GATE 2008: 2 Marks) Solution: Only statoment (IM) s true. Ans. (c} 40. Which of the following rust be true for the RFE. (Return from Exception) instruction on @ general Purpose processor? 1. It must be a trap instruction. IL Ik must be » priviloged instruction. INL An exception cannot be allowed to occur during ‘exeeution of sn RFE instruction, {a) Tonly (b) Tonly {o) Land Tl only (4) 1, Hand TH only (GATE 2008; 2 Marks) Solution: RFE (Return from Exception) 1s a privileged trap instruction that is executed when an 42. www.engineeringonyourfingertips.ooo ‘exception occurs, 60 an exception is not allowed t0 ‘execute, Option (d) is the correct option, Ans. () Por a magnetic disk with concentric circular tracks, tho seck latency is not linearly proportional to the seek distance due to {9} nonuniform distribution of requests (b) arm starting and stopping inertia (6) hicher capacity of tracks on the petiphery of the platter (4) use of unfair arm scheduling polisies (GATE 2008: 2 Marks) Solution: ‘Tracks on magnetic disks are concentric and seek Iniency from one sector to other which ‘may ot may not be in different tracks. ‘This sock distance is not proportional to latency since the ‘racks at periphery have higher diameter, and hence higher capacity to store data, Ans. (0) Which of the following are NOT true in pipelined processor? 1 Bypassing con handle all RAW hazards. TL Register renaming can eliminate all resister carried WAR hazards NL Control hazard penalties can be eliminated by dynamic branch prediction, (@) Land IL only (b) Lond MI only (c) Hand Mf only (@)1 Wand (GATE 2008: 2 Marks) Solution: All the statements are true. Ans. (d) - Por inclusion to hold between two cache Iovele L1 and 12 in a multi-level cache hierarchy, which of the following ate neceseary? Lt must be a-write-through cache TL L2 must be a writethrough cache IIL The associativity of L2 must be greater than that of Lt TV. The 12 cache must be at least ae lange ae the Li cache (a) IV only () 1, Mand 1V only (b) Land IV only: (4) 1,1 MM and 1V (GATE 2008: 2 Marks) Solution: LA and 12 cache are placed between ‘CPU and they can be both write through cache ‘ut not necessarily. Associativity doos not matt. 12 cache must be at least as large as Lt cacho, since all the words in L1 ate also in L2. Ans. (8) www.engineeringonyourfingertips.ooo44. The ure of multiple register windows with over lap causes a reduction in the number of memory accesses for 1 Function locals and parameters IL Register soves and restores TIL nctruction fetches (a) only (b) Monly (6) Monly (4) 1, Wand mH (GATE 2008: 2 Marks) Solution: Multiple register windows with over lp causet « roduction in the umber of memory scoesses for register savas and restorer Ans. (b) 45. Consider a machine with a 2-way setassocative data cacho of size 64 KB and block size 16 bytes. ‘The cache is managed using 92 bit virtual addresses and the page size is 4 KB. A program to be run on this machine begins as follows: double ARR (2024) (10281; ine 4 37 /* Initlalize array ARR to 0.0 +/ For(la07 L<1024; 444) for(3-0; }<1024; 3+) BRR (1) [31 ~0.07 ‘The sleof double 8 bytes. Aray ARR is located in memory starting at tho boginning of vitual page (OxF FOOD and stored in row major order. Tho cache is insally empty and no prefetching is done. The only data memory references made by the program ste those to array ARR ‘The total sie ofthe tags in the cache direstory Is {a) 32 Kbits: (b) 34 Kbits: {et Kbits {d) 68 Kbit (GATE 2008: 2 Marks) Settions Virtual address = 82 bits 2evay cache size = 64 KB {sot will contain ~ 32 KB entries ~ 15 bite Block size = 18 bytes = 4 bits Tag tis [Sabie | Word bie 17 TH 7 ‘Tag size = 172% 1024 = 84 bits ‘Ans. (b) 46. Consider the data given in the above question. Which of the following array elements has the same cache index ae ARR|O}0} (a) ARRIO|s) (b) ARR) {6} ARR; (a) ARR(S}}] (GATE 2008; 2 Marks) ar www.engineeringonyourfingertips.ooo ‘CATE PREVIOUS EARS QUESTIONS — 89 Solution: Total slements can come in one clot 2048. ‘After 2048 elements, same cache index will be on {2/0} and (4}[0}. Ans. (b) ‘The cache hit ratio for this initialization loop i (a) 0% —(b) %_— (e).80% (a) TH (GATE 2008: 2 Marks) Solution: Cache hit ratio found out se fellow: yom ane 3 As we cameo in te abow, hate wil be 05 its. Ans. (c) Linked Answer Questions 48 and 4%: Delayed ranching cam lp tn the handling of control hazards = Por all delayed conditional branch instructions, Irrexpoctive of whother the condition evaluates (0 true or fabs: (a) Tho instruction following the conditional branch instruction in memory is executed. (b) The first instruction in the fall through path is executed. (©) The frst instruction in the taken path is executed, (4) The branch takes longer to execute than any other instruction, (GATE 2008: 2 Marks) Solution: Tho first instruction following the branch instruction is always executed (irrespective cof whother the branch is taken or not) Ans. (b) ). The following code is to run on a pipelined prosas sor with one branch delay slot: Jp ADD Ry © Ry + Ry 4, SUB Ry « Ry~ Ry Ty ADD Ry © Re + Ry Ig STORE Memory [RJ © R, BRANCH to Label 2, ——0 Which of tho instructions Jy fy Tea legit rnatly cccupy tha delay ts oithout any othor togram modification? @h bb Oh ih (GATE 2008: 2 Marks) Solution: Instruction J, contains delayed lot. Ly thas data dependency in I Ans. (b) www.engineeringonyourfingertips.ooo90 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE 50. How many 92K x 1 RAM chips aro needed to pro- ‘vide a memory capacity of 256 K bytes? (a8 ib) 2 {o) 64 (a) 128 Solution: [Number of chips required ‘Ans, (e) 51. A CPU generally handles an interrupt by executing an interrupt service routine (@) as soon as an interrupt i raised. {(b) by checking the interrupt register at the end of the fotch cycle. (©) by checking the interrupt register after finishing the execution of the current instruction, (a) by checking the interrupt register at fixed time interval Solution: Interrupts are handled by checking the Interrupt raglter after finishing the execution of current instruction, Ans. (¢) 52. Consider a 4-staxe pipeline processor. The number of cycles needed by the four instructions J, f,, 1, Ty im stages 5}, Sy, Sy, 5, is shown below: a | wo] fo] abe www.engineeringonyourfingertips.ooo 5B. Consider © 4-way setassccistive cache (initially empty) with total 16 cache blocks. ‘The main ‘memory consists of 256 blocks and the requast for memory blocks isin the following order: 0, 255, 1, 4,3, 8, 188, 199, 216, 129, 68, 8, 48, 92, 73, 92, 155 ‘Which one ofthe following metnory block will NOT. be in cache if LRU replacement policy is wed? @s 8 (rR ~26 (GATE 2000: 2 Marks) Solution: ‘To decide the location (addres), mod 4 is applied. Sao [0482658 DED Sect [1 155, 129, 7 [Seca [rs [See 3 [255-5 160, OF 216 will not be there in cache. Ans. (d) Common Data Questions 54 azad 85: A hard disk has 68 sectors per track, 10 platters each with 2 record ‘ing surfaces and 1000 cylinders. The address of sector is given as triple where cis the cylinder number, h is the surface number and s is the sector number. Thus, the Oth sector is addressed 5 <0, 0,.0> the Ist sector as <0, 0, I>, and so on, (GATE 2000: 2 Marks) 4. The address <100, 16, 20> corresponds to sector muruber {9} 506035, (b) 505086 (6) 505087, (4) 505008 Solution: ‘Total surfaces = 102 = 20 Adrants: 400 590 x 63 + 16 x 63 + 99 — 808037 ‘ans. (0) What isthe umber ofcyles needed ‘oexecutethe 5, ‘The address of 1089h acto following loop? {a} @, 15, 31> (b) <0, 16, 30> for (i=l to 2) (hi Tat Iai Lei} {c) <0, 16, 31> (d) <0, 17, 81> 7 Sebtione {a} 16 (b) 23) 28 (d} 30 Address of 1039th sector = 16 x 31 + 31 = 1039 (GATE 2000: 2 Marks) rey Sottion: Clock 102 3 4 5 6 7 9 0 Mo Mo 1 55 SH L SEE EEE HY iB 5 & % § 8 4 4 % 8% % % s For two iterations = 15 2 — 80 clock eyelas ‘Ans, (d) www.engineeringonyourfingertips.ooo56. A main memory unit with s capacity of 4 mega ibys bull wing IM [bit DRAM chips. Bach DRAM chip has 1 K rows of cols with 1K ell in wach row. The tim taken for «single teftesh eration i 10 ns. The time required to perform fone rafash operation onal cho callin cho memory smi {s) 100 nanoseconds {0) 100 2" nanosoconds {) 100 2 ranossconde {d) 3200 x 2° nanoseconds (GATE 2010: 1 Mark) Solution: Main memory = 4 MB Number of DRAM chips = 4 MB/ 1M x 1 bit = 82 ‘Total cll = 92 1K «1K ‘Time taken to relocate ells ~ 82 x 1K www.engineeringonyourfingertips.ooo ‘CATE PREVIOUS VEARS QUESTIONS 91 Hero wo have to find X in terms of Y. So, 1K» 100 ne Honea, wo suma up as Ans. (d) X = MAX(Y, a + ¥/2) 5. The weicht of a sequence ay 4/2,» Oy of real Ans. (b} number is defined a8 ay + 4,/2-+ + 0/24 ‘A eubsequance ofa sequence ls obtained By delet. ing some elements from the sxquencs, keeping the order of the remaining elernonts tho same. Lat X dlencte the maximum possible weight of a subss- quence of gy oy sd and Y tho maximum por sible weight of a subsequence of oy. 0 «Oe ‘Then X is equal to {) max(Y, a+ ¥) {e) max(¥, a, + 2¥) (0) max(¥, a + ¥/2) (d) ag + ¥/2 (GATE 2010: 2 Marks) Solution: Tho concepts invelve the Dynami Programming in Algorithms. Given that X= max weight from the sequence (a), 0), 03, agatha gD aed = Ot Bt Bt ES Y= max weight from the sequence (4), <4) BB. A S.stage pipelined processor has instruction fetch (UF), mstruction decode (ID), operand fetch (OF), perform operation (PO) and write oporand (WO) Sages. Tho IF, ID, OF and WO stages take 1 clock cyclo each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions, $clek cyels for MUL instruction and 6 clock cycles for DIV instruction, raspoctively. Operand forwaeding is ued in tho pipline, What is the rumbr of clock cycles nea to execute the following sequence of instructions? Instruct of Instruction Tg: MUL Ry, Ro Ry Ry Ry x Ry fy DIV Ry, Ry Ry Re Ry/Ry TE ADD Ry, Ry Ry Pye Py + Py Iz SUB Rey Roy Re Re © Ry~ Re 18 O18) Oar @is (GATE 2010: 2 Marks) Solution: 72 3 4 5 6 TF oo mu @ i m1 h © 1 OF PO PO PO Wo h r WD OF _ _ PO PO PO PO PO PO WO % r wD op | | LL PO WO, 5 Ir 1D or Po wo ‘With operand forwarding, 16 clock eycles are neaded, ‘Ans. (b) www.engineeringonyourfingertips.ooo92 GAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE 50. ‘The program below uses six temporary variables 0, ede ani b= 10 e= 20 daatd erecta frcte becte en=btt a=ste return d+ f Assuming that all operations take their operands from registers, what isthe minimum numberof rgis. ters newled to execute this program without spilling? (2 ws oa (ae (GATE 2010: 2 Marks) Solution: Lt us take three Registers R1, RD and R3 All the operations will be completed using thro registers only. ‘Ans, (b) Common Data Questions 60 and 61: A computer system has an LI eache, an 12 cacha, and a main ‘memory unit connected as shown below. The block siza in L1 cache is 4 words. The block siza in L2 cache is 16 words, The memory access times are 2 nanoseconds, 20 nanossconde and 200 nanoseconds for L1 cache, L2 cache and tain memory unit, respectively a LDsbus | Deabee fe |] 2 eal Main Cache Cache emery 4 words 4 words EMP) 60. When there is # miss in LA cache and a hit in L2 cache, a block Is transferred from L2 cache to Li cache, What is the time taken for this transfer? {b} 20 nanoseconds (4) 88 nanoseconds (GATE 2010: 2 Marks) {(@) 2 nanoseconds {c) 22 nanceoconds on. 62. www.engineeringonyourfingertips.ooo Solution: Accass to Lt cache = 2 ne ‘Acoaee to 12 cache = 20 ne Block sizeof L2 is 16 words; data bus size is 4 word. So, time taken for this transfer = 4 x 22 = 88 ns. Ans. (d) ‘When there is « missin both L1 cache and 12 cach, first « block is tranaferted from main memory (0 12 cache, and then a block is transferred from L2 cache to L1 cache, What isthe total time taken for those transfors? (a) 222 nanoseconds (6) 962 nanoseconds (b) 888 nanoseconds (4) 968 nanoseconds (GATE 2010: 2 Marks) Solution: Block transfer from main memory to L2 cache = 2 +4 x (20 + 200) = 2 + 880 = 882 ns 12 to Li = 882 + 20 +66 = 908 ns Ans. (d) A computer handlas several interrupt sources of ‘hich the following are relevant for this question, ‘+ Interrupt from CPU temperature sensor (raises interrupt if CPU temperature is too high) + Interrupt from Mouse (raises interrupt if the mouse is moved or a button is prassed) ‘+ Interrupt from Keyboard (raises interrupt when a kay is pressad or relaasea) + Tnterrupt from Hard Dick (tases interrupt when «disk raad is comploted) ‘Which one of these wll b handed at the HIGHEST priority? (a) Interrupe from Hard Disk (b) Interrupt from Mouse (c) Interrupt from Keyboard (4) Interrupt from CPU temperature sensor (GATE 2011: 1 Mark) Solution: Interrupt from CPU temperature sensor ‘wll be handled at the highest prictty. Ans, (d) (Consider hypothetical procseor with an instruction of type LW 2,20 (2), which during execution reads 4 22-bit word from memory and stores it in a 22-bit ‘oxiter Ry. The effective addrees ofthe memory loca tion i obtained by the addition of «constant 20 and the contents of register Ry. Which of the following best reflects the addressing mode inoplemented! by this instruction for the operand in memory? (a) Immediate addressing {b) Regictor addressing (c) Ragisterinditect scaled addressing () Base indexes addressing (GATE 2011: 1 Mark) Solution: Bifective address ~ contents of register R, +20 ans. (d) www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo oA. www.engineeringonyourfingertips.ooo ‘On s non-pipelined sequential processor, a program, segment, which is & part of the interrupt service routine, is given to transfer 500 bytes from an 1/0 ddovice to memory. Initialize the addrass rezistor Initialize the count to 500 LOOP: Load a byte from device Store in memory at addrass given by address register Increment the address rogistor ‘Decrement the count If count!= 0 go 9 LOOP Assume that each statement in this program is equivalent (0 machine instruction which takes ono clock eycle to execute if it is a non-load/store Inctruction, The load:store instructions take two clock eyeles to execute. ‘The dosigner of tho eystom also has an altemato approach of wsing the DMA controller to implement the same tanefer. The DMA controller requires 20 clock cycks for initialization and other overheads. Bach DMA transfor cycle takes two clock cycles to transfor one byte of data from the device to the memory. What is the approximate spoed up when the DMA comtroller based design is used in place of the interrupt-driven program-based input output? (apa iby 4a (51 (ay 67 (GATE 2011: 2 Marks) Solution: By using load store instructions: 14 1 +800 (24241-4141) = 2+ 800% 7 = 8502 ‘By using DMA: 20 + 500 x2 = 1020 ‘Spoad up = 8502/1020 = 2.4 Ans. (a) ‘An 8 KB direct-mappod writeback cacho is org nized as multiple blocks, each of size 22 bytes. The processor generates 2-bit addresses. The cache controller maintains the tag information for each cathe block comprising of the fellowing: 1 Valid bit 1 Moalified bit ‘As many bits as the minimum nooded to identify the memory block mapped in the cache What is the total size of the memory nesded at the cache controller to storemetadata (tags) forthe cach’) (a) 4864 bits b) 6144 bits {c) 6656 bits (a) 5876 bits (GATE 2011: 2 Marks) Solution: Number of blocks = 8K/32 = 256 blocks Bleck bite | Word bits 5 a ‘Tag bite tm ‘CATE PREVIOUS VEARS QUESTIONS — 93 Memory size for tag bite = 19 + 2 = 21 ‘Total size of memory for tags = 21256 ~ 8976 bits ‘Ans. (d) 66, Consider an instruction pipeline with four stages (SI, $2, 83 and S4) each with combinational cicuit only. The pipeline registers are given in the fizure, seo] {2} sacs] |2] ac} |S] fs s ‘S . What is the cylinder number ofthe last soctor of tho fils if is storad in a contiguous manner? (GATE 2018: 2 Marks) (2) 1281 (b) 1282 (<) 1288) 1284 Solution: Number of soctors required to store the file = eee = 85504 sectors Number of sectors in a cylinder = 16 x 64 = 1024 ‘otal numberof elinders equted ~ SSM Last sector wil be stored on 128ith cylinder ‘Ans. (0) 75. Consider an instruction pipoline with fivo stages vwithous any branch prediction: Fetch Instruction (FD), Decode Inetruction (DD), Fetch Operand (FO), Exzcute instruction (ED) and Write Operand {WO}, The stage delays fr F, DI, FO, EL and WO are 5 ne, 7 ns, 10 ns, Sms and 6 ns, respctivly ‘There ate intcemediate storage butkes after each stage and tho delay of each buffer is Vins. A. pro- ‘ram consisting of 12 instructions J, LJ is ‘executed im this pipelined processor, Inetructbo 4 isthe only branch instruction and ite branch target {10 If the branch ie taken during the execution of this program, the time (in ns) needed to complate tho program is (a) 182 () 165 (9.176 (a) 928 (GATE 2013: 2 Marks) www.engineeringonyourfingertips.oooSolutions Clock pulse duration Maximum stage delay + Buffer delay = 1x54 4 — 1) 11 = 88 ns np ka 1=88 4 88—11 = 165, Ans, (b} Common Data Questions 76 and 77 Tie fellow ing code segment is executed on a processor which allows only register operands in ite instructions. Each instruction can have attmost two source oper ands and one destination operand. Assume that all variables are dead after this code segment, onarb dectay ie >a) ( 17. www.engineeringonyourfingertips.ooo PRACTICEEXERCKES 95 it > R) ( Re fy = ara ) else ( Ree [read value ¢ from memory) Re RtR (d= * al RoR tB, fd=d* Rp- Ret R, [e=c+ ail RoR tm fe ) ‘Only ‘c’ is spilled variable here, which needs to be stored and loaded from memory. Ans, (b) What ie the minimum number of registers needed in the instruction set architecture of the proce sor to compile this code segment without any spill to memory? Do not apply any optimization other than optimizing register allocation, ete] @s 4 5 Me (GATE 2013: 2 Marks) 76. Suppose tho instruction sot architecture of the Solution: Let Ry = a, Ry = b. The machine code is processor has only two registers. The only allowed = eat 45 Compiler optimization is code motion, which moves oe oe statements ftom one place to another while pre a = serving correctness. What is the minimum number os +h ‘of spill to memory in the compiled cede? crc Rea Ret Ry ‘a wu ri Bek 2 (as tie. (GATE 2013: 2 Marks) Lt Solution: Suppose Ry and R, hold a and b, respeo da=d* di RAR * RS tivaly. Here for code optimizstion, code motion is e=ete +R allowed. The machine code ig; ) Bee Rt R (ea +b) : None een Four registers are requitod Ans. () PRACTICE EXERCISES Set 1 (6) The width of tho addrose bus 1. The three diffrent kinds of buses supporting the architecture of a computer are (a) Address bus, data bus, 1/0 bus (b) Front bus, data bus, address bus {¢) Control bus, data bus, addrass bus {4) 1/0 bus, adirese bus, instruction bus 2. The term ‘WORD of a OPU equals to {) The maximum adkressable memory sico {b) The width ofa CPU register (integer o oa pint) (4) The total number of general purpose CPU registers - Which of the following statement is false about CISC ARCHITECTURE? (2) CISC machine instructions may inckade complex addrossing modes, which require: many ‘lock cycles to carry out. {b) CISC control units ate typically micro-programmed, allowing tho instruction set 12 be more flaible. www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 96 a 6. 10, www.engineeringonyourfingertips.ooo (OAPTER2 COMPUTER ORGAMZATION AND ARCHITECTURE {¢) In the CISC instruction et, all arithmetic/logic instructions must be register based for fast processing, (4) CISC architectures may perform better in net ‘work centric applications than RISC. ‘Tho regietor that holds the address of the loca ton to oF from which data aro to be transforred is called (a) Index register {b) Accuraulator (c) Memory addrase registers (d) Memory data registers Which one of the following ts not a type of 1/0 channel? {@) Multiplexer {) Solector {6} Block multiplexer (4) None of the above ‘Tho porformance of a pipelined processor is degraded if (2) the pipeline stages have diferent dalays {b) consocutive instructions aro to be executed serially {) the pipeline stages share hardware resources {d) all of the above ‘The minimum time delay betwoen the initiation of wo independent memory operations is called {9} Access time {b) Cycle time {6} Rotational time (4) Latency time ‘Tho rogistor which koops track of the execution of a program and which contains the memory address of the instruction currently being executed is known (8) index register {b) memory address regieter {) program counter (4) inetruction registers |. For interval arithmotic, tho best rounding toch- gue used is (a) rounding to plus and minus infinity {b) rounding to zero {) rounding to nearest zero (4) rounding to the next number Mardwired control unit are faster than micro. pprograrumed control unit because {they do not consist of slower memory elements, {b) they do not have slower aloments such as gatas, flip flops and resisers (6) they consist of elements based on VVLSI design technology (@) they contain high-speed digital components nn. 12, 13. 4. 15. 16. ar. 18. ‘The most relovant addreseing mode to write post ton-independent code is () direct mode (b) auto mode (6) relative mode (4) indexed mode A.CPU uses 241i instruction. A progam starts st address 800 (in decimal). Which one of the fllow- ing is a legal program counter content (all values in decimal)? (a) 98 (by 212 (6) 600 (a) 700 ‘An attempt to access a location not owned by a program is called (a) dats fae (b) adds fault (c) struction fault (4) page fault Which of the following statement. about relative addressing mode te FALSE? (2) It enables reduced program code (b) It allows indexing of array olomont with samo instruction, () It enables easy relocation of data (4) Tt enables fastor address calculation than abeo= lute addrassing Compared to CISC procescts, RISC prooasors contain (a) mote register and emallor instruction set {b) larger instruction sat and loss registers (c) lees registers and smaller instruction set (€) mote registers and larger instruction set Micto-programmed control cannot be implemented in RISC architecture because (0) It tend to slow down the processor (b) it consumes more chip areas and large instruction sot (6) handling large mumber of registers is impos sible in micro-programmed system. (4) tho 1 instruction/eycle timing requirement for RISC is dificult to achieve in micro-pro- rammed based architecture, Relocation of the code 1s easier in Irrspec. tive of the program code (a) indirect addressing (b) indoced addressing (c) base register addressing {d) absolute addressing In inverted page table organization, the size of the page table depands on () the number of processes (0) the size of page (c) the siza of main memory (4) the number of frames in the main memory19. When using the concept of locality of reference, the page reference being made by process {) will always be to the page used im the previous page reference (b) is likely to be one of tho pages used in the past {fos page references {6} will always to be one of the pages existing in the main memory (2) will alway lead to page fault 20. If the now vrsion of processor is not made com- patible to programs written for Its older version, it could be able to process at « faster speed {@) the statemant is trua, (b) the statement is false {¢) the epaed cannot be predicted. {d) speed has nothing to-do with the compatibility. 21. A certain snooping cache can nop omly om acknees line. Which of the following is true? (8) This would adverily affect the system ifthe swite through protocol is used () Ie would run woll ifthe write through protocol is wrod, {6) Data snooring is mandatory to be implemented om data bina (4) Data stooping may not be aquired. 22. When the frequency of the input signal to a CMOS ‘gate Is increased, the average power dissipation (2) decreases exponentially (b) increases {c) docteasas (4) increases exponentially 28. The disadvantage of hardwired control unite with ship Bop is (9) design becomes complex. {b) it requires more number of ip Hops (c) contol circuit sposd does not match with fp flops: (4) flip-flops. cam hansle the data unit not the ‘contsol unit 24. In a vectored interrupt, (a) the branch address is assigned to a fixed loca ton in memory. (b) the interrupting source supplies the branch {information to the processor through an intet- rrupt vector. (6) the branch address is obtained from a resistor im the processor. (@) the branch address ie obtained from program ‘counter. www.engineeringonyourfingertips.ooo PRACTICEEXERCKES 97 25. A device employing INTA line for device inter rupt puts the CALL instruction on the data bus while (a) INTA ts active (b) HOLD is active, (¢) READY isactive, (4) READY is active, 226. On receiving an interrupt frm an 1/0 dion, tho CPU (a) branches off to halt (or wait) for a prodete ‘mined time (b) branches off to the interrupt service after come plotion of the current instruction (6) branches off to the interrupt. service routine immedistely (4) hands over control of addrass bus and data bus to the interrupting device 27. Using largo block size in a fixed block sizo filo system leads to (9) better disk throughput but poorer dick space uuilization (b) botier disk throughput and better disk space utilization (6) it does not matter as the total memory size is same (@) poorer disk throughput but better disk space utilization 228. Which of the following statements are true about aging? (a) Is divides momory into units of equal czo, (b) Ie permits implementation of virtual memory. (It suffers from internal fragmentation (4) Ie ouffers from external fragmentation, 29, ‘The number of entree in an inverted page table (9) & equal to the number of pracestes, (b) is equal to the number of page frames in the main memory. (c) ie equal to the sto of tho page frame. (4) is equal to the number of page frames in cache memory, 80. In a virtual memory system, the addresses used by the programmer belongs to (a) memory space (6) address space () physical space (4) main memory spaco 31. Power consumption of processors can be vastly reduoed by making use of based transistors to implement the ICs (a) NMOS only (b) TTL Schottky and PMOS (c) PMOS only. (d) NMOS and PMOS www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 9B iAPTER> COMPUTER ORGAMZATION AND ARCHITECTURE 82, Address symbol table ie generated by the (3) memory management software {assembler {c) table match of associative memory (2) generated by CPU 88, How many 128 x8 RAM chips are nooded to have total RAM of 2018 bytes? S(O) (ope (AVRD ‘84, In 6085 microprocessor, how many 1/0 devices can be interfaced in 1/0 mapped 1/0 technique? (3) Either 256 input devices or 256 output devices {b) 81/0 devices {6} 256 input devices and 256 ousput devices (4) 512 input-output devices ‘85. After reset, the CPU starts the execution of instruc tom from memory address (my c) 0000, 36. In a microprocesve sytem, supposo TRAP, HOLD find RESET pin got stivated st tho same time, ‘tile the pnosssea sas exeouting rome inet tions the system wil {a) execute the TRAP instruction {b) execute the HOLD instruction {o) execute the RESET instruction (d) none of these instructions will be executed (8) $000 (a) FFF 97. In 8085 microprocessor, the programmer cannot ‘access which flag directly? {s) Sign flag tb) Carry fla (6) Ausiliary carry flag (d) Parity flag 88. Which of the following is « pseudo-instruction for 808s (a) SPHL ib) CMP () NOP (a) END 39. The term “cycle stealing” refers to: () Inorruptbasod data trator (b) DMA-based data transfer {o) Poting mode data transfer {d) Clock cycle overriding: 40. Which of the following architacture isnot suitable for the following SIMD architecture? (a) Veetor processor (b) PLA-bwoal proossor {6} Von Neumann {4) PALbasd procaor 41. In 4 multiprogramming eystem, which of the fol lowing concopts is usod? (a) Data parallelism (b) Paging (6) Li cache (a) DMA 42. PAL circuit can be defined as (a) fixed OR and programmable AND logic. (b) programmable OR and programmable AND logic. (6) fixed AND and programmable OR logic (d) fixed OR and fixed AND ogi. 43. IF the clock input applied to a cascaded Mod-6 and Mod-4 counter is 48 kHs. Thon the output of the cascaded arrangement shall be of () 4.8 kite (b) 12 kite (o) SkHe (a) 8 ke 44. If there are four ROM ICs of 8K and two RAM 10s ‘of 4K words, them the address range of Ist RAM is {accume initial addresses correspond to ROMs) (2) (8000}y 0 (OFFF)y—(b) ($000) to (7EFP)y (6) (8000) to (SFFF)y (4) (6000)y to (OFF) 45. ‘Tho method for updating the main memory as 60cm a6 a word is removed froma the cache is called (a) Writethrough, (b) Write-back (6) Writesave (d) Cachesave Sot 2 ‘The most appropriate matching for the following pairs X. Indirect addressing 1 Loops Y. Immediate ackressing 2. Pointers %, Autordecrement addressing 3, Constants (s)X-3Y-22-1 (t)X-1¥-37-2 ()X-2¥-8Z-1 ()X-3Y-12-2 2. Which of the following is not a form of memory? (a) Instruction cache (b) Instruction register (6) Instruction opoode (4) Translation look aside buffer 9. In sorial data transmiscion, overy byte of data is padded with a ‘0° in the beginning and one oF bwo T's at the end of byte because {@) Receiver is tobe synchronized for byte reception. (b) Receiver recovers lost ‘0° snd ‘from these padded bite (©) Padded bits aro useful in parity computation. (a) None of thase. www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo PRACTICE EXERCKES 99 4. in complement addition, overflow (6) Horizontal mictoprogramming, vertical micro programming, hardwired control. SG ogee soho ee renee erm (4) Vertical microprogrammaing, horsontal mieno- (b) cannot occur when a positive value is added ¢0 diinkampiarinamasenie ‘negative valu 10. A processor nesds softwate interrupt to {o) is flagged when the carrie for sign bit and ie deeneereneenneoneberaetien provios bit match (b) implement co-routines: Dnemot ashore (c) obtain system gervicas which need execution of 5. Th performance of pipolinad proceson suffers en {4} the pipeline stages have diffrent delays Y Pe eacaestive nettions are lopaodent on each 12+ Which ie the most appropriate match forthe tas — in the fits het with the Vlas inthe sacond Mist? {) the pipeline stagas share hardware resourcas {d) all of the above pees ey 6. Match the pairs in the following questions X-Indiet acidcvering, 1 Army implemen By witing the corresponding letters only ‘aamisisating; TE Wig aes {A) IEEE 488 (P) Specifics tho intorface for com 7 aonb A gia a 2, Baco resistor Mi Pocsing array a8 {B) IEEE 796 (Q) Specified the bus standard addressing pocameter for connecting a computer to ‘other devlose including CPUs (9) (XM) (YD (ZW) (b) XM YM (2 “instrumentation bus 12, Consider the folowing data path of a simple nom {D) RS282.C (8) Sponiflos the bus standaed fr the piplelined CPU. The registers A, B, Ay, Ay, MDR, back plane” bus called muitibus tho bus and tho ALU aro sit wide. SP and MAR are 16-bit rogistas. The MUX is of size 8 x (2:1) {8) (A) = (P)s (B) = (BY; (6) = |S} (D) ~ (Q) and the DEMUX ls of sto 8 x (12). Each memory {b) (A) = (P); (B) = (Q); (©) = (BY (D) — (8) sD) — ‘operation takes 2 CPU clock cyclas and uses MAR {6} (A} = (Q); (B) — (8); (©) — (Bs (D) — (P} (memory address register) and MDR (memory data register), SP can be dacrementad locally (dl) (A) — (R); (B) — (8); (C) = (Py; D) = (BY 7. When an interrupt occurs, an operating system (0) Tenores the intereupt {b) Always change stato of interrupted process ‘after processing the interrupt (0) Always resumes execution of interrupted pro ess after proceesing the interrupt (@) May change state of interrupted procaas to ‘Blocked’ and schedule another process. 8. RAID configuration of dks are used to provide ‘The CPU instruction “push +, whete r= A or B (a) Faulttolerance ——_(b) High epaed hs the specification {c) High data density (a) Nono of the above Map] or 9. Arrange the following configuration for CPU in Pee decreasing otder of operating specds: hardwired How many OPU clock cyslas are neoded to execute control, vertical microprogramming, horizontal the “push 7 instruction? nbroprograraaing @2 8s 4 Ws {) Hardwired control, vertical mictoprogramming, 4, fy the abeclute addreesing modo, horizontal mictoprogramming (b) Hardwired conttol, horizontal ticroprogram- (a) the operand is inside tho instruction. ‘ming, vertical microprogramming. (b) the address of the operand is inside the instruction www.engineeringonyourfingertips.ooo100 cHAFTER2: COMPUTER ORGAMZATION AND ARCTECTURE {c) the register containing the address of the oper: ‘and is epecifiad inside the instruction, {@) the location of the operand fs implicit, 14, What are the states of the auxiliary earry (AC) and carry flog (CY) after executing the following 8085 program? MMVI, SDH MIV L, oan MOV A, H ADDL. (a) AC =0, CY =0 () AC =1, CY =0 ()AC=1, CY =1 ()ac=a ert 15. Horizontal microprogramming {2} does not require uso of signal decoders {b) results in largerstzad microinstructions than vertical mictoprogramming {¢) uses 1 bit for each comtrol signal (d) all of the above 16. For the daisy chain scheme of connecting 1/0 devices, which of the following statements is trus? (a) It gives non-uniform priority to various devices. (b) I¢ gives uniform priority to ll devicas (¢) Is only usaful for connecting slow devices to {8 processor device (a) Ie requires s separate interrupt pin on the pro ‘ceasor for each device 17. A microprogeam conteol unit is required to gener. ‘tea total of 25 control signals. Assume that during ‘any microinstruction at moet two control signals are active. Minimum nuruber of bite required in the control word to genotate the required control signal is 18. The correct matching for the following paite is 19. 8 a1. a1. |. A decimal number has 64 di www.engineeringonyourfingertips.ooo (s) A-4,B~3,0-1,D-2 (b) A-2,B-1,0-3D-4 (JA-4&B-3.C-2D-1 () A-2,B-3,C-4,D-1 1/0 redirection () implies changing the namo of fle (b) can be employed to use an existing file as input file for a program (6) implies connecting two programs through a pipe (4) none ofthe above ‘Themain differences) between a CISC and a RISC processor is/are that a RISC processor typically (a) has fewer instruction (b) has fewer addresing modes and more registers () iscsi to implement wing hardwvited contol logic (4) all of these How biz is a 4-way setasscciative cache made up of 16 bits word, 4 words per line and having 1024 sets? () 82 KB (b) 64 KB (6) 128 KB (4) 256 KB Consider the expression to be evaluated as X = (M+ Nx O/(P x Qs. How many threo address Instruction are required to evaluate the above riven expression? @3 (4 swe = (M+ Nx OP x Q), how many one ‘addrees instructions are required to evaluate it? @4 (6 S10 ts Tho number of bits nnesded for its equivalent binary representation is (0) 200 (bas (ey 246 (a) 277 - Determine the spead up obtained from pipelining if latencies for wach stage in single cycle processor is A.DMA 1/0 1. High-speed RAM con B. Cache 2. Disk ©. Interrupt 1/0 8. Printer i [p_ [aw [Mew [we D. Condition code 4 ALU ne [2m [sm [Ane | ine register ANSWERS TO PRACTICE EXERCISES Set 1 Lo 8. (0) 5a) 7. (b) 2 () 2. (b) 4.19 «(a 8. (0) 10. (a) www.engineeringonyourfingertips.ooo1. (3) 18. (4) 25. (a) 12. (3) 19. (b} 28. (b) 18. (b) 20. (a) 27. (a) 14. (a) a1. (a) 28. (0) 15. (a) 22. (b) 20. (b) 16. (a) 23. (b) 30. (0) Ir) 24, (a) ai. (a) Set 2 1, (o) Indirect addressing mode — Pointer acoess Iamnodiato addressing moco ~> Constant access Auto decrement addressing mode —> Loops 2. (@) An opcodo is tho portion of a machine lan- ‘uage instruction that spocifios the operation to be performed. 8, (s) Bits im the boginning and at the end of byte are known as start and stop bits, respectively. Start and stop bits are used for synchronization purpose, 4. (b) When two 2's complement numbers are added, ‘overflow occurs in the following situations: ‘Both operand are postive and the result is negative, Both operands are nogative and tho resul is postive. ‘So option (b) f true, overflow doss not occur when positive and negative numbers are added, 5. (4) Diffarant hagarde are caused duo to various ddopondancies, Difforont. dopendenci for pipolined rocasor are as follows: Structural dependency is due to diforent delays in pipelined stages. Control dependency is due to consecutive instruc- dons are dependent on each other: Data dependency is due to hardware resources sharing. 6. (c) (A) IEEEE 488 ~ (Q) (B) IEEE 796 ~ (S) (C) IEBEE 696 — (R) «D) RS282-0 ~ (P) T. (o) When interrupt is caused, the execution of current instruction i stopped. After handling interrupt, the program resumes its execution 8. (a) RAID is random array of independent disks that combines multiple disk drive components into 4 logical unit, RAID configuration provides faulttolerance and high speed 9. (b) Horizontal micro programming has high paral lelism than vertical, So, the speed order is: 10. a. 12, 13. 14. 15. 16. www.engineeringonyourfingertips.ooo [ANSWERS TO FRACTICE EXERCISES. 101 32. (b) 39. (b) 83. (b) 40. (0) 34. (0) 41.) 35. (0) 42, () 36. (a) 48. (9) a7. (2) a4, (0) a8. (a) 45. (b) Hardwired contecl > horizontal mieroprogramming > vertical mlcroprogramming {€) A processor noods software interrupts to obtain system services which need execution of privileged ‘instructions. Pointers are used for accessing indirect addrossoe, Indirect addressing mode — Pointer access In case of immediate addressing, the constant rep recent the effective addres, Immediate addressing mode —+ Constant access ‘Auto decrement/ineromente are usad for loop variables, Auto decrement addresing mode + Loops (a) Push ‘? mamory operation needs 2 clocks (b) In absolute addressing mode, the address of the operand is inside the instruction. «© carry Auxilary carry @i111@11 @)# © 101110 dy HoD)# +0 11 0:1 0:1 Up —_ Trot o 0 Oy AC= land CY -0 (@) Features of horizontal microprograraming are (0) Te dows not require use of signal decoders (ii) Te results in largot-sized ‘microinstructions ‘han vertical microprogramming (i) Te wses 1 bit for each contol signal (a) The daisy chaining method of establishing priority consists ofa serial ecnnection ofall devices ‘that request an interrupt. Tho davice with the highest priority is placed in the first position, followed by lower-priotity dovicos up to the dovice ‘with the lowest priority, which is placed last in ‘the chain. The farther the device is from the fist rorition, the lower is its priority. Therefore, daisy chain gives non-uniform priority to-vatious devices. www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo 102 cHArTER2: COMPUTER ORGANZATION AND ARCHTECTURE 17. (10) To generate 2% control signals, 5 bits sre required. To generate two control signals, the following scheme will be used, 5 bite to ientify first and 5 bits to Kbntiy second including tho caso when cne of them isnot present, So total bite required = 10. . (b) DMA.1/0 — Disk Cacho ~ Highspead RAM Interrupt 1/0 ~ Printor Condition code register — ALU - (6) 1/0 retiection impliss conmoction eo. pro- rams through a pipe. 20. (d) The major characteristics of « RISC procesor are (0) Relatively fo inctruction (il) Relatively few addressing modes (ii) Mote resistors (iv) Hlardwited rather than microprograromed control 21. (a) Number of bytas per line = 16 bit» 4 byte = Shytes Cacho size = 8 x 4x 1024 = 62 KB 22, (b) Thros-adirce instructions ato os follow: MUL RI, N, 0 MULR2, P,Q ADD R3, M, RI DIV X, R3, R2 23. (0) LOAD P(AC © P) MPY Q(AC = AC Q) STORE X (X © AC) LOADN (AG © N) MPY 0 (AC & AG x0) ADD M(AC © AC + M) DIV X (AC © AC/X) STORE X (X © AG) 24, (b) The number of bits is 1-128 =1 => 10 2 = r= loge 10 = 218 25, (44) Spoad up obtained by pipelining _ (p20 S24 419 34g www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo UNIT Ill: PROGRAMMING AND DATA STRUCTURES LAST SIX YEARS’ GATE ANALYSIS = Marks 1 = Marks 2 ‘Total number of questions 5 ‘ 3 i z 2015 2014 213 2012 2011 2010 Concepts on which questions were asked in the previous six years Year Goncept 2015 Binary tres, Minitaum-maximum heap, By tree, Functions, Recursion, Kuh Smallast lement in C, Hashing, Stack, C strings, Postfix equation, Binary search tree, Merge sort, Pointers, Switch-caso, Storage classes, Quick sort, Array, Complexity 2014 Loops, Inorder traversal, Binary search tree, Tree traversal 2013 Binary search tree, Heap sort, Control statements, Recursion, Queuss, Array and Pointer 2012 Switch 260, Queue, Tose, Functions, Storage classoe 2011 Array, Heaps, Binary search tree, Recursion 2010 __Binary tree, Pointers, Heaps, Recursion, Link list www.engineeringonyourfingertips.ooowww.engineeringonyourfingertips.ooo www.engineeringonyourfingertips.ooo

Chapter-2 Computer Organization & Architecture PDF
No ratings yet
Chapter-2 Computer Organization & Architecture PDF
48 pages
COA Notes
No ratings yet
COA Notes
46 pages
CO Willey
No ratings yet
CO Willey
23 pages
Cao Notes
No ratings yet
Cao Notes
20 pages
Lecture Note 292311150242100
No ratings yet
Lecture Note 292311150242100
101 pages
CO Laqs Full
No ratings yet
CO Laqs Full
65 pages
Lecture 1.2.3 (Instruction Types, Addressing Modes)
No ratings yet
Lecture 1.2.3 (Instruction Types, Addressing Modes)
27 pages
COAP Chapter 5
No ratings yet
COAP Chapter 5
45 pages
5) COA - Unit5 - ProfSnehaTalreja - Notes
No ratings yet
5) COA - Unit5 - ProfSnehaTalreja - Notes
58 pages
Unit2 2
No ratings yet
Unit2 2
20 pages
Instruction Set of 8085
No ratings yet
Instruction Set of 8085
25 pages
Copy3 Chapter 6
No ratings yet
Copy3 Chapter 6
40 pages
DDCO
No ratings yet
DDCO
34 pages
8085 Microprocessor Instructions
No ratings yet
8085 Microprocessor Instructions
36 pages
CH6 - Centeral Processing Unit
No ratings yet
CH6 - Centeral Processing Unit
43 pages
Unit 2
No ratings yet
Unit 2
65 pages
MP - Module 2
No ratings yet
MP - Module 2
16 pages
CO Unit I (R20)
No ratings yet
CO Unit I (R20)
41 pages
Machine Language
No ratings yet
Machine Language
108 pages
Group 4 Microcomputer Systems Assignment
No ratings yet
Group 4 Microcomputer Systems Assignment
42 pages
Co Unit 2
No ratings yet
Co Unit 2
14 pages
Assembly Language Addressing Modes
No ratings yet
Assembly Language Addressing Modes
21 pages
Unit2 Cpu
No ratings yet
Unit2 Cpu
15 pages
Unit 2
No ratings yet
Unit 2
71 pages
CPU Organization & Instruction Sets
No ratings yet
CPU Organization & Instruction Sets
46 pages
8085 Microprocessor Programming Guide
No ratings yet
8085 Microprocessor Programming Guide
122 pages
CPU Instruction Formats Guide
No ratings yet
CPU Instruction Formats Guide
5 pages
Instruction Format
No ratings yet
Instruction Format
26 pages
Adobe Scan 01 Feb 2024
No ratings yet
Adobe Scan 01 Feb 2024
17 pages
Mic Unit III
No ratings yet
Mic Unit III
70 pages
Unit 2
No ratings yet
Unit 2
17 pages
l5 Instruction Set and Addressing Modes
No ratings yet
l5 Instruction Set and Addressing Modes
48 pages
Instruction Set Architecture Part 2
No ratings yet
Instruction Set Architecture Part 2
43 pages
Chapter Four
No ratings yet
Chapter Four
57 pages
2021 Chapter 2 CPU Lecture 2 Stud
No ratings yet
2021 Chapter 2 CPU Lecture 2 Stud
65 pages
Computer Architecture
No ratings yet
Computer Architecture
8 pages
COA Chapter 3 Final Edited
No ratings yet
COA Chapter 3 Final Edited
58 pages
Unit 2 (Upto Midsem)
No ratings yet
Unit 2 (Upto Midsem)
27 pages
Coa Unit-2
No ratings yet
Coa Unit-2
16 pages
Isa 1
No ratings yet
Isa 1
13 pages
Co Unit-4
No ratings yet
Co Unit-4
12 pages
ITEC582 Chapter13
No ratings yet
ITEC582 Chapter13
20 pages
COA Unit-3
No ratings yet
COA Unit-3
37 pages
Comp Arch-B-1
No ratings yet
Comp Arch-B-1
47 pages
7COA Slides-1
No ratings yet
7COA Slides-1
26 pages
CPU Instruction Formats Explained
No ratings yet
CPU Instruction Formats Explained
8 pages
Microprocessor Quesion Solve-2019
No ratings yet
Microprocessor Quesion Solve-2019
18 pages
Adressing Mode - 2
No ratings yet
Adressing Mode - 2
11 pages
?addressing Modes
No ratings yet
?addressing Modes
36 pages
NAME: Fuldeore Srushti Vinod Subject: Coa ID: 201071908 BRANCH: Computer
No ratings yet
NAME: Fuldeore Srushti Vinod Subject: Coa ID: 201071908 BRANCH: Computer
7 pages
Cha 1
No ratings yet
Cha 1
73 pages
Unit-2 Cpu
No ratings yet
Unit-2 Cpu
35 pages
Unit - 3
No ratings yet
Unit - 3
29 pages
Coa LMR
No ratings yet
Coa LMR
11 pages
Unit 2
No ratings yet
Unit 2
17 pages
Mal CH2
No ratings yet
Mal CH2
49 pages
Unit-3 Notes Complete
No ratings yet
Unit-3 Notes Complete
16 pages
Unit3 Coa Notes
No ratings yet
Unit3 Coa Notes
32 pages

Computer Organization & Architecture

Uploaded by

Computer Organization & Architecture

Uploaded by

You might also like