KEMBAR78
Last Part Computer Architecture IUGET | PDF | Office Equipment | Central Processing Unit
0% found this document useful (0 votes)
20 views53 pages

Last Part Computer Architecture IUGET

Uploaded by

fotsostyve840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views53 pages

Last Part Computer Architecture IUGET

Uploaded by

fotsostyve840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Computer Organization and Architecture

MUL TOS ® (C + D) * (A + B)
POP X M [X] TOS

Addressing Modes
The operation field of an instruction specifies the operation to be performed. This
operation must be executed on some data stored in computer register as memory
words. The way the operands are chosen during program execution is dependent on
the addressing mode of the instruction. The addressing mode specifies a rule for
interpreting or modifying the address field of the instruction between the operand is
activity referenced. Computer use addressing mode technique for the purpose of
accommodating one or both of the following provisions.
(1) To give programming versatility to the uses by providing such facilities as
pointer to memory, counters for top control, indexing of data, and program relocation.
(2) To reduce the number of bits in the addressing fields of the instruction.

Addressing Modes: The most common addressing techniques are


• Immediate
• Direct
• Indirect
• Register
• Register Indirect
• Displacement
• Stack

All computer architectures provide more than one of these addressing modes.
The question arises as to how the control unit can determine which addressing mode
is being used in a particular instruction. Several approaches are used. Often, different
opcodes will use different addressing modes. Also, one or more bits in the instruction

Page 71
Computer Organization and Architecture

format can be used as a mode field. The value of the mode field determines which
addressing mode is to be used.

What is the interpretation of effective address. In a system without virtual


memory, the effective address will be either a main memory address or a register. In a
virtual memory system, the effective address is a virtual address or a register. The
actual mapping to a physical address is a function of the paging mechanism and is
invisible to the programmer.
Opcode Mode Address
Immediate Addressing:
The simplest form of addressing is immediate addressing, in which the
operand is actually present in the instruction:
OPERAND = A
This mode can be used to define and use constants or set initial values of
variables. The advantage of immediate addressing is that no memory reference other
than the instruction fetch is required to obtain the operand. The disadvantage is that
the size of the number is restricted to the size of the address field, which, in most
instruction sets, is small compared with the world length.

Direct Addressing:
A very simple form of addressing is direct addressing, in which the address field
contains the effective address of the operand:

Page 72
Computer Organization and Architecture

EA = A
It requires only one memory reference and no special calculation.

Indirect Addressing:
With direct addressing, the length of the address field is usually less than the
word length, thus limiting the address range. One solution is to have the address field
refer to the address of a word in memory, which in turn contains a full-length address
of the operand. This is known as indirect addressing:
EA = (A)

Register Addressing:
Register addressing is similar to direct addressing. The only difference is that
the address field refers to a register rather than a main memory address:
EA = R

Page 73
Computer Organization and Architecture

The advantages of register addressing are that only a small address field is
needed in the instruction and no memory reference is required. The disadvantage of
register addressing is that the address space is very limited.

The exact register location of the operand in case of Register Addressing


Mode is shown in the Figure 34.4. Here, 'R' indicates a register where the operand is
present.
Register Indirect Addressing:
Register indirect addressing is similar to indirect addressing, except that the
address field refers to a register instead of a memory location. It requires only one
memory reference and no special calculation.
EA = (R)

Register indirect addressing uses one less memory reference than indirect
addressing. Because, the first information is available in a register which is nothing
but a memory address. From that memory location, we use to get the data or
information. In general, register access is much more faster than the memory access.

Displacement Addressing:
Page 74
Computer Organization and Architecture

A very powerful mode of addressing combines the capabilities of direct


addressing and register indirect addressing, which is broadly categorized as
displacement addressing:
EA = A + (R)
Displacement addressing requires that the instruction have two address fields,
at least one of which is explicit. The value contained in one address field (value = A)
is used directly. The other address field, or an implicit reference based on opcode,
refers to a register whose contents are added to A to produce the effective address.

The general format of Displacement Addressing is shown in the Figure 4.6.


Three of the most common use of displacement addressing are:
• Relative addressing
• Base-register addressing
• Indexing

Relative Addressing:
For relative addressing, the implicitly referenced register is the program
counter (PC). That is, the current instruction address is added to the address field to
produce the EA. Thus, the effective address is a displacement relative to the address
of the instruction.

Base-Register Addressing:

Page 75
Computer Organization and Architecture

The reference register contains a memory address, and the address field
contains a displacement from that address. The register reference may be explicit or
implicit. In some implementation, a single segment/base register is employed and is
used implicitly. In others, the programmer may choose a register to hold the base
address of a segment, and the instruction must reference it explicitly.

Indexing:
The address field references a main memory address, and the reference
register contains a positive displacement from that address. In this case also the
register reference is sometimes explicit and sometimes implicit.
Generally index register are used for iterative tasks, it is typical that there is a
need to increment or decrement the index register after each reference to it. Because
this is such a common operation, some system will automatically do this as part of the
same instruction cycle.
This is known as auto-indexing. We may get two types of auto-indexing: -one is
auto-incrementing and the other one is -auto-decrementing. If certain registers are
devoted exclusively to indexing, then auto-indexing can be invoked implicitly and
automatically. If general purpose register are used, the auto index operation may need
to be signaled by a bit in the instruction.

Auto-indexing using increment can be depicted as follows:


EA = A + (R)
R = (R) + 1
Auto-indexing using decrement can be depicted as follows:
EA = A + (R)
R = (R) - 1

In some machines, both indirect addressing and indexing are provided, and it is
possible to employ both in the same instruction. There are two possibilities: The
indexing is performed either before or after the indirection. If indexing is performed
after the indirection, it is termed post indexing
EA = (A) + (R)

Page 76
Computer Organization and Architecture

First, the contents of the address field are used to access a memory location
containing an address. This address is then indexed by the register value.
With pre indexing, the indexing is performed before the indirection:
EA = ( A + (R)
An address is calculated, the calculated address contains not the operand, but the
address of the operand.

Stack Addressing:
A stack is a linear array or list of locations. It is sometimes referred to as a
pushdown list or last-in-first-out queue. A stack is a reserved block of locations. Items
are appended to the top of the stack so that, at any given time, the block is partially
filled. Associated with the stack is a pointer whose value is the address of the top of
the stack. The stack pointer is maintained in a register. Thus, references to stack
locations in memory are in fact register indirect addresses.
The stack mode of addressing is a form of implied addressing. The machine
instructions need not include a memory reference but implicitly operate on the top of
the stack.
Value addition: A Quick View
Various Addressing Modes with Examples

The most common names for addressing modes (names may differ
among architectures)
Addressing Example
Meaning When used
modes Instruction
When a value is
Register Add R4,R3 R4 <- R4 + R3
in a register
Immediate Add R4, #3 R4 <- R4 + 3 For constants
Add R4, R4 <- R4 + Accessing local
Displacement
100(R1) Mem[100+R1] variables
Accessing using
Register a pointer or a
Add R4,(R1) R4 <- R4 + M[R1]
deffered computed
address
Useful in array
addressing:
Add R3, (R1 R3 <- R3 + R1 - base of
Indexed
+ R2) Mem[R1+R2] array
R2 - index
amount

Page 77
Computer Organization and Architecture

Useful in
Add R1,
Direct R1 <- R1 + Mem[1001] accessing static
(1001)
data
If R3 is the
Memory Add R1, R1 <- R1 + address of a
deferred @(R3) Mem[Mem[R3]] pointer p, then
mode yields *p
Useful for
stepping
through arrays
Auto- Add R1, R1 <- R1 +Mem[R2] in a loop.
increment (R2)+ R2 <- R2 + d R2 - start of
array
d - size of an
element
Same as
autoincrement.
Both can also
Auto- Add R1,- R2 <-R2-d
be used to
decrement (R2) R1 <- R1 + Mem[R2]
implement a
stack as push
and pop
Used to index
arrays. May be
applied to any
Add R1, R1<-
Scaled base
100(R2)[R3] R1+Mem[100+R2+R3*d]
addressing
mode in some
machines.

Notation:
<- - assignment
Mem - the name for memory:
Mem[R1] refers to contents of memory location whose address is given by the
contents of R1
Source: Self

Data Transfer & Manipulation


Computer provides an extensive set of instructions to give the user the flexibility to
carryout various computational task. Most computer instruction can be classified into
three categories.
(1) Data transfer instruction
(2) Data manipulation instruction
(3) Program control instruction
Data transfer instruction cause transferred data from one location to another without
changing the binary instruction content. Data manipulation instructions are those that
perform arithmetic logic, and shift operations. Program control instructions provide

Page 78
Computer Organization and Architecture

decision-making capabilities and change the path taken by the program when
executed in the computer.

(1) Data Transfer Instruction


Data transfer instruction move data from one place in the computer to another
without changing the data content. The most common transfers are between memory
and processes registers, between processes register & input or output, and between
processes register themselves
(Typical data transfer instruction)
Name Mnemonic
Load LD
Store ST
Move MOV
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP

(2) Data Manipulation Instruction


It performs operations on data and provides the computational capabilities for the
computer. The data manipulation instructions in a typical computer are usually
divided into three basic types.
(a) Arithmetic Instruction
(b) Logical bit manipulation Instruction
(c) Shift Instruction.
(a) Arithmetic Instruction
Name Mnemonic
Increment INC
Decrement DEC
Add Add

Page 79
Computer Organization and Architecture

Subtract Sub
Multiply MUL
Divide DIV
Add with Carry ADDC
Subtract with Basses SUBB
Negate (2’s Complement) NEG

(b) Logical & Bit Manipulation Instruction


Name Mnemonic
Clear CLR
Complement COM
AND AND
OR OR
Exclusive-Or XOR
Clear Carry CLRC
Set Carry SETC
Complement Carry COMC
Enable Interrupt ET
Disable Interrupt OI
(c) Shift Instruction
Instructions to shift the content of an operand are quite useful and one often provided
in several variations. Shifts are operation in which the bits of a word are moved to the
left or right. The bit-shifted in at the and of the word determines the type of shift used.
Shift instruction may specify either logical shift, arithmetic shifts, or rotate type shifts.
Name Mnemonic
Logical Shift right SHR
Logical Shift left SHL
Arithmetic shift right SHRA
Arithmetic shift left SHLA
Rotate right ROR

Page 80
Computer Organization and Architecture

Rotate left ROL


Rotate right through carry RORC
Rotate left through carry ROLC

Introduction About Program Control:-


A program that enhances an operating system by creating an environment in which
you can run other programs. Control programs generally provide a graphical
interface and enable you to run several programs at once in different windows.
Control programs are also called operating environments.

The program control functions are used when a series of conditional or


unconditional jump and return instruction are required. These instructions allow the
program to execute only certain sections of the control logic if a fixed set of logic
conditions are met. The most common instructions for the program control available
in most controllers are described in this section.
Introduction About status bit register:-
A status register, flag register, or condition code register is a collection of
status flag bits for a processor. An example is the FLAGS register of the computer
architecture. The flags might be part of a larger register, such as a program status
word (PSW) register.

The status register is a hardware register which contains information about the
state of the processor. Individual bits are implicitly or explicitly read and/or written
by the machine code instructions executing on the processor. The status register in a
traditional processor design includes at least three central flags: Zero, Carry, and
Overflow, which are set or cleared automatically as effects of arithmetic and bit
manipulation operations. One or more of the flags may then be read by a subsequent
conditional jump instruction (including conditional calls, returns, etc. in some
machines) or by some arithmetic, shift/rotate or bitwise operation, typically using the
carry flag as input in addition to any explicitly given operands. There are also
processors where other classes of instructions may read or write the fundamental
Page 81
Computer Organization and Architecture

zero, carry or overflow flags, such as block-, string- or dedicated input/output


instructions, for instance.

Some CPU architectures, such as the MIPS and Alpha, do not use a dedicated flag
register. Others do not implicitly set and/or read flags. Such machines either do not
pass implicit status information between instructions at all, or do they pass it in a
explicitly selected general purpose register.
A status register may often have other fields as well, such as more specialized
flags, interrupt enable bits, and similar types of information. During an interrupt, the
status of the thread currently executing can be preserved (and later recalled) by
storing the current value of the status register along with the program counter and
other active registers into the machine stack or some other reserved area of memory.
Common flags:-
This is a list of the most common CPU status register flags, implemented in almost all
modern processors.

Flag Name Description


Indicates that the result of arithmetic or logical
Z Zero flag
operation (or, sometimes, a load) was zero.
Enables numbers larger than a single word to be
added/subtracted by carrying a binary digit from a less
significant word to the least significant bit of a more
C Carry flag
significant word as needed. It is also used to extend bit
shifts and rotates in a similar manner on many
processors (sometimes done via a dedicated X flag).
Indicates that the result of a mathematical operation is
negative. In some processors, the N and S flags are
Sign flag
distinct with different meanings and usage: One
S/N Negative
indicates whether the last result was negative whereas
flag
the other indicates whether a subtraction or addition
has taken place.

Page 82
Computer Organization and Architecture

Indicates that the signed result of an operation is too


Overflow
V/O/W large to fit in the register width using twos complement
flag
representation.

Introduction About Conditional branch instruction:-


Conditional branch instruction:-
Conditional branch instruction is the branch instruction bit and BR instruction is the
Program control instruction.
The conditional Branch Instructions are listed as Bellow:-

Mnemonics Branch Instruction Tested control


BZ Branch if Zero Z=1
BNZ Branch if not Zero Z=0
BC Branch if Carry C=1
BNC Branch if not Carry C=0
BP Branch if Plus S=0
BM Branch if Minus S=1
BV Branch if Overflow V=1
BNV Branch if not Overflow V=0

Unsigned Compare(A-B):-
Mnemonics Branch Instruction Tested control
BHI Branch if Higher A>B
BHE Branch if Higher or Equal A >= B
BLO Branch if Lower A<B
BLE Branch if Lower or Equal A <= B
BE Branch if Equal A=B
BNE Branch if not Equal A not = B

Signed Compare(A-B):
Mnemonics Branch Instruction Tested control

Page 83
Computer Organization and Architecture

BGT Branch if Greater Than A>B


BGE Branch if Greater Than or Equal A >= B
BLT Branch if Less Than A<B
BLE Branch if Less Than or Equal A <= B
BE Branch if Equal A=B
BNE Branch if not Equal A not = B

Introduction About program interrupt:-


When a Process is executed by the CPU and when a user Request for another Process
then this will create disturbance for the Running Process. This is also called as
the Interrupt.

Interrupts can be generated by User, Some Error Conditions and also by


Software’s and the hardware’s. But CPU will handle all the Interrupts very carefully
because when Interrupts are generated then the CPU must handle all the Interrupts
Very carefully means the CPU will also Provide Response to the Various Interrupts
those are generated. So that When an interrupt has Occurred then the CPU will handle
by using the Fetch, decode and Execute Operations.

Interrupts allow the operating system to take notice of an external event, such
as a mouse click. Software interrupts, better known as exceptions, allow the OS to
handle unusual events like divide-by-zero errors coming from code execution.

The sequence of events is usually like this:


Hardware signals an interrupt to the processor
The processor notices the interrupt and suspends the currently running software
The processor jumps to the matching interrupt handler function in the OS
The interrupt handler runs its course and returns from the interrupt
The processor resumes where it left off in the previously running software
The most important interrupt for the operating system is the timer tick interrupt. The
timer tic interrupt allows the OS to periodically regain control from the currently
running user process. The OS can then decide to schedule another process, return back
Page 84
Computer Organization and Architecture

to the same process, do housekeeping, etc. The timer tick interrupt provides the
foundation for the concept of preemptive multitasking.

TYPES OF INTERRUPTS
Generally there are three types of Interrupts those are Occurred For Example
1) Internal Interrupt
2) External Interrupt.
3) Software Interrupt.

1.Internal Interrupt:
• When the hardware detects that the program is doing something wrong, it will
usually generate an interrupt usually generate an interrupt.
– Arithmetic error - Invalid Instruction
– Addressing error - Hardware malfunction
– Page fault – Debugging
• A Page Fault interrupt is not the result of a program error, but it does require the
operating system to get control.

The Internal Interrupts are those which are occurred due to Some Problem in
the Execution For Example When a user performing any Operation which contains any
Error and which contains any type of Error. So that Internal Interrupts are those
which are occurred by the Some Operations or by Some Instructions and the
Operations those are not Possible but a user is trying for that Operation. And The
Software Interrupts are those which are made some call to the System for Example
while we are Processing Some Instructions and when we wants to Execute one more
Application Programs.

2.External Interrupt:
• I/O devices tell the CPU that an I/O request has completed by sending an interrupt
signal to the processor.
• I/O errors may also generate an interrupt.

Page 85
Computer Organization and Architecture

• Most computers have a timer which interrupts the CPU every so many interrupts the
CPU every so many milliseconds.

The External Interrupt occurs when any Input and Output Device request for any
Operation and the CPU will Execute that instructions first For Example When a
Program is executed and when we move the Mouse on the Screen then the CPU will
handle this External interrupt first and after that he will resume with his Operation.

3.Software interrupts:
These types if interrupts can occur only during the execution of an instruction. They
can be used by a programmer to cause interrupts if need be. The primary purpose of
such interrupts is to switch from user mode to supervisor mode.

A software interrupt occurs when the processor executes an INT instruction.


Written in the program, typically used to invoke a system service. A processor
interrupt is caused by an electrical signal on a processor pin. Typically used by devices
to tell a driver that they require attention. The clock tick interrupt is very common; it
wakes up the processor from a halt state and allows the scheduler to pick other work
to perform.

A processor fault like access violation is triggered by the processor itself when it
encounters a condition that prevents it from executing code. Typically when it tries to
read or write from unmapped memory or encounters an invalid instruction.

CISC Characteristics
A computer with large number of instructions is called complex instruction set
computer or CISC. Complex instruction set computer is mostly used in scientific
computing applications requiring lots of floating point arithmetic.
 A large number of instructions - typically from 100 to 250 instructions.
 Some instructions that perform specialized tasks and are used infrequently.
 A large variety of addressing modes - typically 5 to 20 different modes.
 Variable-length instruction formats

Page 86
Computer Organization and Architecture

 Instructions that manipulate operands in memory.

RISC Characteristics
A computer with few instructions and simple construction is called reduced
instruction set computer or RISC. RISC architecture is simple and efficient. The major
characteristics of RISC architecture are,
 Relatively few instructions
 Relatively few addressing modes
 Memory access limited to load and store instructions
 All operations are done within the registers of the CPU
 Fixed-length and easily-decoded instruction format.
 Single cycle instruction execution
 Hardwired and micro programmed control

Example of RISC & CISC


Examples of CISC instruction set architectures are PDP-11, VAX, Motorola 68k,
and your desktop PCs on intel’s x86 architecture based too .
Examples of RISC families include DEC Alpha, AMD 29k, ARC, Atmel AVR,
Blackfin, Intel i860 and i960, MIPS, Motorola 88000, PA-RISC, Power (including PowerPC),
SuperH, SPARC and ARM too.

Page 87
Computer Organization and Architecture

Which one is better ?


We cannot differentiate RISC and CISC technology because both are suitable at its specific
application. What counts is how fast a chip can execute the instructions it is given and how well it
runs existing software. Today, both RISC and CISC manufacturers are doing everything to get an
edge on the competition.

http://www.laureateiit.com/projects/bacii2014/projects/coa_anil/i_o_interface.html

UNIT – III (12 Lectures)


MICRO-PROGRAMMED CONTROL: Control memory, address sequencing, micro-program
example, design of control unit.
Book: M. Moris Mano (2006), Computer System Architecture, 3rd edition, Pearson/PHI,
India: Unit-7 Pages: 213-238

Page 88
Computer Organization and Architecture

COMPUTER ARITHMETIC: Addition and subtraction, multiplication and division algorithms,


floating-point arithmetic operation, decimal arithmetic unit, decimal arithmetic operations.
Book: M. Moris Mano (2006), Computer System Architecture, 3rd edition, Pearson/PHI,
India: Unit-10 Pages: 333-380

Control Memory:
Control memory is a random access memory(RAM) consisting of addressable storage
registers. It is primarily used in mini and mainframe computers. It is used as a temporary
storage for data. Access to control memory data requires less time than to main memory; this
speeds up CPU operation by reducing the number of memory references for data storage and
retrieval. Access is performed as part of a control section sequence while the master clock
oscillator is running. The control memory addresses are divided into two groups: a task mode
and an executive (interrupt) mode.

Addressing words stored in control memory is via the address select logic for each of
the register groups. There can be up to five register groups in control memory. These groups
select a register for fetching data for programmed CPU operation or for maintenance console
or equivalent display or storage of data via maintenance console or equivalent. During
programmed CPU operations, these registers are accessed directly by the CPU logic. Data
routing circuits are used by control memory to interconnect the registers used in control
memory. Some of the registers contained in a control memory that operate in the task and
the executive modes include the following: Accumulators Indexes Monitor clock status
indicating registers Interrupt data registers

• The control unit in a digital computer initiates sequences of micro operations


• The complexity of the digital system is derived form the number of sequences that are
performed
• When the control signals are generated by hardware, it is hardwired
• In a bus-oriented system, the control signals that specify micro operations are groups of bits
that select the paths in multiplexers, decoders, and ALUs.

Page 89
Computer Organization and Architecture

• The control unit initiates a series of sequential steps of micro operations


• The control variables can be represented by a string of 1’s and 0’s called a control word
• A micro programmed control unit is a control unit whose binary control variables are stored
in memory
• A sequence of microinstructions constitutes a micro program
• The control memory can be a read-only memory
• Dynamic microprogramming permits a micro program to be loaded and uses a writable
control memory
• A computer with a micro programmed control unit will have two separate memories: a
main memory and a control memory
• The micro program consists of microinstructions that specify various internal control
signals for execution of register micro operations
• These microinstructions generate the micro operations to:
 fetch the instruction from main memory
 evaluate the effective address
 execute the operation
 return control to the fetch phase for the next instruction

• The control memory address register specifies the address of the microinstruction
• The control data register holds the microinstruction read from memory
• The microinstruction contains a control word that specifies one or more micro operations
for the data processor
• The location for the next micro instruction may, or may not be the next in sequence
• Some bits of the present micro instruction control the generation of the address of the next
micro instruction
• The next address may also be a function of external input conditions
• While the micro operations are being executed, the next address is computed in the next
address generator circuit (sequencer) and then transferred into the CAR to read the next
micro instructions

• Typical functions of a sequencer are: o incrementing the CAR by one


 loading into the CAR and address from control memory
 transferring an external address
 loading an initial address to start the control operations

Page 90
Computer Organization and Architecture

• A clock is applied to the CAR and the control word and next-address information are taken
directly from the control memory
• The address value is the input for the ROM and the control work is the output
• No read signal is required for the ROM as in a RAM
• The main advantage of the micro programmed control is that once the
hardware configuration is established, there should be no need for h/w or wiring changes
• To establish a different control sequence, specify a different set of microinstructions for
control memory

Addressing Sequencing:
Each machine instruction is executed through the application of a sequence of
microinstructions. Clearly, we must be able to sequence these; the collection of
microinstructions which implements a particular machine instruction is called a routine.

The MCU typically determines the address of the first microinstruction which
implements a machine instruction based on that instruction's opcode. Upon machine power-
up, the CAR should contain the address of the first microinstruction to be executed.
The MCU must be able to execute microinstructions sequentially (e.g., within routines), but
must also be able to ``branch'' to other microinstructions as required; hence, the need for a
sequencer.

The microinstructions executed in sequence can be found sequentially in the CM, or


can be found by branching to another location within the CM. Sequential retrieval of
microinstructions can be done by simply incrementing the current CAR contents; branching
requires determining the desired CW address, and loading that into the CAR.

CAR
Control Address Register

Page 91
Computer Organization and Architecture

control ROM
control memory (CM); holds CWs
opcode
opcode field from machine instruction
mapping logic
hardware which maps opcode into microinstruction address
branch logic
determines how the next CAR value will be determined from all the various possibilities
multiplexors
implements choice of branch logic for next CAR value
incrementer
generates CAR + 1 as a possible next CAR value
SBR
used to hold return address for subroutine-call branch operations

Conditional branches are necessary in the micro program. We must be able to perform
some sequences of micro-ops only when certain situations or conditions exist (e.g., for
conditional branching at the machine instruction level); to implement these, we need to be
able to conditional execute or avoid certain microinstructions within routines.

Subroutine branches are helpful to have at the micro program level. Many routines
contain identical sequences of microinstructions; putting them into subroutines allows those
routines to be shorter, thus saving memory. Mapping of opcodes to microinstruction
addresses can be done very simply. When the CM is designed, a ``required'' length is
determine for the machine instruction routines (i.e., the length of the longest one). This is
rounded up to the next power of 2, yielding a value k such that 2 k microinstructions will be
sufficient to implement any routine.

The first instruction of each routine will be located in the CM at multiples of this
``required'' length. Say this is N. The first routine is at 0; the next, at N; the next, at 2*N; etc.
This can be accomplished very easily. For instance, with a four-bit opcode and routine length
of four microinstructions, k is two; generate the microinstruction address by appending two
zero bits to the opcode:

Page 92
Computer Organization and Architecture

Alternately, the n-bit opcode value can be used as the ``address'' input of a 2n x M ROM; the
contents of the selected ``word'' in the ROM will be the desired M-bit CAR address for the
beginning of the routine implementing that instruction. (This technique allows for variable-
length routines in the CM.) >pp We choose between all the possible ways of generating CAR
values by feeding them all into a multiplexor bank, and implementing special branch logic
which will determine how the muxes will pass on the next address to the CAR.

As there are four possible ways of determining the next address, the multiplexor bank
is made up of N 4x1 muxes, where N is the number of bits in the address of a CW. The branch
logic is used to determine which of the four possible ``next address'' values is to be passed on
to the CAR; its two output lines are the select inputs for the muxes.

Page 93
Computer Organization and Architecture

Addition and Subtraction


Four basic computer arithmetic operations are addition, subtraction, division and
multiplication. The arithmetic operation in the digital computer manipulate data to produce
results. It is necessary to design arithmetic procedures and circuits to program arithmetic
operations using algorithm. The algorithm is a solution to any problem and it is stated by a
finite number of well-defined procedural steps. The algorithms can be developed for the
following types of data.
1. Fixed point binary data in signed magnitude representation
2. Fixed point binary data in signed 2’s complement representation.
3. Floating point representation
4. Binary Coded Decimal (BCD) data

Addition and Subtraction with signed magnitude


Consider two numbers having magnitude A and B. When the signed numbers are added or
subtracted, there can be 8 different conditions depending on the sign and the operation
performed as shown in the table below:
Operation Add magnitude When A > B When A < B When A = B
(+A) + (+B) +(A + B) -- -- --
(+A) + (-B) -- +(A - B) -(B - A) +(A - B)
(-A) + (+B) -- -(A - B) +(B - A) +(A - B)
(-A) + (-B) -(A + B) -- -- --
(+A) - (+B) -- +(A - B) -(B - A) +(A - B)
(+A) - (-B) +(A + B) -- -- --
(-A) - (+B) -(A + B) -- -- --
(-A) - (-B) -- -(A - B) +(B - A) +(A - B)
From the table, we can derive an algorithm for addition and subtraction as follows:
Addition (Subtraction) Algorithm:
 When the signs of A & B are identical, add the two magnitudes and attach the sign of A to
the result.
 When the sign of A & B are different, compare the magnitude and subtract the smaller
number from the large number. Choose the sign of the result to be same as A if A > B, or the
complement of the sign of A if A < B. If the two numbers are equal, subtract B from A and
make the sign of the result positive.

Page 94
Computer Organization and Architecture

Hardware Implementation

fig: Hardware for signed magnitude addition and subtraction

The hardware consists of two registers A and B to store the magnitudes, and two flip-
flops As and Bs to store the corresponding signs. The results can be stored in the register A
and As which acts as an accumulator. The subtraction is performed by adding A to the 2’s
complement of B. The output carry is transferred to the flip-flop E. The overflow may occur
during the add operation which is stored in the flip-flop A Ë… F. When m = 0, the output of E is
transferred to the adder without any change along with the input carry of ‘0".

The output of the parallel adder is equal to A + B which is an add operation. When m =
1, the content of register B is complemented and transferred to parallel adder along with the
input carry of 1. Therefore, the output of parallel is equal to A + B’ + 1 = A – B which is a
subtract operation.

Page 95
Computer Organization and Architecture

Hardware Algorithm

fig: flowchart for add and subtract operations

As and Bs are compared by an exclusive-OR gate. If output=0, signs are identical, if 1 signs are
different.
 For Add operation, identical signs dictate addition of magnitudes and for operation
identical signs dictate addition of magnitudes and for subtraction, different magnitudes
dictate magnitudes be added. Magnitudes are added with a micro operation EA
 Two magnitudes are subtracted if signs are different for add operation and identical for
subtract operation. Magnitudes are subtracted with a micro operation EA = B and number
(this number is checked again for 0 to make positive 0 [As=0]) in A is correct result. E = 0
indicates A < B, so we take 2’s complement of A.

Multiplication
Hardware Implementation and Algorithm
Generally, the multiplication of two final point binary number in signed magnitude
representation is performed by a process of successive shift and ADD operation. The process
consists of looking at the successive bits of the multiplier (least significant bit first). If the
multiplier is 1, then the multiplicand is copied down otherwise, 0’s are copied. The numbers

Page 96
Computer Organization and Architecture

copied down in successive lines are shifted one position to the left and finally, all the numbers
are added to get the product.
But, in digital computers, an adder for the summation (∑) of only two binary numbers are
used and the partial product is accumulated in register. Similarly, instead of shifting the
multiplicand to the left, the partial product is shifted to the right. The hardware for the
multiplication of signed magnitude data is shown in the figure below.

Hardware for multiply operation


Initially, the multiplier is stored q register and the multiplicand in the B register. A register is
used to store the partial product and the sequence counter (SC) is set to a number equal to
the number of bits in the multiplier. The sum of A and B form the partial product and both
shifted to the right using a statement “Shr EAQ” as shown in the hardware algorithm. The flip
flops As, Bs & Qs store the sign of A, B & Q respectively. A binary ‘0” inserted into the flip-flop
E during the shift right.
Hardware Algorithm

flowchart for multiply algorithm

Page 97
Computer Organization and Architecture

Example: Multiply 23 by 19 using multiply algorithm.


multiplicand E A Q SC
Initially, 0 00000 10011 101(5)
Iteration1(Qn=1), 00000
add B 0 +10111
first partial product 10111
shrEAQ, 0 01011 11001 100(4)
Iteration2(Qn=1) 01011
Add B 1 +10111 11001
Second partial product 00010
shrEAQ, 0 10001 01100 011(3)
Iteration3(Qn=0)
0 01000 10110 010(2)
shrEAQ,
Iteration4(Qn=0)
0 00100 01011 001(1)
shrEAQ,
Iteration5(Qn=1 00100
Add B 0 +10111 01011
Fifth partial product 11011
shrEAQ, 0 01101 10101 000
FinalProductinAQ 0110110101
The final product is in register A & Q. therefore, the product is 0110110101.

Booth Algorithm
The algorithm that is used to multiply binary integers in signed 2’s complement form is called
booth multiplication algorithm. It works on the principle that the string 0’s in the multiplier
doesn’t need addition but just the shifting and the sting of 1’s from bit weight 2k to 2m can be
treated as 2k+1 – 2m (Example, +14 = 001110 = 23=1 – 21 = 14). The product can be obtained by
shifting the binary multiplication to the left and subtraction the multiplier shifted left once.

According to booth algorithm, the rule for multiplication of binary integers in signed 2’s
complement form are:
 The multiplicand is subtracted from the partial product of the first least significant bit is 1
in a string of 1’s in the multiplicand.

Page 98
Computer Organization and Architecture

 The multiplicand is added to the partial product if the first least significant bit is 0
(provided that there was a previous 1) in a string of 0’s in the multiplier.
 The partial product doesn’t change when the multiplier bit is identical to the previous
multiplier bit.
This algorithm is used for both the positive and negative numbers in signed 2’s complement
form. The hardware implementation of this algorithm is in figure below:

The flowchart for booth multiplication algorithm is given below:

flowchart for booth multiplication algorithm

Numerical Example: Booth algorithm


BR=10111(Multiplicand)
QR=10011(Multiplier)
Array Multiplier
The multiplication algorithm first check the bits of the multiplier one at time and form partial
product. This is a sequential process that requires a sequence of add and shift micro
operation. This method is complicated and time consuming. The multiplication of 2 binary

Page 99
Computer Organization and Architecture

numbers can also be done with one micro operation by using combinational circuit that
provides the product all at once.
Example.
Consider that the multiplicand bits are b1 and b0 and the multiplier bits are a1 and a0. The
partial product is c3c2c1c0. The multiplication two bits a0 and a1 produces a binary 1 if both
the bits are 1, otherwise it produces a binary 0. This is identical to the AND operation and can
be implemented with the AND gates as shown in figure.

2-bit by 2-bit array multiplier

Division Algorithm
The division of two fixed point signed numbers can be done by a process of successive
compare shift and subtraction. When it is implemented in digital computers, instead of
shifting the divisor to the right, the dividend or the partial remainder is shifted to the left. The
subtraction can be obtained by adding the number A to the 2’s complement of number B. The
information about the relative magnitudes of the information about the relative magnitudes
of numbers can be obtained from the end carry,
Hardware Implementation
The hardware implementation for the division signed numbers is shown id the figure.

Page 100
Computer Organization and Architecture

Division Algorithm
The divisor is stored in register B and a double length dividend is stored in register A and Q.
the dividend is shifted to the left and the divider is subtracted by adding twice complement of
the value. If E = 1, then A >= B. In this case, a quotient bit 1 is inserted into Qn and the partial
remainder is shifted to the left to repeat the process. If E = 0, then A > B. In this case, the
quotient bit Qn remains zero and the value of B is added to restore the partial remainder in A
to the previous value. The partial remainder is shifted to the left and approaches continues
until the sequence counter reaches to 0. The registers E, A & Q are shifted to the left with 0
inserted into Qn and the previous value of E is lost as shown in the flow chart for division
algorithm.

flowchart for division algorithm


This algorithm can be explained with the help of an example.
Consider that the divisor is 10001 and the dividend is 01110.

Page 101
Computer Organization and Architecture

binary division with digital hardware


Restoring method
Method described above is restoring method in which partial remainder is restored by
adding the divisor to the negative result. Other methods:
Comparison method: A and B are compared prior to subtraction. Then if A >= B, B is
subtracted from A. if A < B nothing is done. The partial remainder is then shifted left and
numbers are compared again. Comparison inspects end carry out of the parallel adder before
transferring to E.
Non-restoring method: In contrast to restoring method, when A -B is negative, B is not
added to restore A but instead, negative difference is shifted left and then B is added. How is it
possible? Let’s argue:
 In flowchart for restoring method, when A < B, we restore A by operation A - B + B. Next
time in a loop,
this number is shifted left (multiplied by 2) and B subtracted again, which gives: 2 (A - B +
B) – B = 2 A - B.
 In Non-restoring method, we leave A - B as it is. Next time around the loop, the number is
shifted left and B is added: 2 (A - B) + B = 2 A - B (same as above).

Page 102
Computer Organization and Architecture

Divide Overflow
The division algorithm may produce a quotient overflow called dividend overflow. The
overflow can occur of the number of bits in the quotient are more than the storage capacity of
the register. The overflow flip-flop DVF is set to 1 if the overflow occurs.
The division overflow can occur if the value of the half most significant bits of the dividend is
equal to or greater than the value of the divisor. Similarly, the overflow can occue=r if the
dividend is divided by a 0. The overflow may cause an error in the result or sometimes it may
stop the operation. When the overflow stops the operation of the system, then it is called
divide stop.

Arithmetic Operations on Floating-Point Numbers


The rules apply to the single-precision IEEE standard format. These rules
specify only the major steps needed to perform the four operations. Intermediate
results for both mantissas and exponents might require more than 24 and 8 bits,
respectively & overflow or an underflow may occur. These and other aspects of the
operations must be carefully considered in designing an arithmetic unit that meets the
standard. If their exponents differ, the mantissas of floating-point numbers must be
shifted with respect to each other before they are added or subtracted. Consider a

decimal example in which we wish to add 2.9400 x to 4.3100 x . We rewrite

2.9400 x as 0.0294 x and then perform addition of the mantissas to get 4.3394

x . The rule for addition and subtraction can be stated as follows:

Add/Subtract Rule

The steps in addition (FA) or subtraction (FS) of floating-point numbers (s1, eˆ , f1) fad{s2, eˆ
2, f2) are as follows.

1. Unpack sign, exponent, and fraction fields. Handle special operands such as zero,
infinity, or NaN(not a number).

2. Shift the significand of the number with the smaller exponent right by bits.
3. Set the result exponent er to max(e1,e2).
4. If the instruction is FA and s1= s2 or if the instruction is FS and s1 ≠ s2 then add the
significands; otherwise subtract them.

Page 103
Computer Organization and Architecture

5. Count the number z of leading zeros. A carry can make z = -1. Shift the result
significand left z bits or right 1 bit if z = -1.
6. Round the result significand, and shift right and adjust z if there is rounding overflow,
which is a carry-out of the leftmost digit upon rounding.
7. Adjust the result exponent by er = er - z, check for overflow or underflow, and pack
the result sign, biased exponent, and fraction bits into the result word.

Multiplication and division are somewhat easier than addition and subtraction, in that
no alignment of mantissas is needed.
BCD Adder:

BCD adder A 4-bit binary adder that is capable of adding two 4-bit words having a BCD
(binary-coded decimal) format. The result of the addition is a BCD-format 4-bit output word,
representing the decimal sum of the addend and augend, and a carry that is generated if this
sum exceeds a decimal value of 9. Decimal addition is thus possible using these devices.

Page 104
Computer Organization and Architecture

UNIT – IV (10 Lectures)


THE MEMORY SYSTEM: Basic concepts, semiconductor RAM types of read - only memory
(ROM), cache memory, performance considerations, virtual memory, secondary storage, raid,
direct memory access (DMA).
Book: Carl Hamacher, Zvonks Vranesic, SafeaZaky (2002), Computer Organization, 5th
edition, McGraw Hill: Unit-5 Pages: 292-366

BASIC CONCEPTS OF MEMORY SYSTEM


The maximum size of the Main Memory (MM) that can be used in any computer is
determined by its addressing scheme. For example, a 16-bit computer that generates 16-bit
addresses is capable of addressing upto 216 =64K memory locations. If a machine generates
32-bit addresses, it can access upto 232 = 4G memory locations. This number represents the
size of address space of the computer.

If the smallest addressable unit of information is a memory word, the machine is called
word-addressable. If individual memory bytes are assigned distinct addresses, the computer
is called byte-addressable. Most of the commercial machines are byte addressable. For
example in a byte-addressable 32-bit computer, each memory word contains 4 bytes. A
possible word-address assignment would be:
Word Address Byte Address
0 0123
4 4567
8 8 9 10 11
. ….. . ….. . …..
With the above structure a READ or WRITE may involve an entire memory word or it may
involve only a byte. In the case of byte read, other bytes can also be read but ignored by the
CPU. However, during a write cycle, the control circuitry of the MM must ensure that only the
specified byte is altered. In this case, the higher-order 30 bits can specify the word and the
lower-order 2 bits can specify the byte within the word.

CPU-Main Memory Connection – A block schematic: -


From the system standpoint, the Main Memory (MM) unit can be viewed as a “block box”.
Data transfer between CPU and MM takes place through the use of two CPU registers, usually
called MAR (Memory Address Register) and MDR (Memory Data Register). If MAR is K bits
long and MDR is ‘n’ bits long, then the MM unit may contain upto 2k addressable locations and
Page 105
Computer Organization and Architecture

each location will be ‘n’ bits wide, while the word length is equal to ‘n’ bits. During a “memory
cycle”, n bits of data may be transferred between the MM and CPU.

This transfer takes place over the processor bus, which has k address lines (address
bus), n data lines (data bus) and control lines like Read, Write, Memory Function completed
(MFC), Bytes specifiers etc (control bus). For a read operation, the CPU loads the address into
MAR, set READ to 1 and sets other control signals if required. The data from the MM is loaded
into MDR and MFC is set to 1. For a write operation, MAR, MDR are suitably loaded by the
CPU, write is set to 1 and other control signals are set suitably. The MM control circuitry loads
the data into appropriate locations and sets MFC to 1. This organization is shown in the
following block schematic.

Address Bus (k bits) Main Memory upto 2k addressable locations Word length = n bits Data
bus (n bits) Control Bus (Read, Write, MFC, Byte Specifier etc) MAR MDR CPU

Some Basic Concepts


Memory Access Times: - It is a useful measure of the speed of the memory unit. It is the time
that elapses between the initiation of an operation and the completion of that operation (for
example, the time between READ and MFC).
Memory Cycle Time :- It is an important measure of the memory system. It is the minimum
time delay required between the initiations of two successive memory operations (for
example, the time between two successive READ operations). The cycle time is usually
slightly longer than the access time.

Random Access Memory (RAM):

Page 106
Computer Organization and Architecture

A memory unit is called a Random Access Memory if any location can be accessed for a READ
or WRITE operation in some fixed amount of time that is independent of the location’s
address. Main memory units are of this type. This distinguishes them from serial or partly
serial access storage devices such as magnetic tapes and disks which are used as the
secondary storage device.

Cache Memory:-
The CPU of a computer can usually process instructions and data faster than they can be
fetched from compatibly priced main memory unit. Thus the memory cycle time become the
bottleneck in the system. One way to reduce the memory access time is to use cache memory.
This is a small and fast memory that is inserted between the larger, slower main memory and
the CPU. This holds the currently active segments of a program and its data. Because of the
locality of address references, the CPU can, most of the time, find the relevant information in
the cache memory itself (cache hit) and infrequently needs access to the main memory (cache
miss) with suitable size of the cache memory, cache hit rates of over 90% are possible leading
to a cost-effective increase in the performance of the system.

Memory Interleaving: -
This technique divides the memory system into a number of memory modules and arranges
addressing so that successive words in the address space are placed in different modules.
When requests for memory access involve consecutive addresses, the access will be to
different modules. Since parallel access to these modules is possible, the
average rate of fetching words from the Main Memory can be increased.

Virtual Memory: -
In a virtual memory System, the address generated by the CPU is referred to as a virtual or
logical address. The corresponding physical address can be different and the required
mapping is implemented by a special memory control unit, often called the memory
management unit. The mapping function itself may be changed during program execution
according to system requirements.

Because of the distinction made between the logical (virtual) address space and the
physical address space; while the former can be as large as the addressing capability of the
CPU, the actual physical memory can be much smaller. Only the active portion of the virtual
address space is mapped onto the physical memory and the rest of the virtual address space
Page 107
Computer Organization and Architecture

is mapped onto the bulk storage device used. If the addressed information is in the Main
Memory (MM), it is accessed and execution proceeds.

Otherwise, an exception is generated, in response to which the memory management


unit transfers a contiguous block of words containing the desired word from the bulk storage
unit to the MM, displacing some block that is currently inactive. If the memory is managed in
such a way that, such transfers are required relatively infrequency (ie the CPU will generally
find the required information in the MM), the virtual memory system can provide a
reasonably good performance and succeed in creating an illusion of a large memory with a
small, in expensive MM.

Internal Organization of Semiconductor Memory Chips:-


Memory chips are usually organized in the form of an array of cells, in which each cell is
capable of storing one bit of information. A row of cells constitutes a memory word, and the
cells of a row are connected to a common line referred to as the word line, and this line is
driven by the address decoder on the chip. The cells in each column are
connected to a sense/write circuit by two lines known as bit lines. The sense/write circuits
are connected to the data input/output lines of the chip. During a READ operation, the
Sense/Write circuits sense, or read, the information stored in the cells selected by a word line
and transmit this information to the output lines. During a write operation, they receive input
information and store it in the cells of the selected word.

The following figure shows such an organization of a memory chip consisting of 16 words of 8
bits each, which is usually referred to as a 16 x 8 organization.
The data input and the data output of each Sense/Write circuit are connected to a single bi-
directional data line in order to reduce the number of pins required. One control line, the
R/W (Read/Write) input is used a specify the required operation and another control line, the
CS (Chip Select) input is used to select a given chip in a multichip
memory system. This circuit requires 14 external connections, and allowing 2 pins for power
supply and ground connections, can be manufactured in the form of a 16-pin chip. It can store
16 x 8 = 128 bits. Another type of organization for 1k x 1 format is shown below:

Page 108
Computer Organization and Architecture

The 10-bit address is divided into two groups of 5 bits each to form the row and column
addresses for the cell array. A row address selects a row of 32 cells, all of which are accessed
in parallel. One of these, selected by the column address, is connected to the external data
lines by the input and output multiplexers. This structure can store 1024 bits, can be
implemented in a 16-pin chip.

A Typical Memory Cell


Semiconductor memories may be divided into bipolar and MOS types. They may be compared
as follows:

Page 109
Computer Organization and Architecture

Two transistor inverters connected to implement a basic flip-flop. The cell is connected to one
word line and two bits lines as shown. Normally, the bit lines are kept at about 1.6V, and the
word line is kept at a slightly higher voltage of about 2.5V. Under these conditions, the two
diodes D1 and D2 are reverse biased. Thus, because no current flows through the diodes, the
cell is isolated from the bit lines.

Read Operation:
Let us assume the Q1 on and Q2 off represents a 1 to read the contents of a given cell, the
voltage on the corresponding word line is reduced from 2.5 V to approximately 0.3 V. This
causes one of the diodes D1 or D2 to become forward-biased, depending on whether the
transistor Q1 or Q2 is conducting. As a result, current flows from bit line b when the cell is in
the 1 state and from bit line b when the cell is in the 0 state. The Sense/Write circuit at the
end of each pair of bit lines monitors the current on lines b and b’ and sets the output bit line
accordingly.

Write Operation:
While a given row of bits is selected, that is, while the voltage on the corresponding word line
is 0.3V, the cells can be individually forced to either the 1 state by applying a positive voltage
of about 3V to line b’ or to the 0 state by driving line b. This function is performed by the
Sense/Write circuit.

MOS Memory Cell:


MOS technology is used extensively in Main Memory Units. As in the case of bipolar
memories, many MOS cell configurations are possible. The simplest of these is a flip-flop
circuit. Two transistors T1 and T2 are connected to implement a flip-flop. Active pull-up to
VCC is provided through T3 and T4. Transistors T5 and T6 act as switches that can be opened
or closed under control of the word line. For a read operation, when the cell is selected, T5 or
T6 is closed and the corresponding flow of current through b or b’ is sensed by the
sense/write circuits to set the output bit line accordingly. For a write operation, the bit is
selected and a positive voltage is applied on the appropriate bit line, to store a 0 or 1. This
configuration is shown below:

Page 110
Computer Organization and Architecture

Static Memories Vs Dynamic Memories:-


Bipolar as well as MOS memory cells using a flip-flop like structure to store information can
maintain the information as long as current flow to the cell is maintained. Such memories are
called static memories. In contracts, Dynamic memories require not only the maintaining of a
power supply, but also a periodic “refresh” to maintain the information stored in them.
Dynamic memories can have very high bit densities and very lower power consumption
relative to static memories and are thus generally used to realize the main memory unit.

Dynamic Memories:-
The basic idea of dynamic memory is that information is stored in the form of a charge on the
capacitor. An example of a dynamic memory cell is shown below:
When the transistor T is turned on and an appropriate voltage is applied to the bit line,
information is stored in the cell, in the form of a known amount of charge stored on the
capacitor. After the transistor is turned off, the capacitor begins to discharge. This is caused
by the capacitor’s own leakage resistance and the very small amount of current that still flows
through the transistor. Hence the data is read correctly only if is read before the charge on the
capacitor drops below some threshold value. During a Read

operation, the bit line is placed in a high-impedance state, the transistor is turned on and a
sense circuit connected to the bit line is used to determine whether the charge on the
capacitor is above or below the threshold value. During such a Read, the charge on the
capacitor is restored to its original value and thus the cell is refreshed with every read
operation.
Page 111
Computer Organization and Architecture

Typical Organization of a Dynamic Memory Chip:-

The cells are organized in the form of a square array such that the high-and lower-order 8 bits
of the 16-bit address constitute the row and column addresses of a cell, respectively. In order
to reduce the number of pins needed for external connections, the row and column address
are multiplexed on 8 pins.

To access a cell, the row address is applied first. It is loaded into the row address latch
in response to a single pulse on the Row Address Strobe (RAS) input. This selects a row of
cells. Now, the column address is applied to the address pins and is loaded into the column
address latch under the control of the Column Address Strobe (CAS) input and this address
selects the appropriate sense/write circuit. If the R/W signal indicates a Read operation, the
output of the selected circuit is transferred to the data output. Do. For a write operation, the
data on the DI line is used to overwrite the cell selected.

It is important to note that the application of a row address causes all the cells on the
corresponding row to be read and refreshed during both Read and Write operations. To
ensure that the contents of a dynamic memory are maintained, each row of cells must be
addressed periodically, typically once every two milliseconds. A Refresh circuit performs this
function. Some dynamic memory chips in-corporate a refresh facility the chips themselves
and hence they appear as static memories to the user! such chips are often referred to as
Pseudostatic.
Another feature available on many dynamic memory chips is that once the row
address is loaded, successive locations can be accessed by loading only column addresses.

Page 112
Computer Organization and Architecture

Such block transfers can be carried out typically at a rate that is double that for transfers
involving random addresses. Such a feature is useful when memory access follow a regular
pattern, for example, in a graphics terminal Because of their high density and low cost,
dynamic memories are widely used in the main memory units of computers. Commercially
available chips range in size from 1k to 4M bits or more, and are available in various
organizations like 64k x 1, 16k x 4, 1MB x 1 etc.

RAID (Redundant Array of Independent Disks)

RAID (redundant array of independent disks; originally redundant array of inexpensive


disks) provides a way of storing the same data in different places (thus, redundantly) on
multiple hard disks (though not all RAID levels provide redundancy). By placing data on
multiple disks, input/output (I/O) operations can overlap in a balanced way, improving
performance. Since multiple disks increase the mean time between failures (MTBF), storing
data redundantly also increases fault tolerance.

RAID arrays appear to the operating system (OS) as a single logical hard disk. RAID
employs the technique of disk mirroring or disk striping, which involves partitioning each
drive's storage space into units ranging from a sector (512 bytes) up to several megabytes.
The stripes of all the disks are interleaved and addressed in order.

In a single-user system where large records, such as medical or other scientific images,
are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single
record spans all disks and can be accessed quickly by reading all disks at the same time.
In a multi-user system, better performance requires establishing a stripe wide enough to hold
the typical or maximum size record. This allows overlapped disk I/O across drives.

Standard RAID levels


RAID 0: This configuration has striping but no redundancy of data. It offers the best
performance but no fault-tolerance.

Page 113
Computer Organization and Architecture

RAID 1: Also known as disk mirroring, this configuration consists of at least two drives that
duplicate the storage of data. There is no striping. Read performance is improved since either
disk can be read at the same time. Write performance is the same as for single disk storage.

RAID 2: This configuration uses striping across disks with some disks storing error checking
and correcting (ECC) information. It has no advantage over RAID 3 and is no longer used.

RAID 3: This technique uses striping and dedicates one drive to storing parity information.
The embedded ECC information is used to detect errors. Data recovery is accomplished by
calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an
I/O operation addresses all drives at the same time, RAID 3 cannot overlap I/O. For this
reason, RAID 3 is best for single-user systems with long record applications.

Page 114
Computer Organization and Architecture

RAID 4: This level uses large stripes, which means you can read records from any single
drive. This allows you to use overlapped I/O for read operations. Since all write operations
have to update the parity drive, no I/O overlapping is possible. RAID 4 offers no advantage
over RAID 5.

RAID 5: This level is based on block-level striping with parity. The parity information is
striped across each drive, allowing the array to function even if one drive were to fail. The
array’s architecture allows read and write operations to span multiple drives. This results in
performance that is usually better than that of a single drive, but not as high as that of a RAID
0 array. RAID 5 requires at least three disks, but it is often recommended to use at least five
disks for performance reasons.

RAID 5 arrays are generally considered to be a poor choice for use on write-intensive
systems because of the performance impact associated with writing parity information. When
a disk does fail, it can take a long time to rebuild a RAID 5 array. Performance is usually
degraded during the rebuild time and the array is vulnerable to an additional disk failure until
the rebuild is complete.

Page 115
Computer Organization and Architecture

RAID 6: This technique is similar to RAID 5 but includes a second parity scheme that is
distributed across the drives in the array. The use of additional parity allows the array to
continue to function even if two disks fail simultaneously. However, this extra protection
comes at a cost. RAID 6 arrays have a higher cost per gigabyte (GB) and often have slower
write performance than RAID 5 arrays.

Direct Memory Access (DMA)


DMA stands for "Direct Memory Access" and is a method of transferring data from
the computer's RAM to another part of the computer without processing it using the CPU.
While most data that is input or output from your computer is processed by the CPU, some
data does not require processing, or can be processed by another device.

In these situations, DMA can save processing time and is a more efficient way to move
data from the computer's memory to other devices. In order for devices to use direct memory
access, they must be assigned to a DMA channel. Each type of port on a computer has a set of
DMA channels that can be assigned to each connected device. For example, a PCI controller
and a hard drive controller each have their own set of DMA channels.

Page 116
Computer Organization and Architecture

For example, a sound card may need to access data stored in the computer's RAM, but since it
can process the data itself, it may use DMA to bypass the CPU. Video cards that support DMA
can also access the system memory and process graphics without needing the CPU. Ultra DMA
hard drives use DMA to transfer data faster than previous hard drives that required the data
to first be run through the CPU.

An alternative to DMA is the Programmed Input/Output (PIO) interface in which all


data transmitted between devices goes through the processor. A newer protocol for the
ATAIIDE interface is Ultra DMA, which provides a burst data transfer rate up to 33 mbps.
Hard drives that come with Ultra DMAl33 also support PIO modes 1, 3, and 4, and multiword
DMA mode 2 at 16.6 mbps.

DMA Transfer Types


Memory To Memory Transfer
In this mode block of data from one memory address is moved to another memory address.
In this mode current address register of channel 0 is used to point the source address and the
current address register of channel is used to point the destination address in the first
transfer cycle, data byte from the source address is loaded in the temporary register of the
DMA controller and in the next transfer cycle the data from the temporary register is stored
in the memory pointed by destination address.

After each data transfer current address registers are decremented or incremented
according to current settings. The channel 1 current word count register is also decremented
by 1 after each data transfer. When the word count of channel 1 goes to FFFFH, a TC is
generated which activates EOP output terminating the DMA service.

Auto initialize
In this mode, during the initialization the base address and word count registers are loaded
simultaneously with the current address and word count registers by the microprocessor.
The address and the count in the base registers remain unchanged throughout the DMA
service.
After the first block transfer i.e. after the activation of the EOP signal, the original
values of the current address and current word count registers are automatically restored
from the base address and base word count register of that channel. After auto initialization
the channel is ready to perform another DMA service, without CPU intervention.
Page 117
Computer Organization and Architecture

DMA Controller
The controller is integrated into the processor board and manages all DMA data transfers.
Transferring data between system memory and an 110 device requires two steps. Data goes
from the sending device to the DMA controller and then to the receiving device. The
microprocessor gives the DMA controller the location, destination, and amount of data that is
to be transferred. Then the DMA controller transfers the data, allowing the microprocessor to
continue with other processing tasks.

When a device needs to use the Micro Channel bus to send or receive data, it competes
with all the other devices that are trying to gain control of the bus. This process is known as
arbitration. The DMA controller does not arbitrate for control of the BUS instead; the I/O
device that is sending or receiving data (the DMA slave) participates in arbitration. It is the
DMA controller, however, that takes control of the bus when the central arbitration control
point grants the DMA slave's request.
DMA vs. interrupts vs. polling

A diagram showing the position of the DMA in relation to peripheral devices, the CPU and
internal memory

 Works in the background without CPU intervention

 This speed up data transfer and CPU speed

 The DMA is used for moving large files since it would take too much of CPU capacity

Interrupt Systems
 Interrupts take up time of the CPU

Page 118
Computer Organization and Architecture

 they work by asking for the use of the CPU by sending the interrupt to which the CPU
responds
o Note: In order to save time the CPU does not check if it has to respond

 Interrupts are used when a task has to be performed immediately

Polling
Polling requires the CPU to actively monitor the process
 The major advantage is that the polling can be adjusted to the needs of the device

 polling is a low level process since the peripheral device is not in need of a quick response

UNIT – V (10 Lectures)

Page 119
Computer Organization and Architecture

MULTIPROCESSORS: Characteristics of multiprocessors, interconnection structures, inter


processor arbitration, inter processor communication and synchronization, cache coherence,
shared memory multiprocessors.
Book: M. Moris Mano (2006), Computer System Architecture, 3rd edition, Pearson/PHI,
India: Unit-13 Pages: 489-514

Characteristics of Multiprocessors
A multiprocessor system is an interconnection of two or more CPU, with memory and
input-output equipment. As defined earlier, multiprocessors can be put under MIMD
category. The term multiprocessor is sometimes confused with the term multi computers.
Though both support concurrent operations, there is an important difference between a
system with multiple computers and a system with multiple processors.

In a multi computers system, there are multiple computers, with their own operating
systems, which communicate with each other, if needed, through communication links. A
multiprocessor system, on the other hand, is controlled by a single operating system, which
coordinate the activities of the various processors, either through shared memory or inter
processor messages.

The advantages of multiprocessor systems are:


· Increased reliability because of redundancy in processors
· Increased throughput because of execution of multiple jobs in parallel portions of the
same job in parallel

A single job can be divided into independent tasks, either manually by the programmer, or by
the compiler, which finds the portions of the program that are data independent, and can be
executed in parallel. The multiprocessors are further classified into two groups depending on
the way their memory is organized. The processors with shared memory are called tightly
coupled or shared memory processors.

The information in these processors is shared through the common memory. Each of
the processors can also have their local memory too. The other class of multiprocessors is
loosely coupled or distributed memory multi-processors. In this, each processor has their
own private memory, and they share information with each other through interconnection
switching scheme or message passing.
Page 120
Computer Organization and Architecture

The principal characteristic of a multiprocessor is its ability to share a set of main


memory and some I/O devices. This sharing is possible through some physical connections
between them called the interconnection structures.

Inter processor Arbitration


Computer system needs buses to facilitate the transfer of information between its
various components. For example, even in a uniprocessor system, if the CPU has to access a
memory location, it sends the address of the memory location on the address bus. This
address activates a memory chip. The CPU then sends a red signal through the control bus, in
the response of which the memory puts the data on the address bus.

This address activates a memory chip. The CPU then sends a read signal through the
control bus, in the response of which the memory puts the data on the data bus. Similarly, in a
multiprocessor system, if any processor has to read a memory location from the shared areas,
it follows the similar routine.

There are buses that transfer data between the CPUs and memory. These are called
memory buses. An I/O bus is used to transfer data to and from input and output devices. A
bus that connects major components in a multiprocessor system, such as CPUs, I/Os, and
memory is called system bus. A processor, in a multiprocessor system, requests the access of
a component through the system bus.

In case there is no processor accessing the bus at that time, it is given then control of
the bus immediately. If there is a second processor utilizing the bus, then this processor has
to wait for the bus to be freed. If at any time, there is request for the services of the bus by
more than one processor, then the arbitration is performed to resolve the conflict. A bus
controller is placed between the local bus and the system bus to handle this.

Inter processor Communication and Synchronization


In a multiprocessor system, it becomes very necessary, that there be proper communication
protocol between the various processors. In a shared memory multiprocessor system, a
common area in the memory is provided, in which all the messages that need to be
communicated to other processors are written.

Page 121
Computer Organization and Architecture

A proper synchronization is also needed whenever there is a race of two or more


processors for shared resources like I/O resources. The operating system in this case is given
the task of allocating the resources to the various processors in a way, that at any time not
more than one processor use the resource.

A very common problem that can occur when two or more resources are trying to
access a resource which can be modified. For example processor 1 and 2 are simultaneously
trying to access memory location 100. Say the processor 1 is writing on to the location while
processor 2 is reading it. The chances are that processor 2 will end up reading erroneous
data. Such kind of resources which need to be protected from simultaneous access of more
than one processors are called critical sections. The following assumptions are made
regarding the critical sections:
- Mutual exclusion: At most one processor can be in a critical section at a time
-Termination : The critical section is executed in a finite time
- Fair scheduling: A process attempting to enter the critical section will eventually do so in a
finite time.
A binary value called a semaphore is usually used to indicate whether a processor is currently
Executing the critical section.

Cache Coherence
As discussed in unit 2, cache memories are high speed buffers which are inserted
between the processor and the main memory to capture those portions of the contents of
main memory which are currently in use. These memories are five to ten times faster than
main memories, and therefore, reduce the overall access time. In a multiprocessor system,
with shared memory, each processor has its own set of private cache.

Multiple copies of the cache are provided with each processor to reduce the access
time. Each processor, whenever accesses the shared memory, also updates its private cache.
This introduced the problem of cache coherence, which may result in data inconsistency. That
is, several copies of the same data may exist in different caches at any given time.

For example, let us assume there are two processors x and y. Both have the same copy of the
cache. Processor x, produces data 'a' which is to be consumed by processor y. Processor
update the value of 'a' in its own private copy of the cache. As it does not have any access to

Page 122
Computer Organization and Architecture

the private copy of cache of processor y, the processor y continues to use the variable 'a' with
old value, unless it is informed of the change.

Thus, in such kind of situations if the system is to perform correctly, every updation in
the cache should be informed to all the processors, so that they can make necessary changes
in their private copies of the cache.

Page 123

You might also like