KEMBAR78
COA Lecture Notes | PDF | Instruction Set | Central Processing Unit
0% found this document useful (0 votes)
701 views83 pages

COA Lecture Notes

nice and clear

Uploaded by

soloamigos23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
701 views83 pages

COA Lecture Notes

nice and clear

Uploaded by

soloamigos23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 83

COMPUTER ORGANIZATION

CONTENT
Lecture #1
1.1
1.2
1.2.1
1.2.2
1.2.3
Lecture #2
1.3
1.4
Lecture #3
1.5
Lecture #4
1.6
1.6.1
1.6.2
1.7
1.7.1
1.7.2
Lecture #5
1.8
1.9
Lecture #6
2.1
2.2
Lecture #7
2.3
Lecture #8
2.4
2.4.1
2.4.2
2.4.3
Lecture #9
2.4.4
2.4.5
2.4.6
Lecture #10
2.4.6.1
2.4.6.2

Introduction to Digital Computer


Study of Digital Computers
Computer Organization
Computer Architecture
CO v/s CA
Evolution of computers
Basic functional units of Digital computer
Basic operational concepts(processor memory interaction)
Bus structure
Single bus structure
Multiple bus structure
Software
System software
Application software
Performance
RISC v/s CISC
Memory locations and addresses
Content of memory
Instruction Format
Addressing mode
Immediate addressing mode
Direct addressing mode
Register direct addressing mode
Indirect addressing mode
Register indirect addressing mode
Displacement addressing mode
Relative addressing mode
Base register addressing mode

Lecture #11
2.4.6.3
2.4.6.3.1
2.4.6.3.2
2.4.7
Lecture #12
2.5
2.5.1
2.5.2
2.5.3
2.6
2.7
Lecture #13
2.8
2.9
2.9.1
2.9.2
Lecture #14
3.1
Lecture #15
3.2
Lecture #16
3.3
Lecture #17
3.4
Lecture #18
3.5
3.5.1
Lecture #19
3.5.2
Lecture #20
3.6
3.7
Lecture #21
3.8
Lecture #22
3.9
3.9.1
3.9.2
Lecture #23
3.10
3.10.1
3.10.2

Indexed addressing mode


Auto increment mode
Auto decrement mode
Stack / implied addressing mode
Instruction Types
Data transfer instruction
Data manipulation instruction
Program controlled instructions
Instruction Set completeness
Machine instruction and Assembly language
Basic Input/output operations
Additional instructions
Logical Shift instructions
Arithmetic Shift instructions
Introduction
Design of Full Adder
Delay in n-bit ripple carry adder
Carry Lookahead Adder
Integer Multiplication (positive / unsigned integers )
Using combinational circuit
Using Sequential circuit
Integer Multiplication (negative and positive / unsigned and
signed integers )
Booths algorithm
Bit pairing method
Integer Division
Restoring Division method
Non restoring division method
IEEE Standard for Floating point Number
Single precision numbers
Double precision numbers

Lecture #24
4.1
4.2
4.3
4.4
Lecture #25
4.5
4.5.1
4.5.2
Lecture #26
4.5.3
4.5.4
Lecture #27
4.6
Lecture #28
4.7
Lecture #29
4.8
Lecture #30
4.9
4.10
4.11
Lecture #31
5.1
5.2
5.3
5.4
5.5
5.6
5.6.1
5.6.1.1
Lecture #32
5.6.1.2
5.6.1.3
5.6.1.3.1
Lecture #33
5.6.1.3.2
5.6.2
Lecture #34
5.7
5.8
5.8.1
Lecture #35
5.9
5.9.1
5.9.2

CPU
Fundamental concepts
Steps taken by CPU
Internal organization of CPU(using single bus)
Simple / primitive functions performed by the processor
Fetching a word from memory
Storing a word into memory
Register transfer
Arithmetic / logic operation
Execution of a complete instruction
Unconditional and conditional branch instructions
Design of Hardwired Control unit
Multiple bus CPU Organization
Micro programmed Control unit
WILKESS MICROPROGRAMMED CONTROL UNIT
What is the need of memory?
What should the characteristics of a memory?
Fundamental concepts
Parameters used to measure the performance of memory
Types of memory components
Types of memory used in main memory
Random Acces Memory(RAM)
Internal organization
RAM Design
Types of RAM
SRAM
DRAM
ROM
Memory Hierarchy
Cache Memory and it's need
Locality of reference
CACHE MAPPING TECHNIQUES
Direct Mapping
Associative Mapping

Lecture #36
5.9.3
Lecture #37
5.10
5.10.1
5.10.2
5.10.3
5.10.4
5.11
Lecture #38
5.12
5.12.1
5.12.2
5.12.3
Lecture #39
5.13
Lecture #40
5.13.1

Set associative Mapping


Cache Updation Schemes
Read hit
Write hit
Read miss
Write miss
Cache performance
Cache block replacement algorithm
FIFO(First In First Out)
OPT(Optimal )
LRU(Least Recently used)
Virtual Memory
Address translation

COMPUTER ORGANIZATION
PCCS 4301
Credit 3-0-0
SYLLABUS
Module I 12 Hrs
Basic structures of Computers: Functional units, operational concepts, Bus
structures, Software, Performance, Computer Architecture vs Computer
Organization.Machine Instruction and Programs: Memory location and addresses, Big-endian
and Little-endian representation. Memory Operations, Instructions and instruction
Sequencing, Addressing modes, Assembly Language, Basic Input/output operations,
subroutine, additional Instructions.
Module II 12 Hrs
Arithmetic : Addition and subtraction of signed Numbers, Design of Fast Adders,
Multiplication of positive Numbers, Signed-operand multiplication , Fast
multiplication, Integer Division, Floating- point Numbers, (IEEE754 s...) and
operations.
Module III 12 Hrs
Basic Processing units: Fundamental concepts, execution of complete Instructions,
Multi bus organization, Hardwired control, Micro programmed control, RISC vs CISC
architecture.Memory System: Basic Concepts, cache Memory, Cache memory mapping
policies,Cache updating schemes, performance consideration, Virtual memories, Paging and
Page replacement policies, Memory Management requirement, secondary storage.
Text Books:
1. Computer Organization:Carl Hamacher, Zvonkovranesic, Safwat Zaky,Mc Graw
Hill,5 th Ed
2. Computer Organization and Design Hardware/ Software Interface: David A.
Patterson, John L. Hennessy, Elsevier, 4 th Edition.
Reference Books :
1. Computer Architecture and Organization: William Stallings, Pearson Education.
2. Computer Architecture and Organizations, Design principles and Application: B.
Govinda Rajalu, Tata McGraw-Hill Publishing company Ltd.
3. Computer Architecture: Parhami, Oxford University Press
4. Computer system Architecture: Morris M. Mano PHI NewDelhi.
5. Computer Architecture and Organization: John P. Hayes Mc Graw Hill
6. Structured Computer Organization: A.S. Tanenbum, PHI
7. Computer Architecture And Organization: An Integrated Approach, Murdocca,
Heuring Willey India, 1 st Edition.

COMPUTER ORGANIZATION
PCCS 4301
Credit 3-0-0
COURSE OBJECTIVE

1. Students will learn the basic structure and operation of digital computer and its
relevance to classical and modern problems of computer design
2. Students will be able to identify where, when and how enhancements of computer
performance can be accomplished.
3. Students will learn in detail about the various components of ALU and I/O.
4. Students will learn the hierarchical memory system including cache memory and
virtual memory.
5. Student will see how to use concepts of computer organization in real-life settings
using various PC performance improvements.
6. Students will also be introduced to more recent applications of computer
organization in advanced digital systems.
7. Students will learn the sufficient background necessary to read more advance
texts as well as journal articles on the field.

COURSE OUTCOME
1. Student will learn the concepts of computer organization for several engineering
applications.
2. Student will develop the ability and confidence to use the fundamentals of
computer organization as a tool in the engineering of digital systems.
3. Students will able to deal with different types of computers and to identify the
problems in components of computers.
4. Students will develop independent learning skills and be able to learn more about
different computer architectures and hardware.

COMPUTER ORGANIZATION
PCCS 4301
Credit 3-0-0
LESSON PLAN
Module Lecture No
No
1

Description of Topic

Study of Digital computer, CO v/s CA

Evolution of computers, Von Neumann & Havard concept,Functional


units,

Operational concepts

Bus structure, software

Performance, RISC, CISC

Memory locations, big and little Endian

Instruction format

Addrssing mode, immediate,direct, register direct

Indirect, register indirect,displacement

10

Relative, based,

11

Indexed,stack

12

Instruction types, machine instruction

13

Basic I/O, Additional instruction

14

Intger representation, sign magnitude, 1's, 2's

15

Design of full adder and fast adder

16

Delay in n-bit ripple carry adder

17

Carry Lookahead Adder

18

Multiplication of unsigned integer, combinitional circuit

19

Multiplication of unsigned integer, sequential circuit

20

Booth's algorithm,

21

Bit pairing method

22

Integer division,restoring and non restoring method

23

IEEE Floating point representation

24

Internal organization of CPU using single bus

25

Read & write operation

26

Register transfer & ALU operation

27

Execution of an instruction

28

Conditional and unconditional instruction and flags

29

Hardwired control unit

30

Micro programmed control unit

31

Memory organization introduction, RAM

32

RAM Design,SRAM

33

DRAM and Design,ROM

34

Memory Hierarchy,Cache and its operation

35

Mapping techniques in cache, Direct, associative

36

Set associative

37

Cache updation schemes, Cache performance

38

Replacement algorithm ,

39

Virual memory organization

40

Virual Address translation

COMPUTER ORGANIZATION
PCCS 4301
Credit 3
Lecture #1

Text Books:
1. Computer Organization , Carl Hamacher, TMH
2. Computer System Architecture, Morris Mano, PHI
Reference Books:
1. Computer Architecture & Organization, William Stallings, Pearson
Syllabus:
Module 1 Fundamental Units, Instruction set Architecture
Module 2 ALU
Module 3 CPU,MEMORY
Prerequisite
1. Knowledge of digital circuit
2. Functionality of various gates
3. Number System
MODULE-1
1.1Terminology used
Digital computer : It is a fast electronic calculating machine / device that accepts digitized
data(through input device or memory), processes according to a set of instructions(cpu),
produces the result(through output unit or memory). Digitized data means discrete values
like 0 and 1.
1.2 Study of digital computer
The study of digital computer has been carried out various ways with different perspective or
analysis.
1.2.1 Computer Organization
It is concerned with the hardware components that form the digital computer.
It deals with the functionality of various components that are connected to build the digital
computer.
Organizational attributes are
(a) memory organization( different parts ,memory technologies used, types etc)
(b) CPU organization( different parts, operations)
(c) I/O organization
1.2.2 Computer Architecture
It is concerned with those attributes which are seen by a program. It has direct impact on the
execution of the program. Computer architecture shows the behavior of the system.
Architectural attributes are
(a) instruction set
(b)instruction format
(c) no of bits used to different various data

(d) different addressing mechanism to access data


These architectural attributes are also called instruction set architecture(ISA).
1.2.3 CO v/s CA
Two examples are given to understand the difference between CO and CA.
Example-1
Two different models from a same vendor like intel are brought to analyze. Both the
models(lap top and desk top) have same processor like core 2 duo. That means both models
understand the same instruction set as you know each processor understands a fixed no of
instructions. Hence forth their architecture is same.
Due to the placement of various hardware components, one model (laptop) is slim and other
is bulky. Hence their organization is different.
Example-2
Let us assume that you are a chip designer . The problem is to include a multiply instruction
in the instruction set of the processor which has already add instruction.
Then the architectural issues are
(1)Instruction format
(2)length of the instruction
(3) addressing methods of operands etc
The organizational issues are
(1) what hardware unit is added?
(2) whether a special multiplier unit or repetitive use of adder unit

Lecture #2
1.3 Evolution of computers
Evolution of computer has been divided into four generations which are based upon the
technologies used.
1st generation
1945-1955
vacuum tubes
2nd generation
1955-1965
Transistor
rd
3 generation
1965-1975
IC
4th generation
1975-onwards
VLSI
st
1 Generation computers use vacuum tubes as the major components. ENIAC(Electronic
Numerical Integrator And Calculator) was designed by John Macchulay and J. Preaper
Ecskert at the university of pensilvaniya.
Characteristics
(1) 30 tones weight
(2) 18000 vacuum tubes used
(3) 15000 sqft occupied
(4)140kw power consumption
(5) processes the data based on decimal system and had programmed manually.
The machine was programmed by setting the switches and plugging and unplugging the
cables which was a tedious job.
At that time , an expert in mathematics, John Von Neumann has given a concept known as
stored program concept or Von Neumann concept which is depicted as follows
the programming could be done if the program could be represented in a suitable
form(digitized) form , that could be stored in memory along with data. The computational
task can be carried out by reading the instructions from memory and performing the
corresponding operation.
Havard concept: Program and data have kept in two different physical memory
2nd generation computers use transistors as the major components. DEC (Digital Equipment
Corporation) produced PDP-11 series of computers which belong to this category.
3rd generation computers uses IC as the major components. IBM produced IBM-360 series
of computers which belong to this category.
4th generation computers uses VLSI as the major components. Intel, Motorolla etc
produced computers which belong to this category.
1.4 Basic functional units of Digital computer
A digital computer consists of five basic functional units. They are
(1)Input Unit
(2)Output Unit
(3)Memory
(4)ALU
(5) Control Unit
I/O
ALU+
Memory
Block diagram
Control

Input Unit: Through input unit, the computer accepts data for processing. As computer
works on discrete data like 0 or 1, so the data provided has to be encoded with suitable
format. Data can be number or character or instruction. Generally, number is represented by
BCD, character is represented by ASCII or EBCDIC , instruction is represented by various
format designed for the processor. The data is sent to memory for storing or sent to processor
for processing.
Output Unit: Its function is to send the processed result to the outside world. It can be CRT
or printer or hard disk. The input and output unit are combined to a single unit which is called
I/O unit.
Memory: The function of memory unit is to store data and program. It is of 2 types
(1)primary memory(main memory)
(2)secondary memory
Main memory: It is a fast memory where program has to be kept before its execution. Main
memory consists of a large no of cells which are capable of storing 1 bit of information either
0 or 1.
Computer handles 1 bit of information rarely, the reason is 1 bit represents a very small
amount of information.
Take the example of storing character A, it ASCII value is 65(01000001)
Main memory is organized in such a way a group of bits is retrieved at any point of time for
the operation. That group of bits is termed as word.
Each word is associated with an distinct address and so using the address the processor is
able to read or write the word from or into memory.
Main memory is very expensive, so a cheaper memory is used called secondary memory
whose characteristics are
(1) slower, cheaper
(2) stores large volume of data
ALU: All the arithmetic and logical operation are carried out in ALU. It consists of a
number of high speed registers which are used to store the operands or intermediate results.
Control Unit: The operation of all units are coordinated by the control unit. It generates
various timing signals to perform an operation required by the instruction.
Processor generally contains both ALU and control unit.

Lecture #3
1.5 Basic operational concepts(processor memory interaction)
How a program is executed? That means what are the operations carried out to execute an
instruction According Von Neumann concept, the program and data have to be kept in
memory before its execution.
Let my program consists of a single instruction e.i ADD locA,R0; [loc A]+[R0][R0].
Block Diagram
Memory

MAR

PC

MDR
R0
R1

IR

CONTROL

ALU

R n-1

The functionalities of various registers are given below


PC It is a counter.
It contains the address of the instruction which is being currently executed.
It keeps track of the execution of the program
MAR and MDR are two special registers to communicate with the memory.
MAR It holds the address of memory to or from which data has to be transferred.
MDRIt contains the data that is written into or read from memory.
IR The instruction is decoded here. So the processor comes to know what type operation
required and where from it gets the operand for the operation.
R0 to R(n-1) They are general purpose registers which are used to keep temporary results
or used for special purpose.
1. [PC]10
2.[PC][MAR]
Read control signal is sent from control unit to memory
PC is incremented to point to the next instruction or reset.

3.[10][MDR]
4.[MDR][IR]
Decoded add instruction
Operands are one in memory location locA and another is in R0.
5.[MAR]locA
Read control signal is sent from control unit to memory
6.[locA][MDR]
7.Content of [MDR] and [R0] are forwarded to ALU for addition. Addition is carried out.
The result is stored in a temporary register.
8.result[R0]
Similarly, write down the steps for
(i)ADD R0,locA;
(ii) ADD R0,R1;

Lecture #4
1.6 Bus structure
So far , we have discussed
(1) the functionalities of basic units
(2) how the processor interacts with memory to execute an instruction
To read the data from the memory
The address has to be forwarded from processor to memory and
The read control signal has to be forwarded from processor to memory
To writ the data into the memory
The address has to be forwarded from processor to memory,
The data has to be forwarded from processor to memory
The write control signal has to be forwarded from processor to memory.
It implies that these units must be connected in some organized way to make the system
operational. So the connection is must. Then the connection is done with a single wire or
multiple wires.
As you know, the computer works on a word, so using multiple wire is beneficial.
Hence, a group of wires is required to transfer the data,address and control signal, is called a
bus.
A bus is a path way that communicates two or more devices.
The bus structure is of two types.
(1) single bus structure
(2)multiple bus structure
1.6.1Single bus structure
Block diagram

Input

Output

Memory

Processor

All units are connected to a single bus . Then find out the direction of data transfer among
various units.
Unidirectional
Bi-directional
Characteristics
(1) only one transfer is carried out ay any point of time.
(2) Only two units use the bus at any point of time.
(3) its cost is low.
(4) it is very easy to connect a unit to the bus.

1.6.2 Multiple bus structure


Block diagram
Input
Memory

Processor

Output

Processor interacts with memory through memory and I/O devices interact with the processor
with I/O bus.
Characteristics
(1) multiple transfers are carried out.
(2) it is costlier
(3) it is faster than single bus structure.
1.7 Software
It is a program or a collection of programs to assist the user to enter and run the users
program. There are two types of software
1.7.1 System software: It is a collection of programs that are executed to perform the
following functions
a) receiving and interpreting user commands
b)entering and editing application programs and storing them as files in secondary storage
devices.
c)managing the storage and retrieval of files in secondary storage devices.
d)running standard application programs such as word processor, spread sheets or games
with data supplied by the user.
e)controlling I/O units
f)translation the source program into object program
g)linking and running users application program with standard library routines.
Example of system software are operating system, compiler etc.
1.7.2 Application software : Application software is a program which users need to perform
their specific tasks. It is generally written in high level languages like c,c++,java etc.
Example of application software are users program, multimedia software etc.

Lecture #5
1.8 Performance
The most important measure of the performance of a computer is how quickly it can execute
the program. That means the speed with which a computer executes a program. Hence the
performance is affected by following factors
a) design of compiler which generates the object code
b) design of instruction set of processor
c)design of hardware like processor, disk, I/O devices
So to measure the performance of the complete computer system , the total time required to
execute the program is considered which is termed as elapsed time. Elapsed time consists of
processor time, I/O time, memory access time, disk access time etc.
To measure the performance of processor, the processor time is considered. It is the time
during which the processor is active. Processor time is the part of elapsed time .
Clock and clock cycle: The processor circuit is controlled by a timing signal called clock.
The clock defines regular time intervals called clock cycles. To execute an instruction, the
processor performs a sequence of basic steps and each step is completed in one clock cycle.
That means a processor takes a number of clock cycles to execute an instruction. The length
or duration of clock cycle affects the performance of processor.
If P is the length or duration of clock cycle, then clock rate is R=1/P , that no of clock cycles
per second(hertz).
Let the processor is controlled by a 500MHz clock(frequency). That means clock rate is
1/500106 Htz=0.210-8 sec=210-9 sec=2ns.
Basic Performance Equation
The basis performance equation is derived by considering the processor time which is a part
of elapsed time.
Let T be the processor time require to execute a program.
The program has N number of machine instructions and the average basic steps needed per
instruction is S.
So a total NS basic steps are needed for the program and the basic steps may be memory
read or memory write or register transfer or ALU operation etc. Each basic step is completed
in one clock cycle.
That means NS cycles are required to execute the program.
Let the clock rate is R cycles per second.
R cycles takes 1 second.
Then NS cycles takes (NS) / R second which is the processor time T.
So the basic performance equation is
T=( NS) / R
1.9 RISC v/s CISC
The computers are classified into two types depending upon the instruction set followed by
their processors. They are
(1)Reduced Instruction set Computer(RISC)
(2)Complex Instruction set Computer(CISC)
RISC
CISC
1. They have less number of simple 1. The size of instruction set is large and the
instructions. These instructions are used to instructions are of complex type. Many
operate on data which are available in the instructions can access memory.
registers. The only operation that affects

memory are load and store instruction


2. They have very large number of registers
3. All instructions are of fixed length.
4.All instructions take a fixed no of clock
cycles to execute
5. They use few addressing mode in their
instructions
6. Compiler overhead is complex.
7.Generally, it has hardwired control unit.
8. Pipelined is easy for RISC.

2. They have less number of registers


3. They have variable length instructions.
4.All instructions take variable no of clock
cycles to execute
5. They use almost all addressing mode in
their instructions
6. Compiler overhead is simple.
7.Generally, it has
micro programmed
control unit.
8. Pipelined is difficult for CISC.

Lecture #6
2.1Memory locations and addresses
Memory consists of a large no of storage cells and each cell is capable of storing 1 bit of
information 1 or 0.But a processor handles a group of bits at a time which called a word. If
the processor handles n bits at a time, then the word length is n. Now a days, 16/32/64 bits
word processor is available in the market.To read or write a word from or into memory , an
address is required. Generally, integers are used to specify the address.
Block diagram
Processor

0
1

00
Memory
01
10
11
total addresses

Processor

Length of address
1
1
2
2
3
3
k
k

Memory

no of lines
2
0,1
4
0,1,2,3
8
0,1,2,3,4,5,6,7
2k
0 to 2k - 1

addresses

2k is called the address space of the computer. The no of addresses are produced or
recognized by the processor.
10 bits 1K
20 bits 1M
30 bits 1G
2.2 Content of memory
Instruction
Content

fixed point
number

Operands

floating point
character

So the content of memory can be


(1) number
(2) character
(3)instruction

Let us consider a 32 bit word processor.


The integer can be represented as
b31 to b0
Magnitude = b30 230 + b29 229 +..+ b0 20
b31 = 0 for positive integers
1 for negative integers
The character can be represented by ASCII
8 bit character
8 bit character

8 bit character

8 bit character

The instruction can be represented as


Operation code
Addressing information about operands
Now though word size is 32 bits, so 4 characters can be read fro memory or can be written
into memory. But sometime there is a need of working on a single character by the processor.
There are two options available to the processor
(1) 32 bits are retrieved and the unnecessary 24 bits are discarded.
(2) the processor uses extra hardware to access 8 bit data or a single character.
Now a days all the processors achieve the mechanism to access 8 bit data. Alternatively, the
smallest addressable unit accessed by the processor is byte and that processor is called
the byte addressable processor. It is a necessity to have the flexibility to access the operand
which is smaller than the word size.
The byte addressable processor uses two types mechanism to access the word
(1) Big Endian
The most significant byte is stored in the lowest byte address
(12345678)16 is the data to be stored in the memory, then the content of memory is given
below
12

34

56

78

(2) Little Endian


The least significant byte is stored in the lowest byte address
(12345678)16 is the data to be stored in the memory, then the content of memory is given
below
78

56

34

12

Problem: Consider a computer has a byte addressable memory organized in 32 bit word
memory.A program reads your name and which is stored in successive memory locations
stating at address 1000. Show the content of memory if the processor uses
(1) Big Endian Scheme
(2)Little Endian Scheme

Lecture #7
2.3 Instruction Format
The instruction format is given below
Opcode
Addressing information about operands
Opcode=m bits
Addressing info=n bits
Instruction length=(m+n) bits
Q??? How many addressing fields are specified in an instruction? (Morris Mano)
The no of addressing field specified in an instruction depends upon the internal organization
of processor registers. There are 3 types of organization
(1) Single accumulator organization
(2) general register organization
(3) stack organization
There are 4 types of instructions generally used by the processor whose classification is
based on the no of address field specified. They are as follows
(1) 3 address instruction
(2) 2 address instruction
(3) 1 address instruction
(4) 0 address instruction
Then we have to find out which organization is suitable among 4 types of instructions
Single accumulator organization
Here all operations are performed with an implied accumulator register. Implied means there
is no need to specify the register in the instruction.
For example, ADD R1; [R1]+[Acc][Acc]
So we can easily say that, 1 address instruction fits well in single accumulator organization.
General register organization
This organization has a lot of general purpose registers. All the operations are performed by
specifying the registers explicitly in the instruction.
For example,
ADD R1,R2; [R1]+[R2][R2]
ADD R1,R2,R3; [R1]+[R2][R3]
So we can easily say that, 2 address instruction and 3 address instruction fits well in general
register organization.
Stack organization
Stack organized computer has performed all operations on data which is pointed by a special
register, called stack pointer. The content of the stack pointer is updated automatically.
For example,
ADD ;
So we can easily say that, 0 address instruction fits well in stack organization
No address is specified in the instruction. For a binary operation the stack is popped out 2
times and the operation is carried out and the result is placed at top of the stack.

Two extra instructions are used for push and pop operation
PUSH X; [X][TOP]
POP X ; [TOP][X]
Q: Evaluate the expression X=(A+B) (C+D) using 3,2,1,0 address instructions where
X,A,B,C,D are memory locations .
Using 3 address instructions
ADD A,B,R1;
ADD C,D,R2;
MUL R1,R2,X;
Using 2 address instruction
MOV A,R1;
ADD B,R1;
MOV C,R2;
ADD D,R2;
MUL R1,R2;
MOV R2,X;
Using 1 address instruction
LOAD A;
ADD B;
STORE T;
LOAD C;
ADD D;
MUL T;
STORE X;
Using 0 address instruction
PUSH A
PUSH B
ADD
PUSH C
PUSH D
ADD
MUL
POP X

Lecture #8
2.4 Addressing mode
To execute an instruction , the processor fetches the instruction from memory. That
instruction is loaded into IR via MDR. In IR, the instruction is decoded. That means the
processor comes to know the operation required and for the operation the processor needs
data or operand. So the instruction must contain some information about the address of data
which is termed as the addressing mode.
Addressing mode is the way in which the address of operand is specified in an instruction.
Purpose -1) The operands are chosen by the processor during the operation
2)Programmer must have the knowledge about the operands because he or she
implements the logic accordingly.
There are variety of addressing modes used by the instructions of a processor. They are
2.4.1 Immediate addressing mode
Operand or immediate data is specified in an instruction.
Operand is part of the instruction.
Opcode

operand

Example: ADI 05; [Acc]+05[Acc]


Effective address= not required
Advantage: There is no need of memory or register reference as operand is available in the
instruction. So this mode facilitates the fastest mode of operation that means the instruction
which is based on this mode, is the fastest among other addressing mode based instruction.
Disadvantage: The magnitude of the operand is limited.
For example , if 8 bits are used to specify the operands, then only integer between 0 to 255 is
specified.
2.4.2 Direct addressing mode
The address of operand is explicitly specified in an instruction.
The address of operand is the part of the instructio
Opcode
Address(A)

operand
Example, LDA 1000; [1000][Acc]
Effective address= A
Advantage : It requires only one memory reference.
Disadvantage: It provides a limited address space to keep the operand. For example , if 8 bit
is used to specify the address, then the address space is from 0 to 28 1.

2.4.3 Register direct addressing mode


Register is explicitly specified in an instruction.
Register name is the part of the instruction.
Opcode
Register(R)
operand
Effective Address=R
Advantage: No memory reference is required. So it is faster than direct addressing mode.
Disadvantage: If memory address is included, then very less no bits are available to specify
the address.
Problem: Encoding the instruction of a processor.

Lecture #9
2.4.4 Indirect addressing mode
Address is explicitly specified in an instruction and that address contains the address of
the operand.
Address of address of operand is part of the instruction.
Opcode
Address(A)
Add Of Operand

operand
Effective address=(A)
Advantage: There is no restriction to specify the address. The operand can be kept in within
the address space of the processor.
Explain with example given in the class
Disadvantage: There is a need of two memory reference to fetch the operand . So it is slower.
2.4.5 Register indirect addressing mode
The register is explicitly specified in an instruction which contains the address of operand.
Opcode
Register(R)
Address
Register

operand
Effective Address=(R)
Advantage: it is faster than the indirect addressing mode as there
Disadvantage: If memory address is included, then very less no bits are available to specify
the address.
2.4.6 Displacement addressing mode
It is the combination of direct and indirect addressing mode. In which sense it is direct and in
which sense it is indirect.
Here, addresses are specified in an instruction , so it is same as the direct.

But that address is not the effective address of the operand, so it is same as the indirect. The
effective address is calculated as follows
opcode

Register

Address/Offset(A)

Register 0
Register 1
Register n

operand

Effective address=(R)+A
To calculate the effective address, two address are required. But the instructions based on this
addressing mode follows
(1) either register is implicit and address/offset is explicit.
(2) either register is explicit and address/offset is implicit.
(3) both are explicit

Lecture #10
This mode is known by variety of names depending upon the context of its use. They are
(a) Relative addressing mode
(b) Base register addressing mode
(c) indexed addressing mode
2.4.6.1 Relative addressing mode
Here the implicit register is PC which is not specified in the instruction and only
displacement/ offset is specified in the instruction.
For example JUMP +2; [PC]+2[PC]
Explain purpose of the above instruction in a program.
2.4.6.2 Base register addressing mode
When the processor contains a single base register, the register is implicit and only offset
is specified.
Opcode

offset

Effective address=[base register]+offset


When the processor contains more than one base register, the register is explicit and offset is
specified.
Opcode
Register offset
Effective address=[base register]+offset
Advantage: If total no of base register is n , then a single instruction has the flexibility to
access n different segments of memory.
If the offset is k, then that instruction can refer any of n segments of 2k words each.
Justify with example.

Lecture #11
2.4.6.3 Indexed addressing mode
When the processor contains a single index register, the register is implicit and only
memory address is specified. The
From indexed addressing mode, two new modes are implemented.
2.4.6.3.1 Auto increment mode
To calculate the effective address, the content of register is added with the address
specified in the instruction . After that the content of register is incremented.
2.4.6.3.2 Auto decrement mode
The content of register is decremented first. The decremented value is added with the address
specified in the instruction to find the effective address.
2.4.7 Stack / implied addressing mode
In stack addressing mode, the address is not specified in an instruction. The effective
address is available in a special register called stack pointer. The 0-address instruction is
based on stack addressing mode. While executing the instruction, the processor refers the
stack pointer implicitly to fetch the operands.
Exercises based on addressing mode

Lecture #12
2.5 Instruction Types
Computers provide an extensive set instructions to give the programmer the flexibility to
carry out various computational tasks.
Most computer instructions are classified into 3 categories. They are
1) Data transfer instruction
2)Data manipulation instruction
3)Program control instruction
2.5.1 Data transfer instruction
These instructions are used to transfer the data from one location to another. Th transfer is
in between
Processor memory
Memory processor
Processor register
Register- processor
Processor I/O
I/O Processor etc.
Example :
LDA 1000; [1000] [Acc]
STA 1000; [Acc][1000]
MOV B,C; [B][C]
2.5.2 Data manipulation instruction
These instructions are used to perform operation on data and provide the computational
capabilities of a computer
These instructions are of 3 types.
a) Arithmetic instructions
ADD B; [B]+[Acc][Acc]
SUB B;[Acc]-[B][Acc]
b) Logical & bit manipulation instructions
AND B; [Acc] & [B][Acc]
ORA B ;[Acc] || [B][Acc]
c)Shift instructions
RLC ; Rotate left with carry
RRC; Rotate right with carry.
2.5.3 Program controlled instructions
Each time an instruction is fetched from memory, the PC is incremented . So that it
contains the address of next instruction in sequence, which is called straight line sequencing.
Straight line sequencing facilitates the sequential flow of execution. Some times there is a
need of changing the control to different location. That can be achieved by program
control instruction. A program control instruction is used to change the content of PC which
may cause the flow of control to be altered.
For example,
Unconditional instruction JUMP +2; [PC]+2[PC]
Conditional instruction JNZ address,
Machine control instruction HLT.

2.6 Instruction Set completeness


The set of instructions are said to be complete if the computer includes sufficient number
of instructions in each of 3 categories, i.e data transfer instruction, data manipulation
instruction and program control instruction.
So, a program is capable to evaluate any function which is known to be computable.
2.7 Machine instruction and Assembly language
Machine instructions are represented by patterns of 0s and 1s. Within a pattern, some part
represents the operation and others represent s the addressing information about the
operands. But it is very difficult to write and analyse a program using machine instructions.
Therefore, symbols are used to represent these patterns and these symbols are called
mnemonics.
For example, MOV N,R1;
ADD B,C etc.
A complete set of such symbolic names and rules for their use constitute a programming
language which is referred as assembly language.
The assembly language program is translated into a sequence of machine instructions by a
program called assembler. The assembly language program is called a source program and
the machine language program is called object program.
Instruction execution and straight line sequencing:
L et us consider a program for [C]=[A]+[B]. For this problem, the program will be
Address
Instruction
I
MOV A,R0
I+4
ADD B,R0
I+8
MOV R0,C
The program is stored at location I and each instructions length is 4 bytes. So at i+4, second
instruction and at i+8 , third instruction is stored.
To execute the program, the processor places the address I into program counter(PC). Then
the instruction is placed in IR which is carried out the instruction phase. Then the instruction
in IR is decoded. That means the operation required is found and operands are fetched and
the operation is performed which is carried out in execution phase. Then, we say than an
instruction is executed. The processor executes an instruction at a time using the content of
PC and then the execution of next instruction is carried out with the incrementing content of
PC, which is called straight line sequencing.
Subroutine:
A subroutine is a block of code which is used to solve a particular task. It is same as the
function in high level language program.
It is required when a particular task is needed repetitively by a program. So instead of
writing, same set instructions for a particular task in several times in a program, a single copy
is placed in memory to save space. Whenever it is needed, a program calls the subroutine
using a call instruction and that program is called calling program.
After the subroutine has been executed, the calling program must resume. Hence the
subroutine must contain a return statement to return to the calling program.
The execution of subroutine is carried by the call and return instruction using a dedicated
register called link register.

For example,
Address

Instruction(calli
ng program)
..
CALL AREA
Next instruction

200
204

Address

Instruction(subroutine
)

1000(AREA)
1004
1008

Inst-1
Inst-2
RETURN

1000
PC
LINK

204

204

The call instruction is a branch instruction (AREA is the branch address where the
subroutine is stored) that performs the following operations
Stores the content of PC i.e 200 into link register
Branch to the target address specified by the call instruction(the address is 1000). PC
is loaded with branch address i.e.1000.
The return is also a branch instruction which performs the following operations
Branch to the address contained in the link register. The content of link register i.e.
204 is loaded into PC.

Lecture #13
2.8 Basic Input/output operations
One of the basic feature of a computer is its ability to exchange data with I/O devices I.e;
the transfer of data between the processor and i/o devices.
Accessing I/O devices:
Let us take a simple arrangement of I/O devices with processor and memory. Is it possible to
connect the I/O device with a processor directly through a bus??

processor

I/O device1

memory

I/O Device-2

It may not be feasible and the reasons are


1) I/O devices are electromechanical and electromagnetic devices whose operation is
different from processor which is electronic device.
2) The data transfer rate of I/O device is slower than the processor. So there is a need of
synchronization mechanism.
3) Data format in I/O device is different from the format used by the processor.
4) The operating modes of each I/O device is different from other. So each must be
controlled by the processor separately without disturbing others.
To resolve the differences, a special hardware component called I/O interface or I/O module
is required to transfer the data between the processor and the device.
Processor

Address
decoder

Control
circuitry

I/O device

Data & status


register

Address decoder: It enables the device to recognize its address when the address appears in
the address line.
Data register: It holds the data transferred to or from the processor.
Status register: It contains the information relevant the operation from the processor.
Control circuitry: It is responsible to coordinate the I/O transfer.
I/O transfer mode or techniques:
The simplest way I/O transfer is program controlled I/O.
Program controlled I/O
Let us take a simple example how keyboard transfers a character to processor and then how
processor transfers to CRT.

PROCESSOR

DATAIN

SIN
Control
circuitry
KEY
BOARD

DATAOU
T
SOUT
Control
circuitry

CRT

When a key is pressed, the pressed value is stored in DATAIN and SIN is set to 1. When the
processor reads the value from DATAIN, SIN clears to 0.
Similarly, when SOUT=1, the CRT is ready to receive a character and when SOUT=0, the
CRT is bust to display the data,Then a program is implemented to facilitate the I/O transfer.
The program is as follows
READWAIT : If SIN=0, THEN Branch to READWAIT
MOVE DATAIN,R0; (R0 is a processor register)
WRITEWAIT: If SOUT=0,then Branch to WRITEWAIT
MOVE R0,DATAOUT
So to display the character on CRT from the keyboard, the processor repetitively checks the
status of SIN or OUT which is carried out by a program. That means , processor executes a
program that gives the direct control of I/O operation. Thats why , this type of I/O transfer is
called program controlled I/O.

2.9 Additional instructions


2.9.1Logical Shift instructions
There are two logical shift instructions LShift and RShift. These instructions shift an
operand over a number of bit positions specified in a count operand contained in the
instruction.
The general format of left shift instruction is
LShift count, dst
Where count can be an immediate data or the content of register and the result will be stored
in a destination register dst.
For example, LShift #2,R0;
1

So it is left sifted two times .


After shifting two times, the content of the register R0 is given below
1
0
0
0
1
The vacated position are filled with 0.
Finally the content of the register R0 is given below

1
0
0
0
1
1
0
Similarly, for right shift
RShift #2,R0
1
1
1
0
0
0
1
So it is right sifted two times .
After shifting two times, the content of the register R0 is given below

1
1
1
The vacated position are filled with 0.
Finally the content of the register R0 is given below
0

2.9.2 Arithmetic Shift instructions


Arithmetic Shift instructions are of two types whose format are given below
AShiftL count, dst
AShiftR count, dst
Left shift is equivalent to multiplication by 2 and right shift is equivalent to division by 2.
For example, AShiftL #1,R0;
0
0
0
0
0
1
Carry bit is repeated. The content of register R0 is given below

0
0
0
For example, AShiftR #1,R0;
0
0
0

Carry bit is repeated. The content of register R0 is given below


0

2.9.3 Rotate instructions


In shift operation , the bits shifted out of the operand are lost. To preserve the bits, rotate
instructions are used.
Two versions of rotate instructions are used for left and right rotate instructions. one is
without carry and other is with carry.
For example without carry
RotateL #1,R0
1

The final content of R0 is


0
0
0

For example with carry


RotateLC #1,R0
carry
0

After
shifting with carry, the content of R0 is
1

Similarly, for RotateR #1,R0 and RotateRC #1,R0 , the same operation will be carried out
with right shifting.

Lecture #14
MODULE-2
ARITHMETIC UNIT
3.1.Introduction
The content of memory word is
Instruction

fixed point number(integer)

Word
Numeric
floating point number
operand
character
Three types of schemes or system are used to represent fixed point numbers or integers. They
are
1. sign and magnitude
2. 1s complement
3. 2s complement
A 4-bit binary no is represented as follows
b3
b2
b1
b0
Sign &
1s complement 2s complement
magnitude
0
1
1
1
+7
+7
+7
0
1
1
0
+6
+6
+6
0
1
0
1
+5
+5
+5
0
1
0
0
+4
+4
+4
0
0
1
1
+3
+3
+3
0
0
1
0
+2
+2
+2
0
0
0
1
+1
+1
+1
0
0
0
0
+0
+0
+0
1
0
0
0
-0
-7
-8
1
0
0
1
-1
-6
-7
1
0
1
0
-2
-5
-6
1
0
1
1
-3
-4
-5
1
1
0
0
-4
-3
-4
1
1
0
1
-5
-2
-3
1
1
1
0
-6
-1
-2
1
1
1
1
-7
-0
-1
How do you represent a integer using three schemes ?
Take an integer -5.
Sign & magnitude representation is= 1101 where msb bit represents the sign. For positive , it
is 0 and for negative it is 1.

1s complement representation is
1s complement of +5=1s complement of(0101)=1010
2s complement representation is
1s complement+1= 1010+1=1011
Similarity & Difference among three representations
a. For positive integers, all representations are same. But for negative integers, all
representations are different.
b. For 0, there are two representation in sign & magnitude and 1s complement form. In
2s complement, there is only one representation.
c. In sign & magnitude and 1s complement form, the range of integers is from -7 to +7 .
But in 2s complement form, the range is from -8 to -7 for 4 bit binary numbers. That
means , more numbers are represented.
Q: 2s complement is used in digital computers. Give reasons.
The reasons are
1. There is a single representation for 0.
2. More integers are represented.
3. It provides an efficient way for addition and subtraction. A single adder circuit is
used for both operations.

Lecture #15
3.2 Design of Full Adder
Let us add two n bit nos i.e; xn-1 xn-2 x1 x0 with yn-1 yn-2 y1y0 .
xn-1 xn-2 x1 x0
+ yn-1 yn-2 y1 y0 .
Now bit by bit addition is carried out and each addition is a stage. x0 with y0, x1 with y1 .
xn-1 with yn-1 .
For each stage , there are four combinations 0 0, 01, 10 and 11 with either carry 0 or 1. So
the truth table is
xi
0
0
0
0
1
1
1
1

yi
0
0
1
1
0
0
1
1

Carry_in ci
0
1
0
1
0
1
0
1

Sum si
0
1
1
0
1
0
0
1

Carry_out ci+1
0
0
0
1
0
1
1
1

The expression for sum si = xi ex-or yi ex-or ci


The expression for Carry_out ci+1 = xi yi + xi ci + yi ci
So a single stage binary addition is implemented by the digital logic given below
Refer fig 6.2 (page 370)
Design of Fast Adder
3.2.1 n-bit ripple carry adder
To add two n-bit nos, a cascaded connection n full adder is needed whose figure is given
below
Refer fig 6.2 (page 370)
Here, until c1 is generated, no further addition is carried out.
3.2.2 cascade of k n-bit adder
If the input length is k n bits, then the addition is carried out by using k no of n-bit adder
whose figure is given below
Refer fig 6.2 (page 370)
Here, until cn is generated, no further addition is carried out.
3.2.3 modified n-bit ripple carry adder
The n-bit ripple carry adder can be modified to perform both addition and subtraction
whose figure is given below
Refer fig 6.3 (page 371)
When add/sub=0; it is addition. The no y is ex-ored with 0 , so the same no is coming out
and carry_in is set to 0.

When add/sub=1, it is subtraction. The no y is ex-ored with1, so 1s complement is coming


out and the carry_in is set to 1. So the 2s complement is added with x.

Lecture #16
3.3 Delay in n-bit ripple carry adder
For Sum si = 1 gate delay
For Carry_out ci+1 = 2 gate delay
There is a need of delay calculation of carries from c0,c1, .. upto c n-1 . There are n no of
carries out of which
For c0=0 delay
c1=2 delay
upto c n-1=2(n-1)=2n-2 delays
for s n-1 = 2n-2+1=2n-1 delays
for c n= 2n-2+2=2n delays
Take an example of 4 bit ripple carry adder

s0=1
c0=0
c1=2
s1=3
c2=4
s2=5
c3=6 ; 2n-2=24-2=6
s3=7; 2n-1=24-1=6
c4=8; 2n=24=8 where n is 4 bit.
Delay in modified n-bit ripple carry adder
c n-1=2(n-1)=1+2n-2=2n-1 delays ; 1 for ex-oring.
for s n-1 = 2n-1+1=2n delays
for c n= 2n-1+2=2n +1 delays
for checking the over flow=2n+1+1(for ex-oring c n and c n-1 ) = 2n+2 delays .
Why EX-OR is used to check the overflow?
How the delay is reduced?
There are two options to reduce the delays. They are
(i) to use faster possible electronic technology( faster gates)
(ii)to modify the logic gate network which produces the sum and carry with less delay
Under option (ii), the adder used is carry lookahead adder.

Lecture #17
3.4 Carry Lookahead Adder
A carry lookahead adder is used to speed up the generation of carry signals. the expressin for
sum and carry are used to implement the design the basic cell.
The logic expression for sum si= xi ex-or yi ex-or ci.
for Carry_out ci+1 = xi yi + xi ci + yi ci
For factoring Carry_out ci+1 = xi yi + ( xi + yi ) ci
=Gi +Pi ci
Where Gi and Pi are called the generate and propagate functions for stage i. The figure given
below the cell implementation FIG 6.4 (a) of Hamacher book.
Figure 6.4(a) page 373

ci+1= Gi +Pi ci
The cell is designed to generate the carry out with less delay.The various combinations of xi
and yi are checked.
1) case-1(xi = yi =1)
Carry out is 1 when Gi is 1 ,it is possible when xi = yi =1 irrespective of carry in 1 rr 0.
2) case-2( xi =0 , yi =1 or xi =1 , yi =0)
The propagate function from the expression Pi = xi + yi , but the propagate function
derived Pi = xi ex-or yi . This derived function generates carry out 1 if carry in is 1.And also
This function generates carry out 0 if carry in is 0.
That means using the value of xi , yi and carry in , the carry out will be calculated with less
delay.
3)case-3(xi = yi =0)
The carry out is always 0 whether carry in 0 or 1.
Then the 4 bit adder is designed which figure given below FIG 6.4 (b) page 373 of
Hamacher book.
Here all Gi and Pi are calculated in 1 gate delay because Gi = xi yi (using AND gate)
Pi == xi ex-or yi (EX-OR gate.)
All carries are calculated in 3 gate delays as ci+1= Gi +Pi ci (one AND delay for Pi ci and
another OR delay for Gi +Pi ci.
After all carries are generarted in 3 gate delays, one X-OR delay is required to generate the
all sums. That means 4 delays are required to generate the sums. The carries can be
implemented
C1 =G0 + P0C0
C2 =G1+ P1C1==G1+ P1(G0 + P0C0)= G1+ P1G0 + P1P0C0
C3 =G2+ P2C2= G2+ P2(G1+ P1G0 + P1P0C0)= G2+ P2G1+ P2P1G0 + P2P1P0C0

C4 =G3+ P3C3= G3+ P3(G2+ P2G1+ P2P1G0 + P2P1P0C0)= G3+ P3G2+ P3P2G1+ P3 P2P1G0 +
P3P2P1P0C0
All P0, P1, P2 , P3 and all G0, G1, G2 , G3 are generated in 1 gate delay.
3 gate delays for C1.
AND
P0C0

OR
with G0

3 gate delays for C2

AND
P1G0

AND
P1P0C0

OR
with
G1

3 gate delays for C3

AND
P2G1

AND
P2P1G0

OR with
G2

AND
P2P1P0C0

3 gate delays for C4


AND
P3G2
OR with
G3
AND
P3P2G1

AND
P3 P2P1G0

AND
P3P2P1P0C0

All carries are generated in 3 gate cycles and all sums takes 4th gate cycle i.e one gate cycle
for ex-or.
But in 4 bit ripple carry adder requires 7 gate delays for S 3 and 8 gate delays for C4.

Lecture #18
3.5 Integer Multiplication (positive / unsigned integers )
product of two n bit integers is accommodated in 2n digits. Let us multiply
13

11
------------------?
Manually we multiply, we get the correct result.
13 and its binary 1101 is the multiplicand(M) and 11 and its binary is the multiplier(Q).
Both are 4 bit integers and the product occupies 8 bit digit.
1 1 0 1
1 0 1 1
--------------------------------PP0
1 1 0 1
PP1
1 1 0 1
PP2
0 0 0 0
PP3
1 1 0 1
----------------------------------------------------------Product
1 0 0 0 1 1 1 1 (143)
Here PP0,PP1,PP2,PP3 are called the partial products.
For each step , multiplication of multiplicand by one bit of multiplier is performed.
If multiplier bit is 1, then multiplicand is appeared in appropriate position and added to the
partial product.
This multiplication is implemented using combinational circuit and also sequential circuit.
3.5.1 Using combinational circuit
Figure 6.6 of Hamacher(page-377)

The main component in each cell is a full adder FA.The AND gate in each cell determines
whether a multiplicand bit is added or a 0 is added
From the beginning, the partial product 0 0 0 0 is supplied.
If multiplier bit =1, then multiplicand 1101 is added to the partial product using FA.
If multiplier bit =0, then 0000 is added to the partial product using FA.
Disadvantage:
It has a long propagation delay.

Lecture #19
3.5.2 Using Sequential circuit
Figure 6.7 of Hamacher(page-379)

From the beginning, Q is loaded with multiplier.


M is loaded with multiplicand.
C and A are cleared to 0.
When multiplier bit=1, multiplicand is selected for addition and it is shifted.
When multiplier bit=1, 0 is selected for addition and it is same as No Add and then it is
shifted.
After 4 cycle, the product will be available.
Let us take the same example
1 1 0 1
1 0 1 1
--------------------------------From the beginning A=0 0 0 0
Carry C=0
st
1 cycle
Q=1,
0 0 0 0
1 1 0 1
--------------------------------C=0 1 1 0 1
Shift
0 1 1 0 1
2nd cycle
Q=1
1 1 0 1
----------------------------C=1 0 0 1 1 1
Shift
1 0 0 1 1 1
3rd cycle
Q=0
0 0 0 0
----------------------------C=0 1 0 0 1 1 1
Shift
0 1 0 0 1 1 1
4th cycle
Q=1
1 1 0 1
----------------------------C=1 0 0 0 1 1 1 1
Shift
1 0 0 0 1 1 1 1
So, After 4 cycles, the result is 10001111=143.

Lecture #20
3.6 Integer Multiplication (negative and positive / unsigned and signed integers )
Whats about signed integers?
Let us try
-4
3
----+4 = 0100
-4= 1s complement of 0100=1011 +1=1100
+3=011
1 1 0 0
0 1 1
----------------------------1 1 0 0
1 1 0 0
0 0 0 0
--------------------------------------------------1 0 0 1 0 0
The result 100100 is negative. Its 2s complement is positive i.e;
1s complement of 100100=011011+1= + 28, that means by conventional
multiplication, the signed integers produce the wrong result. Hence Booths algorithm is used
for multiplying the signed integers.
3.7 Booths algorithm
Example-1
Let us try
-4
3
----+4 = 0100
-4= 1s complement of 0100=1011 +1=1100
+3=011
1 1 0 0
0 1 1
----------------------------Step-1: Convert the digits to binary of equal length with sign bit.
1 1 0 0 (Multiplicand)
0 0 1 1 (Multiplier)
----------------------------Step-2: Encode the multiplier.
If LSB is 0, then replace it with 0.
If LSB is 1, then replace it with -1.

For other bits, follow the table


Bit i
Bit i-1
0
0
1
1

0
1
0
1

Bit i
replaced by
0
+1
-1
0

So the multiplication is carried out as follows


1 1 0 0 (Multiplicand)
0 +1 0 -1 (Multiplier)
----------------------------?1
?2
?3
?4
Step:3 Next double the length of the partial product. If MSB is 0, then append with 0 and if
MSB is 1, append with 1.
?1=Then 1 1 0 0 -1= -(1 1 0 0) 1=0 1 0 0 1= 0100
?2= 1 1 0 0 0= 0 0 0 0
?3=1 1 0 0 1 = 1 1 0 10
?4=1 1 0 0 0= 0 0 0 0
Then the multiplication as follows
1 1 0 0
0 +1 0 -1
----------------------------0 0 0 0 0 1 0 0
0 0 0 0 0 0 0
1 1 1 1 0 0
0 0 0 0 0
------------------------------------------------------------------------------1 1 1 1 0 1 0 0
So the result is negative as the MSB is 1.
Its 2s complement is =0 0 0 0 1 0 1 1+1=0 0 0 0 1 1 0 0(+12).
So the result is 1 1 1 1 0 1 0 0 is -12.
Example -2
Let us try
+4
+3
----+4 = 0100
+3=011
0 1 0 0
0 1 1
----------------------------Step-1: Convert the digits to binary of equal length with sign bit.
0 1 0 0 (Multiplicand)
0 0 1 1 (Multiplier)
-----------------------------

Step-2: Encode the multiplier.


If LSB is 0, then replace it with 0.
If LSB is 1, then replace it with -1.
For other bits, follow the table
Bit i
Bit i-1 Bit i
replaced by
0
0
0
0
1
+1
1
0
-1
1
1
0
So the multiplication is carried out as follows
0 1 0 0 (Multiplicand)
0 +1 0 -1 (Multiplier)
----------------------------?1
?2
?3
?4
Step:3 Next double the length of the partial product. If MSB is 0, then append with 0 and if
MSB is 1, append with 1.
?1=Then 0 1 0 0 -1= -(0 1 0 0) 1=1 1 0 0 1= 1100
?2= 0 1 0 0 0= 0 0 0 0
?3=0 1 0 0 1 = 0 1 0 0
?4=0 1 0 0 0= 0 0 0 0
Then the multiplication as follows
0 1 0 0
0 +1 0 -1
----------------------------1 1 1 1 1 1 0 0
0 0 0 0 0 0 0
0 0 0 1 0 0
0 0 0 0 0
C=1 0 0 0 0 1 1 0 0
The carry out bit=1 is neglected and the result is 0 0 0 0 1 1 0 0 and its positive as MSB is 0.
So the result is 0 0 0 0 1 1 0 0 which is +12.
Merits:
1) The Booths algorithm handles both positive and negative integers uniformly.
2) It achieves some efficiency in the no. of additions required when the multiplier has few no
+1 and -1.
Demerits
1)If the multiplier has a large no 1s(+1 and -1), that means the form of +1 -1 +1 -1 +1 -1 .
+1 -1 , then a lot of time taken for these multiplications.

Lecture #21
3.8 Bit pairing method
The no of multiplications can be reduced using paring the bits of multiplier.
Let us try
+13
- 6
----+13 = 0 1 101
--6= 1s complement of0 0110=11001 +1=11010
0 1 1 0 1
1 0 1 0
-------------------------------------Step-1: Convert the digits to binary of equal length with sign bit.
0 1 1 0 1(Multiplicand)
1 1 0 1 0 (Multiplier)
-----------------------------------------------------------------Step-2: Encode the multiplier.
The multiplier is 1 1 0 1 0.
Pair two bits from the LSB and also take a implied 0 bit. Then according to the table
below, replace the multiplier bit.
Bit i+1
Bit i
Bit i -1
Replacement
bit
0
0
0
0
0
0
1
+1
0
1
0
+1
0
1
1
+2
1
0
0
--2
1
0
1
-1
1
1
0
-1
1
1
1
0
0 1 1 0 1 (Multiplicand)
0
-1
-2 (Multiplier)
-----------------------------------------------------------------?1
?2
?3
For ? 1,
0 1 1 0 1
-2
------------------------------------------------------------------2= 2s complement +2= 1s complement of 0 0 0 10+1
= 1 1 1 0 1 +1= 1 1 1 1 0

0 1 1 0 1
1 1 1 1 0
-----------------------------------------------------------------Encode the multiplier using Booths table

0
1
0
0
0
1

0
1
0
0
0
1

0
1
0
0
0
1

0
1
0
0
0
1

0 1 1 0 1
0 0 0 -1 0
-----------------------------------------------------------------0 0 0 0 0 0
1 0 0 1 1
0 0 0 0
0 0 0
0 0
1 0 0 1 1 0

For ? 2,
0 1 1 0 1 -1= -(0 1 1 0 1) 1=1 0 0 1 1 1= 1 0 0 1 1
For ? 2,
0 1 1 0 1 0 =0 0 0 0 0
Now substitute the bit pattern for ?1, ?2 and ? 3
0 1 1 0 1 (Multiplicand)
0
-1
-2 (Multiplier)
-----------------------------------------------------------------?1
?2
?3
Hence,
0 1 1 0 1 (Multiplicand)
0
-1
-2 (Multiplier)
-----------------------------------------------------------------1 1 1 1 1 0 0 1 1 0
1 1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0
The carry out bit 1 is neglected and the result is 1 1 1 0 1 1 0 0 1 0. and which is negative.
Its 2s complement is= 0001001101+1=0001001110=+ 78 .Hence the result is -78.

Lecture #22
3.9 Integer Division
3.9.1 Restoring Division method
Refer the diagram 6.21 in page 391 of Hamacher book
An n bit positive divisior is loaded into register M and an n bit positive dividend is loaded
into register Q at the start of the operation. Register A is set to 0. After the division is
complete, the n bit quotient is in register Q and the remainder is in register A. The required
subtractions are facilitated by using 2s complement arithimatic.
The following algorithmn performs restoring division
Do the folowwing n times
1. Shift A and Q left one binary position.
2. Subtract M from A and place the answer back in A.
3. If the sign of A is 1, set q0 to 0 and add M back to A, otherwise set q0 to 1.
Refer fig 6.22 page 392

3.9.2 Non restoring division method


The following algorithmn performs non restoring division
Step: 1 Do the folowwing n times
1. If the sign of A is 0, shift A and Q left one bit position and subtract M from A ;
otherwise shift A and Q left and add M to A.
2. Now, if the sign of A is 1, set q0 to 1; otherwise set q0 to 0.
Step 2: if the sign of A is 1, add M to A.
Refer fig 6.23 page 393

Lecture #23
3.10 IEEE Standard for Floating point Number
A floating point number is represented by
(1) a sign bit(0 for positive and 1 for negative)
(2) a string of significant digits called mantissa
(3) scale factor or exponent which indicates the position of decimal point with respect to
mantissa.
IEEE standard describes the representation for single or double precision numbers.
3.10.1 single precision numbers
the format is given below
Sign Exponent
Mantissa
Total size is 32 bits out of which
1 bit for sign,(0 for positive , 1 for negative)
8 bits for exponent in Excess-127 FORMAT
23 bits for mantissa.
For example, represent -118.625 in IEEE Standard format.
(1) Determine sign bit. S=1.
(2) Convert the number into fixed point binary without using sign.
118=1110110
.625=101
So 118.625= 1110110.101
(3) The mantissa must be normalized, that means the decimal point is placed to the right of
the first non zero significant bit.
118.625=1110110.101 =1.110110101 2 6
Mantissa is=110110101
Exponent is =6, it will be represented in Excess-127 i.e, 6+127=133 = 10000101
(4) mantissa has to represented in 23 bits, that means it is represented as
11011010100000000000000. Remaining bits are padded with 0.
(4) So the format is as follows
1
10000101 11011010100000000000000
Example -2, to represent the number 0.0010110 2 9.
0.0010110 2 9 =0.0010110 2 9 2 3 =1.0110 2 12
Sign bit=0
Exponent=12+127=147=10010011
Mantisa=0110=01100000000000000000
So the format is as follows
0
10010011 01100000000000000000
3.10.2 Double precision numbers
the format is given below
Sign Exponent
Mantissa
Total size is 64 bits out of which
1 bit for sign,(0 for positive , 1 for negative)
11 bits for exponent in Excess-1023 format
52 bits for mantissa

Lecture #24
MODULE-3

CPU ORGANIZATION
4.1. CPU
a) it executes the instructions.
b) it communicates and controls all operations of other sub systems of computer
Because of its central role, it is called central processing unit.
4.2. Fundamental concepts
a) According to von Neumann / havard concept, the program has to be loaded into
consecutive memory locations.
b) For simplicity, each instruction occupies a single word.
4. 3. Steps taken by CPU
To execute an instruction, CPU performs the following steps
1) Fetch the content of memory pointed by PC and then it is loaded into IR.
[IR][[PC]]
2) Increment PC
[PC][PC]+1
3) Carry out the actions specified in IR.
Processor takes the following 3 steps to execute any instruction.
Step 1 and 2 is referred as fetch phase and the time taken by this phase is called fetch cycle.
Step 3 is referred as execution phase and the time taken by this phase is called execution
cycle.
Instruction cycle= fetch cycle + execution cycle
4.4 Internal organization of CPU(using single bus)
Block diagram (next page)of CPU organization

Description
ALU and CPU registers are connected through a single bus.
This bus is internal to CPU.
The external memory is communicated through MAR and MDR.
The number and functionalities of general purpose register R0 to Rn-1 varies from
processor to processor.
Some of the general purpose registers are used as dedicated registers like accumulator,
index register, base register, stack pointer etc.
Y, Z, TEMP registers are not directly referenced by the instruction, they are called implied
registers.
IR and instruction decoder are integral part of CPU circuitry.

PC

IDR &CU

IR
MAR

R0
MDR

MUX

R n-1
ALU

TEMP

Lecture #25
4.5 . Simple / primitive functions performed by the processor
a) Fetch the word from memory and loaded into CPU register.(memory read)
b) Store a word of data from CPU register to memory location(memory write)
c)Transfer a word of data from one register to another.
d)perform the arithmetic / logical operation and the result into a CPU register.
4.5.1 Fetching a word from memory
Where from the CPU gets the address for memory read?
1. It can be the content of PC which is available during fetch phase.
OR
2. It can be content of IR(if the instruction is based on direct or indirect) which is
available in execution phase.
3. it can be content of a register which is known at the execution phase.
Then what are the steps needed to perform the memory read?
a) CPU transfers the address into memory which is connected to memory.
b)CPU sends a read signal to memory to indicate that a read operation is needed by the
processor.
c)After issuing the read request, the CPU normally waits . It receives an answer from
memory through a signal MFC(memory function completed).
d)when MFC is set to 1, the fetched instruction / data is loaded into MDR and it is within
processor.
Example:
Let the content of R1 is an address and the content of address is transferred to R2.
Steps needed
1. [R1][MAR]
2.Read
3. Wait for MFC
4.[MDR][R2]
Inferences:
1. The duration of step3 is dependent upon the memory technology used. If menmory is fast,
the time needed to perform the operation is less and if memory is slow, the time is more.
2.The overall performance of executing the instruction is reduced by reducing less no of
memory reference.
If an instruction has more memory reference it is slower
If an instruction has less memory reference it is faster.
4.5.2 Storing a word into memory
Steps needed
1. The address is loaded into MAR.
2.The data is loaded into MDR and Write signal is forwarded from processor to
memory.
3. Wait for MFC

Example,
Let [R1]= address
[R2]=Data
1. [R1][MAR]
2.[R2][MDR], Write
3. wait for MFC

Lecture #26
4.5.3 Register transfer
To enable data transfer between registers input and output gating signals are required.
To input data to Ri , Ri in signal is used.
To output data from Ri , Ri out signal is used.
Ri in =1, data is loaded to Ri from bus.
Ri out =1, data is loaded from Ri to bus.
For example [R1][R2]
1. R1 out =1
2. R2in =1
How this gating signal is used for data transfer?
Using the AND-OR circuit
Block diagram(Morris Mano)
Explanation
Using MUX
Block diagram figure 7.2 page 416
Explanation
4.5.4 Arithmetic / logic operation
To perform the arithmetic / logic operation , the processor takes the following steps
1. Two operands are made available to two inputs of ALU.
2.ALU gets the operands from Y and another from the bus.
3.The operation is carried out and the result is stored in Z temporarily and then it is
forwarded to the destination register.
For example,
[R1]+[R2][R3]
1. R1 out, Z in
2. R2 out,Add, Z in
3. Zout , R3 in
For above control signals, the block diagram is given.
To differentiate between the Y and constant 1 , a MUX is used whose block diagram is given.
1. R1 out, Z in
2. R2 out,Select Y,Add, Z in
3. Zout , R3 in

Lecture #27
4.6. Execution of a complete instruction
Let the instruction is Add R1,R3 ; [R1]+[R3][R3]
The control signals required are
1. PC out, MAR in, Read, Select 1,Add,, Z in
2. Z out, PC in ,WMFC
3. MDRout , IR in
4. R1 out, Y in
5. R3 out,Select Y,Add, Z in
6. Zout , R3 in ,End
Similarly Add (R3),R1
Control signals required to execute following instructions
1. LDA 1000
2.STA 1000
3.CMP R1
4.MUL R1,R2

Lecture #28
4.7 unconditional and conditional branch instructions
Control signals required for unconditional and conditional branch instructions
Let the instruction is JUMP +2
The control signals required are
1. PC out, MAR in, Read, Select 1,Add,, Z in
2. Z out, PC in ,WMFC
3. MDRout , IR in
4. PCout , Y in
5. Offset_field_of_IR out, Select Y,Add,Z in
6. Z out,PC in,End
The no of steps can be minimized if the control signals are slightly altered
1. PC out, MAR in, Read, Select 1,Add,, Z in
2. Z out, PC in , Y in ,WMFC
3. MDRout , IR in
4. Offset_field_of_IR out, Select Y,Add,Z in
5. Z out,PC in,End
Before writing the control signals required for a conditional instruction, you must know the
use of conditional instruction.
Lets consider the problem to add N no integers which are stored in consecutive memory
locations. Where NUM1,NUM2,NUM3..NUMn and SUM are memory addresses.
1st approach(using sequential instructions)
Memory address Instruction
100
MOV NUM1,RO
101
ADD NUM2,R0
102
ADD NUM3,R0
..
.
..

..
.
ADD NUMn,R0
MOV R0,SUM
SUM
Result
NUM1
Op1
NUM2
Op2
NUM3
Op3

NUMn

Opn

2nd approach(using conditional instruction)


Memory address Instruction
100
MOV N,R1
101
MOV NUM1,R2
102
CLEAR R0
103
ADD (R2),R0
104
INCR R2
105
DECR R1
106
JUMP -4
107
MOV R0,SUM
SUM
Result
NUM1
Op1
NUM2
Op2
NUM3
Op3

NUMn

Opn

How the offset of JUMP instruction has found?


if R1 is not equal to zero, then the control has to transfer to 103. Then how 103 has found??
When the JUMP instruction is executed, the content of PC is 107. Then if -4 is added with
the content of PC, then the result is 103. So the offset used is -4.
The purpose of JUMP instruction is it jumps to 103 if R1 is not equal to zero(means some
integers are still not added) or it follows the next instruction if R1 is equal to zero(all integers
are added).
Does JUMP instruction checks the content of R1 during execution.

Lecture #29
4.8 Design of Hardwired Control unit:
To execute an instruction, CPU must have some means of generating the control signals in
proper sequence. Computer designers uses two types techniques to build the control unit.
They are
(1)Hardwired control unit
(2)Microprogrammed control unit.
Hardwired control unit:
Here, The control signals are generated with the help of digital elements like flip flops,
logic gates, encoder, decoder, counter, multiplexor, demultiplexor etc.
Let us consider the instruction set of the processor consists of two instructions. They are
(1) Add (R3), R1
(2)JUMP +2
The control steps for above two equations are as follows
Add (R3), R1
JUMP +2
1. PC out, MAR in, Read, Select 1,Add,, Z in
1. PC out, MAR in, Read, Select 1,Add,,
2. Z out, PC in , y in ,WMFC
Z in
3. MDRout , IR in
2. Z out, PC in ,Yin,WMFC
4. R3 out, MAR in, Read
3. MDRout , IR in
5. R1 out, y in, WMFC
4.Offset_Field_of_IR out,Select Y,Add, Z
6. MDRout,Select Y,Add, Z in
in
7. Zout , R1 in ,End
5. Zout , PC in ,End
Add instruction has finished in 7 steps and Jump instruction has finished in 5 steps. To
count the steps, a control steps counter is used and each count of the counter corresponds
one control step.
For simplicity, we have taken that each step has to be finished in equal time slot i.e one
clock period.
To provide a separate signal for each step, a step decoder is used.
The step decoder produces the time slots T0,T1,T2,T3,T4,T5,T6,T7 as we are using a 3 bit
control step counter .As Add instruction needs T0 to T6 and Jump instruction needs T0 to T4 .
So T7 is not required at all and it can be grounded.
From the beginning, the step decoder produces the output and after 3 steps the content of
IR can be Add or Jump.
Hence for each control signal, a Boolean expression is formulated which is the function of
following inputs
content of step decoder(T0,T1,T2,T3,T4,T5,T6,T7)
content of IR(either Add or Jump)
content of flags
external inputs like MFC
diagram 7.11(page no 427)
Then how the Boolean expression is formulated?
For End signal, it is high in T6 for Add OR it is high in T4 for Jump. So the expression is
End= T6 .Add + T4 .Jump

The digital circuit needed to generate the End signal is as follows

Figure 7.13 page 428


For Zout signal, it is high in T1 irrespective for any instruction OR it is high in T 6 for Add OR
it is high in T4 for Jump. So the expression is Zout = T1+T6 .Add + T4 .Jump
The digital circuit needed to generate the Zout signal is as follows
Diagram
So, all the control signals are generated with the help of digital circuits.
Merits:
For each control signal, a physical circuit is implemented. It has produced the fastest mode of
operation.
Demerits:
1) To generate a large no of control signals, the digital circuit needed is very complex.
2) Modification is not possible.

Lecture #30
4.9 Multiple bus CPU Organization
The no. of control signals can be reduced by using multiple bus organization and the block
diagram is given below
Block diagram Figure 7.8 page 424
Description
Register files: It has three ports. Through two ports, two registers are placed their content
into buses A and B and another port is used to place the content of bus C into another
register. All are carried out in a single clock cycle.
Bus A and Bus B are used to transfer the operands to two inputs of ALU and the result is
transferred to bus C. All are carried out a single clock cycle.
Sometime ALU passes one of the input to bus C by using the control signals R=A(from
bus A to bus C) or R=B(from bus B to bus C).
An incrementor unit is used to increment the content of PC only.
Sometime there is a need of incrementing the content of register, in that case a
multiplexor is used to select the constant 1.
Let us consider an instruction ADD R1,R2,R3; [R1]+[R2][R3]
Control steps using single bus organization Control steps using multiple bus
organization
1. PC out, MAR in, Read, Select 1,Add,, Z in
1. PC out, MAR in, Read, IncermentPCn
2. Z out, PC in ,WMFC
2.,WMFC
3. MDRout , IR in
3. MDRout ,R=B, IR in
4. R1 out,Y in
4.R1 outA, R2 outB, Select A, Add,
5. R2 out,Select Y, Add, Z in
R3 in ,End
6. . Z out, R3 in, End
We have observed using multiple bus organization less no. of control signals are
required by comparing with single bus organization.
4.10 Micro programmed Control unit
Let us consider the instruction set of the processor consists of three instructions. They are
(1) Add (R3), R1
(2)JUMP +2
(3)JNZ -4
So the instruction set of the processor consists of 3 instructions.
The control steps for above three instructions are as follows
Add (R3), R1
JUMP +2
JNZ -4
1. PC out, MAR in,
1. PC out, MAR in, Read,
1. PC out, MAR in, Read, Select
Read, Select 1,Add,, Select 1,Add,, Z in
1,Add,, Z in
Z in
2. Z out, PC in ,Yin,WMFC
2. Z out, PC in ,WMFC
2. Z out, PC in , y in
3. MDRout , IR in
3. MDRout , IR in
,WMFC
4.Offset_Field_of_IR
4.Offset_Field_of_IR out,Select
3. MDRout , IR in
,Select
Y,Add,
Z
Y,Add, Z in , if Z=0 then End.
out
in
4. R3 out, MAR in,
5. Zout , PC in ,End
5. Zout , PC in ,End
Read
5. R1 out, y in, WMFC
6. MDRout,Select

Y,Add, Z in
7. Zout , R1 in ,End

So Add instruction requires 7 control words and Jump and JNZ instruction requires 5 control
words. Then three microroutines are kept in the control store which is as follows
Addre Microinstruction
ss
00
PC out, MAR in, Read, Select 1,Add,, Z in
01
Z out, PC in , y in ,WMFC
02
MDRout , IR in
03
Branch to starting address of
appropriate microroutine(micro branch
instruction)
04
R3 out, MAR in, Read
05
R1 out, y in, WMFC
06
MDRout,Select Y,Add, Z in
07
Zout , R1 in ,End
08
If z=0, then branch to microinstruction
at 00(micro branch instruction)
09
Offset_Field_of_IR out,Select Y,Add, Z
in

10
Zout , PC in ,End
The content of control is the collection of microinstructions, that means in address 00, the
control word or microinstruction is kept. That means the microinstruction [PC out, MAR in,
Read, Select 1,Add,, Z in] is kept. So the binary pattern [0111000111000000] is kept.
Similarly in address 01, the microinstruction [Z out, PC in , y in ,WMFC ] is kept so the binary
pattern [0100001000100010] is kept. So the content of control store is as follows.
Microinstruction
0111000111000000
0100001000100010
..
..
.

4.11 WILKESS MICROPROGRAMMED CONTROL UNIT


In 1951, M.V.Wilkes proposed a microprogrammed control unit and the microinstruction has
two parts. They are
1.A set of control signals (like PC out, MAR in, Read, Select 1,Add,, Z in ) that indicate the
control lines to be activated.
2An address field that indicates the address of next microinstruction to be fetched.
So the format of microinstruction is

Microinstruction

Address of next
microinstruiction

So the 1st microinstruction is as follows


PC out, MAR in, Read, Select 1,Add,, Z

01

in

Or
0111000111000000

0001

Here, the control memory is organized in ROM which contains two matrix like structure, left
matrix is used for control words and right matrix is used for address of next
microinstruction. The organization is as follows

ext
PC in PC out MAR inReadMDRout IR in y in Select Add Z in Zout R1 out R1 in R3 out WMFC End
CMAR

AD
D.
DEC
ODE
R

Microbranch
Microbranch
S

control signals
Each row represents the microinstruction and a register, called control memory address
register(CMAR) which keeps the address of current microinstruction. From the beginning,
CMAR is loaded by the external source. So the first address of control memory (i.e 0000) is
loaded. Next the CMAR is loaded by the microinstruction. 4th microinstruction loads the
CMAR with an address 04(0100) or 08(1000) or 09(1001) which depends on the content of
IR. So a switch is used to decide the address among 3 addresses. Similarly another switch is
used decide the addresses 08(1000) or 09(1001).

Lecture #31
Memory Organization
5.1 What is the need of memory?
According to stored program or Von Neumann concept, program and data have to be kept in
memory for the execution of the program. So memory is needed to execute the program.
5.2 What should the characteristics of a memory?
The execution speed of the program is dependent upon the speed with which
data and program can be transferred between CPU and memory.
The size of memory should be large enough to hold the program.
Cost of memory should be less.
Henceforth, the memory should be fast, large and cheap. But it is impossible to meet all
these three requirements. So much more attention is given towards memory organization.
5.3 Fundamental concepts
The no of addressable locations are dependent upon the address lines of the processor. That
means
10 address lines= 1k address space
Modern computers are byte addressable. Every byte has an address.
System view
address lines
= unidirectional
data lines = bidirectional
control lines = bidirectional
Processor

Address Lines

Memory

Data Lines
Control lines

MAR and MDR are responsible to transfer the data between the processor and memory.
5.4 Parameters used to measure the performance of memory
Memory Access time: The time elapsed between the read/ write and MFC signal
ts termed as memory access time.
Memory cycle time: The time elapsed between the consecutive read or write
operations is termed as memory cycle time.
Tranfer rate: The no bits transferred per cycle time is termed as transfer rate.
5.5 Types of memory components:
1. Registers
2. Cache
3. Main memory
4. secondary memory

reg
cache
primary memory
secondary memory

Size decreases from 1 to 4.


cost increases from 1to 4.
speed decrease fro 1to 4.

reg< Cache< Main memory< Secondary memory


reg.> Cache>Main memory> Secondary memory
reg.> Cache>Main memory> Secondary memory

5.6 Types of memory used in main memory


5.6.1Random Acces Memory(RAM)
In RAM, any location can be referenced for a read or write operation in a fixed amount of
time which is independent of location.
5.6.1.1 Internal organization
Memory cells are organized in the form of an array. Each cell is a flip flop which is capable
of storing 1 bit of information either 0 or 1.
Let us consider a 168 RAM . It has 16 locations and each location carries 8 bit of
information. Though there are 24 =16 locations, so 4:16 decoder is used to identify one of the
16 addresses for read or write operation.
Figure 5.2 page-296
All cells of the row are connected by the word line which is driven by the 4:16 decoder. So
there are word lines from w0 to w15.
All cells are of a column are connected to a sense/ write circuit by two bit lines of the cell. So
there are eight sense / write circuits.
Read operation
1. Address is placed by the processor.
2. Particular word line is selected by the decoder,
3. Depending upon the state of the cell, the sense / write circuit transfers the state to the
output lines.
Write operation
1. Address is placed by the processor and data (state 0 or 1)is placed on the lines
which are connected to sense / write circuit.
2.
Particular word line is selected and all the cells of that row are updated by the
sense / write circuit.

Lecture #32
5.6.1.2 RAM Design
Design the RAM of following specifications
(a) 168 bit
(b)1288 bit
(c)1288 bit organization using 168 bit RAM
5.6.1.3 Types of RAM
RAM is of two types depending upon the technologies used and they are
(1) Static RAM(SRAM)
(2) Dynamic Ram(DRAM)
5.6.1.3.1 SRAM:
In SRAM, the state 0 or 1 is stored due to the conduction of transistors.
Figure 5.4 of page 298
Internal Organization
(1)Two invertors are cross connected to form a latch.
(2)The latch is connected to two bit lines b0 and b0 through the transistor T1 and T2.
(3) The transistors are used as switches which are opened or closed under the control of word
line.
(4)When wordline= 1(activated), the transistors are on, the cells are read or written into.
(5)When wordline= 0(not activated), the transistors are off, the cells or latch retains its state.
Read operation
Let the cell is in 1 state(set), so X=1 and Y=0.
(1)Address is placed by the processor and word line is selected.
(2)The transistor T1 and T2 starts conducting.
(3) Though X=1 and Y=0, high voltage is available in b0 and low voltage is available in b0 .
(4)The sense / write circuit monitors the bit lines and transfers the state 1 to the output line.
Write operation
Let the state 0 is written into the cell.
(1)Address is placed by the processor and word line is selected. The data(start 0) is put into
the lines which is connected to sense / write circuit.
(2)The transistor T1 and T2 starts conducting.
(3) The sense / write circuit places a low voltage(0) in b0 and high voltage(1) in b0 that
makes the latch reset. So state 0 is written into the cell.

Lecture #33
5.6.1.3.2 DRAM:
In DRAM, the state 1 or 0 is stored due to the capacitance of the capacitor. The presence or
absence of charge is interpreted as state 1 or 0. Capacitor has a natural tendency to discharge.
Hence each cell must be periodically refreshed by restoring the charge of the capacitor to its
original value.
Figure 5.6 of page-299
Here a memory cell consists of a capacitor C and a transistor T. The transistor is used as
switch which is turned on or off when the word line is activated or not.
Write operation
(1) To write into a particular cell, address is placed by the processor and the word line is
activated.
(2) The transistor starts conducting.
(3) An appropriate voltage(high for state 1 or low for state 0) is applied to bit line.
(4) That voltage causes the capacitor to charge depending upon the voltage applied.
(5) If high voltage is applied, then more charge is stored. We can say that 1 is written into
the cell. Similarly if low voltage is applied, then less charge is stored. We can say that 0
is written into the cell.
Read operation
1) To write into a particular cell, address is placed by the processor and the word line is
activated.
2) The transistor starts conducting.
3) The voltage drop is available in the bit line and that voltage may be decreased due to
the leakage factor. The sense / write circuit checks the threshold value.
4) If it is above the threshold value, the sense / write circuit drives the bit line to full
high voltage to recharge the capacitor which represents state 1.
5) If it is below the threshold value, the sense / write circuit drives the bit line to full low
voltage to recharge the capacitor which represents state 0.
6) So a refreshing circuit writes back the value that has read.
5.6.2Read Only Memory
In SRAM / DRAM, When the power is off, the stored program or data are lost. So they
are called volatile memory.
But ROM is a non volatile memory. It keeps the information even if the power is off. This
memory is meant for reading only and the information is written permanently during the
manufacturing time.
Organization:
Figure 5.12 of page- 310
If there is a connection between transistor and ground, the voltage over the bit line drops to
zero which is treated as 0 state.
If there is no connection , the bit line remains in high voltage which is treated as 1 state.

Lecture #34
5.7Memory Hierarchy
Each type of memory has its own merits and demerits. SRAM is used for the smaller size as
the single cell implementation is costly but it is faster. DRAM is used for the larger size as
the single cell implementation is cheap but it is slower. So different types of memory are
arranged in such a way that the instruction or data is available to the processor as soon as
possible. The arrangement is called memory hierarchy which is given below
Figure 5.12 of page-314
5.8 Cache Memory and it's need
CPU is always faste the imbalance, that means CPU issues a memory request and waits for
several CPU cycles for reading or writing operation. To improve CPU performance, a small
fast memory called cache which is placed between processor and main memory to get almost
the speed of a fast memory and capacity of a large memory at moderate price. The
effectiveness of cache memory is based on the property of the program which is called
locality of reference.
5.8.1 Locality of reference:
Locality of a reference is a property of a program and that property says many instructions
in a localized areas of a program are executed repeatedly during some period of time.
Locality of reference is of two types and they are temporal Locality of reference and
temporal locality of reference.
Spatial Locality of reference: According to spatial locality of reference, once a location is
referenced, its nearby locations is referenced very soon. This behavior can be achieved by
the programming constructs like sequential instructions, linear data structure like array, etc.
Cache operation:

CPU does not know the existence of cache. It simply sends the read/ write request with
main memory address. The cache control circuitry checks that whether that address is present
in the cache or not.If it is present in the cache, then the read or write operation is carried out
from the cache which is called hit.
If it is not present in the cache then it is called miss and the main memory must be
referenced.

Lecture #35
5.9 CACHE MAPPING TECHNIQUES
When there is a miss, a block is moved from main memory to cache. There must be a way to
place the main memory block into cache. This can be achieved by different mapping
technique which are implemented by cache. The techniques are
(1) Direct mapping
(2) Associative mapping
(3) Set Associative mapping
5.9.1 Direct Mapping
Here, every main memory block has a fixed location in cache.
Let us consider the main memory has 32 locations . So the size of main memory is 32 word.
5 address lines are required to identify any of 32 addresses. That means the processor places
a 5 bit address for any memory reference.
Let each main memory block has 2 words. So there are 16 main memory blocks i.e; block 0,
block 1,., block 15.
Cache is smaller than main memory and let the cache size is 8 word. So there are 4 cache
blocks i.e; block 0,block 1, block 2 and block3.
Diagram
A mapping function is used to map main memory block into cache block and the mapping
function is the main memory block i is mapped to (i% total cache blocks) block of cache.
So the mapping function is i%4.
Main Memory Block No(i)
Cache block no=i%4
0/4/8/12
0 (main memory block 0 or 4 or 8 or 12 is mapped to cache block 0)
1/5/9/13
1 (main memory block 1 or 5 or 9 or 13 is mapped to cache block 0)
2/6/10/14
2(main memory block 2 or 6 or 10 or 14 is mapped to cache block 0)
3/7/11/15
3(main memory block 3 or 7 or11 or 15 is mapped to cache block 0)
So each main memory block has a fixed cache block.
Processor places 5 bit address(00000 to 11111) for memory reference. From that address, the
cache control circuitry determines whether there is a hit or miss. It must decode or interpret
the address by using the address layout. Then which address layout is followed by the cache
control circuitry?
To find the address layout, the addresses of different main memory blocks are observed. The
addresses of all blocks are
Block 0
00000
00001
Block1
00010
00011
Block2
00100
00101
Block3
00110
00111
Block4
01 0 0 0
01001
Block5
01010
01011

We know block 0 / 4 is mapped block 0 of cache and block 1 / 5 is mapped block 1 of cache.
From the addresses, we found the LSB bit determines the 0th or 1st word of that block. Next 2
LSB bits determines the cache block no. The MSB 2 bits determine the tag bits. So the
address lay out is as follows
Tag(2)

Cache(2)

Word(1)

Tag bits vary from block to block which are mapped to same cache block and the tag bits are
kept in the cache to determine the hit or miss.
Diagram
When there is a memory reference by the processor, the tags from the address and that of
stored in the cache are compared. If there is a match, then it is a hit. The desired word is
fetched from the cache. If there is a mismatch, then it is miss, the word is fetched from main
memory.
Advantage:
It is simple and inexpensive.
Disadvantage
If a program happens to repeatedly reference words from two different blocks that have
mapped to same cache block, then that block will be swapped into the cache from main
memory and vice versa repeatedly which decreases the cache performance.
5.9.2 Associative Mapping
It overcomes the disadvantage of direct mapping by permitting each main memory block to
be loaded into any cache location. Here the cache control circuitry uses the following address
layout
Tag(4)
word(1)
Advantage: It is flexible with high performance.
Disadvantage: Complex circuit is needed to check the tags parallely.

Lecture #36
5.9.3 Set associative Mapping
In set associative mapping technique, every main memory has a fixed set location and within
the set that block is placed anywhere.
Let us consider the main memory has 32 locations . So the size of main memory is 32 word.
5 address lines are required to identify any of 32 addresses. That means the processor places
a 5 bit address for any memory reference.
Let each main memory block has 2 words. So there are 16 main memory blocks i.e; block 0,
block 1,., block 15.
Cache is smaller than main memory and let the cache size is 8 word. So there are 4 cache
blocks i.e; block 0,block 1, block 2 and block3.
Diagram
Let the cache blocks are grouped into 2 sets i.e; set 0 and set 1. Each set contains 2 blocks.
A mapping function is used to map main memory block into cache block and the mapping
function is the main memory block i is mapped to (i% total sets) set of cache and within the
set any block is occupied.
So the mapping function is i%2.
Main Memory Block No(i)
set no=i%2
0/2/4/6/8/10/12/14 0 (main memory block 0 or2or4or6or8or10or12or14 is mapped to set 0)
1/3/5/7/9/11/13 /151 (main memory block 1 or3or5or7or9or11or13or15 is mapped to set 10)
So each main memory block has a fixed set.
Processor places 5 bit address(00000 to 11111) for memory reference. From that address, the
cache control circuitry determines whether there is a hit or miss. It must decode or interpret
the address by using the address layout. Then which address layout is followed by the cache
control circuitry?
To find the address layout, the addresses of different main memory blocks are observed. The
addresses of all blocks are
Block 0
00000
00001
Block1
00010
00011
Block2
00100
00101
Block3
00110
00111
Block4
01 0 0 0
01001
Block5
01010
01011
We know block 0 / 2 is mapped to set 0 of cache and block 1 /3 is mapped to set 1 of cache.
From the addresses, we found the LSB bit determines the 0th or 1st word of that block. Next 1
LSB bits determines the set no. The MSB 3 bits determine the tag bits. So the address lay out
is as follows
Tag(3)
Set(1) Word(1)

Tag bits vary from block to block which are mapped to same set and the tag bits are kept in
the cache to determine the hit or miss.
When there is a memory reference by the processor, the tags from the address and that of
stored in the cache are compared. If there is a match, then it is a hit. The desired word is
fetched from the cache. If there is a mismatch, then it is miss, the word is fetched from main
memory.

Class-37
5.10 Cache Updation Schemes
For a memory reference by a processor, if that reference is available in the cache, then it is hit
otherwise it is miss. Hit is of 2 types and they are read hit and write hit. Similarly miss is of
2 types and they are read miss and write miss.
5.10.1 Read hit
The processor wants to read a location and that location is available in cache, it is read hit.
The main memory is not involved and the cache is read from.
Write hit
5.10.2 Write hit
The processor wants to write a location and that location is available in cache, it is write hit.
For write hit, two types of protocols are used.
(1)Write through
The cache and main memory locations are updated simultaneously. This protocol is simpler
but it requests unnecessary write operations in the main memory when a given word is
updated repeatedly.
(2)Write back
Only the cache location is updated. When the block is removed from the cache, the main
memory location is updated. This protocol also requests unnecessary write operations when a
cache block is written into main memory .All the words of that block is written even if a
single word has been changed.
5.10.3 Read Miss
The processor wants to read a location and that location is not available in cache, it is read
miss. For read miss, two types of techniques are used.
(1)When a read miss occurs, the block containing the requested word is copied from main
memory to cache. After that the requested word is forwarded to the processor.
(2) When a read miss occurs, the block containing the requested word is forwarded to the
processor from main memory .After that the main memory block is copied into cache for
future reference.
5.10.4 Write Miss
The processor wants to write a location and that location is not available in cache, it is write
miss. For write miss, two types of techniques are used.
(1)When write miss occurs, if write through protocol is used, then the information is written
directly into main memory.
(2)In case write back protocol, the block is loaded from main memory to cache and then the
desired word is written.
5.11 Cache performance
Hit Rate: The total no hits divided by the total memory references of a program is called hit
rate.
Miss Rate: The total no misses divided by the total memory references of a program is called
miss rate.
Miss penalty: The time required to access memory due to miss is called miss penalty.
Let a program consists of 10 instructions out of which 4 instructions experience hit and 6
instructions experience miss.

Let t cache = Cache access time


t M = Memory access time
Total time= 4 tcache + 6 tM
Average access time = (4 tcache + 6 tM ) / 10
= (4/10)tcache + (6/10) tM
=h tcache +(1-h) tM

Class-38
5.12 Cache block replacement algorithm
When a miss occurs, a memory block has to be placed in the cache memory. For that, the
cache control circuitry decides which cache block should be removed to create a new space
for the incoming main memory block. The rule to remove cache block is called cache
replacement algorithm.
(a)In direct mapped cache, the position of each main memory block is fixed. So when the
main memory block is going to be mapped, if the cache block is already occupied, then that
occupied block has to be removed. So no replacement algorithm is used for the removal
decision.
(b) In associative mapped cache, if the cache is full, then replacement algorithm is used for
removing the cache block.
(c)In set associative mapped cache, if the set is full, then replacement algorithm is used for
removing the cache block.
One most used replacement algorithm is LRU(Least Recently Used). In this approach, the
block which is not referenced for a longer period of time, is removed from the cache.
Discuss the page replacement algorithm.
Ans:
The page replacement algorithm decides which memory page is removed when memory
block needs to be allocated. There are different types of page replacement algorithms
described as follows:
5.11.1 FIFO(First In First Out)
A FIFO replacement algorithm associates with each page the time when that page was
brought into memory. When a page must be replaced, the oldest page is chosen to swap out.
7 0 1 2 0 3 0
4
2 3 0 3 2 1
2 0
1 7 0 1
7 7 7 2
2 2
4
4 4 0
0
0
7 7 7
0 0 0
3 3
3
2 2 2
1
1
1 0 0
1 1
1 0
0
0 3 3
3
2
2 2 1
5.11.2 OPT(Optimal )
In this approach, the page is replaced which will not be used for the longest period of time.
7 0 1 2 0 3 0 4 2 3 0 3 2 1
2 0 1
7 0 1
7 7 7 2
2
2
2
2
7
0 0 0
0
4
0
0
0
1 1
3
3
3
1
1
5.11.3 LRU(Least Recently used)
In this approach, the page which has not used for a longest period of time, is replaced.
7
7

0
7
0

1
7
0
1

2
2
0
1

3
2
0
3

4
4
0
3

2
4
0
2

3
3
0
2

1
3
1
2

0
0
1
2

7
0
1
7

Class-39
5.13 Virtual Memory
In most modern computer system, the physical main memory is not as large enough as the
address space issued by the processor. The reasons are
(1) It is not economical to keep a large physical main memory.
(2) Secondly, at any point of time processor requires a small portion of the program
for execution.
Hence, the whole program is kept in secondary memory and a portion of the program that is
executed first, is brought to main memory for execution. After that portion is executed, a new
potion(segment) is to be moved from secondary memory to main memory that replaces the
already existing portion(segment) from the main memory.
So virtual memory is a technique that automates the movement of program or data without
the knowledge of user. This technique facilitates the development of program which is
independent of physical memory size.
Internal organization
Diagram 5.26(page-338)
Processor issues an address for instruction or data which is a virtual or logical address.
That address is translated into physical address by operating system and memory
management unit(MMU).
The physical address is checked with cache. If it is a hit, then processor gets the information
from cache. If it is miss, then main memory is involved to forward the information.
If the information is still not available in main memory, then MMU informs OS to bring the
information from secondary memory to main memory.

Class-40
5.13.1 Address translation
An address issued by the processor is called virtual or logical address and the set of all
virtual addresses is called address space.
The address in main memory is called physical address and the set of all physical addresses is
called memory space.
The address space is divided into pages and the memory space is divided into blocks.
Let us take an example. A processor has an address space 8k, so the processor issues 13 bit
virtual address for memory reference as 2 13=8k.
The memory space is 4k.
Let the pages and blocks are of equal size of 1k. So there are 8 pages(page 0, page 1,
,page 7) and 4 blocks(block 0,block 1,block 2,block 3).
Page 0
Page 1
Page 2
Page 3
Block 0
Page 4
Block 1
Page 5
Block 2
Block 3
Page 6
Page 7
Address space=8k
Memory space=4k
Processor issues a 13 bit address(any one from 00000000000000 to 1111111111111). MMU
must interpret or decode the address using a layout. Then which layout is used by MMU to
find whether that page is main memory or not. For that the addresses associated a particular
page has to be observed.
Page 0 000
0000000000

.
000
1111111111
Page 1 001
0000000000

.
001
1111111111
Page 2 010
0000000000

010
1111111111
Page 3

Page 4
Page 5
Page 6
Page 7

111
0000000000

.
111
1111111111

As the page size is 1K(2 10=1K), LSB 10 bits represent the individual word and MSB 3 bits
represent the page no. S o the address layout used by MMU is as follows
Page No(3
Word(10 bits)
bits)
As memory space is 4k, maximum 4 pages can be kept in main memory. The information
about main memory location of each page is kept in a page table and the page table contains
the page no, block no and present bit.
101

0000000001
01

000
001
010
011
100
101
110
111

11
00
01
10
10

0000000001

0
1
1
0
0
1
1
0
1

Block 0
Block 1
Block 2
Block 3

Let the processor issues a 13 bit virtual address1010000000001( 1st word of 5th page) for
memory reference. The MMU checks the page table information and it has been found that
the 5th page has mapped in block 1. Accordingly, the block no(01) is appended which is 12 bit
physical address. The main memory is referenced and the word is available in MBR(Memory
Buffer Register).
The page table is within MMU which is a part of processor chip. As the whole page table is
very large, it can not be kept in MMU. So a small portion of page table is kept in a small
cache which is part of MMU and that small cache called TLB(Translation Lookaside
Buffer).TLB holds page table entries for most recently used pages. A large part of page table

is stored in main memory and the address of page table is kept in a dedicated register called
page table base register.

So the address translation is carried out as follows


The virtual address is issued by the processor.
MMU searches the TLB to convert the virtual address into physical address.
If it is hit in TLB, the physical address is checked with cache and then is checked
with main memory if it is miss in cache.
If it is miss in TLB, the page table is searched in main memory by taking the help of
page table base register.
After getting the physical address, the block containing address is not available in
main memory, MMU informs OS about the page fault and a block is loaded to main
memory from secondary memory.

You might also like