KEMBAR78
Addressing Mode Part3 | PDF | Integrated Circuit | Computer Programming
0% found this document useful (0 votes)
11 views24 pages

Addressing Mode Part3

Uploaded by

fotsostyve840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views24 pages

Addressing Mode Part3

Uploaded by

fotsostyve840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Addressing Modes and Instruction Se t

Addressing Mode
Addressing modes are the ways how architectures spec ify the address of an object they want to
access. In mac hines, an addressing mode can specify a constant, a register or a location in
memory.

The operation field of an instruction specifies the operation to be performed. This operation will
be executed on some data which is stored in computer registers or the main memory. The way
any operand is selected during the program execution is dependent on the addressing mode of the
instruction. The purpose of using addressing modes is as follows:
 To give the programming versatility to the user.
 To reduce the number of bits in addressing field of instruction.

Type s of Addressing Mode s


 Immediate Mode
In this mode, the operand is spec ifie d in the instruction itself. An immediate mode instruction has
an opera nd field rather than the address field. For example : ADD 7, whic h say Add 7 to contents
of accumulator. 7 is the operand here.

 Register Mode
In this mode the operand is stored in the register and this register is present in CPU. The
instruction has the address of the Register where the operand is stored.

Advantages of this mode:


• Shorter instructions and faster instruction fetch.
• Faster memory access to the operand(s)

Disadvantages of this mode:


• Very limited address space
• Using multiple registers helps performance but it complicates the instructions.

 Register Indirect Mode


In this mode, the instruction spec ifies the register whose contents give us the address of operand
which is in memory. Thus, the register contains the address of operand rather than the operand
itself.
 Direct Addressing Mode
In this mode, effective address of operand is present in instruction itself.
For Example: ADD R1, 4000 - In this the 4000 is effective address of operand.
NOTE: Effective Address is the location where operand is present.
• Single memory reference to access data.
• No additional calculations to find the effective address of the operand.

 Indirect Addressing Mode


In this, the address field of instruction gives the address where the effective address is stored in
memory. This slows down the execution, as this includes multiple memory lookups to find the
operand.

 Displacement Addressing Mode


In this the contents of the indexed register is added to the Address part of the instruction, to
obtain the effective address of operand.
EA = A + (R) in this, the address fie ld holds two values, A (which is the base value) and R (that
holds the displacement), or vice versa.
 Relative Addressing Mode
It is a version of Displaceme nt addressing mode. In this the contents of PC (Program Counter) is
added to address part of instruction to obtain the effective address.
EA = A + (PC), where EA is effective address and PC is program counter.
The operand is A cells away from the current cell (the one pointed to by PC)

 Base Register Addre ssing Mode


It is a ga in a version of Displacement addressing mode. This can be define d as EA = A + (R),
where A is displacement and R holds pointer to base address.

 Stack Addressing Mode


In this mode, operand is at the top of the stack. For example: ADD, this instruction will POP top
two items from the stack, add them, and will then PUSH the result to the top of the stack.

Auto Increme nt/Decrement Mode


In this the register is incremented or decremented after or before its value is used.

Table 4: Summary of the Addressing Mode


The most common names for addressing modes (names may differ among architectures)
Addressing Example Meaning When used
modes Instruction
Register Add R4,R3 R4 <- R4 + R3 When a value is in a register
Immediate Add R4, #3 R4 <- R4 + 3 For constants
Displacement Add R4, R4 <- R4 + M[100+R1] Accessing local variables
100(R1)
Register Add R4,(R1) R4 <- R4 + M[R1] Accessing using a pointer or a
deffered computed address
Indexed Add R3, (R1 + R3 <- R3 + M[R1+R2] Useful in array addressing:
R2) R1 - base of array
R2 - index amount
Direct Add R1, R1 <- R1 + M[1001] Useful in accessing static data
(1001)
Memory Add R1, R1 <- R1 + M[M[R3]] If R3 is the address of a pointer p ,
deferred @(R3) then mode yields *p
Auto- Add R1, (R2)+ R1 <- R1 +M[R2] Useful for stepping through arrays
increment R2 <- R2 + d in a loop.
R2 - start of array
d - size of an element
Auto- Add R1,-(R2) R2 <-R2-d Same as autoincrement.
decrement R1 <- R1 + M[R2] Both can also be used to imple ment
a stack as push and pop
Scaled Add R1, R1<- Used to index arrays. May be
100(R2)[R3] R1+M[100+R2+R3*d] applied to any base addressing
mode in some machines.

Instruction Codes
While a Program is a set of instructions that specify the operations, operands, and the sequence
by which processing has to occur. An instruction code is a group of bits that tells the computer to
perform a specific operation part.

 Ope ration Code


The operation code of an instruction is a group of bits that define operations such as add, subtract,
multiply, shift and compliment. The number of bits required for the operation code depends upon
the total number of operations available on the computer. The operation code must consist of at
least n bits for a given 2^n operations. The operation part of an instruction code specifies the
operation to be performed.

 Register Part
The operation must be performed on the data stored in registers. An instruction code therefore
specifies not only operations to be performed but also the registers where the opera nds (data) will
be found as well as the registers where the result has to be stored.

 Stored Program Organisation


The simplest way to organize a computer is to have Processor Register and instruction code with
two parts. The first part specifies the operation to be performed and second specifies an address.
The memory address tells where the operand in memory will be found. Instructions are stored in
one section of memory and data in another.

Computers with a single processor register are known as Accumulator (AC). The operation is
performed with the memory operand and the content of AC.

 Common Bus System


The basic computer has 8 registers, a memory unit and a control unit. Paths must be provided to
transfer data from one register to another. An efficient method for transferring data in a system is
to use a Common Bus System. The output of registers and memory are connected to the common
bus.
 Load(LD)
The lines from the common bus are connected to the inputs of each register and data inputs of
memory. The particular register whose LD input is enabled receives the data from the bus during
the next clock pulse transition.
Before studying about instruction formats lets first study about the operand address parts.
When the 2nd part of an instruction code specifies the operand, the instruction is said to
have immediate operand. And when the 2nd part of the instruction code specifies the address of
an opera nd, the instruction is said to have a direct address. And in indirect address, the 2nd part
of instruction code, specifies the address of a memory word in which the address of the operand
is found.

COMPUTER INSTRUCTIONS
The basic computer has three instruction code formats. The Operation Code (opcode) part of the
instruction contains 3 bits and remaining 13 bits depends upon the operation code encountered.

There are Three Types Of Instruction Formats:

Three-Address Instructions
Computers with three-address instruction formats can use each address field to specify either a
processor register or a memory operand. The program in assembly language that evaluates X =
(A + B) ∗ (C + D) is shown below, together with comments that explain the register transfer
operation of each instruction.

ADD R1, A, B R1 ← M [A] + M [B]


ADD R2, C, D R2 ← M [C] + M [D]
MUL X, R1, R2 M [X] ← R1 ∗ R2

It is assumed that the computer has two processor registers, R1 and R2. The symbol M [A]
denotes the operand at memory address symbolized by A.
The advantage of the three-address format is that it results in short programs when evaluating
arithmetic expressions. The disadvantage is that the binary-coded instructions require too many
bits to specify three addresses. An example of a commercial computer that uses three -address
instructions is the Cyber 170. The instruction formats in the Cyber computer are restricted to
either three register address fields or two register address fields and one memory address field.

Two-Address Instructions
Two address instructions are the most common in commercial computers. Here again each
address field can specify either a processor register or a memory word. The program to evaluate
X = (A + B) ∗ (C + D) is as follows:
MOV R1, A R1 ← M [A]
ADD R1, B R1 ← R1 + M [B]
MOV R2, C R2 ← M [C]
ADD R2, D R2 ← R2 + M [D]
MUL R1, R2 R1 ← R1∗R2
MOV X, R1 M [X] ← R1
The MOV instruction moves or transfers the operands to and from memory and processor
registers. The first symbol listed in an instruction is assumed to be both a source and the
destination where the result of the operation is transferred.
One -Address Instructions
One-address instructions use an implied accumulator (AC) register for all data manipulation. For
multiplication and division there is a need for a second register. However, here we will neglect
the second and assume that the AC contains the result of tall operations. The program to evaluate
X = (A + B) ∗ (C + D) is
LOAD A AC ← M [A]
ADD B AC ← A [C] + M [B]
STORE T M [T] ← AC
LOAD C AC ← M [C]
ADD D AC ← AC + M [D]
MUL T AC ← AC ∗ M [T]
STORE X M [X] ← AC
All operations are done betwee n the AC register and a memory operand. T is the address of a
temporary memory location required for storing the intermediate result.

Zero-Address Instructions
A stack-organized computer does not use an address field for the instruc tions ADD and MUL.
The PUSH and POP instructions, however, need an address field to specify t he operand that
communicates with the stack. The following program shows how X = (A + B) ∗ (C + D) will be
written for a stack organized computer. (TOS stands for top of stack)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ← (A + B)
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ← (C + D)
MUL TOS ← (C + D) ∗ (A + B)
POP X M [X] ← TOS
To evaluate arithmetic expressions in a stack computer, it is necessary to convert the expression
into reverse Polish notation. The name “zero-address” is given to this type of computer because
of the absence of an address field in the computational instructions.

There are three types of formats:

 Memory Reference Instruction


It uses 12 bits to specify the address and 1 bit to specify the addressing mode (I). I is equal
to 0 for direct address and 1 for indirect address.

 Register Reference Instruction


These instructions are recognized by the opcode 111 with a 0 in the left most bit of instruction.
The other 12 bits specify the operation to be executed.

 Input-Output Instruction
These instructions are recognised by the operation code 111 with a 1 in the left most bit of
instruction. The remaining 12 bits are used to specify the input-output operation.
Format of Instruction
Basic fields of an instruction format are given below:

 An operation code field that specifies the operation to be performed.


 An address field that designates the memory address or register.
 A mode field that specifies the way the operand of effective address is determined.
Computers ma y have instructions of differe nt lengths containing varying number of addresses.
The number of address field in the instruction format depends upon the internal organization of
its registers.

Instruction Cycle
An instruction cycle, also known as fetch-decode-execute cycle is the basic operational process of
a computer. This process is re peated continuously by CPU from boot up to s hut down of
computer.

Following are the steps that occur during an instruction cycle:


 Fetch the Instruction
The instruction is fetched from memory address that is stored in Program Counter (PC) and
stored in the Instruction Register (IR). At the end of the fetch operation, PC is incremented by 1
and it then points to the next instruction to be executed.
 Decode the Instruction
The instruction in the IR is executed by the decoder.
 Read the Effective Address
If the instruction has an indirect address, the effective address is read from the memory.
Otherwise operands are directly read in case of immediate operand instruction.
 Execute the Instruction
The Control Unit passes the information in the form of control signals to the functional unit of
CPU. The result generated is stored in main memory or sent to an output device.

The cycle is then repeated by fetching the next instruction. Thus in this way the instruction cycle
is repeated continuously.
Figure 3: Instruction Cycle

CISC and RISC ARCHITECTURES

What is CISC and RISC?

Central Processing Unit Architecture operates the capacity to work from “Instruction Set
Architecture” to where it was designed. There are 2 types of concepts to impleme nt the processor
hardware architecture. The architectural designs of CPU are:

 RISC (Reduced Instruction Set Computing)


 CISC (Complex Instruction Set Computing)
Complex instruction set computing (CISC) has the ability to execute addressing modes or multi-
step operations within one instruction set. It is the design of the CPU where one instruction
performs many low-level operations. For e xample, memory storage, an arithmetic operation and
loading from memory. RISC is a CPU design strategy based on the insight that simplifie d
instruction set gives higher performance whe n combine d with a microprocessor architecture
which has the ability to execute the instructions by using some microprocessor cycles per
instruction. Hardware of the Intel is termed as Complex I nstruction Set Computer (CISC) while
Apple hardware is Reduced Instruction Set Computer (RISC).
Hardware designers invent numerous technologies and tools to implement the desired architecture
in order to fulfil these needs. Hardware architecture may be implemented to be either hardware
specific or software specific, but according to the application both are used in the required
quantity.
Figure 4: RISC and CISC Architecture

 CISC Architecture
The CISC approach attempts to minimize the number of instructions per program, sacrificing the
number of cycles per instruction. Computers based on the CISC architecture are designed to
decrease the memory cost. Because, the large progra ms need more storage, thus increasing the
memory cost and large memory becomes more expensive. To solve these problems, the number
of instructions per progra m can be reduced by embedding the number of operations in a single
instruction, thereby making the instructions more complex.

Figure 5: CISC Architecture

MUL loads two values from the memory into separate registers in CISC. CISC uses minimum
possible instructions by implementing hardware and executes operations.

Instruction Set Architecture is a medium to permit communication between the programmer and
the hardware. Data execution part, c opying of data , deleting or editing is the user commands used
in the microprocessor and with this microprocessor the Instruction set architecture is operated.
The main instruction used in the above Instruction Set Architecture are as below

 Instruction Set:

Group of instructions given to execute the program and they direct the computer by manipulating
the data. Instructions are in the form – Opcode (operational code) and Operand. Where, opcode is
the instruction applied to load and store data, etc. The operand is a memory register where
instruction applied.

 Addressing Modes:

Addressing modes are the manner in the data is accessed. Depending upon the type of instruction
applied, addressing modes are of various types such as direct mode where straight data is
accessed or indirect mode where the location of the data is accessed. Processors having ide ntical
ISA may be very different in organization. A processor with identica l ISA and nearly identical
organization is still not nearly identical.
CPU performance is given by the fundamental law

Thus, CPU performance is dependent upon Instruction Count, CPI (Cycles per instruction) and
Clock cycle time. And all three are affected by the instruction set architecture.

 Instruction Count of the CPU

This underlines the importance of the instruction set architecture. There are two prevalent
instruction set architectures

Examples of CISC PROCESSORS


 IBM 370/168 – It was introduced in the year 1970. CISC design is a 32 bit processor and
four 64-bit floating point registers.
 VAX 11/780 – CISC design is a 32-bit processor and it supports many numbers of
addressing modes and machine instructions which is from Digital Equipment
Corporation.
 Intel 80486 – It was launched in the year 1989 and it is a CISC processor, which has
instructions varying lengths from 1 to 11 and it will have 235 instructions.

Characteristics of CISC Architecture

i. Instruction-decoding logic will be Complex.


ii. One instruction is required to support multiple addressing modes.
iii. Less chip space is enough for genera l purpose registers for the instructions that are
operated directly on memory.
iv. Various CISC designs are set up two specia l registers for the stack pointer, handling
interrupts e.t.c.
v. MUL is referred to as a “complex instruction” and requires the programmer for storing
functions.
 RISC Architecture
RISC (Reduced Instruction Set Computer) is used in portable devices due to its power efficiency.
RISC is a type of microprocessor architecture that uses highly-optimized set of instructions. RISC
does the opposite by reducing the cycles per instruction at the cost of the number of instructions
per program Pipe lining is one of the unique features of RISC. It is performed by overlapping the
execution of several instructions in a pipeline fashion.

It has a high performance advantage over CISC. RISC processors take simple instructions and are
executed within a clock cycle
Example: Apple iPod and Nintendo DS.

Figure 6: RISC Architecture


RISC Architecture Characteristics

i. Simple Instructions are used in RISC architecture.


ii. RISC helps and supports few simple data types and synthesizes complex data types.
iii. RISC utilizes simple addressing modes and fixed length instructions for pipelining.
iv. RISC permits any register to use in any context.
v. One Cycle Execution Time
vi. The amount of work that a computer can perform is reduced by separating “LOAD” and
“STORE” instructions.
vii. RISC c ontains Large Number of Registers in order to prevent various numbers of
interactions with memory.
viii. In RISC, P ipelining is easy as the execution of all instructions will be done in a uniform
interval of time i.e. one click.
ix. In RISC, more RAM is required to store assembly level instructions.
x. Reduced instructions need a less number of transistors in RISC.
xi. RISC uses Harvard memory model means it is Harvard Architecture.
xii. A compiler is used to perform the conversion operation means to convert a high-level
language statement into the code of its form.
 RISC & CISC Comparison

MUL instruction is divided into three instructions


“LOAD” – moves data from the memory bank to a register
“PROD” – finds product of two operands located within the registers
“STORE” – moves data from a register to the memory banks
The main difference between RISC and CISC is the number of instructions and its complexity.

Figure 7: Architecture of RISC Vs CISC

 Semantic Gap
Both RISC and CISC architectures have been developed as an attempt to cover the semantic gap.
With an objective of improving efficie ncy of software de velopment, severa l
powerful programming languages have come up, viz., Ada, C, C++, Java, etc. They provide a
high leve l of a bstraction, conciseness and power. By this evolution the semantic gap grows. To
enable efficient compilation of high level language programs, CISC a nd RISC designs are the two
options.

CISC designs involve very complex architectures, including a large number of instructions and
addressing modes, whereas RISC designs involve simplified instruction set and a dapt it to the real
requirements of user programs.

Figure 8: CISC and RISC Design

The Advantages and Disadvantages of RISC and CISC

 The Advantages of RISC architecture


i. RISC(Reduced instruction set computing)architecture has a set of instructions, so high-
level language compilers can produce more efficient code
ii. It allows freedom of using the space on microprocessors because of its simplicity.
iii. Many RISC processors use the registers for passing arguments and holding the loca l
variables.
iv. RISC functions use only a few parameters, and the RISC processors cannot use the call
instructions, and therefore, use a fixed length instruction which is easy to pipeline.
v. The speed of the operation can be maximized and the execution time can be minimized.
Very less number of instructional formats, a few numbers of instructions and a few
addressing modes are needed.

 The Disadvantages of RISC architecture


i. Mostly, the performance of the RISC processors depends on the programmer or compiler
as the knowledge of the compiler plays a vita l role while changing the CISC code to a
RISC code
ii. While rearranging the CISC code to a RISC code, termed as a code expansion, will
increase the size. And, the quality of this code expansion will again depend on the
compiler, and also on the machine’s instruction set.
iii. The first level cache of the RISC processors is als o a disadvantage of the RISC, in which
these processors have large memory caches on the chip itself. For feeding the
instructions, they require very fast memory systems.

 Advantages of CISC architecture


i. Microprogramming is easy assembly language to implement, and less expensive than
hard wiring a control unit.
ii. The ease of microcoding new instructions allowed designers to ma ke CISC machines
upwardly compatible:
iii. As each instruction became more accomplished, fewer instructions could be used to
implement a given task.

 Disadvantages of CISC architecture


i. The performance of the machine slows down due to the amount of clock time taken by
different instructions will be dissimilar
ii. Only 20% of the existing instructions is used in a typical programming event, even
though there are various spec ialized instructions in reality which are not even used
frequently.
iii. The conditional codes are set by the CISC instructions as a side effect of each instruction
which takes time for this setting – and, as the subsequent instruction changes the
condition code bits – so, the compiler has to exa mine the condition code bits before this
happens.

MEMORY ORGANIZATION
A memory unit is the collection of storage units or devices together. The memory unit stores the
binary information in the form of bits. Generally, memory/storage is classified into 2 categories:

 Volatile Memory: This loses its data, when power is switched off.
 Non-Volatile Memory: This is a permanent storage and does not lose any data when power
is switched off.

Memory Hierarchy

Figure 9: Memory Hierarchy


The tota l memory capacity of a computer can be visua lized by hierarchy of components. The
memory hierarchy system consists of all storage devices contained in a computer system from the
slow Auxiliary Memory to fast Main Memory and to smaller Cache memory.

Auxiliary memory access time is generally 1000 times that of the ma in memory, hence it
is at the bottom of the hierarchy.
 The main memory occupies the central position because it is equipped to c ommunicate
directly with the CPU and with auxiliary memory devices through Input/output processor
(I/O).
When the program not residing in main memory is needed by the CPU, they are brought in from
auxiliary memory. Progra ms not currently needed in ma in memory are transferred into au xiliary
memory to provide space in main memory for other programs that are currently in use.
 The cache memory is used to store program data which is currently being executed in the
CPU. Approximate access time ratio between cache memory and main memory is about 1
to 7~10

Memory Access Me thods


Each memory type is a collection of numerous memory locations. To access data from any
memory, first it must be located and then the data is read from the memory location. Following
are the methods to access information from memory locations:

1. Random Acce ss : Main memories are random access memories, in which each memory
location has a unique address. Using this unique address any me mory location can be reached
in the same amount of time in any order.
2. Sequential Access: This method allows memory access in a sequence or in order.
3. Dire ct Access : In this mode, information is stored in tracks, with each track having a separate
read/write head.

1. Main Memory
The memory unit that communicates directly within the CPU, Auxillary me mory and Cache
memory, is called main memory. It is the centra l stora ge unit of the c omputer system. It is a large
and fast memory used to store data during computer operations. Main memory is made up
of RAM and ROM, with RAM integrated circuit chips holing the major share.
i. Random Access Memory (RAM):
 DRAM: Dyna mic RAM, is made of capac itors and transistors, and must be refreshed
every 10~100 ms. It is slower and cheaper than SRAM.
 SRAM: Static RAM, has a six transistor circuit in each cell and retains data , until
powered off.
 NVRAM : Non-Volatile RAM, retains its data, eve n whe n turned off. Example: Flash
memory.

ii. Read Only Memory (ROM): is non-volatile and is more like a permanent storage for
information. It also stores the bootstrap loader program, to load and start the operating
system when computer is turned on. PROM (Programmable ROM), EPROM (Erasable
PROM) and EEPROM (Electrically Erasable PROM) are some commonly used ROMs.
2. Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For example: Magnetic disks
and tapes are commonly used auxiliary devices. Other devices used as auxiliary memory are
magnetic drums, magnetic bubble me mory and optical disks. It is not directly accessible to the
CPU, and is accessed using the Input/Output channels.

3. Cache Me mory
The data or contents of the main me mory that are used a ga in and again by CPU, are store d in the
cache memory so that we can easily access that data in shorter time.
Whenever the CPU needs to access memory, it first checks the cache memory. If the data is not
found in cache memory then the CPU moves onto the ma in memory. It also transfers block of
recent data into the cache and keeps on deleting the old data in cache to accomodate the new one.

 Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit ratio. When the
CPU refers to me mory and finds the word in cache it is said to produce a hit. If the word is not
found in cache, it is in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Hit Ratio = Hit/(Hit + Miss)

4. Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in which each bit
position can be compared. In this the content is compared in each bit cell which allows very fast
table lookup. Since the entire chip can be compared, conte nts are randomly stored without
considering addressing scheme. These chips have less storage capac ity than regular me mory
chips.

5. Mapping and Concept of Virtual Memory


The transformation of data from main memory to cache memory is called mapping. There are 3
main types of mapping:

 Associative Mapping
 Direct Mapping
 Set Associative Mapping

6. Virtual Memory
Virtual memory is the separation of logical memory from physical memory. This separation
provides large virtual memory for programmers when only small physical memory is available.
Virtual me mory is used to give programmers the illusion that they have a very large memory
even though the computer has a small main memory. It makes the task of progra mming easier
because the programmer no longer needs to worry about the a mount of physica l memory
available.
Parallel Processing and Data Transfer Modes in a Computer Syste m
Instead of processing each instruction sequentially, a paralle l processing system provides
concurrent data processing to increase the execution time. In this the syste m may ha ve two or
more ALU's and should be able to execute two or more instructions at the same time. The
purpose of parallel processing is to speed up the computer processing capability and increase its
throughput.

NOTE: Throughput is the number of instructions that can be executed in a unit of time.
Parallel processing can be viewed from various levels of complexity. At the lowest level, we
distinguish between parallel and serial operations by the type of registers used. At the higher level
of complexity, parallel processing can be achieved by using multiple functional units that perform
many operations simultaneously.

 Pipelining

Pipelining is the process of accumulating instruction from the processor through a pipeline. It
allows storing and executing instructions in an orderly process. It is also known as pipe line
processing.
Pipelining is a technique where multiple instructions are overlapped during execution. P ipeline is
divided into sta ges and these stages are connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.
Note : Pipelining increases the overall instruction throughput.
Advantages of Pipe lining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
In pipeline system, each segment consists of an input register followed by a combinational
circuit. The register is used to hold data and combinational circ uit performs operations on it. The
output of combinationa l circuit is applied to the input register of the next segment.

Figure 10: Pipelining

Pipeline syste m is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, the re are robotic arms to
perform a certain task, and then the car moves on ahead to the next arm.
Type s of Pipeline
It is divided into 2 categories:
i. Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for float ing point
operations, multiplication of fixed point numbers etc. For example: The input to the Floating
Point Adder pipeline is:
X  A  2 a
Y  B  2 b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.
The floating point addition and subtraction is done in 4 parts:

 Compare the exponents.


 Align the mantissas.
 Add or subtract mantissas
 Produce the result.
ii. Instruction Pipe line
In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficie nt if the instruction cycle is divided into
segments of equal duration.

 Pipe line Conflicts


There are some factors that cause the pipeline to deviate its normal performance. Some of these
factors are given below:
i. Timing Variations
All stages cannot take same a mount of time. This problem genera lly occurs in instruction
processing where different instructions have different opera nd requirements and thus different
processing time.

ii. Data Hazards


When several instructions are in partial execution, and if they reference same data then the
problem arises. We must ensure that next instruction does not attempt to access data before the
current instruction, because this will lead to incorrect results.

 Pipe line Hazards


There are situations, called hazards that preve nt the next instruction in the instruction stream from
being executing during its designated clock cyc le. Ha zards reduce the performance from the ideal
speedup gained by pipelining.
A hazard is created whenever there is a dependence between instruction and they are close
enough that the overlap caused by pipelining would change the order of access to an operand.

There are three classes of hazards:

1. Structural Hazards. They arise from resource c onflicts when the hardware cannot support all
possible combinations of instructions in simultane ous overlapped execution.
2. Data Hazards. They arise when an instruction depends on the result of a previous
instruction in a way that is exposed by the overlapping of instructions in the pipeline.
3. Control Hazards. They arise from the pipelining of branches and other instructions that
change the PC.

i. Branching
In order to fetch and execute the next instruction, we must know what that instruction is. If the
present instruction is a conditional branc h, and its result will lead us to the ne xt instruction, then
the next instruction may not be known until the current one is processed.
ii. Interrupts
Interrupts set unwanted instructions into the instruction stream. Interrupts effect the execution of
instruction.

iii. Data Dependency


It arises when a n instruction depends upon the result of a pre vious instruction but this result is not
yet available.

Principle s of Pipelining Using DLX Architecture

The principles of pipelining will be described using DLX (DELUXE) and a simple version of its
pipeline. Those principles can be applied to more complex instruction sets than DLX, although
the resulting pipelines are more complex. It has simple pipe line architecture for CPU and
provides a good architectural model for study.

The architecture of DLX was chosen based on observations about most frequently used primitives
in programs. DLX provides a good architectural model for study, not only because of the recent
popularity of this type of machine, but also because it is easy to understand.
Like most recent load/store machines, DLX emphasizes

 A simple load/store instruction set


 Design for pipelining efficiency
 An easily decoded instruction set
 Efficiency as a compiler target

Registers for DLX

 32-bit general purpose registers


 32 floating-point registers (FPRs),
 32 single precision (32-bit) registers or Even-odd pairs holding double precision values.
Thus, the 64bit a few special registers can be transferred to and from the integer registers.

Data types for DLX

 For integer data: 8-bit bytes, 16-bit half words, 32-bit words
 For floating point: 32-bit single precision, 64-bit double precision
 The DLX operations work on 32-bit integers and 32- or 64-bit floating point. Bytes and
half words are loaded into registers with e ither zeros or the sign bit replicated to fill the
32 bits of the registers.
Memory

 Byte addressable
 32-bit address
 Two addressing modes (immediate and displacement). Register deferred and absolute
addressing with 16-bit field are accomplished
 Memory references are load/store between memory and FPRs and all memory accesses must
be aligned
 There are instructions for moving between a FPR and a
Instructions

 Instruction layout for DLX


 Complete list of instructions in DLX
 32 bits(fixed), must be aligned
Operations

There are four classes of instructions:


 Load/Store : Any of the s or FPRs may be loaded and stored except that loading R0
has no effect.
 ALU Operations: All ALU instructions are register-register instructions.
The operations are : add, subtract, AND, OR, XOR, shifts
 Compare instructions, two registers (=,!=,<,>,=<,=>).
If the condition is true, these instructions place a 1 in the destination register,
otherwise they place a 0.
 Branches/Jumps: All branches are conditional. The branch condition is specified
by the instruction, which may test the register source for zero or nonzero.
 Floating-Point Operations: add, subtract, multiply, divide

An Implementation of DLX

This un-pipeline d implementation is not the most economica l or the highest performance
implementation without pipelining. Instead, it is designed to lead naturally to a pipe lined
implementation. Imple menting the instruction set requires the introduction of several temporary
registers that are not part of the architecture. Every DLX instruction can be imple mented in at
most five clock cycles. The five clock cycles are:

i. Instruction fetch cycle (IF)


ii. Instruction decode/register fetch (ID)
iii. Execution/Effective address cycle (EX)
iv. Memory access/branch completion cycle (MEM)
v. Write-back cycle (WB)
On each cycle the instruction will be process from IF to WB cycle (If "Cycle" is disabled the n it
has no effect). If it appears like nothing has changed, it means that the cycle is not active for the
instruction type. Detailed description of each ia as follows:
Instruction fetch cycle (IF):
IR  MEM[PC]
NPC  PC +4

Operation:

• Send out the PC and fetch the instruction from memory into the instruction register (IR)
• Increment the PC by 4 to address the next sequential instruction
• The IR is used to hold the instruction that will be needed on subsequent clock cycles
• The NPC is used to hold the next sequential PC (program counter)

 Instruction de code/register fetch (ID):


A  Regs[IR6..10]
B  Regs[IR11..15]
Imm  ((IR16)16##IR16..31)
Operation:

i. Decode the instruction and access the register file to read the registers.
ii. The outputs of the general-purpose registers are read into two temporary registers (A
and B) for use in later clock cycles.
iii. The lower 16 bits of the IR are also sign-extended and stored into the temporary
register IMM, for use in the next cycle.
iv. Decoding is done in para lle l with reading registers, which is possible because these
fields are at a fixed location in the DLX instruction format. This technique is known
as fixed-field decoding.

 Execution/Effective address cycle (EX):


The ALU operates on the operand prepared in the prior cycle, performing one of four functions
depending on the DLX instruction type

 Memory reference:
ALUOutput  A +Imm

Operation: The ALU adds the operands to form the effective address and places the result into
the register ALUOutput

Register-Register ALU instruction:


ALUOutput  A op B

Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register B. The result is placed in the register ALUOutput.

Register- Immediate ALU instruction:

ALUOutput  A op Imm
Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register Imm. The result is placed in the register ALUOutput.
Branch:

ALUOutput  NPC + Imm


Cond  ( A op 0 )
Operation:

• The ALU adds the NPC to the sign-extended immediate value in Imm to compute
the address of the branch target.
• Register A, which has been read in the prior cycle, is checked to determine
whether the branch is taken.
• The comparison operation op is the relational operator determined by the branch
opcode (e.g. op is "==" for the instruction BEQZ)

 Memory access/branch completion cycle (MEM):


The only DLX instructions active in this cycle are loads, stores, and branches.

Operation:
 Access memory if needed
 If the instruction is load , data returns from memory and is placed in the LMD (load
memory data) register
 If the instruction is store, data from the B register is written into memory.
 In either case the address used is the one computed during the prior cycle
and stored in the register ALUOutput

Branch:

 if (cond) PC  ALU Output else PC  NPC

Operation:
- If the instruction branches, the PC are replaced with bra nch destination address in the register
ALUOutput
- Otherwise, PC is replaced with the incremented PC in the register NPC

Memory reference:

LMD  Mem[ALUOutput] or Mem[ALUOutput]  B

 Write-back cycle (WB):


• Register-Register ALU instruction: Regs[IR 16..20]  ALUOutput
• Register-Immediate ALU instruction: Regs[IR 11..15]  ALUOutput
Load instruction:

Regs[IR11..15]  LMD
Operation:
 Write the result into the register file, whether it comes from the memory(LMD) or from ALU
(ALUOutput)
 The register destination field is in one of two positions depending on the opcode
Limitations on practical depth of a pipeline arise from:

 Pipeline latency. The fact that the execution time of each instruction does not decrease
puts limitations on pipeline depth;
 Imbalance among pipeline stages. Imbalance among the pipe stages reduces
performance since the clock can run no faster than the time needed for the slowest
pipeline stage;
 Pipeline overhead. Pipeline overhead arises from the combination of pipeline register
delay (setup time plus propagation delay) and clock skew.

Once the clock cycle is as small as the sum of the clock skew and latch overhead, no further
pipelining is useful, since there is no time left in the cycle for useful work.

Example
1. Consider a non-pipelined machine with 6 execution stages of lengths 50ns, 50ns, 60ns,
60ns, 50ns, and 50 ns.
i. Find the instruction latency on this machine.
ii. How much time does it take to execute 100 instructions?

Solution:

Instruction latency = 50+50+60+60+50+50= 320 ns


Time to execute 100 instructions = 100*320 = 32000 ns
2. Suppose a pipelining is introduced on this machine. Assume that when introducing
pipelining, the clock skew adds 5ns of overhead to each execution stage.
i. What is the instruction latency on the pipelined machine?
ii. How much time does it take to execute 100 instructions?

Solution:
Remember tha t in the pipelined implementation, the le ngth of the pipe stages must all be the
same, i.e., the speed of the slowest stage plus overhead. With 5ns overhead it comes to:

The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead = 60 + 5 = 65 ns


Instruction latency = 65 ns
Time to execute 100 instructions = 65*6*1 + 65*1*99 = 390 + 6435 = 6825 ns

3. What is the speedup obtained from pipelining?

Solution:
Speedup is the ratio of the average instruction time without pipelining to the average instruction
time with pipelining.
Average instruction time not pipelined = 320 ns
Average instruction time pipelined = 65 ns
Speedup for 100 instructions = 32000 / 6825 = 4.69
REFERENCES

Abd-El-Barr, M . and El-Rewini, H. (2004) Pipelining Design Techniques, in Fundamentals of Computer Organization and
Architecture, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/0471478326.ch9

https://cs.nyu.edu/~gottlieb/courses/2000s/2006-07-spring/os2250/lectures/lecture-03.html
https://www.docsity.com/en/computer-architecture-and-organization-instruction-types-saritha/30725/
http://studylib.net/doc/6649432/cmp1203-computer-architecture-and-organization

https://www.google.com.ng/url?sa=t&source=web&rct=j&url=https://www.iare.ac.in/sites/default/file
s/PPT/CO%2520Lecture%2520Notes.pdf&ved=2ahUKEwiNl_2OsY3ZAhUNQMAKHdBmB_AQFj
AFegQICRAB&usg=AOvVaw0YMjE-M2IUxLJbD-g6jGsM

https://www.google.com.ng/url?sa=t&source=web&rct=j&url=https://www.elprocus.com/difference-
between-risc-and-cisc-architecture/&ved=2ahUKEwjioNK-
sY3ZAhVHDMAKHRneCeoQFjADegQIEBAB&usg=AOvVaw3q6GNzCAaQ9aCFzvMjV1yf

You might also like