Addressing Mode Part3
Addressing Mode Part3
Addressing Mode
Addressing modes are the ways how architectures spec ify the address of an object they want to
access. In mac hines, an addressing mode can specify a constant, a register or a location in
memory.
The operation field of an instruction specifies the operation to be performed. This operation will
be executed on some data which is stored in computer registers or the main memory. The way
any operand is selected during the program execution is dependent on the addressing mode of the
instruction. The purpose of using addressing modes is as follows:
To give the programming versatility to the user.
To reduce the number of bits in addressing field of instruction.
Register Mode
In this mode the operand is stored in the register and this register is present in CPU. The
instruction has the address of the Register where the operand is stored.
Instruction Codes
While a Program is a set of instructions that specify the operations, operands, and the sequence
by which processing has to occur. An instruction code is a group of bits that tells the computer to
perform a specific operation part.
Register Part
The operation must be performed on the data stored in registers. An instruction code therefore
specifies not only operations to be performed but also the registers where the opera nds (data) will
be found as well as the registers where the result has to be stored.
Computers with a single processor register are known as Accumulator (AC). The operation is
performed with the memory operand and the content of AC.
COMPUTER INSTRUCTIONS
The basic computer has three instruction code formats. The Operation Code (opcode) part of the
instruction contains 3 bits and remaining 13 bits depends upon the operation code encountered.
Three-Address Instructions
Computers with three-address instruction formats can use each address field to specify either a
processor register or a memory operand. The program in assembly language that evaluates X =
(A + B) ∗ (C + D) is shown below, together with comments that explain the register transfer
operation of each instruction.
It is assumed that the computer has two processor registers, R1 and R2. The symbol M [A]
denotes the operand at memory address symbolized by A.
The advantage of the three-address format is that it results in short programs when evaluating
arithmetic expressions. The disadvantage is that the binary-coded instructions require too many
bits to specify three addresses. An example of a commercial computer that uses three -address
instructions is the Cyber 170. The instruction formats in the Cyber computer are restricted to
either three register address fields or two register address fields and one memory address field.
Two-Address Instructions
Two address instructions are the most common in commercial computers. Here again each
address field can specify either a processor register or a memory word. The program to evaluate
X = (A + B) ∗ (C + D) is as follows:
MOV R1, A R1 ← M [A]
ADD R1, B R1 ← R1 + M [B]
MOV R2, C R2 ← M [C]
ADD R2, D R2 ← R2 + M [D]
MUL R1, R2 R1 ← R1∗R2
MOV X, R1 M [X] ← R1
The MOV instruction moves or transfers the operands to and from memory and processor
registers. The first symbol listed in an instruction is assumed to be both a source and the
destination where the result of the operation is transferred.
One -Address Instructions
One-address instructions use an implied accumulator (AC) register for all data manipulation. For
multiplication and division there is a need for a second register. However, here we will neglect
the second and assume that the AC contains the result of tall operations. The program to evaluate
X = (A + B) ∗ (C + D) is
LOAD A AC ← M [A]
ADD B AC ← A [C] + M [B]
STORE T M [T] ← AC
LOAD C AC ← M [C]
ADD D AC ← AC + M [D]
MUL T AC ← AC ∗ M [T]
STORE X M [X] ← AC
All operations are done betwee n the AC register and a memory operand. T is the address of a
temporary memory location required for storing the intermediate result.
Zero-Address Instructions
A stack-organized computer does not use an address field for the instruc tions ADD and MUL.
The PUSH and POP instructions, however, need an address field to specify t he operand that
communicates with the stack. The following program shows how X = (A + B) ∗ (C + D) will be
written for a stack organized computer. (TOS stands for top of stack)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ← (A + B)
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ← (C + D)
MUL TOS ← (C + D) ∗ (A + B)
POP X M [X] ← TOS
To evaluate arithmetic expressions in a stack computer, it is necessary to convert the expression
into reverse Polish notation. The name “zero-address” is given to this type of computer because
of the absence of an address field in the computational instructions.
Input-Output Instruction
These instructions are recognised by the operation code 111 with a 1 in the left most bit of
instruction. The remaining 12 bits are used to specify the input-output operation.
Format of Instruction
Basic fields of an instruction format are given below:
Instruction Cycle
An instruction cycle, also known as fetch-decode-execute cycle is the basic operational process of
a computer. This process is re peated continuously by CPU from boot up to s hut down of
computer.
The cycle is then repeated by fetching the next instruction. Thus in this way the instruction cycle
is repeated continuously.
Figure 3: Instruction Cycle
Central Processing Unit Architecture operates the capacity to work from “Instruction Set
Architecture” to where it was designed. There are 2 types of concepts to impleme nt the processor
hardware architecture. The architectural designs of CPU are:
CISC Architecture
The CISC approach attempts to minimize the number of instructions per program, sacrificing the
number of cycles per instruction. Computers based on the CISC architecture are designed to
decrease the memory cost. Because, the large progra ms need more storage, thus increasing the
memory cost and large memory becomes more expensive. To solve these problems, the number
of instructions per progra m can be reduced by embedding the number of operations in a single
instruction, thereby making the instructions more complex.
MUL loads two values from the memory into separate registers in CISC. CISC uses minimum
possible instructions by implementing hardware and executes operations.
Instruction Set Architecture is a medium to permit communication between the programmer and
the hardware. Data execution part, c opying of data , deleting or editing is the user commands used
in the microprocessor and with this microprocessor the Instruction set architecture is operated.
The main instruction used in the above Instruction Set Architecture are as below
Instruction Set:
Group of instructions given to execute the program and they direct the computer by manipulating
the data. Instructions are in the form – Opcode (operational code) and Operand. Where, opcode is
the instruction applied to load and store data, etc. The operand is a memory register where
instruction applied.
Addressing Modes:
Addressing modes are the manner in the data is accessed. Depending upon the type of instruction
applied, addressing modes are of various types such as direct mode where straight data is
accessed or indirect mode where the location of the data is accessed. Processors having ide ntical
ISA may be very different in organization. A processor with identica l ISA and nearly identical
organization is still not nearly identical.
CPU performance is given by the fundamental law
Thus, CPU performance is dependent upon Instruction Count, CPI (Cycles per instruction) and
Clock cycle time. And all three are affected by the instruction set architecture.
This underlines the importance of the instruction set architecture. There are two prevalent
instruction set architectures
It has a high performance advantage over CISC. RISC processors take simple instructions and are
executed within a clock cycle
Example: Apple iPod and Nintendo DS.
Semantic Gap
Both RISC and CISC architectures have been developed as an attempt to cover the semantic gap.
With an objective of improving efficie ncy of software de velopment, severa l
powerful programming languages have come up, viz., Ada, C, C++, Java, etc. They provide a
high leve l of a bstraction, conciseness and power. By this evolution the semantic gap grows. To
enable efficient compilation of high level language programs, CISC a nd RISC designs are the two
options.
CISC designs involve very complex architectures, including a large number of instructions and
addressing modes, whereas RISC designs involve simplified instruction set and a dapt it to the real
requirements of user programs.
MEMORY ORGANIZATION
A memory unit is the collection of storage units or devices together. The memory unit stores the
binary information in the form of bits. Generally, memory/storage is classified into 2 categories:
Volatile Memory: This loses its data, when power is switched off.
Non-Volatile Memory: This is a permanent storage and does not lose any data when power
is switched off.
Memory Hierarchy
Auxiliary memory access time is generally 1000 times that of the ma in memory, hence it
is at the bottom of the hierarchy.
The main memory occupies the central position because it is equipped to c ommunicate
directly with the CPU and with auxiliary memory devices through Input/output processor
(I/O).
When the program not residing in main memory is needed by the CPU, they are brought in from
auxiliary memory. Progra ms not currently needed in ma in memory are transferred into au xiliary
memory to provide space in main memory for other programs that are currently in use.
The cache memory is used to store program data which is currently being executed in the
CPU. Approximate access time ratio between cache memory and main memory is about 1
to 7~10
1. Random Acce ss : Main memories are random access memories, in which each memory
location has a unique address. Using this unique address any me mory location can be reached
in the same amount of time in any order.
2. Sequential Access: This method allows memory access in a sequence or in order.
3. Dire ct Access : In this mode, information is stored in tracks, with each track having a separate
read/write head.
1. Main Memory
The memory unit that communicates directly within the CPU, Auxillary me mory and Cache
memory, is called main memory. It is the centra l stora ge unit of the c omputer system. It is a large
and fast memory used to store data during computer operations. Main memory is made up
of RAM and ROM, with RAM integrated circuit chips holing the major share.
i. Random Access Memory (RAM):
DRAM: Dyna mic RAM, is made of capac itors and transistors, and must be refreshed
every 10~100 ms. It is slower and cheaper than SRAM.
SRAM: Static RAM, has a six transistor circuit in each cell and retains data , until
powered off.
NVRAM : Non-Volatile RAM, retains its data, eve n whe n turned off. Example: Flash
memory.
ii. Read Only Memory (ROM): is non-volatile and is more like a permanent storage for
information. It also stores the bootstrap loader program, to load and start the operating
system when computer is turned on. PROM (Programmable ROM), EPROM (Erasable
PROM) and EEPROM (Electrically Erasable PROM) are some commonly used ROMs.
2. Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For example: Magnetic disks
and tapes are commonly used auxiliary devices. Other devices used as auxiliary memory are
magnetic drums, magnetic bubble me mory and optical disks. It is not directly accessible to the
CPU, and is accessed using the Input/Output channels.
3. Cache Me mory
The data or contents of the main me mory that are used a ga in and again by CPU, are store d in the
cache memory so that we can easily access that data in shorter time.
Whenever the CPU needs to access memory, it first checks the cache memory. If the data is not
found in cache memory then the CPU moves onto the ma in memory. It also transfers block of
recent data into the cache and keeps on deleting the old data in cache to accomodate the new one.
Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit ratio. When the
CPU refers to me mory and finds the word in cache it is said to produce a hit. If the word is not
found in cache, it is in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Hit Ratio = Hit/(Hit + Miss)
4. Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in which each bit
position can be compared. In this the content is compared in each bit cell which allows very fast
table lookup. Since the entire chip can be compared, conte nts are randomly stored without
considering addressing scheme. These chips have less storage capac ity than regular me mory
chips.
Associative Mapping
Direct Mapping
Set Associative Mapping
6. Virtual Memory
Virtual memory is the separation of logical memory from physical memory. This separation
provides large virtual memory for programmers when only small physical memory is available.
Virtual me mory is used to give programmers the illusion that they have a very large memory
even though the computer has a small main memory. It makes the task of progra mming easier
because the programmer no longer needs to worry about the a mount of physica l memory
available.
Parallel Processing and Data Transfer Modes in a Computer Syste m
Instead of processing each instruction sequentially, a paralle l processing system provides
concurrent data processing to increase the execution time. In this the syste m may ha ve two or
more ALU's and should be able to execute two or more instructions at the same time. The
purpose of parallel processing is to speed up the computer processing capability and increase its
throughput.
NOTE: Throughput is the number of instructions that can be executed in a unit of time.
Parallel processing can be viewed from various levels of complexity. At the lowest level, we
distinguish between parallel and serial operations by the type of registers used. At the higher level
of complexity, parallel processing can be achieved by using multiple functional units that perform
many operations simultaneously.
Pipelining
Pipelining is the process of accumulating instruction from the processor through a pipeline. It
allows storing and executing instructions in an orderly process. It is also known as pipe line
processing.
Pipelining is a technique where multiple instructions are overlapped during execution. P ipeline is
divided into sta ges and these stages are connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.
Note : Pipelining increases the overall instruction throughput.
Advantages of Pipe lining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
In pipeline system, each segment consists of an input register followed by a combinational
circuit. The register is used to hold data and combinational circ uit performs operations on it. The
output of combinationa l circuit is applied to the input register of the next segment.
Pipeline syste m is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, the re are robotic arms to
perform a certain task, and then the car moves on ahead to the next arm.
Type s of Pipeline
It is divided into 2 categories:
i. Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for float ing point
operations, multiplication of fixed point numbers etc. For example: The input to the Floating
Point Adder pipeline is:
X A 2 a
Y B 2 b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.
The floating point addition and subtraction is done in 4 parts:
1. Structural Hazards. They arise from resource c onflicts when the hardware cannot support all
possible combinations of instructions in simultane ous overlapped execution.
2. Data Hazards. They arise when an instruction depends on the result of a previous
instruction in a way that is exposed by the overlapping of instructions in the pipeline.
3. Control Hazards. They arise from the pipelining of branches and other instructions that
change the PC.
i. Branching
In order to fetch and execute the next instruction, we must know what that instruction is. If the
present instruction is a conditional branc h, and its result will lead us to the ne xt instruction, then
the next instruction may not be known until the current one is processed.
ii. Interrupts
Interrupts set unwanted instructions into the instruction stream. Interrupts effect the execution of
instruction.
The principles of pipelining will be described using DLX (DELUXE) and a simple version of its
pipeline. Those principles can be applied to more complex instruction sets than DLX, although
the resulting pipelines are more complex. It has simple pipe line architecture for CPU and
provides a good architectural model for study.
The architecture of DLX was chosen based on observations about most frequently used primitives
in programs. DLX provides a good architectural model for study, not only because of the recent
popularity of this type of machine, but also because it is easy to understand.
Like most recent load/store machines, DLX emphasizes
For integer data: 8-bit bytes, 16-bit half words, 32-bit words
For floating point: 32-bit single precision, 64-bit double precision
The DLX operations work on 32-bit integers and 32- or 64-bit floating point. Bytes and
half words are loaded into registers with e ither zeros or the sign bit replicated to fill the
32 bits of the registers.
Memory
Byte addressable
32-bit address
Two addressing modes (immediate and displacement). Register deferred and absolute
addressing with 16-bit field are accomplished
Memory references are load/store between memory and FPRs and all memory accesses must
be aligned
There are instructions for moving between a FPR and a
Instructions
An Implementation of DLX
This un-pipeline d implementation is not the most economica l or the highest performance
implementation without pipelining. Instead, it is designed to lead naturally to a pipe lined
implementation. Imple menting the instruction set requires the introduction of several temporary
registers that are not part of the architecture. Every DLX instruction can be imple mented in at
most five clock cycles. The five clock cycles are:
Operation:
• Send out the PC and fetch the instruction from memory into the instruction register (IR)
• Increment the PC by 4 to address the next sequential instruction
• The IR is used to hold the instruction that will be needed on subsequent clock cycles
• The NPC is used to hold the next sequential PC (program counter)
i. Decode the instruction and access the register file to read the registers.
ii. The outputs of the general-purpose registers are read into two temporary registers (A
and B) for use in later clock cycles.
iii. The lower 16 bits of the IR are also sign-extended and stored into the temporary
register IMM, for use in the next cycle.
iv. Decoding is done in para lle l with reading registers, which is possible because these
fields are at a fixed location in the DLX instruction format. This technique is known
as fixed-field decoding.
Memory reference:
ALUOutput A +Imm
Operation: The ALU adds the operands to form the effective address and places the result into
the register ALUOutput
Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register B. The result is placed in the register ALUOutput.
ALUOutput A op Imm
Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register Imm. The result is placed in the register ALUOutput.
Branch:
• The ALU adds the NPC to the sign-extended immediate value in Imm to compute
the address of the branch target.
• Register A, which has been read in the prior cycle, is checked to determine
whether the branch is taken.
• The comparison operation op is the relational operator determined by the branch
opcode (e.g. op is "==" for the instruction BEQZ)
Operation:
Access memory if needed
If the instruction is load , data returns from memory and is placed in the LMD (load
memory data) register
If the instruction is store, data from the B register is written into memory.
In either case the address used is the one computed during the prior cycle
and stored in the register ALUOutput
Branch:
Operation:
- If the instruction branches, the PC are replaced with bra nch destination address in the register
ALUOutput
- Otherwise, PC is replaced with the incremented PC in the register NPC
Memory reference:
Regs[IR11..15] LMD
Operation:
Write the result into the register file, whether it comes from the memory(LMD) or from ALU
(ALUOutput)
The register destination field is in one of two positions depending on the opcode
Limitations on practical depth of a pipeline arise from:
Pipeline latency. The fact that the execution time of each instruction does not decrease
puts limitations on pipeline depth;
Imbalance among pipeline stages. Imbalance among the pipe stages reduces
performance since the clock can run no faster than the time needed for the slowest
pipeline stage;
Pipeline overhead. Pipeline overhead arises from the combination of pipeline register
delay (setup time plus propagation delay) and clock skew.
Once the clock cycle is as small as the sum of the clock skew and latch overhead, no further
pipelining is useful, since there is no time left in the cycle for useful work.
Example
1. Consider a non-pipelined machine with 6 execution stages of lengths 50ns, 50ns, 60ns,
60ns, 50ns, and 50 ns.
i. Find the instruction latency on this machine.
ii. How much time does it take to execute 100 instructions?
Solution:
Solution:
Remember tha t in the pipelined implementation, the le ngth of the pipe stages must all be the
same, i.e., the speed of the slowest stage plus overhead. With 5ns overhead it comes to:
Solution:
Speedup is the ratio of the average instruction time without pipelining to the average instruction
time with pipelining.
Average instruction time not pipelined = 320 ns
Average instruction time pipelined = 65 ns
Speedup for 100 instructions = 32000 / 6825 = 4.69
REFERENCES
Abd-El-Barr, M . and El-Rewini, H. (2004) Pipelining Design Techniques, in Fundamentals of Computer Organization and
Architecture, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/0471478326.ch9
https://cs.nyu.edu/~gottlieb/courses/2000s/2006-07-spring/os2250/lectures/lecture-03.html
https://www.docsity.com/en/computer-architecture-and-organization-instruction-types-saritha/30725/
http://studylib.net/doc/6649432/cmp1203-computer-architecture-and-organization
https://www.google.com.ng/url?sa=t&source=web&rct=j&url=https://www.iare.ac.in/sites/default/file
s/PPT/CO%2520Lecture%2520Notes.pdf&ved=2ahUKEwiNl_2OsY3ZAhUNQMAKHdBmB_AQFj
AFegQICRAB&usg=AOvVaw0YMjE-M2IUxLJbD-g6jGsM
https://www.google.com.ng/url?sa=t&source=web&rct=j&url=https://www.elprocus.com/difference-
between-risc-and-cisc-architecture/&ved=2ahUKEwjioNK-
sY3ZAhVHDMAKHRneCeoQFjADegQIEBAB&usg=AOvVaw3q6GNzCAaQ9aCFzvMjV1yf