Instruction Set Principles and Examples
UNIT - 2
Classification of Instruction Set
Architectures
Instruction Set
Design
software
instruction set
hardware
Multiple Implementations: 8086 Pentium 4
ISAs evolve: MIPS-I, MIPS-II, MIPS-III, MIPS-IV,
MIPS,MDMX, MIPS-32, MIPS-64
MIPS (originally an acronym for Microprocessor
without Interlocked Pipeline Stages)
MIPS is a reduced instruction set computer (RISC)
instruction set architecture (ISA) developed by MIPS
Technologies (formerly MIPS Computer Systems, Inc.).
The early MIPS architectures were 32-bit, and later
versions were 64-bit. Multiple revisions of the MIPS
instruction set exist, including MIPS I, MIPS II, MIPS III,
MIPS IV, MIPS V, MIPS32, and MIPS64. The current
revisions are MIPS32 (for 32-bit implementations) and
MIPS64 (for 64-bit implementations).[1][2] MIPS32 and
MIPS64 define a control register set as well as the
instruction set.
Typical Processor Execution Cycle
Instruction
Obtain instruction from program storage
Fetch
Instruction
Determine required actions and instruction size
Decode
Operand
Locate and obtain operand data
Fetch
Execute
Result
Compute result value or status
Deposit results in register or storage for later use
Store
Next
Instruction
Determine successor instruction
Instruction and Data Memory: Unified or Separate
Computer
Program
(Instructions)
Programmer's View
ADD
SUBTRACT
AND
OR
COMPARE
.
.
.
01010
01110
10011
10001
11010
.
.
.
CPU
Memory
I/O
Computer's View
Princeton (Von Neumann) Architecture
--- Data and Instructions mixed in same
unified memory
Harvard Architecture
--- Data & Instructions in
separate memories
--- Program as data
--- Storage utilization
--- Has advantages in certain
high performance
implementations
--- Single memory interface
--- Can optimize each memory
Classifying instruction set Architectures
There are four types of internal storages uses by the
processor to store operands explicitly and implicitly for
execution of a program
1.Stack
2.Accumulator
3.Set of Registers (Register-Memory)
4.Set of Registers (Register-Register/load-store)
The operands in stack architecture are implicitly on the top
of the stack, and in an accumulator architecture one
operand is implicitly the accumulator. The general-purpose
register architectures (Register-Memory and RegisterRegister) have only explicit operands, either in registers or
memory locations.
Basic Addressing Classes
Declining cost of registers
Operand locations for four instruction set architecture
classes.
The arrows indicate whether the operand is an input or the result of the
ALU operation, or both an input and result.
Lighter shades indicate inputs, and the dark shade indicates the result.
In (a), a Top Of Stack register (TOS), points to the top input operand,
which is combined with the operand below. The first operand is removed
from the stack, the result takes the place of the second operand, and
TOS is updated to point to the result. All operands are implicit. In (b), the
Accumulator is both an implicit input operand and a result. In (c), one
input operand is a register, one is in memory, and the result goes to a
register. All operands are registers in (d) and, like the stack architecture,
can be transferred to memory only via separate instructions: push or pop
for (a) and load or store for (d).
Code Sequence for C=A+B
Stack
Push A
Push B
Add
Pop C
Accumulator
Load A
Add B
Store C
Register-memory Register-register
Load R1, A
Add R3,R1, B
Store R3, C
Load R1, A
Load R2, B
Add R3, R1, R2
Store R3, C
The code sequence for C = A + B for four classes of instruction sets.
Note that the Add instruction has implicit operands for stack and accumulator
architectures, and explicit operands for register architectures. It is assumed
that A, B, and C all belong in memory and that the values of A and B cannot be
destroyed.
Stack Architectures
Accumulator Architectures
Register-Set Architectures
Register-to-Register: Load-Store Architectures
Register-to-Memory Architectures
Memory-to-Memory Architectures
Instruction Formats
Instruction Set Architecture (ISA )
To command a computer's hardware, you must speak its
language
The words of a machine's language are called instructions, and
its vocabulary is called instruction set
Once you learn one machine language, it is easy to pick up
others:
There are few fundamental operations that all computers must provide
All designer have the same goal of finding a language that simplifies building
the hardware and the compiler while maximizing performance and
minimizing cost
Learning how instructions are represented leads to discovering
the secret of computing: the stored-program concept
The MIPS instruction set is used as a case study
Interface Design
A good interface:
Lasts through many implementations (portability, compatibility)
Is used in many different ways (generality)
Provides convenient functionality to higher levels
Permits an efficient implementation at lower levels
Design decisions must take into account:
Technology
Machine organization
use
Programming languages
Compiler technology
Operating systems
imp 1
Interface
use
imp 3
Time
use
imp 2
Classifying Instruction Set Architectures
Accumulator Architecture
Common in early stored-program computers when hardware was so expensive
Machine has only one register (accumulator) involved in all math. & logical operations
All operations assume the accumulator as a source operand and a destination for the
operation, with the other operand stored in memory
Extended Accumulator Architecture
Dedicated registers for specific operations, e.g stack and array index registers, added
The 8085 microprocessor is a an example of of such special-purpose register arch.
General-Purpose Register Architecture
MIPS is an example of such arch. where registers are not sticking to play a single role
This type of instruction set can be further divided into:
Register-memory: allows for one operand to be in memory
Register-register (load-store): demands all operands to be in registers
Machine
Motorola 6800
DEC VAX
Intel 8086
Motorola 68000
Intel 80386
PowerPC
DEC Alpha
# general-purpose
registers
Architecture style
2
16
1
16
32
32
32
Accumulator
Register-memory, memory-memory
Extended accumulator
Register-memory
Register-memory
Load-store
Load-store
Year
1974
1977
1978
1980
1985
1992
1992
Compact Code and Stack Architectures
When memory is scarce, machines, like Intel 80x86 had variable-length
instructions to match varying operand specifications and minimize code size
Stack machines abandoned registers altogether arguing that it is hard for
compilers to use them efficiently
Operands are to be pushed on a stack from memory and the results have to
be popped from the stack to memory
Operations take their operand by default from the top of the stack and insert
the results back onto the stack
Stack machines simplify compilers and lent themselves to a compact
instruction encoding but limit compiler optimization (e.g. in math. expressions)
Example:
A= B+C
Push AddressC
Push AddressB
add
Pop AddressA
# Top=Top+4; Stack[Top]=Memory[AddressC]
# Top=Top+4; Stack[Top]=Memory[AddressB]
# Stack[Top-4]=Stack[Top]+Stack[Top-4]; Top=Top-4
# Memory[AddressA]=Stack[Top]; Top=Top-4
Compact code is important for heralded network computers where programs
must be downloaded over the Internet (e.g. Java-based applications)
Other types of Architecture
High-Level-Language Architecture
In the 1960s, systems software was rarely written in high-level languages and virtually
every commercial operating system before Unix was written in assembly
Some people blamed the code density on the instruction set rather than the
programming language
A machine design philosophy was advocated with the goal of making the hardware
more like high-level languages
The effectiveness of high-level languages, memory size limitation and lack of efficient
compilers doomed this philosophy to a historical footnote
Reduced Instruction Set Architecture
With the recent development in compiler technology and expanded memory sizes less
programmers are using assembly level coding
Instruction set architecture became measurable in the way compilers rather
programmable use them
RISC architecture favors simplifying hardware design over enriching the offered set of
instructions, relying on compilers to effectively use them to perform complex operations
Virtually all new architecture since 1982 follows the RISC philosophy of fixed
instruction lengths, load-store operations, and limited addressing mode
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model
from Implementation
High-level Language Based
(B5000 1963)
Concept of a Family
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
(Vax, Intel 432 1977-80)
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
RISC
(MIPS,SPARC,IBM RS6000, . . .1987)
Register-Memory Architectures
# memory
addresses
Max. number
of operands
SPARC, MIPS, PowerPC, ALPHA
Intel 60X86, Motorola 68000
VAX (also has 3 operands format)
VAX (also has 2 operands format)
Effect of the number of memory operands:
Examples
Memory Address
Interpreting Memory Addressing
The address of a word matches the byte address of one of its 4 bytes
The addresses of sequential words differ by 4 (word size in byte)
words' addresses are multiple of 4 (alignment restriction)
Machines that use the address of the leftmost byte as the word address is
called "Big Endian" and those that use rightmost bytes called "Little Endian"
Misalignment complicates memory access and causes programs to run slower
(Some machines does not allow misaligned memory access at all)
Byte ordering can be a problem when exchanging data among different machines
Byte addresses affects array index calculation to account for word addressing and
offset within the word
Object
addressed
Aligned at
byte offsets
Misaligned at
byte offsets
Byte
1,2,3,4,5,6,7
Never
Half word
0,2,4,6
1,3,5,7
Word
0,4
1,2,3,5,6,7
Double word
1,2,3,4,5,6,7
Addressing Modes
Addressing modes refer to how to specify the location of an
operand (effective address)
Addressing modes have the ability to:
Significantly reduce instruction counts
Increase the average CPI
Increase the complexity of building a machine
The VAX machine is used for benchmark data since it supports
wide range of memory addressing modes
Famous addressing modes can be classified based on:
the source of the data into register, immediate or
memory
the address calculation into direct and indirect
An indexed addressing mode is usually provided to allow
efficient implementation of loops and array access
Example of Addressing Modes
Address. mode
Example
Register
ADD R4, R3
Immediate
Displacement
ADD R4, #3
ADD R4, 100 (R1)
Register indirect
ADD R4, (R1)
Indexed
ADD R4, (R1 + R2)
Direct or absolute
ADD R4, (1001)
Memory indirect or
memory deferred
Autoincrement
ADD R4, @(R3)
Auto decrement
Scaled
ADD R4, (R2) +
ADD R4, -(R2)
ADD R4, 100 (R2)
[R3]
Meaning
Regs[R4] = Regs[R4] +
Regs[R3]
Regs[R4] = Regs[R4] + 3
Regs[R4] = Regs[R4] +
Mem[ 100 + Regs[R1] ]
Regs[R4] = Regs[R4] +
Mem[Regs[R1] ]
Regs[R4] = Regs[R4] +
Mem[Regs[R1] +
Regs[R2]]
Regs[R4] = Regs[R4] +
Mem[ 1001 ]
Regs[R4] = Regs[R4] +
Mem[Mem[Regs[R3] ]]
Regs[R4] = Regs[R4] +
Mem[Regs[R2] ]
Regs[R2] = Regs[R2] + d
Regs[R2] = Regs[R2] d
Regs[R4] = Regs[R4] +
Mem[Regs[R2] ]
Regs[R4] = Regs[R4] +
Mem[100 + Regs[R2] +
Regs[R3] * d]
When used
When a value is in a register
For constants
Accessing local variables
Accessing using a pointer or a
computed address
Sometimes useful in array
addressing: R1 = base of the
array: R2 = index amount
Sometimes useful for accessing
static data; address constant
may need to be large
If R3 is the address of the
pointer p, then mode yields *p
Useful for stepping through
arrays within a loop. R2 points to
start of the array; each reference
increments R2 by d.
Same use as autoincrement.
Autodecrement/increment can
also act as push/pop to
implement a stack
Used to index arrays.
Addressing Mode for Signal Processing
DSP offers special addressing modes to better serve popular algorithms
Special features requires either hand coding or a compiler that uses such
features (GNU would not be a good choice)
Modulo addressing:
Since DSP deals with continuous data streams,
circular buffers are widely used
Fast Fourier Transform
Circular or modulo addressing allows automatic
increment and decrement and resets pointer
when reaching the end of the buffer
1 (0012) 4 (1002)
Reverse addressing:
3 (0112) 6 (1102)
Resulting address is the reverse order of the
current address
4 (1002) 1 (0012)
Reverse addressing mode expedites the
access, which other wise requires a number of
logical instructions or extra memory access
0 (0002) 0 (0002)
2 (0102) 2 (0102)
5 (1012) 5 (1012)
6 (1102) 3 (0112)
7 (1112) 7 (1112)
Operations of the Computer Hardware
There must certainly be instructions for performing the
fundamental arithmetic operations.
Burkes, Goldstine and Von Neumann, 1947
Assembly language is a symbolic representation of what the processor actually understand
MIPS assembler allows only one instructions/line and ignore comments following # until end of line
Example:
Translation of a segment of a C program to MIPS assembly
instructions:
C:
f = (g + h) - (i + j)
MIPS:
add
add
sub
t0, g, h
t1, i, j
f, t0, t1
# temp. variable t0 contains "g + h"
# temp. variable t1 contains "i + j"
# f = t0 - t1 = (g + h) - (i + j)
Operations in the Instruction Set
Operator type
Arithmetic and logical
Data Transfer
Control
System
Floating point
Decimal
String
Graphics
Examples
Integer arithmetic and logical operations: add, and, subtract , or
Loads-stores (move instructions on machines with memory addressing)
Branch, jump, procedure call and return, trap
Operating system call, Virtual memory management instructions
Floating point instructions: add, multiply
Decimal add, decimal multiply, decimal to character conversion
String move, string compare, string search
Pixel operations, compression/decompression operations
Arithmetic, logical, data transfer and control are almost standard categories
for all machines
System instructions are required for multi-programming environments
although support for system functions varies
Decimal and string instructions can be primitives, e.g. IBM 360 and the VAX
Support for floating point, decimal, string and graphics can be optionally
sometimes provided via co-processor
Some machines rely on the compiler to synthesize special operations such
as string handling from simpler instructions
Operations for Media & Signal Process.
Single instruction multiple data (SIMD) and vector instructions
are often supported in DSPs, which are commonly used in
multimedia and signal processing applications
Partitioned Add (integer)
Perform multiple 16-bit addition on a 64-bit ALU since most data are narrow
Increases ALU throughput for multimedia applications
Paired single operations (float)
Allow same register to be acting as two operands to the same operation
Handy in dealing with vertices and coordinates
Multiply and accumulate
Very handy for calculating dot products of vectors (signal processing) and
matrix multiplication
Frequency of Operations Usage
The most widely executed instructions are the simple operations of an
instruction set
The following is the average usage in SPECint92 on Intel 80x86
Rank
1
2
3
4
5
6
7
8
9
10
80x86 Instruction
Load
Conditional branch
Compare
Store
Add
And
Sub
Move register-register
Call
Return
Total
Integer Average
(% total executed)
22%
20%
16%
12%
8%
6%
5%
4%
1%
1%
96%
Make the common case fast by focusing on these operations
Control Flow Instructions
Data is based on SPEC2000 on Alpha
jump for unconditional change in the control flow
branch for conditional change in the control flow
Procedure calls and returns
Destination Address Definition
Data is based SPEC2000 on Alpha
Relative addressing w.r.t. the program counter proved to be the best choice
for forward and backward branching or jumps (load address independent)
To allow for dynamic loading of library routines, register indirect address
allows addresses to be loaded in special registers
(e.g. virtual functions in C++ and system calls in a case statement)
Condition Evaluation
Compare&branch can be efficient if majority
of conditions are comparison with zero
Based on SPEC92 on MIPS
Remember to focus
on the common case
Frequency of Types of Comparison
Data is based on SPEC2000 on Alpha
Different benchmark and machine set new design
priority
DSPs support repeat instruction for for loops (vectors) using 3 registers
Supporting Procedures
Execution of a procedure follows the following steps:
Store parameters in a place accessible to the procedure
Transfer control to the procedure
Acquire the storage resources needed for the procedure
Perform the desired task
Store the results value in a place accessible to the calling program
Return control to the point of origin
The hardware provides a program counter to trace instruction flow and
manage transfer of control
Parameter Passing
Registers can be used for passing small number of parameters
A stack is used to spill registers of the current context and make room for
the called procedure to run and to allow for large parameters to be passed
Storage of machine state can be performed by caller or callee
Handling of shared variables is important to ensure correct semantics and
thus requires clear specifications in the library interface
Global variables stored in registers need careful handling
Type and Size of Operands
The type of an operand is designated by encoding it in the instructions
operation code
The type of an operand, e.g. single precision float, effectively gives its size
Common operand types include character, half word and word size integer,
single- and double-precision floating point
Characters are almost always in ASCII and integers are in 2s complement
and floating point in IEEE 754
The 16-bit Unicode, used in Java is gaining popularity due its support for
the international character sets
For business applications, some architecture support a decimal format in
binary coded decimal (BCD)
Depending on the size of the word, the complexity of handling different
operand types differs
DSP offers fixed point data types to support high precision floating point
arithmetic and to allow sharing single exponent for multiple numbers
For Graphics applications vertex and pixel operands are added features
Size of Operands
Frequency of reference by size based on SPEC2000 on Alpha
Double-word data type is used for double-precision floating point operations
and address storage in machines with a 64-bit wide address bus
Words are used for integer operations and for 32-bit address bus machines
Because the mix in SPEC, word and double-word data types dominates
Instruction Representation
Humans are taught to think in base 10 (decimal) but numbers may be
represented in any base (123 in base 10 = 1111011 in binary or base 2)
Numbers are stored in computers as a series of high and low electronic
signals (binary numbers)
Binary digits are called bits and considered the atom of computing
Each piece of an instruction is a number and placing these numbers
together forms the instruction
Assembler translate the assembly symbolic instructions into machine
language instructions (machine code)
Example:
Assembly:
M/C language (decimal):
M/C language (binary):
add $t0, $s1, $s2
0
17
18
32
000000
10001
10010
01000
00000
100000
6 b its
5 b its
5 b its
5 b its
5 b its
6 b its
Note: MIPS compiler by default maps $s0,,$s7 to reg. 16-23 and $t0,,$t7 to reg. 8-15
Encoding an Instruction Set
Instruction encoding affects the size of the compiled program and the
complexity of the CPU implementation
The operation is typically specified in one field called opcode
The addressing mode for the operand can be encoded with the operation
or specified through a separate identifier in case of large number of
supported modes
The architecture must balance between several competing factors:
1. Desire to support as many registers and addressing modes as possible
2. Effect of operand specification on the size of the instruction (program)
3. Desire to simplify instruction fetching and decoding during execution
Fixed size instruction encoding simplify the CPU design while limiting the
addressing modes supported
An architect caring about the code size can use variable size encoding
A hybrid approach is to allow variability by supporting multiple-sized
instruction
Encoding Examples
MIPS Instruction format
Register-format instructions:
op:
rs:
rt:
rd:
shmat:
funct:
op
rs
rt
rd
sham t
fu n c t
6 b its
5 b it s
5 b its
5 b it s
5 b its
6 b it s
Basic operation of the instruction, traditionally called opcode
The first register source operand
The second register source operand
The register destination operand, it gets the result of the operation
Shift amount
This field selects the specific variant of the operation of the op field
Immediate-type instructions:
op
rs
rt
a d d re s s
6 b its
5 b its
5 b it s
1 6 b its
Some instructions need longer fields than provided for large value constant
The 16-bit address means a load word instruction can load a word within a
region of 215 bytes of the address in the base register
Example:
lw
$t0, 32($s3)
# Temporary register $t0 gets A[8]
Instruction
add
sub
lw
sw
Format
R
R
I
I
op
0
0
35
43
rs
reg
reg
reg
reg
rt
reg
reg
reg
reg
rd
reg
reg
N/A
N/A
shamt
0
0
N/A
N/A
funct
32
34
N/A
N/A
address
N/A
N/A
address
address
The Stored Program Concept
Learning how instructions are represented leads to discovering
the secret of computing: the stored-program concept
Todays computers are build on two key principles :
Instructions are represented as numbers
Programs can be stored in memory to be
read or written just like numbers
The power of the concept:
memory can contain:
the source code for an editor
A c c o u n t in g p r o g r a m
( m a c h in e c o d e )
E d it o r p r o g r a m
( m a c h in e c o d e )
P ro c e s s o r
the compiled m/c code for the editor
the text that the compiled program is using
the compiler that generated the code
M e m o ry
C c o m p i le r
( m a c h in e c o d e )
P a y r o ll d a ta
B o o k te x t
S o u r c e c o d e in C
fo r e d ito r p ro g r a m
Compiling if-then-else in MIPS
Assuming the five variables f, g, h, i,
and j correspond to the five registers
$s0 through $s4, what is the compiler
MIPS code for the following C if
statement:
i= j
i = = j?
E ls e :
f= g + h
f= g h
if (i == j) f = g + h; else f = g - h;
MIPS:
E x it :
Else:
Exit:
bne
$s3, $s4, Else
add
$s0, $s1, $s2
Exit
sub
$s0, $s1, $s2
i j
# go to Else if i j
# f = g + h (skipped if i j)
# f = g - h (skipped if i = j)
Typical Compilation
Major Types of Optimization
Optimization Name
Explanation
Frequency
High level
At or near source level; machine indep.
Procedure integration
Replace procedure call by procedure body
Local
Within straight line code
Common sub- expression
elimination
Replace two instances of the same computation by
single copy
Constant propagation
Replace all instances of a variable that is assigned a
constant with the constant
Stack height reduction
Rearrange expression tree to minimize resources
needed for expression evaluation
Global
Across a branch
Global common sub
expression elimination
Same as local, but this version crosses branches
13%
Copy propagation
Replace all instances of a variable A that has been
assigned X (i.e., A = X) with X
11%
Code motion
Remove code from a loop that computes same value
each iteration of the loop
16%
Induction variable
Simplify/eliminate array addressing calculations
within loops
elimination
Machine-dependant
Depends on machine knowledge
Strength reduction
Many examples, such as replace multiply by a
constant with adds and shifts
Pipeline Scheduling
Reorder instructions to improve pipeline performance
N.M
18%
22%
N.M
2%
N.M
N.M.
Program and Compiler Optimization Level
Effect of Complier Optimization
Measurements taken on MIPS
Level 0: non-optimized code
Level 2: global optimization, s/w pipelining
Level 1: local optimization
Level 3: adds procedure integration
Compiler Support for Multimedia Instr.
Intels MMX and PowerPC AltiVec have small vector processing capabilities
targeting Multimedia applications (to speed up graphics)
Intel added new set of instructions, called Streaming SIMD Extension
A major advantage of vector computers is hiding latency of memory access
by loading multiple elements and then overlapping execution with data
transfer
Vector computers typically have strided and/or gather/scatter addressing to
perform operations on distant memory locations
Strided addressing allows memory access in increment larger than one
Gather/scatter addressing is similar to register indirect mode where the
address are stored instead of the data
Supporting vector operation without strided addressing, such as Intels MMX
limits the potential speedup
Such limited support for vector processing makes the use of vectorizing
compiler optimization unpopular and restrict its scope to hand coded routines
SIMD instructions on MMX and AltiVec tend to be solutions not primitives
Starting a Program
C p ro g ra m
C o m p il e r
A s s e m b ly la n g u a g e p r o g r a m
A s s e m b le r
Object files for Unix typically contains:
Header: size & position of components
Text segment: machine code
Data segment: static and dynamic variables
Relocation info: identify absolute memory ref.
Symbol table: name & location of labels,
procedures and variables
Debugging info: mapping source to object
code, break points, etc.
O b j e c t : M a c h in e la n g u a g e m o d u le
O b je c t : L ib r a r y r o u t in e ( m a c h in e la n g u a g e )
Linker
L in k e r
- Place code & data modules
symbolically in memory
-Determine the address of data &
instruction labels
-Patch both internal & external ref.
E x e c u t a b le : M a c h in e la n g u a g e p r o g r a m
L oa de r
M e m o ry
Loading Executable Program
$sp
7 ff f ff ff
hex
S ta c k
To load an executable, the operating system
follows these steps:
Reads the executable file header to
determine the size of text and data segments
Creates an address space large enough for
the text and data
Copies the instructions and data from the
executable file into memory
D y n a m ic d a t a
$gp
1000 8000
hex
1000 0000
pc
S t a tic d a ta
Initializes the machine registers and sets the
stack pointer to the first free location
Text
Jumps to a start-up routines that copies the
parameters into the argument registers and
calls the main routine of the program
hex
0040 0000
hex
R eserved
0
Copies the parameters (if any) to the main
program onto the stack
Effect of ISA on Compiler Complexity
Promote regularity
Limit # register formats and variability of # operands in the instruction
Avoid fixing a class of registers to some operations
Separate addressing modes from operations to allow consistent handling
Provide primitives, not solutions
Avoid being influenced by features of a particular high level programming
Language (VAX provide call instruction with automatic context saving and does not
allow for per-call optimization)
Allow enhancing the handling of common features
Simplify trade-offs among alternatives
Simplify the analysis of special features such as cache and pipeline
Allow of simultaneous activities to promote optimization
Favor static binding
Avoid dynamically changing, at run time, values set at compile time to
allow for optimization (e.g. changing the mask that indicates registers to be
saved by the caller can diminish its value in case of separate compilation)
Instruction Set Design Issues
6 Instruction Set Design Issues
1. Number of Addresses
2. Flow of Control
3. Operand Types
4. Addressing Modes
5. Instruction Types
6. Instruction Formats
Number of Addresses
Four categories
3-address machines
- 2 for the source operands and one for the result
2-address machines
- One address doubles as source and result
1-address machine
- Accumulator machines
- Accumulator is used for one source and result
0-address machines
- Stack machines
- Operands are taken from the stack
- Result goes onto the stack
Number of Addresses (cont.)
Three-address machines
Two for the source operands, one for the result
RISC processors use three addresses
Sample instructions
add
dest,src1,src2
; M(dest)=[src1]+[src2]
sub
dest,src1,src2
; M(dest)=[src1]-[src2]
mult
dest,src1,src2
; M(dest)=[src1]*[src2]
Three addresses:
Operand 1, Operand 2, Result
Example: a = b + c
Three-address instruction formats are not common, because they require a
relatively long instruction format to hold the three address references.
Number of Addresses (cont.)
Example
C statement
A=B+C*DE+F+A
Equivalent code:
mult
T,C,D
;T = C*D
add
T,T,B
;T = B+C*D
sub
T,T,E
;T = B+C*D-E
add
T,T,F
;T = B+C*D-E+F
add
A,T,A
;A = B+C*D-E+F+A
Number of Addresses (cont.)
Two-address machines
One address doubles (for source operand & result)
Last example makes a case for it
- Address T is used twice
Sample instructions
load
dest,src ; M(dest)=[src]
add
dest,src ; M(dest)=[dest]+[src]
sub
dest,src ; M(dest)=[dest]-[src]
mult
dest,src ; M(dest)=[dest]*[src]
Two Addresses:
One address doubles as operand and result
Example: a = a + b
The two-address formal reduces the space requirement but also
introduces some awkwardness. To avoid altering the value of an
operand, a MOVE instruction is used to move one of the values to a
result or temporary location before performing the operation.
Number of Addresses (cont.)
Example
C statement
A=B+C*DE+F+A
Equivalent code:
load
T,C
;T = C
mult
T,D
;T = C*D
add
T,B
;T = B+C*D
sub
T,E
;T = B+C*D-E
add
T,F
;T = B+C*D-E+F
add
A,T
;A = B+C*D-E+F+A
Number of Addresses (cont.)
One-address machines
Use special set of registers called accumulators
- Specify one source operand & receive the result
Called accumulator machines
Sample instructions
load
addr ; accum = [addr]
store
addr ; M[addr] = accum
add
addr ; accum = accum + [addr]
sub
addr ; accum = accum - [addr]
mult
addr ; accum = accum * [addr]
Number of Addresses (cont.)
Example
C statement
A=B+C*DE+F+A
Equivalent code:
load
;load C into accum
mult
;accum = C*D
add
;accum = C*D+B
sub
;accum = B+C*D-E
add
;accum = B+C*D-E+F
add
;accum = B+C*D-E+F+A
store
;store accum contents in A
Number of Addresses (cont.)
Zero-address machines
Stack supplies operands and receives the result
- Special instructions to load and store use an address
Called stack machines (Ex: HP3000, Burroughs B5500)
Sample instructions
push
addr ; push([addr])
pop
addr ; pop([addr])
add
; push(pop + pop)
sub
; push(pop - pop)
mult
; push(pop * pop)
Number of Addresses (cont.)
Example
C statement
A=B+C*DE+F+A
Equivalent code:
push
sub
push
push
push
add
Mult
push
add
push
B
F
A
add
pop
Load/Store Architecture
Instructions expect operands in internal processor registers
Special LOAD and STORE instructions move data between registers
and memory
RISC uses this architecture
Reduces instruction length
63
Load/Store Architecture (cont.)
Sample instructions
load
Rd,addr
;Rd = [addr]
store
addr,Rs
;(addr) = Rs
add
Rd,Rs1,Rs2 ;Rd = Rs1 + Rs2
sub
Rd,Rs1,Rs2 ;Rd = Rs1 - Rs2
mult
Rd,Rs1,Rs2 ;Rd = Rs1 * Rs2
Number of Addresses (cont.)
Example
C statement
A = B + C * D E + F + A
Equivalent code:
load
R1,B
mult
R2,R2,R3
load
R2,C
add
R2,R2,R1
load
R3,D
sub
R2,R2,R4
load
R4,E
add
R2,R2,R5
load
R5,F
add
R2,R2,R6
load
R6,A
store
A,R2
2: Flow of Control
Default is sequential flow
Several instructions alter this default
execution
Branches
- Unconditional
- Conditional
- Delayed branches
Procedure calls
- Delayed procedure calls
Flow of Control (cont.)
Branches
Unconditional
- Absolute address
- PC-relative
Target address is specified relative to PC contents
Relocatable code
Example: MIPS
- Absolute address
j target
- PC-relative
b target
Flow of Control (cont.)
e.g., Pentium
e.g., SPARC
Flow of Control (cont.)
Branches
Conditional
- Jump is taken only if the condition is met
Two types
- Set-Then-Jump
Condition testing is separated from branching
Condition code registers are used to convey the condition test
result
Condition code registers keep a record of the status of the last
ALU operation such as overflow condition
- Example: Pentium code
cmp
AX,BX
je
target
; compare AX and BX
; jump if equal
Flow of Control (cont.)
- Test-and-Jump
Single instruction performs condition testing and branching
- Example: MIPS instruction
beq
Rsrc1,Rsrc2,target
Jumps to target if Rsrc1 = Rsrc2
Delayed branching
Control is transferred after executing the instruction that
follows the branch instruction
- This instruction slot is called delay slot
Improves efficiency
Highly pipelined RISC processors support
Flow of Control (cont.)
Procedure calls
Facilitate modular programming
Require two pieces of information to return
- End of procedure
Pentium
uses ret instruction
MIPS
uses jr instruction
- Return address
In a (special) register
MIPS allows any general-purpose register
On the stack
Pentium
Flow of Control (cont.)
Flow of Control (cont.)
Delay slot
Parameter Passing
Two basic techniques
Register-based (e.g., PowerPC, MIPS)
- Internal registers are used
Faster
Limit the number of parameters
Recursive procedure
Stack-based (e.g., Pentium)
- Stack is used
More general
3: Operand Types
Instructions support basic data types
Characters
Integers
Floating-point
Instruction overload
Same instruction for different data types
Example: Pentium
mov
AL,address
;loads an 8-bit value
mov
AX,address
;loads a 16-bit value
mov
EAX,address
;loads a 32-bit value
Operand Types
Separate instructions
Instructions specify the operand size
Example: MIPS
lb
Rdest,address
;loads a byte
lh
Rdest,address
;loads a halfword
;(16 bits)
lw
Rdest,address
;loads a word
;(32 bits)
ld
Rdest,address
Similar instruction: store
;loads a doubleword
;(64 bits)
4: Addressing Modes
How the operands are specified
Operands can be in three places
- Registers
Register addressing mode
- Part of instruction
Constant
Immediate addressing mode
All processors support these two addressing modes
- Memory
Difference between RISC and CISC
CISC supports a large variety of addressing modes
RISC follows load/store architecture
5: Instruction Types
Several types of instructions
Data movement
- Pentium:
mov
dest,src
- Some do not provide direct data movement instructions
- Indirect data movement
add
Rdest,Rsrc,0 ;Rdest = Rsrc+0
Arithmetic and Logical
- Arithmetic
Integer and floating-point, signed and unsigned
add, subtract, multiply, divide
- Logical
and, or, not, xor
Instruction Types (cont.)
Condition code bits
S: Sign bit (0 = +, 1= -)
Z: Zero bit (0 = nonzero, 1 = zero)
O: Overflow bit (0 = no overflow, 1 = overflow)
C: Carry bit (0 = no carry, 1 = carry)
Example: Pentium
cmp
count,25
;compare count to 25
;subtract 25 from count
je
target
;jump if equal
Instruction Types (cont.)
Flow control and I/O instructions
- Branch
- Procedure call
- Interrupts
I/O instructions
- Memory-mapped I/O
Most processors support memory-mapped I/O
No separate instructions for I/O
- Isolated I/O
Pentium supports isolated I/O
Separate I/O instructions
in
AX,io_port ;read from an I/O port
out
io_port,AX ;write to an I/O port
6: Instruction Formats
Two types
Fixed-length
- Used by RISC processors
- 32-bit RISC processors use 32-bits wide instructions
Examples: SPARC, MIPS, PowerPC
Variable-length
- Used by CISC processors
- Memory operands need more bits to specify
Opcode
Major and exact operation
Examples of Instruction Formats
RISC
(Reduced Instruction Set Computer )
versus
CISC (Complex Instruction Set Computer)
RISC Vs CISC
The underlying philosophy of RISC machines is that a
system is better able to manage program execution
when the program consists of only a few different
instructions that are the same length and require the
same number of clock cycles to decode and execute.
RISC systems access memory only with explicit load
and store instructions.
In CISC systems, many different kinds of instructions
access memory, making instruction length variable
and fetch-decode-execute time unpredictable.
84
RISC Vs CISC
The difference between CISC and RISC becomes
evident through the basic computer performance
equation:
RISC systems shorten execution time by reducing
the clock cycles per instruction.
CISC systems improve performance by reducing the
number of instructions per program.
85
RISC Vs CISC
The simple instruction set of RISC machines
enables control units to be hardwired for maximum
speed.
The more complex-- and variable-- instruction set of
CISC machines requires microcode-based control
units that interpret instructions as they are fetched
from memory. This translation takes time.
With fixed-length instructions, RISC lends itself to
pipelining and speculative execution.
86
RISC Vs CISC
Consider the the program fragments:
CISC
mov ax, 10
mov bx, 5
mul bx, ax
RISC
Begin
mov ax, 0
mov bx, 10
mov cx, 5
add ax, bx
loop Begin
The total clock cycles for the CISC version might be:
(2 movs 1 cycle) + (1 mul 30 cycles) = 32 cycles
While the clock cycles for the RISC version is:
(3 movs 1 cycle) + (5 adds 1 cycle) + (5 loops 1
cycle) = 13 cycles
With RISC clock cycle being shorter, RISC gives us
much faster execution speeds.
87
RISC Vs CISC
Because of their load-store ISAs, RISC architectures
require a large number of CPU registers.
These register provide fast access to data during
sequential program execution.
They can also be employed to reduce the overhead
typically caused by passing parameters to
subprograms.
Instead of pulling parameters off of a stack, the
subprogram is directed to use a subset of registers.
88
RISC Vs CISC
This is how
registers can
be overlapped
in a RISC
system.
The current
window pointer
(CWP) points
to the active
register
window.
89
RISC Vs CISC
It is becoming increasingly difficult to distinguish
RISC architectures from CISC architectures.
Some RISC systems provide more extravagant
instruction sets than some CISC systems.
Some systems combine both approaches.
The following two slides summarize the
characteristics that traditionally typify the differences
between these two architectures.
90
RISC Vs CISC
RISC
Multiple register sets.
Three operands per
instruction.
Parameter passing
through register
windows.
Single-cycle
instructions.
Hardwired
control.
Continued....
Highly pipelined.
CISC
Single register set.
One or two register
operands per
instruction.
Parameter passing
through memory.
Multiple cycle
instructions.
Microprogrammed
control.
Less pipelined.
91
RISC Vs CISC
RISC
Simple instructions,
few in number.
Fixed length
instructions.
Complexity in
compiler.
Only LOAD/STORE
instructions access
memory.
Few addressing modes.
CISC
Many complex
instructions.
Variable length
instructions.
Complexity in
microcode.
Many instructions can
access memory.
Many addressing
modes.
92
RISC Vs CISC
Summary
Instruction Set Design Issues
Instruction set design issues include:
Where are operands stored?
- registers, memory, stack, accumulator
How many explicit operands are there?
- 0, 1, 2, or 3
How is the operand location specified?
- register, immediate, indirect, . . .
What type & size of operands are supported?
- byte, int, float, double, string, vector. . .
What operations are supported?
- add, sub, mul, move, compare . . .
More About General Purpose Registers
Why do almost all new architectures use
GPRs?
Registers are much faster than memory (even
cache)
- Register values are available immediately
- When memory isnt ready, processor must wait
(stall)
Registers are convenient for variable storage
- Compiler assigns some variables just to registers
- More compact code since small fields specify
registers
(compared to memory addresses)
Processor
Registers
Memory
Cache
Disk
What Operations are Needed
Arithmetic + Logical
Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT
Logical operation: AND, OR, XOR, NOT
Data Transfer - copy, load, store
Control - branch, jump, call, return
Floating Point (ADDF, MULF, DIVF )
Same as arithmetic but usually take bigger operands
Decimal - ADDD, CONVERT
String - move, compare, search
Graphics pixel and vertex, compression/decompression operations
97
Stacks Architecture: Pros and Cons
Pros
Good code density (implicit top of stack)
Low hardware requirements
Easy to write a simpler compiler for stack architectures
Cons
Stack becomes the bottleneck
Little ability for parallelism or pipelining
Data is not always at the top of stack when need, so additional
instructions like TOP and SWAP are needed
Difficult to write an optimizing compiler for stack architectures
Accumulators Architecture : Pros and Cons
Pros
Very low hardware requirements
Easy to design and understand
Cons
Accumulator becomes the bottleneck
Little ability for parallelism or pipelining
High memory traffic
Memory-Memory Architecture : Pros and Cons
Pros
Requires fewer instructions (especially if 3 operands)
Easy to write compilers for (especially if 3 operands)
Cons
Very high memory traffic (especially if 3 operands)
Variable number of clocks per instruction
With two operands, more data movements are required
Memory-Register Architecture : Pros and Cons
Pros
Some data can be accessed without loading first
Instruction format easy to encode
Good code density
Cons
Operands are not equivalent (poor orthogonal)
Variable number of clocks per instruction
May limit number of registers