ICS 122 The Processor Design Version 2
ICS 122 The Processor Design Version 2
• Module 1: Explains that the performance of a computer is determined by three key factors:
instruction count, clock cycle time, and clock cycles per instruction (CPI).
• In addition, explains that the compiler and the instruction set architecture determine the instruction
count required for a given program.
• Module 2: Explains the Numerical precision, in the context of number representations with MIPS
number of instruction count.
• However, the implementation of the processor determines both the clock cycle time and the
number of clock cycles per instruction.
• In this Module, we construct the datapath and control unit for two different implementations of
the MIPS instruction set.
Contd.,
• For above stated instructions the first two steps are identical:
• Send the program counter (PC) to the memory that contains the code and fetch the
instruction from that memory.
• Read one or two registers, using fields of the instruction to select the registers to read.
For the load word instruction, we need to read only one register, but most other
instructions require reading two registers.
• After these two steps, the actions required to complete the instruction depend on the
instruction class (previous slide).
Designing a Processor: Step-by-Step
1. Analyze MIPS ISA => major datapath components to execute each class of
MIPS instructions.
1. The meaning of each instruction is given by the register transfers
2. Datapath must include storage elements for ISA registers
3. Datapath must support each register transfer
Op6 address26
• Concepts used to implement the MIPS subset are used to construct a broad spectrum of
computers
Details of the MIPS Subset
Instruction Meaning Format
add rd, rs, rt addition op6 = 0 rs5 rt5 rd5 0 0x20
sub rd, rs, rt subtraction op6 = 0 rs5 rt5 rd5 0 0x22
and rd, rs, rt bitwise and op6 = 0 rs5 rt5 rd5 0 0x24
or rd, rs, rt bitwise or op6 = 0 rs5 rt5 rd5 0 0x25
xor rd, rs, rt exclusive or op6 = 0 rs5 rt5 rd5 0 0x26
slt rd, rs, rt set on less than op6 = 0 rs5 rt5 rd5 0 0x2a
addi rt, rs, imm16 add immediate 0x08 rs5 rt5 imm16
slti rt, rs, imm16 slt immediate 0x0a rs5 rt5 imm16
andi rt, rs, imm16 and immediate 0x0c rs5 rt5 imm16
ori rt, rs, imm16 or immediate 0x0d rs5 rt5 imm16
xori rt, imm16 xor immediate 0x0e rs5 rt5 imm16
lw rt, imm16(rs) load word 0x23 rs5 rt5 imm16
sw rt, imm16(rs) store word 0x2b rs5 rt5 imm16
beq rs, rt, offset16 branch if equal 0x04 rs5 rt5 offset16
bne rs, rt, offset16 branch not equal 0x05 rs5 rt5 offset16
j address26 jump 0x02 address26
Register Transfer Level (RTL)
• RTL is a description of data flow between registers, it gives a meaning to the instructions.
• In detail, it shows a way to specify the behavior of instructions at a low level, detailing how data is
transferred between registers, memory, and the Program Counter (PC).
• All instructions are fetched from memory at address PC
• Instruction RTL Description
ADD Reg(rd) ← Reg(rs) + Reg(rt); PC ← PC + 4
SUB Reg(rd) ← Reg(rs) – Reg(rt); PC ← PC + 4
ORI Reg(rt) ← Reg(rs) | zero_ext(imm16); PC ← PC + 4
LW Reg(rt) ← MEM[Reg(rs) + sign_ext(imm16)]; PC ← PC + 4
SW MEM[Reg(rs) + sign_ext(imm16)] ← Reg(rt); PC ← PC + 4
BEQ if (Reg(rs) == Reg(rt))
PC ← PC + 4 + 4 × sign_ext(offset16)
else PC ← PC + 4
Instruction Fetch/Execute Summary of the Instruction Cycle
1. Fetch: Get the instruction from memory.
R-type Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(rs), data2 ← Reg(rt) 2. Decode: Identify the operation and fetch operands
from registers.
Execute operation: ALU_result ← func(data1, data2)
3. Execute: Perform the operation using the ALU.
Write ALU result: Reg(rd) ← ALU_result
Next PC address: PC ← PC + 4 4. Writeback: Save the result to a register.
5. Update PC: Move to the next instruction.
I-type Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(rs), data2 ← Extend(imm16) immediate value is typically 16 bits, it is sign-
extended to 32 bits to match the size of the
Execute operation: ALU_result ← op(data1, data2)
operands.
Write ALU result: Reg(rt) ← ALU_result
Next PC address: PC ← PC + 4
• PC target address or PC + 4
• Adder
A Basic MIPS Implementation
An abstract view of the implementation of the MIPS subset showing the major functional units and the
major connections between them.
Multiplexers
◼ Can’t just join wires together
◼ Use multiplexers
A Basic MIPS Implementation …Contd.,
The basic implementation of the MIPS subset, including the necessary multiplexors and control lines.
Building a Datapath
• Datapath
• Elements that process data and addresses in the CPU
• Registers, ALUs, mux’s, memories, …
• ALU, Adder 16 32 m
0
A 32
zero
select
• Multiplexers
ALUOp
PC
Address Address
32
32 Data_out
Instruction
• Data memory clk Memory
Data_in
• PC register clk
Mem Mem
Registers Read Write
• Register file
5 32
RA BusA
5 32
• Clocking methodology
RB BusB
5
RW
BusW
• Timing of writes clk
RegWrite 32
Register Element
• Register Data_In
RB
• RB selects register to be read on BusB
32
5 BusB
RW
RW . WE R2
.
5 . 32
.
.
32 32
. 32
BusW BusA
WE R31
RegWrite 32
32
Clock BusB
Tri-State Buffers
• Allow multiple sources to drive a single bus
• Two Inputs: Enable
• Data_in
• Enable (to enable output) Data_in Data_out
• One Output: Data_out
• If (Enable) Data_out = Data_in
else Data_out = High Impedance state (output is disconnected)
Data_0
• Tri-state buffers can be
Output
used to build multiplexors
Data_1
Select
Building a Multifunction ALU
2
Shift/Rotate
SLL = 00
Operation
SLT: ALU does a SUB
SRL = 01 Shift Amount 5
and check the sign
SRA = 10 and overflow
Shifter
ROR = 11
32
c0 0
A 32 ALU Result
sign
A ≠ 1 32
d
d 2
B 32
Arithmetic
Operation
32 e
r 3
ADD = 0
SUB = 1 2
overflow zero
0 ALU
Logic Unit
Selection
1
2 Shift = 00
Operation
AND = 00 SLT = 01
Logical
3
OR = 01 Arith = 10
NOR = 10 2 Logic = 11
XOR = 11
Details of the Shifter
• Implemented with multiplexers and wiring
• Shift Operation can be: SLL (Shift Left Logical), SRL (Shift Right Logical), SRA (Shift Right Arithmetic),
or ROR (Rotate Right)
• Input Data is extended to 63 bits according to Shift Op
• The 63 bits are shifted right according to S4S3S2S1S0
5
sa
SLL S4 S3 S2 S1 S0
16 8 4 2 1
0 0 0 0 0
Data_out
split split split split split
Extender
31 31 31 31 31
Data
63 16 47 8 39 4 35 2 33 1 32
mux
mux
mux
mux
mux
32
31 31 31 31 31
16 8 4 2 1
31 31 31 31 31
1 1 1 1 1
16 8 4 2 1
2
Shift Shift Right Shift Right Shift Right Shift Right Shift Right
op 0 or 16 bits 0 or 8 bits 0 or 4 bits 0 or 2 bits 0 or 1 bit
More on Shifters
• SRA → 1110 (binary, representing -2 in two's complement) shifted right by 1 bit → 1111 (binary, preserving
the negative value).
• ROR → 1010 (binary) rotated right by 1 bit → 0101 (binary, and the rightmost bit 0 wraps around to the
leftmost position).
Details of the Shifter – cont’d
0_31 means 31 zeros,
• Input data is extended from 32 to 63 bits as follows: data[31:0] original 32-bit data.
• If shift op = SRL then ext_data[62:0] = 0_31 || data[31:0]
• If shift op = SRA then ext_data[62:0] = data[31]31 || data[31:0]
• If shift op = ROR then ext_data[62:0] = data[30:0] || data[31:0]
• If shift op = SLL then ext_data[62:0] = data[31:0] || 0_31
Register 2
Register 1
clock
Register 2
register
Combinational logic • Tmax_comb: longest delay through
combinational logic
clock
• Ts : setup time that input to a register
writing edge
must be stable before arrival of clock
edge
Tclk-q Tmax_comb Ts Th • Th: hold time that input to a register
must hold after arrival of clock edge
• Clock skew arises because the clock signal uses different paths with slightly
different delays to reach state elements.
• Clock skew is the difference in absolute time between when two storage
elements see a clock edge.
A
d of the PC are ‘00’ since Improved
32
32
d PC is a multiple of 4 30
+1 Datapath
30
32 32
00
00
Instruction Instruction
32
Datapath does not 32
PC
Address Address
PC
Instruction
handle branch or Instruction
clk Memory jump instructions clk Memory
R-Format Instructions
RegWrite
ALU Operation
30
+1
Instruction Registers 32
Memory Rs 5
30 32 RA BusA A 32
00
Instruction Rt 5 L
32 RB
PC
Address BusB 32 U
Rd 5
RW ALU result
BusW
clk
Rs and Rt fields select two BusA & BusB provide data input to ALU.
registers to read. Rd field ALU result is connected to BusW
selects register to write
Same clock updates PC and Rd register
• Control signals
• ALU Operation is the ALU operation as defined in the funct field for R-type
• RegWrite is used to enable the writing of the ALU result
Datapath for I-type ALU Instructions
Op6 Rs5 Rt5 immediate16
Reg Write
ALU Operation
30
+1
Instruction Registers 32
Memory Rs 5
30 32 RA BusA A 32
00
Instruction 5
32 L
32 RB
Address BusB 32 U
PC
Rt 5
RW ALU result
BusW
00
Instruction Rt 5 32 L Another mux
32 RB
Address
BusB 0 U
selects 2nd ALU
PC
0
RW 1
Rd
1 BusW
32
input as either data
clk RegDst ExtOp ALUSrc on BusB or the
ALU result extended
Extender immediate
Imm16
❖ Control signals
ALUOp is derived from either the Op or the funct field
RegWr enables the writing of the ALU result
ExtOp controls the extension of the 16-bit immediate
RegDst selects the register destination as either Rt or Rd
ALUSrc selects the 2nd ALU source as BusB or extended immediate
Controlling ALU Instructions
RegWr = 1
ALUOp
30
+1 For R-type ALU
Instruction Registers 32
Memory Rs 5 instructions, RegDst is ‘1’
30 RA BusA A
00
Instruction
32 32 to select Rd on RW and
Rt 5 32 L
32 RB BusB 0 U ALUSrc is ‘0’ to select
Address
PC
0
Rd RW 1 BusB as second ALU
1 BusW
input. The active part of
ExtOp ALUSrc = 0
clk RegDst = 1 datapath is shown in
ALU result green
Extender
Imm16
RegWr = 1
+1
ALUOp For I-type ALU
30
Instruction Registers 32 instructions, RegDst is ‘0’
Memory Rs 5
30 32 RA BusA A to select Rt on RW and
32
00
32
Instruction Rt 5
RB
32 L ALUSrc is ‘1’ to select
Address
BusB 0 U Extended immediate as
PC
0
Rd RW 1
1 BusW second ALU input. The
ExtOp ALUSrc = 1 active part of datapath is
clk RegDst = 0
32 ALU result
shown in green
Extender
Imm16
Details of the Extender
• Two types of extensions
• Zero-extension for unsigned constants
• Sign-extension for signed constants
• Control signal ExtOp indicates type of extension
• Extender Implementation: wiring and one AND gate
ExtOp = 0 Upper16 = 0
.. Upper
. 16 bits
ExtOp ExtOp = 1
Upper16 = sign bit
.. Lower
Imm16 . 16 bits
Load/Store Instructions
• Read register operands
• Calculate address using 16-bit offset
• Use ALU, but sign-extend offset
• Load: Read memory and update register
• Store: Write register value to memory
Load and Store Word
• Load Word Instruction (Word = 4 bytes in MIPS)
lw Rt, imm16(Rs) # Rt MEMORY[Rs+imm16]
• Store Word Instruction
sw Rt, imm16(Rs) # Rt ➔ MEMORY[Rs+imm16]
• Base or Displacement addressing is used
• Memory Address = Rs (base) + Immediate16 (displacement)
• Immediate16 is sign-extended to have a signed displacement
30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32
00
Instruction Rt 5 L Address
32 RB 32
Address
BusB 0 U Data_out 1
PC
0
Rd RW Data_in
1 BusW 1
32
RegDst Reg
Wr
clk
RegWr
ALUOp
30
+1
Instruction Registers 32
Memory Rs 5
30 32 RA BusA A 32
00
Instruction Rt 5 32 L
32 RB
Address
BusB 0 U
PC
0
RW 1
1 BusW
Rd 32
clk RegDst ExtOp ALUSrc
ALU result
Extender
Imm16
Adding Data Memory to Datapath
• A data memory is added for load and store instructions
ExtOp ALUOp MemRead MemWrite
Imm16 32 ALUSrc
E MemtoReg
ALU result
30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32
00
Instruction Rt 5 L Address
32 RB 32
Address
BusB 0 U Data_out 1
PC
0
Rd RW Data_in
1 BusW 1
32
RegDst
RegWrite
clk
ALU calculates data memory address A 3rd mux selects data on BusW as
either ALU result or memory data_out
❖ Additional Control signals
BusB is connected to Data_in of Data
MemRd for load instructions Memory for store instructions
30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32
00
Instruction Rt 5 L Address
32 RB 32
Address
BusB 0 U Data_out 1
PC
0
Rd RW Data_in
1 BusW 1
32
RegDst RegWrite
=0 =1
clk
MemRd = ‘1’ to read MemtoReg = ‘1’ places the data Clock edge updates PC
data memory read from memory on BusW and Register Rt
Controlling the Execution of Store
ExtOp = 1 ALUOp MemRd MemWr
= ADD =0 =1
ALUSrc
Imm16 32 MemtoReg
=1
E ALU result =X
30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32
00
Instruction Rt 5 L Address
32 RB 32
PC Address
BusB 0 U Data_out 1
0
Rd RW Data_in
1 BusW 1
32
RegDst
RegWrite
=X
clk =0
MemWr = ‘1’ to write MemtoReg = ‘X’ because don’t Clock edge updates PC
data memory care what data is put on BusW and Data Memory
R-Type/I-Type/Load/Store Datapath
Branch Instructions
beq Rs,Rt,label branch to label if (Rs == Rt)
bne Rs,Rt,label branch to label if (Rs != Rt)
Just
re-routes
wires
Sign-bit wire
replicated
Implementing Jumps
Jump 2 address
31:26 25:0
0
ExtOp New adder for computing branch
Next PC Address
target address
Imm16 +
E
+1 Zero ALU result
Memory A Memory
Registers L Address
Rt 0
Address U
PC
RB 1 Data_out 1
Instruction 0 BusB 0
Rd Data_in
1 RW
BusW
clk
0
ExtOp = X PCSrc = 1 (Jump Target)
Next PC Address
Imm16 +
E
+1 Zero = X ALU result
Memory A Memory
Registers L Address
Rt 0
Address U
PC
RB 1 Data_out 1
Instruction 0 BusB 0
Rd Data_in
1 RW
BusW
clk
Reg Reg ALU ALU Mem Mem WB
Op Dst Wr Src Op Rd Wr data
=J =X =0 =X =X =0 =0 =X
Memory A Memory
Registers L Address
Rt 0
Address U
PC
RB 1 Data_out 1
Instruction 0 BusB 0
Rd Data_in
1 RW
BusW
clk
Reg Reg ALU ALU Mem Mem WB
Op Dst Wr Src Op Rd Wr data
BEQ =X =0 =0 = SUB =0 =0 =X
Zero
Instruction
0 Memory
Datapath A
PC
1 Address L
32
2 Instruction U
WBdata
MemWr
MemRd
ALUSrc
RegDst
RegWr
ExtOp
Op6
PCSrc ALUOp
funct6
Zero
PC Main ALU
Op6
Control Control Control
Imm16 +
Ext
+1 ALU result
Zero
0 Memory A Memory
Registers L Address
Rt 0
Address U
PC
1 RB 1 Data_out
Instruction 1
2 0 BusB 0
Rd Data_in
1 RW
BusW
PCSrc
clk
ALUop
func
MemRd MemtoReg
RegDst RegWr
Op ALU
PC MemWr
Ctrl
Ctrl
ExtOp ALUSrc
Main
Zero
Control
The Main Control Unit
• Control signals derived from instruction
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
Second ALU operand is the value of register Second ALU operand is the value of the
ALUSrc
Rt that appears on BusB extended 16-bit immediate
R-type
XORI
ADDI
ANDI
BEQ
SLTI
BNE
ORI
SW
LW
J
ALUSrc = (R-type + BEQ + BNE)
Logic Equations
MemRd = LW
MemWr = SW
WBdata
MemWr
MemRd
ALUSrc
RegDst
RegWr
ExtOp
MemtoReg = LW
ALU Control Truth Table
R-type X 0 = Increment PC
BEQ 0 0 = Increment PC
BNE 1 0 = Increment PC
Decoder
Branch = (BEQ . Zero) + (BNE . Zero) BEQ BNE J
Zero
Branch = 1, Jump = 0 ➔ PCSrc = 2
Branch = 0, Jump = 1 ➔ PCSrc = 1
Branch = 0, Jump = 0 ➔ PCSrc = 0
Branch Jump
A Simple Implementation Scheme
• The three instruction classes (R-type, load and store, and branch) use two different
instruction formats
A Simple Implementation Scheme
Figure: The data path of Basic MIPS with all necessary multiplexors and all control lines identified
A Simple Implementation Scheme
• Four steps to execute the instruction; these steps are ordered by the flow of information:
The LWI Rt, Rd(Rs) instruction loads a word from memory at the address calculated by
adding the contents of Rd and Rs, and stores that word in the register Rt.