KEMBAR78
ICS 122 The Processor Design Version 2 | PDF | Central Processing Unit | Integrated Circuit
0% found this document useful (0 votes)
8 views71 pages

ICS 122 The Processor Design Version 2

The document outlines the design and implementation of a MIPS processor, focusing on key performance factors such as instruction count, clock cycle time, and clock cycles per instruction. It details the construction of the datapath and control unit for MIPS instruction set implementations, including a basic and a pipelined version. Additionally, it reviews MIPS instruction formats, execution cycles, and the components necessary for building a functional datapath.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views71 pages

ICS 122 The Processor Design Version 2

The document outlines the design and implementation of a MIPS processor, focusing on key performance factors such as instruction count, clock cycle time, and clock cycles per instruction. It details the construction of the datapath and control unit for MIPS instruction set implementations, including a basic and a pipelined version. Additionally, it reviews MIPS instruction formats, execution cycles, and the components necessary for building a functional datapath.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

ICS 122 CO & A

The Processor Design


The Intro

• Module 1: Explains that the performance of a computer is determined by three key factors:
instruction count, clock cycle time, and clock cycles per instruction (CPI).

• In addition, explains that the compiler and the instruction set architecture determine the instruction
count required for a given program.

• Module 2: Explains the Numerical precision, in the context of number representations with MIPS
number of instruction count.

• However, the implementation of the processor determines both the clock cycle time and the
number of clock cycles per instruction.

• In this Module, we construct the datapath and control unit for two different implementations of
the MIPS instruction set.
Contd.,

• Concentrate on the principles and techniques used in implementing a processor,


• Starting with a highly abstract and simplified overview
• Builds up a datapath and constructs a simple version of a processor sufficient to implement
an instruction set like MIPS.

• In addition, a more realistic pipelined MIPS implementation


A Basic MIPS Implementation
• Examining an implementation that includes a subset of the core MIPS instruction set:
1. The memory-reference instructions load word (lw) and store word (sw)
2. The arithmetic-logical instructions add, sub, AND, OR, and slt
3. The instructions branch equal (beq) and jump (j)
• This subset does not include all the integer instructions (for example, shift, multiply, and divide
are missing), nor does it include any floating-point instructions.
• It illustrates the key principles used in creating a datapath and designing the control.
A Basic MIPS Implementation…Contd.,

• For above stated instructions the first two steps are identical:
• Send the program counter (PC) to the memory that contains the code and fetch the
instruction from that memory.
• Read one or two registers, using fields of the instruction to select the registers to read.
For the load word instruction, we need to read only one register, but most other
instructions require reading two registers.

• After these two steps, the actions required to complete the instruction depend on the
instruction class (previous slide).
Designing a Processor: Step-by-Step
1. Analyze MIPS ISA => major datapath components to execute each class of
MIPS instructions.
1. The meaning of each instruction is given by the register transfers
2. Datapath must include storage elements for ISA registers
3. Datapath must support each register transfer

2. Select datapath components and clocking methodology


3. Assemble datapath, meeting the requirements
4. Analyze implementation of each instruction
1. Determine the setting of control signals for register transfer

5. Assemble the control logic


Review of MIPS Instruction Formats
• All instructions are 32-bit wide
• Three instruction formats: R-type, I-type, and J-type

Op6 Rs5 Rt5 Rd5 sa5 funct6

Op6 Rs5 Rt5 immediate16

Op6 address26

• Op6: 6-bit opcode of the instruction


• Rs5, Rt5, Rd5: 5-bit source and destination register numbers
• sa5: 5-bit shift amount used by shift instructions
• funct6: 6-bit function field for R-type instructions
• immediate16: 16-bit immediate constant or PC-relative offset
• address26: 26-bit target address of the jump instruction
MIPS Subset of Instructions

• Only a subset of the MIPS instructions is considered


• ALU instructions (R-type): add, sub, and, or, xor, slt
• Immediate instructions (I-type): addi, slti, andi, ori, xori
• Load and Store (I-type): lw, sw
• Branch (I-type): beq, bne
• Jump (J-type): j

• This subset does not include all the integer instructions

• But sufficient to illustrate design of datapath and control

• Concepts used to implement the MIPS subset are used to construct a broad spectrum of
computers
Details of the MIPS Subset
Instruction Meaning Format
add rd, rs, rt addition op6 = 0 rs5 rt5 rd5 0 0x20
sub rd, rs, rt subtraction op6 = 0 rs5 rt5 rd5 0 0x22
and rd, rs, rt bitwise and op6 = 0 rs5 rt5 rd5 0 0x24
or rd, rs, rt bitwise or op6 = 0 rs5 rt5 rd5 0 0x25
xor rd, rs, rt exclusive or op6 = 0 rs5 rt5 rd5 0 0x26
slt rd, rs, rt set on less than op6 = 0 rs5 rt5 rd5 0 0x2a
addi rt, rs, imm16 add immediate 0x08 rs5 rt5 imm16
slti rt, rs, imm16 slt immediate 0x0a rs5 rt5 imm16
andi rt, rs, imm16 and immediate 0x0c rs5 rt5 imm16
ori rt, rs, imm16 or immediate 0x0d rs5 rt5 imm16
xori rt, imm16 xor immediate 0x0e rs5 rt5 imm16
lw rt, imm16(rs) load word 0x23 rs5 rt5 imm16
sw rt, imm16(rs) store word 0x2b rs5 rt5 imm16
beq rs, rt, offset16 branch if equal 0x04 rs5 rt5 offset16
bne rs, rt, offset16 branch not equal 0x05 rs5 rt5 offset16
j address26 jump 0x02 address26
Register Transfer Level (RTL)
• RTL is a description of data flow between registers, it gives a meaning to the instructions.
• In detail, it shows a way to specify the behavior of instructions at a low level, detailing how data is
transferred between registers, memory, and the Program Counter (PC).
• All instructions are fetched from memory at address PC
• Instruction RTL Description
ADD Reg(rd) ← Reg(rs) + Reg(rt); PC ← PC + 4
SUB Reg(rd) ← Reg(rs) – Reg(rt); PC ← PC + 4
ORI Reg(rt) ← Reg(rs) | zero_ext(imm16); PC ← PC + 4
LW Reg(rt) ← MEM[Reg(rs) + sign_ext(imm16)]; PC ← PC + 4
SW MEM[Reg(rs) + sign_ext(imm16)] ← Reg(rt); PC ← PC + 4
BEQ if (Reg(rs) == Reg(rt))
PC ← PC + 4 + 4 × sign_ext(offset16)
else PC ← PC + 4
Instruction Fetch/Execute Summary of the Instruction Cycle
1. Fetch: Get the instruction from memory.
R-type Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(rs), data2 ← Reg(rt) 2. Decode: Identify the operation and fetch operands
from registers.
Execute operation: ALU_result ← func(data1, data2)
3. Execute: Perform the operation using the ALU.
Write ALU result: Reg(rd) ← ALU_result
Next PC address: PC ← PC + 4 4. Writeback: Save the result to a register.
5. Update PC: Move to the next instruction.
I-type Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(rs), data2 ← Extend(imm16) immediate value is typically 16 bits, it is sign-
extended to 32 bits to match the size of the
Execute operation: ALU_result ← op(data1, data2)
operands.
Write ALU result: Reg(rt) ← ALU_result
Next PC address: PC ← PC + 4

BEQ Fetch instruction: Instruction ← MEM[PC]


Fetch operands: data1 ← Reg(rs), data2 ← Reg(rt)
determines whether branch condition is met
Equality: zero ← subtract(data1, data2) (whether data1 == data2).
Branch: if (zero) PC ← PC + 4 + 4×sign_ext(offset16)
else PC ← PC + 4 If zero flag is 1 (true), the branch is taken, PC is updated
to target address, or simply incremented to next inst
Instruction Fetch/Execute – cont’d
LW Fetch instruction: Instruction ← MEM[PC] base address for calculating the memory address, it
Fetch base register: base ← Reg(rs) computes the effective memory address from which
data will be loaded, then CPU retrieves the data from
Calculate address: address ← base + sign_extend(imm16)
memory, stores the loaded data in a register for
Read memory: data ← MEM[address] future use.
Write register Rt: Reg(rt) ← data
Next PC address: PC ← PC + 4

prepares the base address and the data to be stored,


SW Fetch instruction: Instruction ← MEM[PC] computes the effective memory address where the
Fetch registers: base ← Reg(rs), data ← Reg(rt) data will be stored, stores the data in memory.

Calculate address: address ← base + sign_extend(imm16)


Write memory: MEM[address] ← data
Next PC address: PC ← PC + 4
upper 4 bits of current PC, the 26-bit address
concatenation encoded in jump instruction, Two 0 bits ('00')
Jump Fetch instruction: Instruction ← MEM[PC]
to align address to a 4-bytes long MIPS
Target PC address: target ← PC[31:28] || address26 || ‘00’ instruction, jump to the target address.
Jump: PC ← target
Instruction Execution
• Fetch instruction
➢ PC → Instruction memory
• Decode
➢ Decoder ( part of control unit)
• Execution
• Register numbers → register file, read registers
• Depending on instruction class
• Use ALU to calculate
• Arithmetic result
• Memory address for load/store
• Branch target address
• Access data memory for load/store

• PC  target address or PC + 4
• Adder
A Basic MIPS Implementation

An abstract view of the implementation of the MIPS subset showing the major functional units and the
major connections between them.
Multiplexers
◼ Can’t just join wires together
◼ Use multiplexers
A Basic MIPS Implementation …Contd.,

The basic implementation of the MIPS subset, including the necessary multiplexors and control lines.
Building a Datapath
• Datapath
• Elements that process data and addresses in the CPU
• Registers, ALUs, mux’s, memories, …

• We will build a MIPS data-path incrementally


• Refining the overview design
Requirements of the Instruction Set
• Memory
• Instruction memory where instructions are stored
• Data memory where data is stored
• Registers
• 31 × 32-bit general purpose registers, R0 is always zero
• Read source register Rs
• Read source register Rt
• Write destination register Rt or Rd
• Program counter PC register and Adder to increment PC
• Sign and Zero extender for immediate constant
• ALU for executing instructions
Components of the Datapath
• Combinational Elements 32

• ALU, Adder 16 32 m
0
A 32
zero

Extend u L ALU result


x U
• Immediate extender ExtOp 1
32 overflow

select
• Multiplexers
ALUOp

• Storage Elements Instruction


32
Data
32 32
Memory
32 32
• Instruction memory

PC
Address Address
32
32 Data_out
Instruction
• Data memory clk Memory
Data_in

• PC register clk
Mem Mem
Registers Read Write
• Register file
5 32
RA BusA
5 32

• Clocking methodology
RB BusB
5
RW
BusW
• Timing of writes clk
RegWrite 32
Register Element
• Register Data_In

• Similar to the D-type Flip-Flop n bits

• n-bit input and output Write


Enable
WE Register Clock

• Write Enable (WE): n bits


• Enable / disable writing of register
Data_Out
• Negated (0): Data_Out will not change
• Asserted (1): Data_Out will become Data_In after clock edge

• Edge triggered Clocking


• Register output is modified at clock edge
MIPS Register File
• Register File consists of 31 × 32-bit registers
• BusA and BusB: 32-bit output busses for reading 2 registers
• BusW: 32-bit input bus for writing a register when RegWrite is 1
• Two registers read and one written in a cycle
• Registers are selected by: 5
Register
32
File BusA
• RA selects register to be read on BusA 5
RA

RB
• RB selects register to be read on BusB
32
5 BusB
RW

• RW selects the register to be written Clock


BusW
• Clock input 32
RegWrite
• The clock input is used ONLY during write operation
• During read, register file behaves as a combinational logic block
• RA or RB valid => BusA or BusB valid after access time
Details of the Register File
RA 5 RB 5
32
Decoder "0" Decoder "0"
Tri-state
R0 is not
WE R1 buffers
used
32 32
Decoder

RW . WE R2
.
5 . 32

.
.
32 32
. 32
BusW BusA
WE R31
RegWrite 32
32
Clock BusB
Tri-State Buffers
• Allow multiple sources to drive a single bus
• Two Inputs: Enable
• Data_in
• Enable (to enable output) Data_in Data_out
• One Output: Data_out
• If (Enable) Data_out = Data_in
else Data_out = High Impedance state (output is disconnected)

Data_0
• Tri-state buffers can be
Output
used to build multiplexors
Data_1
Select
Building a Multifunction ALU
2

Shift/Rotate
SLL = 00

Operation
SLT: ALU does a SUB
SRL = 01 Shift Amount 5
and check the sign
SRA = 10 and overflow
Shifter
ROR = 11
32

c0 0
A 32 ALU Result
sign
A ≠ 1 32
d
d 2
B 32
Arithmetic
Operation

32 e
r 3
ADD = 0
SUB = 1 2
overflow zero

0 ALU
Logic Unit
Selection
1

2 Shift = 00
Operation

AND = 00 SLT = 01
Logical

3
OR = 01 Arith = 10
NOR = 10 2 Logic = 11
XOR = 11
Details of the Shifter
• Implemented with multiplexers and wiring
• Shift Operation can be: SLL (Shift Left Logical), SRL (Shift Right Logical), SRA (Shift Right Arithmetic),
or ROR (Rotate Right)
• Input Data is extended to 63 bits according to Shift Op
• The 63 bits are shifted right according to S4S3S2S1S0

5
sa
SLL S4 S3 S2 S1 S0
16 8 4 2 1
0 0 0 0 0

Data_out
split split split split split
Extender

31 31 31 31 31
Data

63 16 47 8 39 4 35 2 33 1 32

mux

mux

mux
mux

mux
32
31 31 31 31 31
16 8 4 2 1
31 31 31 31 31
1 1 1 1 1
16 8 4 2 1
2
Shift Shift Right Shift Right Shift Right Shift Right Shift Right
op 0 or 16 bits 0 or 8 bits 0 or 4 bits 0 or 2 bits 0 or 1 bit
More on Shifters

• SLL → 1010 (binary) shifted left by 1 bit → 0100 (binary).

• SRL → 1010 (binary) shifted right by 1 bit → 0101 (binary).

• SRA → 1110 (binary, representing -2 in two's complement) shifted right by 1 bit → 1111 (binary, preserving
the negative value).

• ROR → 1010 (binary) rotated right by 1 bit → 0101 (binary, and the rightmost bit 0 wraps around to the
leftmost position).
Details of the Shifter – cont’d
0_31 means 31 zeros,
• Input data is extended from 32 to 63 bits as follows: data[31:0] original 32-bit data.
• If shift op = SRL then ext_data[62:0] = 0_31 || data[31:0]
• If shift op = SRA then ext_data[62:0] = data[31]31 || data[31:0]
• If shift op = ROR then ext_data[62:0] = data[30:0] || data[31:0]
• If shift op = SLL then ext_data[62:0] = data[31:0] || 0_31

• For SRL, the 32-bit input data is zero-extended to 63 bits


• For SRA, the 32-bit input data is sign-extended to 63 bits
• For ROR, 31-bit extension = lower 31 bits of data
• Then, shift right according to the shift amount
• As the extended data is shifted right, the upper bits will be: 0 (SRL), sign-bit (SRA), or
lower bits of data (ROR)
Instruction and Data Memories
• Instruction memory needs only provide read access
• Because datapath does not write instructions
32 32
• Behaves as combinational logic for read Address Instruction

• Address selects Instruction after access time Instruction


Memory

• Data Memory is used for load and store Data


• MemRead: enables output on Data_out Memory
32 32
• Address selects the word to put on Data_out Address Data_out
32
• MemWrite: enables writing of Data_in Data_in

• Address selects the memory word to be written Clock

• The Clock synchronizes the write operation


MemRead MemWrite

• Separate instruction and data memories


• Later, we will replace them with caches
Clocking Methodology
• Clocks are needed in a sequential logic to decide when
a state element (register) should be updated ❖ We assume edge-triggered
• To ensure correctness, a clocking methodology defines clocking
when data can be written and read ❖ All state changes occur on the
same clock edge
❖ Data must be valid and stable
before arrival of clock edge

Register 2
Register 1

❖ Edge-triggered clocking allows a


Combinational logic register to be read and written
during same clock cycle

clock

rising edge falling edge


Determining the Clock Cycle
• With edge-triggered clocking, the clock cycle must be long enough to accommodate the path
from one register through the combinational logic to another register

• Tclk-q : clock to output delay through


Register 1

Register 2
register
Combinational logic • Tmax_comb: longest delay through
combinational logic
clock
• Ts : setup time that input to a register
writing edge
must be stable before arrival of clock
edge
Tclk-q Tmax_comb Ts Th • Th: hold time that input to a register
must hold after arrival of clock edge

Tcycle ≥ Tclk-q + Tmax_comb + Ts • Hold time (Th) is normally fulfilled since


Tclk-q > Th
Clock Skew

• Clock skew arises because the clock signal uses different paths with slightly
different delays to reach state elements.

• Clock skew is the difference in absolute time between when two storage
elements see a clock edge.

• With a clock skew, the clock cycle time is increased

Tcycle ≥ Tclk-q + Tmax_combinational + Tsetup+ Tskew


ReCap
• Designing a Processor: Step-by-Step

• Datapath Components and Clocking

• Assembling an Acceptable Datapath

• Controlling the Execution of Instructions

• Main, ALU, and PC Control


Instruction Fetch
Datapath elements A unit used to operate on or hold data within a processor.
1. A memory unit to store the instructions of a program and supply instructions given
an address
2. program counter (PC) is a register that holds the address of the current instruction.
3. an adder to increment the PC to the address of the next instruction
Instruction Fetching Datapath
• We can now assemble the datapath from its components
• For instruction fetching, we need …
• Program Counter (PC) register
• Instruction Memory
• Adder for incrementing PC
Improved datapath
increments upper
30 bits of PC by 1
next PC
4
The least significant 2 bits next PC

A
d of the PC are ‘00’ since Improved
32
32
d PC is a multiple of 4 30
+1 Datapath
30
32 32

00
00

Instruction Instruction
32
Datapath does not 32
PC

Address Address

PC
Instruction
handle branch or Instruction
clk Memory jump instructions clk Memory
R-Format Instructions

• Read two register operands


• Perform arithmetic/logical operation
• Write register result
Datapath for R-type Instructions
Op6 Rs5 Rt5 Rd5 sa5 funct6

RegWrite
ALU Operation

30
+1
Instruction Registers 32
Memory Rs 5
30 32 RA BusA A 32

00
Instruction Rt 5 L
32 RB
PC
Address BusB 32 U
Rd 5
RW ALU result
BusW

clk

Rs and Rt fields select two BusA & BusB provide data input to ALU.
registers to read. Rd field ALU result is connected to BusW
selects register to write
Same clock updates PC and Rd register
• Control signals
• ALU Operation is the ALU operation as defined in the funct field for R-type
• RegWrite is used to enable the writing of the ALU result
Datapath for I-type ALU Instructions
Op6 Rs5 Rt5 immediate16

Reg Write
ALU Operation
30
+1
Instruction Registers 32
Memory Rs 5
30 32 RA BusA A 32
00
Instruction 5
32 L
32 RB
Address BusB 32 U
PC
Rt 5
RW ALU result
BusW

clk ExtOp Same clock


Imm16 edge updates
Rt selects register to Extender
PC and Rt
write, not Rd
Second ALU input comes from the extended
• Control signals immediate. RB and BusB are not used

• ALU Operation is derived from the Op field for I-type instructions


• RegWrite is used to enable the writing of the ALU result
• ExtOp is used to control the extension of the 16-bit immediate
Combining R-type & I-type Datapaths
RegWr
ALUOp A mux selects RW
30
+1
Instruction Registers 32 as either Rt or Rd
Memory Rs 5
30 32 RA BusA A 32

00
Instruction Rt 5 32 L Another mux
32 RB
Address
BusB 0 U
selects 2nd ALU

PC
0
RW 1
Rd
1 BusW
32
input as either data
clk RegDst ExtOp ALUSrc on BusB or the
ALU result extended
Extender immediate
Imm16

❖ Control signals
 ALUOp is derived from either the Op or the funct field
 RegWr enables the writing of the ALU result
 ExtOp controls the extension of the 16-bit immediate
 RegDst selects the register destination as either Rt or Rd
 ALUSrc selects the 2nd ALU source as BusB or extended immediate
Controlling ALU Instructions

RegWr = 1
ALUOp
30
+1 For R-type ALU
Instruction Registers 32
Memory Rs 5 instructions, RegDst is ‘1’
30 RA BusA A
00
Instruction
32 32 to select Rd on RW and
Rt 5 32 L
32 RB BusB 0 U ALUSrc is ‘0’ to select
Address
PC

0
Rd RW 1 BusB as second ALU
1 BusW
input. The active part of
ExtOp ALUSrc = 0
clk RegDst = 1 datapath is shown in
ALU result green
Extender
Imm16

RegWr = 1

+1
ALUOp For I-type ALU
30
Instruction Registers 32 instructions, RegDst is ‘0’
Memory Rs 5
30 32 RA BusA A to select Rt on RW and
32
00

32
Instruction Rt 5
RB
32 L ALUSrc is ‘1’ to select
Address
BusB 0 U Extended immediate as
PC

0
Rd RW 1
1 BusW second ALU input. The
ExtOp ALUSrc = 1 active part of datapath is
clk RegDst = 0

32 ALU result
shown in green
Extender
Imm16
Details of the Extender
• Two types of extensions
• Zero-extension for unsigned constants
• Sign-extension for signed constants
• Control signal ExtOp indicates type of extension
• Extender Implementation: wiring and one AND gate

ExtOp = 0  Upper16 = 0
.. Upper
. 16 bits
ExtOp ExtOp = 1 
Upper16 = sign bit
.. Lower
Imm16 . 16 bits
Load/Store Instructions
• Read register operands
• Calculate address using 16-bit offset
• Use ALU, but sign-extend offset
• Load: Read memory and update register
• Store: Write register value to memory
Load and Store Word
• Load Word Instruction (Word = 4 bytes in MIPS)
lw Rt, imm16(Rs) # Rt  MEMORY[Rs+imm16]
• Store Word Instruction
sw Rt, imm16(Rs) # Rt ➔ MEMORY[Rs+imm16]
• Base or Displacement addressing is used
• Memory Address = Rs (base) + Immediate16 (displacement)
• Immediate16 is sign-extended to have a signed displacement

Base or Displacement Addressing


Op6 Rs5 Rt5 immediate16
+ Memory Word
Base address
Adding Data Memory to Datapath
• A data memory is added for load and store instructions
Op6 Rs5 Rt5 immediate16

ExtOp ALUOp MemRd MemWr


Imm16 32 ALUSrc
Extender ALU result
WBdata

30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32
00

Instruction Rt 5 L Address
32 RB 32
Address
BusB 0 U Data_out 1
PC

0
Rd RW Data_in
1 BusW 1
32
RegDst Reg
Wr
clk

RegWr
ALUOp
30
+1
Instruction Registers 32
Memory Rs 5
30 32 RA BusA A 32
00

Instruction Rt 5 32 L
32 RB
Address
BusB 0 U
PC

0
RW 1
1 BusW
Rd 32
clk RegDst ExtOp ALUSrc

ALU result
Extender
Imm16
Adding Data Memory to Datapath
• A data memory is added for load and store instructions
ExtOp ALUOp MemRead MemWrite
Imm16 32 ALUSrc
E MemtoReg
ALU result

30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32
00

Instruction Rt 5 L Address
32 RB 32
Address
BusB 0 U Data_out 1
PC

0
Rd RW Data_in
1 BusW 1
32
RegDst
RegWrite
clk

ALU calculates data memory address A 3rd mux selects data on BusW as
either ALU result or memory data_out
❖ Additional Control signals
BusB is connected to Data_in of Data
 MemRd for load instructions Memory for store instructions

 MemWr for store instructions


 MemtoReg selects data on BusW as ALU result or Memory Data_out
Controlling the Execution of Load
ExtOp = 1 ALUOp MemRd MemWr
= ADD =1 =0
ALUSrc
Imm16 32 MemtoReg
=1
E ALU result =1

30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32
00
Instruction Rt 5 L Address
32 RB 32
Address
BusB 0 U Data_out 1
PC
0
Rd RW Data_in
1 BusW 1
32
RegDst RegWrite
=0 =1
clk

RegDst = ‘0’ selects Rt RegWrite = ‘1’ to enable ExtOp = 1 to sign-extend


as destination register writing of register file Immmediate16 to 32 bits

ALUSrc = ‘1’ selects extended ALUOp = ‘ADD’ to calculate data memory


immediate as second ALU input address as Reg(Rs) + sign-extend(Imm16)

MemRd = ‘1’ to read MemtoReg = ‘1’ places the data Clock edge updates PC
data memory read from memory on BusW and Register Rt
Controlling the Execution of Store
ExtOp = 1 ALUOp MemRd MemWr
= ADD =0 =1
ALUSrc
Imm16 32 MemtoReg
=1
E ALU result =X

30
+1
Instruction Rs 5 32
RA BusA Data
30 Memory Memory 0
32
Registers A 32 32

00
Instruction Rt 5 L Address
32 RB 32
PC Address
BusB 0 U Data_out 1
0
Rd RW Data_in
1 BusW 1
32
RegDst
RegWrite
=X
clk =0

RegDst = ‘X’ because no RegWrite = ‘0’ to disable ExtOp = 1 to sign-extend


register is written writing of register file Immmediate16 to 32 bits

ALUSrc = ‘1’ selects extended ALUOp = ‘ADD’ to calculate data memory


immediate as second ALU input address as Reg(Rs) + sign-extend(Imm16)

MemWr = ‘1’ to write MemtoReg = ‘X’ because don’t Clock edge updates PC
data memory care what data is put on BusW and Data Memory
R-Type/I-Type/Load/Store Datapath
Branch Instructions
beq Rs,Rt,label branch to label if (Rs == Rt)
bne Rs,Rt,label branch to label if (Rs != Rt)

• Read register operands


• Compare operands
• Use ALU, subtract and check Zero output
• Calculate target address
• Sign-extend displacement
• Shift left 2 places (word displacement)
• Add to PC + 4
• Already calculated by instruction fetch
Branch Instructions

Just
re-routes
wires

Sign-bit wire
replicated
Implementing Jumps

Jump 2 address
31:26 25:0

• Jump uses word address


• Update PC with concatenation of
• Top 4 bits of old PC
• 26-bit jump address
• 00
• Need an extra control signal decoded from opcode
Adding Jump and Branch to Datapath
PCSrc

Branch Target Address Adding a mux at the PC input


2
Jump Target = PC[31:28] ‖ Imm26
1

0
ExtOp New adder for computing branch
Next PC Address
target address
Imm16 +
E
+1 Zero ALU result

Instruction Rs BusA Data


RA
00

Memory A Memory
Registers L Address
Rt 0
Address U
PC

RB 1 Data_out 1
Instruction 0 BusB 0
Rd Data_in
1 RW
BusW

clk

Op Reg ALU ALU Mem Mem WB


RegWr Wr data
Dst Src Op Rd

❖ Additional Control Signals


 PCSrc for PC control: 1 for a jump and 2 for a taken branch
 Zero flag for branch control: whether branch is taken or not
Controlling the Execution of a Jump
PCSrc =
1
Branch Target Address
2
Jump Target = PC[31:28] ‖ Imm26 If (Opcode == J) then
1

0
ExtOp = X PCSrc = 1 (Jump Target)
Next PC Address

Imm16 +
E
+1 Zero = X ALU result

Instruction Rs BusA Data


RA
00

Memory A Memory
Registers L Address
Rt 0
Address U
PC

RB 1 Data_out 1
Instruction 0 BusB 0
Rd Data_in
1 RW
BusW

clk
Reg Reg ALU ALU Mem Mem WB
Op Dst Wr Src Op Rd Wr data
=J =X =0 =X =X =0 =0 =X

MemRd = MemWr = RegWr = 0, Don't care about other control signals

Clock edge updates PC register only


Controlling the Execution of a Branch
PCSrc =
2
Branch Target Address
2 If (Opcode == BEQ && Zero == 1)
Jump Target = PC[31:28] ‖ Imm26
1
ExtOp = 1
then PCSrc = 2 (Branch Target)
0
Next PC Address
else PCSrc = 0 (Next PC)
Imm16 +
E
+1 Zero = 1 ALU result

Instruction Rs BusA Data


RA
00

Memory A Memory
Registers L Address
Rt 0
Address U
PC

RB 1 Data_out 1
Instruction 0 BusB 0
Rd Data_in
1 RW
BusW

clk
Reg Reg ALU ALU Mem Mem WB
Op Dst Wr Src Op Rd Wr data
BEQ =X =0 =0 = SUB =0 =0 =X

ALUSrc = 0, ALUOp = SUB, ExtOp = 1, MemRd = MemWr = RegWr = 0

Clock edge updates PC register only


Main, ALU, and PC Control

Zero
Instruction
0 Memory
Datapath A

PC
1 Address L
32
2 Instruction U

WBdata
MemWr
MemRd
ALUSrc
RegDst

RegWr

ExtOp
Op6
PCSrc ALUOp
funct6
Zero
PC Main ALU
Op6
Control Control Control

PC Control Input Main Control Input ALU Control Input


 6-bit opcode • 6-bit opcode field  6-bit opcode field
 ALU zero flag Main Control Output  6-bit function field
PC Control Output • Main control signals ALU Control Output
 PCSrc signal  ALUOp signal for ALU
Single-Cycle Datapath + Control
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26


ExtOp
Next PC Address

Imm16 +
Ext
+1 ALU result
Zero

Instruction Rs BusA Data


RA
00

0 Memory A Memory
Registers L Address
Rt 0
Address U
PC

1 RB 1 Data_out
Instruction 1
2 0 BusB 0
Rd Data_in
1 RW
BusW
PCSrc
clk
ALUop
func
MemRd MemtoReg
RegDst RegWr
Op ALU
PC MemWr
Ctrl
Ctrl
ExtOp ALUSrc

Main
Zero
Control
The Main Control Unit
• Control signals derived from instruction

R-type 0 rs rt rd shamt funct


31:26 25:21 20:16 15:11 10:6 5:0

Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0

Branch 4 rs rt address
31:26 25:21 20:16 15:0

opcode always read, write for sign-extend


read except R-type and and add
for load load
Main Control Signals

Signal Effect when ‘0’ Effect when ‘1’

RegDst Destination register = Rt Destination register = Rd

Destination register (Rt or Rd) is written


RegWr No register is written
with the data on BusW

ExtOp 16-bit immediate is zero-extended 16-bit immediate is sign-extended

Second ALU operand is the value of register Second ALU operand is the value of the
ALUSrc
Rt that appears on BusB extended 16-bit immediate

Data memory is read


MemRd Data memory is NOT read
Data_out ← Memory[address]

Data memory is written


MemWr Data Memory is NOT written
Memory[address] ← Data_in

MemtoReg BusW = ALU result BusW = Data_out from Memory


Main Control Truth Table

Op RegDst RegWr ExtOp ALUSrc MemRd MemWr MemtoReg

R-type 1 = Rd 1 X 0 = BusB 0 0 0 = ALU


ADDI 0 = Rt 1 1 = sign 1 = Imm 0 0 0 = ALU
SLTI 0 = Rt 1 1 = sign 1 = Imm 0 0 0 = ALU
ANDI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU
ORI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU
XORI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU
LW 0 = Rt 1 1 = sign 1 = Imm 1 0 1 = Mem
SW X 0 1 = sign 1 = Imm 0 1 X
BEQ X 0 1 = sign 0 = BusB 0 0 X
BNE X 0 1 = sign 0 = BusB 0 0 X
J X 0 X X 0 0 X

X is a don’t care (can be 0 or 1), used to minimize logic


Logic Equations for Main Control Signals
Op6
RegDst = R-type

RegWrite = (SW + BEQ + BNE + J) Decoder

ExtOp = (ANDI + ORI + XORI)

R-type

XORI
ADDI

ANDI

BEQ
SLTI

BNE
ORI

SW
LW

J
ALUSrc = (R-type + BEQ + BNE)
Logic Equations
MemRd = LW

MemWr = SW

WBdata
MemWr
MemRd
ALUSrc
RegDst

RegWr
ExtOp
MemtoReg = LW
ALU Control Truth Table

Op funct ALU function 4-bit Coding


R-type AND AND 0001 The 4-bit Coding
R-type OR OR 0010 defines the binary
R-type XOR XOR 0011 ALU operations.
R-type ADD ADD 0100
R-type SUB SUB 0101
Logic equations
R-type SLT SLT 0110 are derived for the
ADDI X ADD 0100 4-bit coding.
SLTI X SLT 0110
ANDI X AND 0001
ORI X OR 0010
XORI X XOR 0011
LW X ADD 0100
SW X ADD 0100
BEQ X SUB 0101
BNE X SUB 0101
J X X X
PC Control Truth Table

Op Zero flag PCSrc

R-type X 0 = Increment PC

J X 1 = Jump Target Address

BEQ 0 0 = Increment PC

BEQ 1 2 = Branch Target Address

BNE 0 2 = Branch Target Address

BNE 1 0 = Increment PC

Other than Jump or Branch X 0 = Increment PC

The ALU Zero flag is used by BEQ and BNE instructions


PC Control Logic
• The PC control logic can be described as follows:
if (Op == J) PCSrc = 1;
else if ((Op == BEQ && Zero == 1) ||
(Op == BNE && Zero == 0)) PCSrc = 2;
else PCSrc = 0;
Op

Decoder
Branch = (BEQ . Zero) + (BNE . Zero) BEQ BNE J
Zero
Branch = 1, Jump = 0 ➔ PCSrc = 2
Branch = 0, Jump = 1 ➔ PCSrc = 1
Branch = 0, Jump = 0 ➔ PCSrc = 0

Branch Jump
A Simple Implementation Scheme

• The three instruction classes (R-type, load and store, and branch) use two different
instruction formats
A Simple Implementation Scheme

Figure: The data path of Basic MIPS with all necessary multiplexors and all control lines identified
A Simple Implementation Scheme

• Show the operation of the datapath for an R-type instruction, such as


add $t1,$t2,$t3

• Four steps to execute the instruction; these steps are ordered by the flow of information:

1. The instruction is fetched, and the PC is incremented.


2. Two registers, $t2 and $t3, are read from the register file; also, the main control
unit computes the setting of the control lines during this step.
3. The ALU operates on the data read from the register file, using the function code
(bits 5:0, which is the funct field, of the instruction) to generate the ALU function.
4. The result from the ALU is written into the register file using bits 15:11 of the
instruction to select the destination register ($t1).
A Simple Implementation Scheme
A Simple Implementation Scheme
• Similarly, we can illustrate the execution of a load word, such as
lw $t1, offset($t2)
• We can think of a load instruction as operating in five steps (similar to how the R-
type executed in four):

1. An instruction is fetched from the instruction memory, and the PC is


incremented.
2. A register ($t2) value is read from the register file.
3. The ALU computes the sum of the value read from the register file and the
sign-extended, lower 16 bits of the instruction (offset).
4. The sum from the ALU is used as the address for the data memory.
5. The data from the memory unit is written into the register file; the register
destination is given by bits 20:16 of the instruction ($t1).
A Simple Implementation Scheme
A Simple Implementation Scheme
• Finally, we can show the operation of the branch-on-equal instruction, such as
beq $t1, $t2, offset
• It operates much like an R-format instruction, but the ALU output is used to
determine whether the PC is written with PC + 4 or the branch target address.

• Four steps in execution:


1. An instruction is fetched from the instruction memory, and the PC is
incremented.
2. Two registers, $t1 and $t2, are read from the register file.
3. The ALU performs a subtract on the data values read from the register file.
The value of PC + 4 is added to the sign-extended, lower 16 bits of the
instruction (offset) shifted left by two; the result is the branch target address.
4. The Zero result from the ALU is used to decide which adder result to store
into the PC.
A Simple Implementation Scheme
Task

• Show the basic single-cycle MIPS implementation on gthe below mentioned


instruction:

• Instruction: LWI Rt,Rd(Rs)


• Interpretation: Reg[Rt] = Mem[Reg[Rd]+Reg[Rs]]

The LWI Rt, Rd(Rs) instruction loads a word from memory at the address calculated by
adding the contents of Rd and Rs, and stores that word in the register Rt.

You might also like