Datapath 1 Notes
Datapath 1 Notes
Start: X:40
                           The Big Picture: Where are We Now?
                                       Processor
                                                             Input
                                       Control
                                                    Memory
                                       Datapath
                                                             Output
Before we go any further, let’s step back for a second and take a look at the big picture.
All computer consist of five components: (1) Input and (2) output devices. (3) The Memory System.
And the (4) Control and (5) Datapath of the Processor.
Today’s lecture covers the datapath design.
In the next lecture, I will show you how to design the processor’s control unit.
+1 = 5 min. (X:45)
                           The Big Picture: The Performance Perspective
This slide shows how the next two lectures fit into the overall performance picture.
Recall from one of your earlier lectures that the performance of a machine is determined by 3
factors: (a) Instruction count, (b) Clock cycle time, and (c) Clock cycles per instruction.
Instruction count is controlled by the Instruction Set Architecture and the compiler design so the
computer engineer has very little control over it (Instruction Count).
What you as a computer engineer can control, while you are designing a processor, are the Clock
Cycle Time and Instruction Count per cycle.
More specifically, in the next two lectures, you will be designing a single cycle processor which by
definition takes one clock cycle to execute every instruction.
The disadvantage of this single cycle processor design is that it has a long cycle time.
+2 = 7 min. (X:47)
                            The MIPS Instruction Formats
                       °All MIPS instructions are 32 bits long. The three instruction formats:
                                              31            26            21            16             11              6               0
                               • R-type            op             rs            rt             rd            shamt         funct
                                                   6 bits        5 bits        5 bits         5 bits          5 bits       6 bits
                                              31            26            21            16                                             0
                               • I-type            op             rs            rt                          immediate
                                                   6 bits        5 bits        5 bits                        16 bits
                               • J-type       31            26                                                                         0
                                                   op                                target address
                                                   6 bits                                    26 bits
One of the most important thing you need to know before you start designing a processor is how
the instructions look like.
Or in more technical term, you need to know the instruction format. One good thing about the MIPS
instruction set is that it is very simple.
First of all, all MIPS instructions are 32 bits long and there are only three instruction formats: (a) R-
type, (b) I-type, and (c) J-type.
The different fields of the R-type instructions are:
(a) OP specifies the operation of the instruction.
(b) Rs, Rt, and Rd are the source and destination register specifiers.
(c) Shamt specifies the amount you need to shift for the shift instructions.
(d) Funct selects the variant of the operation specified in the “op” field.
For the I-type instruction, bits 0 to 15 are used as an immediate field. I will show you how this
immediate field is used differently by different instructions.
Finally for the J-type instruction, bits 0 to 25 become the target address of the jump.
+3 = 10 min. (X:50)
                           The MIPS Subset
                                                    31        26        21        16             11             6               0
                      °ADD and subtract                  op        rs        rt          rd           shamt             funct
                              • add rd, rs, rt           6 bits    5 bits    5 bits     5 bits        5 bits           6 bits
                              • sub rd, rs, rt
                                                    31        26        21        16                                            0
                      °OR Immediate:                     op        rs        rt                   immediate
                              • ori rt, rs, imm16        6 bits    5 bits    5 bits                   16 bits
                      °BRANCH:
                              • beq rs, rt, imm16
                      °JUMP:                        31        26                                                                0
                              • j target                 op                       target address
                                                         6 bits                        26 bits
In today’s lecture, I will show you how to implement the following subset of MIPS instructions: add,
subtract, or immediate, load, store, branch, and the jump instruction.
The Add and Subtract instructions use the R format. The Op together with the Func fields together
specified all the different kinds of add and subtract instructions.
Rs and Rt specifies the source registers. And the Rd field specifies the destination register.
The Or immediate instruction uses the I format. It only uses one source register, Rs. The other
operand comes from the immediate field. The Rt field is used to specified the destination register.
Both the load and store instructions use the I format and both add the Rs and the immediate filed
together to form the memory address.
The difference is that the load instruction will load the data from memory into Rt while the store
instruction will store the data in Rt into the memory.
The branch on equal instruction also uses the I format. Here Rs and Rt are used to specified the
registers we need to compare.
If these two registers are equal, we will branch to a location specified by the immediate field.
Finally, the jump instruction uses the J format and always causes the program to jump to a
memory location specified in the address field.
I know I went over this rather quickly and you may have missed something. But don’t worry, this is
just an overview. You will keep seeing these (point to the format) all day today.
+3 = 13 min. (X:53)
                        An Abstract View of the Implementation
                                                                      Two types of functional units
                                                                      –Operational element that operate on data (combinational
                                                                      –State element that contain data (sequential)
                                                                      •Generic Implementation:
                  Clk
                                PC                                    –use PC to supply instruction address
                                                                      –get the instruction from memory
                                        Instruction Address           –read registers
                                                                      –use the instruction to decide exactly what to do
                              Ideal              Instruction          •All instructions use the ALU after reading the registers
                           Instruction
                            Memory            Rd Rs      Rt     Imm – Why? memory-reference? arithmetic? control flow?
                                               5  5       5       16
                                                                                        Data
                                                Rw Ra      Rb                     32   Address
                                     32                         32                                   Ideal      DataOut
                                                                            ALU
                                                  32 32-bit                                          Data
                                                  Registers                            Data         Memory
                                        Clk                                            In
                                                                                              Clk
                                                                     32
                           Next step: to fill in the details: more units, more connections, and control unit
                   ECE4680 Datapath.6                                                                              2002-4-10
One thing you may noticed from our last slide is that almost all instructions, except Jump, require
reading some registers, do some computation, and then do something else.
Therefore our datapath will look something like this.
For example, if we have an add instruction (points to the output of Instruction Memory), we will
read the registers from the register file (Ra, Rb and then busA and busB).
Add the two numbers together (ALU) and then write the result back to the register file.
On the other hand, if we have a load instruction, we will first use the ALU to calculate the memory
address.
Once the address is ready, we will use it to access the Data Memory.
And once the data is available on Data Memory’s output bus, we will write the data to the register
file. Well, this is simple enough.
But if it is this simple, you probably won’t need to take this class.
So in today’s lecture, I will show you how to turn this abstract datapath into a real datapath by
making it slightly (JUST slightly) more complicated so it can do real work for you.
But before we do that, let’s do a quick review of the clocking methodology
+3 = 16 (X:56)
                             Clocking Methodology
                       Clk
                                Setup   Hold                                    Setup   Hold
                                                        Don’t Care
                                  .      .                                       .         .
                                  .      .                                       .         .
                                  .      .                                       .         .
°Cycle Time = Latch Prop + Longest Delay Path + Setup + Clock Skew
°(Latch Prop + Shortest Delay Path - Clock Skew) > Hold Time
Remember, we will be using a clocking methodology where all storage elements are clocked by
the same clock edge.
Consequently, our cycle time will be the sum of:
(a) The Clock-to-Q ( or latch propagation) time of the input registers.
(b) The longest delay path through the combinational logic block.
(c) The set up time of the output register.
(d) And finally the clock skew.
In order to avoid hold time violation, you have to make sure this inequality is fulfilled.
+2 = 18 min. (X:58)
                            An Abstract View of the Critical Path
                      °Register file and ideal memory:
                              • The CLK input is a factor ONLY during write operation
                              • During read operation, behave as combinational logic:
                                    -    Address valid => Output valid after “access time.”
                    Clk                                                        Critical Path (Load Operation) =
                                   PC                                            PC’s prop time +
                                        Instruction Address                      Instruction Memory’s Access Time +
                                                                                 Register File’s Access Time +
                                                                                 ALU to Perform a 32-bit Add +
                                  Ideal           Instruction                    Data Memory Access Time +
                               Instruction                                       Setup Time for Register File Write +
                                Memory          Rd Rs         Rt   Imm
                                                 5  5          5     16          Clock Skew
                                                                                           Data
                                                  Rw Ra       Rb                     32   Address
                                        32                         32                                   Ideal    DataOut
                                                                               ALU
                                                    32 32-bit                                           Data
                                                    Registers                             Data         Memory
                                          Clk                                             In
                                                                                                 Clk
                                                                          32
                   ECE4680 Datapath.8                                                                           2002-4-10
Now with the clocking methodology back in your mind, we can think about how the critical path of
our “abstract” datapath may look like.
One thing to keep in mind about the Register File and Ideal Memory (points to both Instruction and
Data) is that the Clock input is a factor ONLY during the write operation.
For read operation, the CLK input is not a factor. The register file and the ideal memory behave as
if they are combinational logic.
That is you apply an address to the input, then after certain delay, which we called access time,
the output is valid.
We will come back to these points (point to the “behave” bullets) later in this lecture.
But for now, let’s look at this “abstract” datapath’s critical path which occurs when the datapath
tries to execute the Load instruction.
The time it takes to execute the load instruction are the sum of:
(a) The PC’s clock-to-Q time.
(b) The instruction memory access time.
(c) The time it takes to read the register file.
(d) The ALU delay in calculating the Data Memory Address.
(e) The time it takes to read the Data Memory.
(f) And finally, the setup time for the register file and clock skew.
+3 = 21 (Y:01)
                            The Steps of Designing a Processor
                                                       ponent
                                          Element < com
+2 = 27 min. (Y:07)
                           What is RTL: The ADD Instruction
Here is an example. In terms of Register Transfer Language, this is what the Add instruction need
to do.
First, you need to fetch the instruction from memory.
Then you perform the actual add operation.
And finally, you need to update the program counter to point to the next instruction.
+1 = 28 min. (Y:08)
                           What is RTL: The Load Instruction
+1 = 29 min (Y:09)
                           Combinational Logic Elements
                                                                     CarryIn
                      °Adder              A
                                              32
                                                             Adder
                                                                                   Sum
                                                                       32
                                          B                                        Carry
                                              32
                      °MUX (p.B-9,B-19)                                                °Decoder
                                                   Select
                                                                                                            out0
                                          A                                                                 out1
                                                                                                  Decoder
                                              32
                                                       MUX
                                                                                                            out2
                                                                               Y            3
                                                                     32
                                          B                                                                 out7
                                              32
                      °ALU
                                                     OP
                                        A
                                              32
                                                            ALU
                                                                               Result
                                                                      32
                                        B                                      Zero
                                              32
                  ECE4680 Datapath.12                                                                              2002-4-10
Based on the Register Transfer Language examples we have so far, we know we will need the
following combinational logic elements.
We will need an adder to update the program counter.
A MUX to select the results.
And finally, an ALU to do various arithmetic and logic operation.
+1 = 30 min. (Y:10)
                           Storage Element: Register (p.B22-B25)
                          • Write Enable:
                                -       0: Data Out will not change                               Clk
                                -       1: Data Out will become Data In
                          • Array of logical elements(see register file on next 2 slides)
                                                                                                                  to       1.
                                                                                               able signal is set
                                                             clock tick ON   LY if the Write En
                          The content      is updated at the
As far as storage elements are concerned, we will need a N-bit register that is similar to the D flip-
flop I showed you in class.
The significant difference here is that the register will have a Write Enable input.
That is the content of the register will NOT be updated if Write Enable is zero.
The content is updated at the clock tick ONLY if the Write Enable signal is set to 1.
+1 = 31 min. (Y:11)
                            Storage Element: Register File
                                                                                        RW RA RB
                       °Register File consists of 32 registers:            Write Enable 5 5   5
                               • Two 32-bit output busses:                                         busA
                                  busA and busB                           busW        32 32-bit         32
                               • One 32-bit input bus: busW                  32       Registers
                                                                              Clk                  busB
                       °Register is selected by:                                                        32
We will also need a register file that consists of 32 32-bit registers with two output busses (busA
and busB) and one input bus.
The register specifiers Ra and Rb select the registers to put on busA and busB respectively.
When Write Enable is 1, the register specifier Rw selects the register to be written via busW.
In our simplified version of the register file, the write operation will occurs at the clock tick.
Keep in mind that the clock input is a factor ONLY during the write operation.
During read operation, the register file behaves as a combinational logic block.
That is if you put a valid value on Ra, then bus A will become valid after the register file’s access
time.
Similarly if you put a valid value on Rb, bus B will become valid after the register file’s access time.
In both cases (Ra and Rb), the clock input is not a factor.
+2 = 33 min. (Y:13)
                             Storage Element: Register File -- Detailed diagram
                                                                                                RW RA RB
                                                                                   Write Enable 5 5   5
                                                                                                           busA
                                                                                  busW       32 32-bit          32
                                                                                     32      Registers
                                         Write Enable                     RA RB       Clk                  busB
                                                                                                                32
                                                        C
                                         0
                                                            Register 0
                                         1              D
                                   32-to-1              C
                        RW         Decoder                  Register 1
                                         30
                                                        D                 M
                                         31
                                                                          U       busA
                                                        C
                                                            Register 30   X
                                                        D
                                                        C
                                                            Register 31
                      busW                              D
                       Clk
                                                                          M
                                                                          U       busB
We will also need a register file that consists of 32 32-bit registers with two output busses (busA
and busB) and one input bus.
The register specifiers Ra and Rb select the registers to put on busA and busB respectively.
When Write Enable is 1, the register specifier Rw selects the register to be written via busW.
In our simplified version of the register file, the write operation will occurs at the clock tick.
Keep in mind that the clock input is a factor ONLY during the write operation.
During read operation, the register file behaves as a combinational logic block.
That is if you put a valid value on Ra, then bus A will become valid after the register file’s access
time.
Similarly if you put a valid value on Rb, bus B will become valid after the register file’s access time.
In both cases (Ra and Rb), the clock input is not a factor.
+2 = 33 min. (Y:13)
                           Storage Element: Idealized Memory
                                                                             Write Enable   Address
                      °Memory (idealized)
                              • One input bus: Data In
                                                                       Data In                    DataOut
                              • One output bus: Data Out
                                                                          32                            32
                                                                           Clk
                      °Memory word is selected by:
                              • Address selects the word to put on Data Out
                              • Write Enable = 1: address selects the memory
                                memory word to be written via the Data In bus
The last storage element you will need for the datapath is the idealized memory to store your data
and instructions.
This idealized memory block has just one input bus (DataIn) and one output bus (DataOut).
When Write Enable is 0, the address selects the memory word to put on the Data Out bus.
When Write Enable is 1, the address selects the memory word to be written via the DataIn bus at
the next clock tick.
Once again, the clock input is a factor ONLY during the write operation.
During read operation, it behaves as a combinational logic block.
That is if you put a valid value on the address lines, the output bus DataOut will become valid after
the access time of the memory.
+2 = 35 min. (Y:15)
                           Overview of the Instruction Fetch Unit (Fig. 5.5)
Clk PC
                                                                          Next Address
                                                                             Logic
                                                             Address
                                                                           Instruction Word
                                                            Instruction
                                                             Memory             32
Now let’s take a look at the first major component of the datapath: the instruction fetch unit.
The common RTL operations for all instructions are:
(a) Fetch the instruction using the Program Counter (PC) at the beginning of an
   instruction’s execution (PC -> Instruction Memory -> Instruction Word).
(b) Then at the end of the instruction’s execution, you need to update the
   Program Counter (PC -> Next Address Logic -> PC).
More specifically, you need to increment the PC by 4 if you are executing sequential code.
For Branch and Jump instructions, you need to update the program counter to “something else”
other than plus 4.
I will show you what is inside this Next Address Logic block when we talked about the Branch and
Jump instructions.
For now, let’s focus our attention to the Add and Subtract instructions.
+2 = 37 min. (Y:17)
                            RTL: The ADD Instruction
                                         31            26            21            16            11            6            0
                                              op             rs            rt            rd           shamt        funct
                                              6 bits        5 bits        5 bits        5 bits        5 bits       6 bits
+1 = 38 min. (Y:18)
                           RTL: The Subtract Instruction
                                        31            26            21            16            11            6            0
                                             op             rs            rt            rd           shamt        funct
                                             6 bits        5 bits        5 bits        5 bits        5 bits       6 bits
+1 = 39 min. (Y:19)
                            Datapath for Register-Register Operations
                      °R[rd] <- R[rs] op R[rt]                               Example: add                     rd, rs, rt
                             • Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields
                             • ALUctr and RegWr: control logic after decoding the instruction
                                          31             26             21            16                 11              6              0
                                                op              rs            rt                rd              shamt           funct
                                                6 bits         5 bits        5 bits             5 bits          5 bits         6 bits
                                                       Rd Rs Rt
                                                RegWr 5 5                                                     ALUctr
                                                             5
                                                                                   busA
                                                              Rw Ra Rb
                                         busW                                              32
                                                                                                                ALU
                                                              32 32-bit                                                       Result
                                          32                  Registers                                                  32
                                            Clk                                    busB
                                                                                           32
+3 = 42 min. (Y:22)
                            Register-Register Timing
                  Clk
                                                Clk-to-Q
                  PC     Old Value            New Value
                                                            Instruction Memory Access Time
                  Rs, Rt, Rd,                Old Value           New Value
                  Op, Func
                                                                         Delay through Control Logic
                  ALUctr                     Old Value                          New Value
                                                    Rd Rs Rt
                                             RegWr 5 5                                       ALUctr               Register Write
                                                          5
                                                                                                                   Occurs Here
                                                                          busA
                                                         Rw Ra Rb
                                        busW                                  32
                                                                                               ALU
                                                         32 32-bit                                          Result
                                          32             Registers                                     32
                                            Clk                           busB
                                                                             32
+3 = 45 min. (Y:25)
                            RTL: The OR Immediate Instruction
                                         31        26        21         16                          0
                                              op        rs        rt            immediate
                                              6 bits    5 bits    5 bits          16 bits
                                         31                            16 15                    0
                                          0000000000000000                     immediate
                                                16 bits                          16 bits
+2 = 57 min. (Y:27)
                           Datapath for Logical Operations with Immediate
                      °R[rt] <- R[rs] op ZeroExt[imm16]]                                             Example: ori              rt, rs, imm16
                                        31          26            21                       16                                          0
                                               op           rs                       rt                       immediate
                                               6 bits       5 bits                   5 bits                        16 bits
                                                                                                                    ALU
                                        32               Registers                                                        32
                                         Clk                                  busB
                                                                                                    Mux
                                                                                      32
                                                                           ZeroExt
                                               imm16
                                                           16                             32
                                                                                                          ALUSrc
                  ECE4680 Datapath.23                                                                                                      2002-4-10
+3 = 50 min. (Y:30)
                            RTL: The Load Instruction
                                                     31        26          21         16                                   0
                                                          op          rs        rt                 immediate
                       °lw           rt, rs, imm16        6 bits     5 bits     5 bits               16 bits
                                         31                 16 15                                        0
                                           0000000000000000    0                     immediate
                                                  16 bits                                16 bits
                                         31                 16 15                                        0
                                            1111111111111111 1                       immediate
                                                     16 bits                             16 bits
Like the OR immediate instruction I just showed you, the load instruction also uses the I format
(point to the format diagram).
But unlike the OR immediate instruction, the immediate field (Imm16 of the format diagram) is sign
extended instead of zero extended.
That is we will duplicate the most significant bit of 16 times to the left to form a 32-bit value.
This sign extended value (SignExt) is then added to the register selected by the Rs field of the
instruction to form the memory address.
The memory address is then used to load the value into the register specified by the Rt field of the
instruction (Rt of the format diagram).
+2 = 57 min. (Y:37)
                           Datapath for Load Operations
                      °R[rt] <- Mem[R[rs] + SignExt[imm16]]                                       Example: lw         rt, rs, imm16
                                        31         26                       21               16                              0
                                             op                 rs                rt                immediate
                                             6 bits             5 bits            5 bits                    16 bits
                                        Rd        Rt
                        RegDst
                                         Mux
                                                                Don’t Care
                                                       Rs                                         ALUctr
                                  RegWr 5                         (Rt)
                                                  5         5                                                                    MemtoReg
                                                                             busA
                                             Rw Ra Rb
                         busW                                                     32
                                                                                                    ALU
                                             32 32-bit
                              32             Registers
                                                                                                                                  Mux
                                                                                                           32
                               Clk                                   busB                                   MemWr          32
                                                                                       Mux
                                                                       32
                                                                                                                WrEn Adr
                                                                 Extender
                                                                                                  Data In
                                     imm16                                   32                                   Data
                                                  16                                                 32
                                                                                                                 Memory
                                                                                                      Clk
                                                                                    ALUSrc
Once again we cannot use the instruction’s Rd field for the Register File’s Rw input because load
is a I-type instruction and there is no such thing as the Rd field in the I format.
So instead of Rd, the Rt field is used to specify the destination register through this two to one
multiplexor.
The first operand of the ALU comes from busA of the register file which contains the value of
Register Rs (points to the Ra input of the register file).
The second operand, on the other hand, comes from the immediate field of the instruction.
Instead of using the Zero Extender I used in datapath for the or immediate datapath, I have to use
a more general purpose Extender that can do both Sign Extend and Zero Extend.
The ALU then adds these two operands together to form the memory address.
Consequently, the output of the ALU has to go to two places:
(a) First the address input of the data memory.
(b) And secondly, also to the input of this two-to-one multiplexer.
The other input of this multiplexer comes from the output of the data memory so we can place the
output of the data memory onto the register file’s input bus for the load instruction.
For Add, Subtract, and the Or immediate instructions, the output of the ALU will be selected to be
placed on the input bus of the register file.
In either case, the control signal RegWr should be asserted so the register file will be written at the
end of the cycle.
+3 = 60 min. (Y:40)
                            RTL: The Store Instruction
                                         31        26        21        16                           0
                                              op        rs        rt            immediate
                                              6 bits    5 bits    5 bits          16 bits
+2 = 62 min. (Y:42)
                           Datapath for Store Operations
                      °Mem[R[rs] + SignExt[imm16] <- R[rt]]                                          Example: sw          rt, rs, imm16
                                        31             26                   21              16                                     0
                                              op                 rs                 rt                immediate
                                              6 bits             5 bits          5 bits                         16 bits
                                        Rd         Rt
                       RegDst
                                        Mux
                                                        Rs       Rt                              ALUctr
                                 RegWr 5           5         5                                                     MemWr           MemtoReg
                                                                             busA
                                             Rw Ra Rb
                        busW                                                     32
                                                                                                     ALU
                                             32 32-bit
                             32              Registers
                                                                                                                                       Mux
                                                                                                           32
                              Clk                                    busB                                                     32
                                                                                      Mux
                                                                       32
                                                                                                      Data In      WrEn Adr
                                                                 Extender
                                                                                                           32
                                    imm16                                                                            Data
                                                                            32
                                                16                                                                  Memory
                                                                                            ALUSrc     Clk
                  ECE4680 Datapath.27
                                                                  ExtOp                                                                      2002-4-10
+2 = 64 min. (Y:44)
                           RTL: The Branch Instruction
                                        31        26        21        16                           0
                                             op        rs        rt            immediate
                                             6 bits    5 bits    5 bits          16 bits
+1 = 65 min. (Y:45)
                           Datapath for Branch Operations
                      °beq        rs, rt, imm16                                     We need to compare Rs and Rt!
                                        31         26                       21               16                                  0
                                             op                 rs                rt                   immediate
                                             6 bits             5 bits            5 bits                      16 bits
                                        Rd        Rt                                                             Branch                  Clk
                                                                                                                              PC
                        RegDst
                                         Mux
                                                       Rs       Rt                                ALUctr     imm16
                                  RegWr 5         5         5                                                           Next Address
                                                                             busA                              16          Logic
                                             Rw Ra Rb
                         busW                                                     32
                                                                                                       ALU
                                             32 32-bit
                              32             Registers
                               Clk                                   busB                                      Zero                  To Instruction
                                                                                                                                       Memory
                                                                                       Mux
                                                                       32
                                                                 Extender
                                     imm16                                   32
                                                  16
                                                                                    ALUSrc
+2 = 67 min. (Y:47)
                           Binary Arithmetics for the Next Address
                      °In theory, the PC is a 32-bit byte address into the instruction memory:
                              • Sequential operation: PC<31:0> = PC<31:0> + 4
                              • Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4
In theory, the Program Counter (PC) is a 32-bit byte address into the Instruction memory.
The Program Counter is increment by four after each sequential instruction.
When a branch is taken, we need to sign extend the 16 bit immediate field, multiply this sign
extended value by four, and add it to the sequential instruction address (PC + 4).
Why does this magic number “4” always come up? Well the reason is that the 32-bit PC is a byte
address and all MIPS instructions are four bytes, or 32 bits, long.
In other words, if we keep a 32-bit Program Counter, then the two least significant bits of the
Program Counter will always be zeros.
And if these two bits are always zeros, there is no reason to have hardware to keep them.
So in practice, we will simply the hardware by using a 30 bit program counter.
That is, we will build a Program Counter that only keep tracks of the upper 30 bits (<31:2>) of the
instruction address because we know the 2 least significant bits will always be 0s.
Then instead of always increase the Program Counter by four for sequential operation, we only
have to increase it by 1.
And for branch operation, we don’t need to multiply the sign extended immediate field by four
before adding to the sequential PC (PC + 1).
And when we apply the program counter to the address of the instruction memory, we need to
attach two zeros to its least significant bits.
+3 = 70 min. (Y:50)
                            Next Address Logic: Expensive and Fast Solution
                                                                      30
                                                                                                                Addr<31:2>
                                                                              30                                Addr<1:0>
                                   PC
“00”
                                                    Adder
                                          30                                                 0                   Instruction
                                                                                   30
                                                                                            Mux
                                                                                                                  Memory
                                                                           Adder
                                          “1”
                                                                                             1
                                                                                                                        32
                                    Clk                SignExt                      30
                                        imm16                    30
                             Instruction<15:0> 16                                                               Instruction<31:0>
Branch Zero
So let’s see how we can put all these theories (point to the equations) into practice.
The PC plus one is implemented by this first adder here.
For branch operation, we need to sign extend the immediate field of the instruction and then add it
to the output of the first adder to implement this equation (PC + 1 + SignExt(imm16)).
For sequential operation, the output of the first adder is selected by the two-to-one mux so it will be
saved into the PC register at the next clock tick.
For a taken branch, that is we have a branch_on_equal and the condition Zero is true, the output
of the second adder is selected.
In all cases, the 30 bit Program Counter is used as instruction address bit 31 to bit 2.
The two least significant bits of the instruction address will always be zeroes.
One question you may want to ask is: Do we really need an adder just to add “1”?
Well may be not.
+2 = 72 min. (Y:52)
                           Next Address Logic: Cheap and Slow Solution
                     °Why is this slow?
                            • Cannot start the address add until Zero (output of ALU) is valid
30
                                                                                                    Addr<31:2>
                                  PC
                                         30                              “1”                        Addr<1:0>
                                                                                             “00”
                                                 “0”                              Carry In            Instruction
                                                              0
                                                                          Adder
                                                                                                       Memory
                                                             Mux
                                   Clk                                                30
                                              SignExt
                                                              1                                              32
                             imm16                      30          30
                  Instruction<15:0> 16
                                                                                                    Instruction<31:0>
Branch Zero
One way to simplify the implementation is to use the CarryIn input of the adder to implement the
PC<31:2> = PC<31:2> plus 1 operation.
Then we can put a MUX in front of the adder to add the branch offset if the branch is taken.
If the branch is not taken, we simply set the 2nd output of the ALU to zeros so we only add one
through the CarryIn input.
Why is this implementation slow?
Well because we cannot start the address add until the Zero input is valid.
And when will the Zero input become valid? Not until we have performed a 32-bit subtract in the
main datapath.
But does it matter that this is slow in the overall scheme of things?
Well, probably not in this single cycle implementation.
The critical path of this single cycle implementation will be the load instruction’s memory access so
the extra time it takes to calculate the PC can be hidden behind the critical path.
+3 = 75 min (Y:55)
                            RTL: The Jump Instruction
                                         31        26                                 0
                                              op           target address
                                              6 bits           26 bits
°j target
Finally, let’s take a look at the jump instruction which uses the J format.
The effect of the jump instruction is to change the lower 26 bits of the Program Counter to the
value specified in the address field of the instruction.
+1 = 76 min. (Y:46)
                           Instruction Fetch Unit
                      °j            target
                              • PC<31:2> <- PC<31:28> concat target<25:0>
                                                                  30
                                                                                                                Addr<31:2>
                                                PC<31:28>                             30
                                                                                                                Addr<1:0>
                                                                 4                                       “00”
                                                     Target                                         1             Instruction
                                          Instruction<25:0>                        30
                                                                                                   Mux
                                                                 26                                                Memory
                                  PC
30 0
                                                    Adder
                                                                                   0                                     32
                                                                             30
                                                                                  Mux
                                         “1”
                                                                     Adder
                                                                                   1              Jump          Instruction<31:0>
                                   Clk                                       30
                                                SignExt
                               imm16                        30
                                      16
                    Instruction<15:0>
Branch Zero
+2 = 78 min. (Y:58)
                           Putting it All Together: A Single Cycle Datapath
                      °We have everything except control signals (underline)
                                                                                                           Instruction<31:0>
                                                                       Branch
<16:20>
                                                                                                                                      <11:15>
                                                                                       Instruction
<21:25>
                                                                                                                                                 <0:15>
                                                                        Jump           Fetch Unit
                                         Rd   Rt
                      RegDst                                             Clk
                                        1 Mux 0
                                                   Rs       Rt                                             Rt             Rs         Rd         Imm16
                                 RegWr 5      5         5                              ALUctr
                                                         busA                                           Zero             MemWr                            MemtoReg
                                           Rw Ra Rb
                         busW                               32
                                                                                                ALU
                                           32 32-bit
                              32                                                                                                                    0
                                           Registers busB      0                                      32
                                                                                                                                                  Mux
                               Clk
                                                                                 Mux
                                                      32
                                                                                                                                      32
                                                                                                            WrEn Adr                                1
                                                            Extender
                                                                                 1       Data In 32
                                     imm16                                                                      Data
                                                                         32
                                              16                                                               Memory
                                                                                                  Clk
                                                                                ALUSrc
                                                             ExtOp
                  ECE4680 Datapath.35                                                                                                                     2002-4-10
+2 = 80 min. (Z:00)
                          Where to get more information?