CS 1203
DLD & CA
Basics of Computer Architecture Design
Course Instructor: Dr. Pooran Singh
Department of Electrical and Computer Engineering,
Mahindra Ecole Centrale, Mahindra University
Email: pooran.singh@mahindrauniversity.edu.in
Introduction
• Computers and their design are introduced in this
chapter.
• The specification for a computer consists of a description
of its appearance to a programmer at the lowest level, i.e.
instruction set architecture (ISA).
• From the ISA, a high-level description of the hardware to
implement the computer, called the computer
architecture, is formulated.
• This architecture, for a simple computer, is typically
divided into a datapath and a control. The datapath is
defined by three basic components:
1. A set of registers,
2. The microoperations performed on data stored in the
registers, and
3. The control interface.
RISC Architecture
• RISC is the way to make hardware simpler whereas CISC is
the single instruction that handles multiple work.
• The main idea behind RISC is to simplify hardware by using
an instruction set composed of a few basic steps for
loading, evaluating, and storing operations just like a load
command will load data, a store command will store the
data.
Characteristics of RISC:
• Simpler instruction, hence simple instruction decoding.
• Instruction comes undersize of one word.
• Instruction takes a single clock cycle to get executed.
• More general-purpose registers.
• Simple Addressing Modes.
• Fewer data types.
• A pipeline can be achieved.
❑Advantages of RISC
• Simpler instructions: RISC processors use a smaller set of
simple instructions, which makes them easier to decode and
execute quickly. This results in faster processing times.
• Faster execution: Because RISC processors have a simpler
instruction set, they can execute instructions faster than CISC
processors.
• Lower power consumption: RISC processors consume less
power than CISC processors, making them ideal for portable
devices
❑Disadvantages of RISC
• More instructions required: RISC processors require more
instructions to perform complex tasks than CISC processors.
• Increased memory usage: RISC processors require more
memory to store the additional instructions needed to
perform complex tasks.
• Higher cost: Developing and manufacturing RISC processors
can be more expensive than CISC processors.
CISC Architecture
The main idea is that a single instruction will do all
loading, evaluating, and storing operations just like a
multiplication command will do stuff like loading data,
evaluating, and storing it, hence it’s complex.
Characteristics of CISC
• Complex instruction, hence complex instruction decoding.
• Instructions are larger than one-word size.
• Instruction may take more than a single clock cycle to get
executed.
• Less number of general-purpose registers as operations get
performed in memory itself.
• Complex Addressing Modes.
• More data types.
❑Advantages of CISC
• Reduced code size: CISC processors use complex instructions that
can perform multiple operations, reducing the amount of code
needed to perform a task.
• More memory efficient: Because CISC instructions are more
complex, they require fewer instructions to perform complex
tasks, which can result in more memory-efficient code.
• Widely used: CISC processors have been in use for a longer time
than RISC processors, so they have a larger user base and more
available software.
❑Disadvantages of CISC
• Slower execution: CISC processors take longer to execute
instructions because they have more complex instructions and
need more time to decode them.
• More complex design: CISC processors have more complex
instruction sets, which makes them more difficult to design and
manufacture.
• Higher power consumption: CISC processors consume more
power than RISC processors because of their more complex
instruction sets.
RISC vs CISC
CPU Performance
• RISC: Reduce the cycles per instruction at the cost of the number of instructions per
program.
• CISC: The CISC approach attempts to minimize the number of instructions per
program but at the cost of an increase in the number of cycles per instruction.
• Earlier when programming was done using assembly language, a need was felt to
make instruction do more tasks because programming in assembly was tedious and
error-prone due to which CISC architecture evolved but with the up-rise of high-level
language dependency on assembly reduced RISC architecture prevailed.
Example:
• Suppose we have to add two 8-bit numbers:
• CISC approach: There will be a single command or instruction for this like ADD which
will perform the task.
• RISC approach: Here programmer will first load command to load data in registers
then it will use a suitable operator and then it will store the result in the desired
location.
• So, add operation is divided into parts i.e. load, operate, store due to which RISC
programs are longer and require more memory to get stored but require fewer
transistors due to less complex command.
RISC CISC
Transistors are used for storing
Transistors are used for more registers
complex Instructions
Fixed sized instructions Variable sized instructions
Can perform only Register to Register Can perform REG to REG or REG to MEM or
Arithmetic operations MEM to MEM
Requires more number of registers Requires less number of registers
Code size is large Code size is small
An instruction executed in a single clock cycle Instruction takes more than one clock cycle
Instructions are larger than the size of one
An instruction fit in one word.
word
Simple and limited addressing modes. Complex and more addressing modes.
RISC is Reduced Instruction Cycle. CISC is Complex Instruction Cycle.
The number of instructions are less as The number of instructions are more as
compared to CISC. compared to RISC.
It consumes the low power. It consumes more/high power.
RISC is highly pipelined. CISC is less pipelined.
RISC required more RAM. CISC required less RAM.
Here, Addressing modes are less. Here, Addressing modes are more.
DATAPATHS
• Instead of having each individual register perform its
microoperations directly, computer systems often employ a
number of storage registers in conjunction with a shared
operation unit called an ALU.
• To perform a microoperation, the contents of specified source
registers are applied to the inputs of the shared ALU. The ALU
performs an operation, and the result of this operation is
transferred to a destination register.
• The combination of a set of registers with a shared ALU and
interconnecting paths is the datapath for the system.
• In addition to the registers, the datapath contains the digital logic
that implements the various microoperations. This digital logic
consists of buses, multiplexers, decoders, and processing circuits.
The control unit for the datapath directs the information
flow through the buses, the ALU, the shifter, and the
registers by applying signals to the select inputs.
For example, to perform the microoperation
R1←R2 + R3
Generic Datapath of a Computer
N-bit ALU
Arithmetic Circuit
Logic Circuit Diagram
Arithmetic/Logic Unit
The Shifter
Barrel Shifter
Datapath
Logic
Diagram
1
The Control Word
Datapath with Control Variables
Simulation of the
Microoperation
Sequence
in Table 7
A Simple Computer Architecture
Storage Resource Diagram for a Simple Computer
Instruction Formats
• The operation code of an instruction, often shortened to “opcode,” is a group of bits in
the instruction that specifies an operation, such as add, subtract, shift, or
complement.
• The opcode of an instruction specifies the operation to be performed. The operation
must be performed using data stored in computer registers or in memory.
• An instruction, therefore, must specify not only the operation, but also the registers or
memory words in which the operands are to be found and the result is to be placed.
Block Diagram for a Single-Cycle Computer
Diagram of Instruction
Decoder
*If there is to be a jump or branch, PL = 1, loading the PC. For PL = 0, the PC is
incremented. With PL = 1, JB = 1 calls for a jump, and JB = 0 calls for a conditional
branch.
Single-Cycle Computer Issues
Worst-Case Delay Path in
Single-Cycle Compute
Block Diagram for a Single-Cycle Computer
Performance of the Single-Cycle Design
An example combinational-logic data path to compute z := (u + v)(w – x) / y
Add/Sub Multiply Divide Total
latency latency latency latency
2 ns 6 ns 15 ns 23 ns
u
+
v
w
− / z
x
Beginning with inputs u, v, w, x, and y
y stored in registers, the entire computation
can be completed in 25 ns, allowing 1
ns each for register readout and write
Nov. 2014 Computer Architecture, Data Path and Control Slide 41
Performance Estimation for Single-Cycle Architecture
Instruction access 2 ns ALU-type P Not
C used
Register read 1 ns
ALU operation 2 ns
Data cache access 2 ns
Load P
Register write 1 ns C
Total 8 ns
Single-cycle clock = 125 MHz
P Not
Store used
C
R-type 44% 6 ns
Load 24% 8 ns
P Not Not Not
Store 12% 7 ns Branch
(and jr) C used used used
Branch 18% 5 ns
Jump 2% 3 ns
Weighted mean 6.36 ns Jump P
C
Not
used
Not
used
Not
used
Not
used
(except
jr & jal)
The MicroMIPS data path unfolded (by depicting the register write step as a
separate block) so as to better visualize the critical-path latencies.
Nov. 2014 Computer Architecture, Data Path and Control Slide 42
Single-Cycle vs. Multicycle MicroMIPS
Clock
Time
needed
Time
allotted Instr 1 Instr 2 Instr 3 Instr 4
Clock
Time Time
needed saved
3 cycles 5 cycles 3 cycles 4 cycles
Time
allotted Instr 1 Instr 2 Instr 3 Instr 4
Single-cycle versus multicycle instruction execution
Nov. 2014 Computer Architecture, Data Path and Control Slide 46
Performance of the Multicycle Design
R-type 44% 4 cycles ALU-type P Not
C used
Load 24% 5 cycles
Store 12% 4 cycles
Branch 18% 3 cycles
Load P
Jump 2% 3 cycles C
Not
Contribution to CPI Store P
C used
R-type 0.444 = 1.76
Load 0.245 = 1.20
Store 0.124 = 0.48 Branch P Not Not Not
C used used used
Branch 0.183 = 0.54 (and jr)
Jump 0.023 = 0.06
_____________________________
Average CPI 4.04 Jump P Not
used
Not
used
Not Not
used
(except C used
jr & jal)
The MicroMIPS data path unfolded (by depicting the register write step as a
separate block) so as to better visualize the critical-path latencies.
Nov. 2014 Computer Architecture, Data Path and Control Slide 49
How Good is Our Multicycle Design?
Clock rate of 500 MHz better than 125 MHz Cycle time = 2 ns
of single-cycle design, but still unimpressive Clock rate = 500 MHz
How does the performance compare with R-type 44% 4 cycles
Load 24% 5 cycles
current processors on the market?
Store 12% 4 cycles
Branch 18% 3 cycles
Not bad, where latency is concerned Jump 2% 3 cycles
A 2.5 GHz processor with 20 or so pipeline
Contribution to CPI
stages has a latency of about 0.4 20 = 8 ns R-type 0.444 = 1.76
Throughput, however, is much better for Load 0.245 = 1.20
the pipelined processor: Store 0.124 = 0.48
Branch 0.183 = 0.54
Up to 20 times better with single issue Jump 0.023 = 0.06
_____________________________
Perhaps up to 100 with multiple issue Average CPI 4.04
Nov. 2014 Computer Architecture, Data Path and Control Slide 50
Summary
• In the first part of the chapter, the concept of a computer datapath
for implementing computer microoperations was introduced.
• Among the major components of datapaths are register files, buses,
arithmetic/logic units (ALUs), and shifters.
• The control word provides a means of organizing the control of the
microoperations performed by the datapath.
• These concepts were combined to serve as a basis for exploring
computers.
• In the second part of the chapter, control design for programmed
systems was introduced by examining two different implementations
of basic control units for a simple computer architecture.
• We introduced the concept of instruction set architectures and
defined instruction formats and operations for the simple computer.
• The first implementation of this computer is capable of executing any
instruction in a single clock cycle. Aside from having a program
counter and its logic, the control unit of this computer consists of a
combinational decoder circuit.