COMPUTER ARCHITECTURE
1
GENERAL INFORMATION
Course : Computer Architecture (CSE 2213)
Instructor : Nawshin Tabassum Tanny
Lecturer,
Department of CSE, AUST
Email : tanny.cse@aust.edu
Contact# : 01644681387
Room# : 7A01/J
Acknowledgements: These slides contain some materials developed and
copyrighted by Swapna S. Gokhale
1
WHAT WILL WE LEARN?
• Computer Architecture
• The science and art of designing the
hardware/software interface and designing,
selecting, and interconnecting hardware
components to create a computing system that
meets functionality requirements, performance,
energy consumption, cost, and other specific goals.
2
TASKS OF A COMPUTER
ARCHITECT
• Determine which attributes are important for a new computer.
• Design a computer to maximize performance and energy efficiency while staying
within cost, power and availability constraints. This task has many aspects:
a) instruction set design
b) functional organization
c) logic design
d) implementation; which encompass
i. integrated circuit design
ii. packaging
iii. power and cooling
• Optimizing the design.
3
WHAT IS “COMPUTER
ARCHITECTURE” ?
• Computer Architecture =
Instruction Set Architecture + Computer Organization
• Instruction Set Architecture (ISA)
• WHAT the computer does (logical view)
• Computer Organization
• HOW the ISA is implemented (physical view)
• We will study both in this course
4
INSTRUCTION SET
ARCHITECTURE
• Instruction set architecture is the attributes of a computing
system as seen by the assembly language programmer or
compiler.
• Instruction Set (what operations can be performed?)
• Instruction Format (how are instructions specified?)
• Data storage (where is data located?)
• Addressing Modes (how is data accessed?)
• Exceptional Conditions (what happens if something goes
wrong?)
5
COMPUTER ORGANIZATION
• Computer organization is the view of the computer
that is seen by the logic designer. This includes
• Capabilities & performance characteristics of
functional units (e.g., registers, ALU, shifters, etc.).
• Ways in which these components are
interconnected
• How information flows between components
• Logic and means by which such information flow is
controlled
• Coordination of functional units
6
WHAT IS A COMPUTER?
• a computer is a electronic calculating machine that:
• Accepts digitized input information,
• Processes the information according to a list of internally
stored instructions and
• Produces the resulting output information.
• The list of instructions is called a computer program,
and the internal storage is called computer memory.
• Functions performed by a computer are:
• Accepting information to be processed as input.
• Storing a list of instructions to process the information.
• Processing the information according to the list of
instructions.
• Providing the results of the processing as output. 7
B ASIC FUNCTIONAL UNITS OF A
COMPUTER
Input unit accepts Arithmetic and logic unit(ALU):
information: •Performs the desired
•Human operators, operations on the input
•Electromechanical devices (keyboard) information as determined
•Other computers by instructions in the memory
Memory Arithmetic
Input Instr1 & Logic
Instr2
Instr3
Data1
Output Data2 Control
I/O Processor
Memory unit
Stores Control unit coordinates
Output unit sends various actions
results of processing: information:
•Instructions, •Input,
•To a monitor display, •Output
•To a printer •Data
•Processing
8
INFORMATION IN A
COMPUTER -- INSTRUCTIONS
• Instructions are explicit commands that:
• Transfer information within a computer (e.g., from memory to
ALU)
• Transfer of information between the computer and I/O devices
(e.g., from keyboard to computer, or computer to printer)
• Perform arithmetic and logic operations (e.g., Add two
numbers, Perform a logical AND).
• A sequence of instructions to perform a task is called a program,
which is stored in the memory.
• Processor fetches instructions that make up a program from the
memory and performs the operations stated in those
instructions.
• What do the instructions operate upon?
9
INFORMATION IN A
COMPUTER -- DATA
• Data are the “operands” upon which instructions operate.
• Data could be:
• Numbers,
• Encoded characters.
• Data, in a broad sense means any digital information.
• Computers use data that is encoded as a string of binary
digits called bits.
10
INPUT UNIT
Binary information must be presented to a computer in a specific format. This
task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world Computer
Memory
Keyboard
Audio input
Input Unit
……
Processor
11
MEMORY UNIT
• Memory unit stores instructions and data.
• Recall, data is represented as a series of bits.
• The memory contains a large number of semiconductor storage cells each
capable of storing one bit of information.
12
MEMORY UNIT
• Processor reads instructions and reads/writes data from/to the memory during
the execution of a program.
• In theory, instructions and data could be fetched one bit at a time.
• In practice, a group of bits is fetched at a time.
• Group of bits stored or retrieved at a time is termed as “word”
• Number of bits in a word is termed as the “word length” of a computer. Typical
word lengths range from 16 to 64 bits.
• In order to read/write to and from memory, a processor should know where to
look: “Address” is associated with each word location, addresses are numbers
that identify successive locations. (Memory address)
13
MEMORY UNIT (CONTD..)
• Processor reads/writes to/from memory based on the memory address:
• Access any word location in a short and fixed amount of time based on the
address.
• Random Access Memory (RAM) provides fixed access time independent of
the location of the word.
• Access time is known as “Memory Access Time”.
• Memory and processor have to “communicate” with each other in order to
read/write information.
• In order to reduce “communication time”, a small amount of RAM (known as
Cache) is tightly coupled with the processor.
• Modern computers have three to four levels of RAM units with different
speeds and sizes:
• Fastest, smallest known as Cache
• Slowest, largest known as Main memory.
14
MEMORY UNIT (CONTD..)
• There are 2 classes of storage called primary and secondary.
• Primary storage of the computer consists of RAM units.
• Fastest, smallest unit is Cache.
• Slowest, largest unit is Main Memory.
• Primary storage is insufficient to store large amounts of data and programs.
• Primary storage can be added, but it is expensive.
• Store large amounts of data on secondary storage devices:
• Magnetic disks and tapes,
• Optical disks (CD-ROMS).
• Access to the data stored in secondary storage in slower, but take advantage of the
fact that some information may be accessed infrequently.
• Cost of a memory unit depends on its access time, lesser access time implies higher
cost.
15
ARITHMETIC AND LOGIC
UNIT (ALU)
• Most computer operations are executed in the Arithmetic and Logic Unit (ALU).
• Arithmetic operations such as addition, subtraction.
• Logic operations such as comparison of numbers.
• In order to execute an instruction, operands need to be brought into the ALU
from the memory.
• Operands are stored in general purpose registers available in the ALU.
• Access times of general purpose registers are faster than the cache.
• Results of the operations are stored back in the memory or retained in the
processor for immediate use.
16
OUTPUT UNIT
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device and send processed results to the outside world.
Computer Real world
Memory Printer
Graphics display
Speakers
……
Output Unit
Processor
17
CONTROL UNIT
• Operation of a computer can be summarized as:
• Accepts information from the input units (Input unit).
• Stores the information (Memory).
• Processes the information (ALU).
• Provides processed results through the output units (Output unit).
• Operations of Input unit, Memory, ALU and Output unit are coordinated by
Control unit.
• Instructions control “what” operations take place (e.g. data transfer, processing).
• Control unit generates timing signals which determines “when” a particular
operation takes place.
18
HOW ARE THE FUNCTIONAL
UNITS CONNECTED?
•For a computer to achieve its operation, the functional units need to
communicate with each other.
•In order to communicate, they need to be connected.
Input Output Memory Processor
Bus
•Functional units may be connected by a group of parallel wires.
•The group of parallel wires is called a bus.
•Each wire in a bus can transfer one bit of information.
•The number of parallel wires in a bus is equal to the word length of
a computer
19
BUS STRUCTURES
❑ A group of lines that serves a connecting path for several devices is
called a bus
◆ In addition to the lines that carry the data, the bus must have lines for
address and control purposes
◆ The simplest way to interconnect functional units is to use a single bus,
as shown below (Single bus structure)
20
DRAWB ACKS & ADVANTAGES OF THE
SINGLE BUS STRUCTURE
• The devices connected to a bus vary widely in their speed of
operation
• Some devices are relatively slow, such as printer and keyboard
• Some devices are considerably fast, such as optical disks
• Memory and processor units operate are the fastest parts of a computer
• Efficient transfer mechanism thus is needed to cope with this
problem
• A common approach is to include buffer registers with the devices to
hold the information during transfers
Advantages of the Single Bus Structure:
❖Low cost
❖Flexibility for attaching peripheral devices
21
ORGANIZATION OF C ACHE
AND MAIN MEMORY
memory bus
Main Cache
memory memory Processor
Bus
Why is the access time of the cache memory lesser than the
access time of the main memory?
22
COMPUTER COMPONENTS:
TOP-LEVEL VIEW
System Bus= Data Bus+
Control Bus + Address
Bus + Memory Bus
23
Memory
BASIC OPERATIONAL
MAR MDR
Control
PC R0
R1
Processor
IR
ALU
Rn - 1
n general purpose
registers
CONCEPTS
Figure 1.2. Connections between the processor and the memory.
24
REVIEW
• Activity in a computer is governed by instructions.
• To perform a task, an appropriate program consisting of a list
of instructions is stored in the memory.
• A Program = A sequence of instructions : Assembly
language or Machine language instructions
• Individual instructions are brought from the memory into the
processor, which executes the specified operations.
• Data to be used as operands are also stored in the memory.
25
A TYPICAL INSTRUCTION
• MOV LOCA, R0
• General format:
Instruction = Operation source_operand destination_operand
• Moves the operand at memory location LOCA to the operand in a register
R0 in the processor.
• Simply: Moves the contents of Memory Location LOCA to the processor
register R0
• The original contents of LOCA are preserved.
• The original contents of R0 is overwritten.
• Instruction that Moves data from Memory to Register is called LOAD
instruction (e.g., MOV LOCA, R0)
• Instruction that moves data from Register to Memory is called STORE
instruction (e.g., MOV R0, LOCA)
26
ANOTHER TYPIC AL
INSTRUCTION
• ADD LOCA, R0
• General format:
Instruction = Operation Source_operand Destination_operand
• Add the operand at memory location LOCA to the operand in a register
R0 in the processor.
• Place the sum into register R0.
• The original contents of LOCA are preserved.
• The original contents of R0 is overwritten.
• Instruction is fetched from the memory into the processor – the
operand at LOCA is fetched and added to the contents of R0 – the
resulting sum is stored in register R0.
27
LOAD AND STORE INSTRUCTIONS
TO TRANSFER FROM/TO MEMORY
TO/FROM REGISTERS
Summary:
• MOV LOCA, R1 = means => Bring the content of memory location A into
Register R1
• MOV R2, LOCB = means => save the value of register R2 in memory location
B
• ADD R1, R0 == means => R0 [R0] + [R1] (Add the contents of both the
registers R0 and R1 and store into register R0
• For ADD, whose contents will be overwritten? (R0)
• Load and Store Instructions
• LOAD LOCA, R1 equivalent to MOV LOCA, R1
• STORE R2, LOCB equivalent to MOV R2, LOCB
28
EXAMPLES OF A FEW REGISTERS:
• Instruction register (IR): Holds the instruction that is currently executing by
the CPU
not process not control
• Program counter register (PC): Points to (i.e., holds the address of) the next
instruction that will be fetched from the memory to be executed by the CPU
• General-purpose registers (R0 – Rn-1): generally holds the operands for
executing the instructions of current program
• Memory address register (MAR): Holds the memory address to be read. A
read signal from the CPU to the memory module reads the word address held by
the MAR register
• Memory data register (MDR): Contains the data to be written into or read
out of the addressed location i.e Facilitates the transfer of operands/data to/from
Memory from/to the CPU.
29
EXECUTING A PROGRAM ... B ASIC
OPERATING STEPS
• Programs reside in the main memory (RAM) through input devices
• PC register’s value is set to the first instruction
Repeat the following Steps Until the “END” instruction is
executed
•Instruction fetch: -The contents of PC are transferred to MAR
CU=Control Unit
-A Read signal is sent by CU to the memory
-The Memory module reads out the location addressed by MAR
register. The contents of that location is loaded into (returned by)
MDR PC->MAR->Memory->MDR->IR
-The contents of MDR are transferred to IR register
❑ Decode and execute -At this point, the instruction is ready to be decoded
and executed. Instruction in the IR is examined (decoded) to
determine which operation is to be performed.
- Get operands for ALU: Fetch the operands from the memory or
registers. 30
EXECUTING A PROGRAM ...
BASIC OPERATING STEPS…
-The operand may already in a General-purpose register
-Or, may be fetched from Memory (send address to MAR – send Read
signal to Memory module – Wait for MFC signal (WMFC) from Memory
– Get the operand/data from MDR)
• Perform operation in ALU
• Store the result back
➢ Store in a general-purpose register
➢ Or, store into memory (send the write address to MAR, and send result
to MDR – Write signal to Memory – WMFC)
➢ WMFC = Wait for Memory Function Complete Signal
• Meanwhile, PC is incremented to the next instruction
• Some Examples: Add R0, R1; Add (R0), R1; Add 50(R0), R1;
31
INTERRUPT
• Normal execution of programs may be interrupted if some device requires
urgent servicing program etc
• To deal with the situation immediately, the normal execution of the
current program must be interrupted
• Procedure of interrupt operation
• The device raises an interrupt signal
• The processor provides the requested service by executing an
appropriate interrupt-service routine
• The state of the processor is first saved before servicing the interrupt
• Normally, the contents of the PC, the general registers, and some control
information are stored in memory
• When the interrupt-service routine is completed, the state of the
processor is restored so that the interrupted program may continue
32
CLASSES OF INTERRUPTS
• Program
• Generated by some condition that occurs as a result of an instruction
execution such as arithmetic overflow, division by zero, attempt to
execute an illegal machine instruction, or reference outside a user’s
allowed memory space
• Timer
• Generated by a timer within the processor. This allows the operating
system to perform certain functions on a regular basis
• I/O
• Generated by an I/O controller, to signal normal completion of an
operation or to signal a variety of error conditions
• Hardware failure
• Generated by a failure such as power failure
33
SOFTWARE
• In order for a user to enter and run an application program, the
computer must already contain some system software in its memory
• System software is a collection of programs that are executed as
needed to perform functions such as
• Receiving and interpreting user commands
• Running standard application programs such as word processors,
etc, or games
• Managing the storage and retrieval of files in secondary storage
devices
• Controlling I/O units to receive input information and produce
output results
34
SOFTWARE
• Translating programs from source form prepared by the user into
object form consisting of machine instructions
• Linking and running user-written application programs with existing
standard library routines, such as numerical computation packages
• System software is thus responsible for the coordination of all
activities in a computing system
35
OPERATING SYSTEM
• Operating system (OS)
• This is a large program, or actually a collection of routines,
that is used to control the sharing of and interaction among
various computer units as they perform application programs
• The OS routines perform the tasks required to assign
computer resource to individual application programs
• These tasks include assigning memory and magnetic disk
space to program and data files, moving data between
memory and disk units, and handling I/O operations
36
PERFORMANCE
• The most important measure of a computer is how quickly it
can execute programs i.e., Runtime of programs. The speed
with which a computer executes programs is affected by the
design of its hardware and its machine language instructions.
Because programs are usually written in a high-level language,
performance is also affected by the compiler that translates
programs into machine languages.
• For best performance, the following factors must be
considered
• Compiler
• Instruction set
• Hardware design
37
PERFORMANCE
• Three factors affect performance:
➢ Hardware design (e.g., CPU clock rate)
➢ 1GHz CPU => 1 Billion Hz => 109 clock cycles/sec
(Hz=cycles/sec)
➢ 1 basic operation (e.g., integer addition) possible in 1 cycle => 1
billion basic operations (109 integer additions!) possible in 1
sec!!! WOW!!!
➢ 1Mhz => 1 Million Hz => 106 clock cycles/sec
➢ Instruction set architecture (ISA) (e.g., CISC or RISC ISA?)
➢ CISC => instructions complex, more capable, but runs slower
complex instruction set computer
➢ RISC => instructions Simple, runs faster, but less capable
reduced instruction set computer
➢ Compiler (how efficient your compiler to optimize your code for
pipelining...etc?)
38
PERFORMANCE
• Processor circuits are controlled by a timing
signal called a clock
• The clock defines regular time intervals, called clock cycles
• To execute a machine instruction, the processor
divides the action to be performed into a
sequence of basic steps, such that each step can
be completed in one clock cycle
• Let the length P of one clock cycle, its inverse is
the clock rate, R=1/P
39
PROCESSOR CLOCK
• Clock, clock cycle, and clock rate
• Clock Rate = 1 GHz = 109 Hz = 109 cycles/second or 109
clock
pulses per second !!! WOW!!! It also means
it has a Clock Cycle of 1/109 =10-9 sec = 1
ns (nano-second).
• 4GHz CPU => 4x109 cy/sec => 1 clock cycle = 0.25 ns
• 500 MHz => 500x106 cycles/sec => 2 ns clock pulses
• 1 MHz = 106 cycles/sec; 1KHz=103 cycles/sec
• 1GHz=1000MHz, 1MHz=1000KHz, 1KHz=1000Hz
• Hz (Hertz) – cycles per second (clock cycles / second)
40
BASIC PERFORMANCE
EQUATION
• T – processor time required to execute a program that may have been prepared in high-
level language
• N – Dynamic Instruction Count. It is the number of actual machine language
instructions needed to complete the execution (note: A single 1-line loop may execute
more than a billion times !!! ) Unit: Instruction
• S – average number of basic steps (or, clock cycles) needed to execute one machine
instruction. Each basic step completes in one clock cycle. Unit: cycles/instruction
S might not be provided in question, then S=1
• R – clock rate: cycles/sec
• Note: these are not independent to each other
• How to improve T?
- reduce N x S, Increase R
41
BASIC PERFORMANCE
EQUATION
⚫T–program execution time. Unit: second
⚫ N – Unit: instructions
⚫ S – Unit: cycles/instructions
⚫ R–clock rate: cycles/second
Example: A program with dynamic instruction count (N) of
1000 instructions, each instruction taking 5 cycles on average
(S=5 cycles/instruction) and running at a speed of 1KHZ (R =
103 0r 1000 cycles/second), what will be the program
execution time T?
Ans: T= 1000 instructions x 5cycles/ instruction
1000 cycles/sec
= 5 sec
42
OVERVIEW
• The execution time T of a program that has a dynamic instruction count N
is given by:
Here S is the average number of clock cycles it takes to fetch and execute
one instruction, and R is the clock rate. (The dynamic instruction count N is
computed considering loops, repeated function calls, recursion, etc!)
• Instruction throughput is defined as the number of instructions executed per
second.
4
3
PERFORMANCE IMPROVEMENT
• Pipelining and superscalar operation
• Pipelining: by overlapping the execution of successive instructions
• Superscalar: different instructions are concurrently executed with
multiple instruction pipelines. This means that multiple functional units are
needed
• Clock rate improvement
• Improving the integrated-circuit technology
makes logic circuits faster, which reduces the
time needed to complete a basic step
44
PERFORMANCE IMPROVEMENT
• Reducing amount of processing done in one basic
step also makes it possible to reduce the clock
period, P.
• However, if the actions that have to be performed
by an instruction remain the same, the number of
basic steps needed may increase
• Reduce the number of basic steps to execute
• Reduced instruction set computers (RISC) and complex instruction set
computers (CISC)
45
I MP ROV I NG PE R FOR MANCE: E FFE CT OF I NST RUCT ION
SE T AR CH I T E CT URES ( I SA) , E . G ., CI SC AND R I SC I SA
➢ Reduced Instruction Set Computers (RISC): simpler
instructions => N ↑, S ↓, Better than CISC, because Pipelining is
more effective for RISC!!
• Complex Instruction Set Computers (CISC):
Complex instructions => N ↓ , S ↑, Not Good, As not suitable for
Pipelining!! Instructions complex, more capable => the program gets
smaller in size (reduced N), but complex instructions increase S and
hampers/stalls pipeline. Example of CISC: Intel processors
• So, A key consideration is the use of Pipelining
1
➢ S is close to 1, means the number of cycles per instruction is nearly
ideal / small (close to 1) (e.g. RISC processors)
➢ RISC is Better, because easier to implement efficient pipelining
with simpler instruction sets. (example of RISC architecture: ARM
processors 46
PERFORMANCE MEASUREMENT
• T is difficult to compute. Also, T has inappropriate unit (second) for commercial
use.
• Measure computer performance using benchmark programs (a set of sample
programs, e.g., word processing programs, games, media (audio/video) playback ,
I/O intensive programs, etc ...).
• System Performance Evaluation Corporation (SPEC) selects and
publishes representative application programs for different application domains,
together with test results for many commercially available computers.
• Reference computer: A previous, renowned computer system, picked by SPEC
That means geometric mean of multiple
test runs
47
PERFORMANCE MEASUREMENT
• T is difficult to compute. Also, T has inappropriate unit (second) for commercial
use.
• Measure computer performance using benchmark programs (a set of sample
programs, e.g., word processing programs, games, media (audio/video) playback ,
I/O intensive programs, etc ...).
• System Performance Evaluation Corporation (SPEC) selects and
publishes representative application programs for different application domains,
together with test results for many commercially available computers.
• Reference computer: A previous, renowned computer system, picked by SPEC
48
REFERENCE BOOK
Computer Organization : Carl Hamacher
49