Unit 1
Introduction(6 hours)
Contents
1.1 Organization and architecture
1.2 Structure and function
1.3 The evolution of computer architecture (RISC, CISC, BERKELEY RISC I,
overlapped register window)
1.4 Performance assessment
1.4.1 Clock speed and instruction per second
1.4.2 Instruction execution rate: CPI, MIPS Rate, MFLOPS rate, arithmetic mean,
harmonic mean, speed metric, geometric mean, rate metric, Amdahl’s law, speed up
1.5 Computer function
1.5.1 Instruction fetch and execute
1.5.2 Instruction cycle state diagram
1.6 Interconnection structure: bus interconnection, multilevel bus hierarchy, PCI
5/23/2025 Compiled by Yatru Harsha Hiski 2
Overview
FIGURE: A Typical Computer Advertisement
5/23/2025 Compiled by Yatru Harsha Hiski 3
Explanation of Ad Short Form /
Full Form / Meaning
Term
Short Form / Level 2 Cache – Slightly slower but larger
Full Form / Meaning L2 cache
Term than L1 (2MB here).
Intel Core i7 – A high-performance CPU 1 Terabyte Serial ATA – Hard disk with
Intel i7 1TB SATA
family from Intel. 1000 GB storage using SATA interface.
CPU with 4 processing cores for parallel 7200 Revolutions Per Minute – Speed of
Quad Core 7200 RPM
processing. the hard disk spinning.
3.9 Gigahertz – Processor speed; 1 GHz = 1 USB Universal Serial Bus – Standard port for
3.9GHz connecting peripherals.
billion cycles/sec.
DDR3 Double Data Rate 3 Synchronous Dynamic PCI / PCIe Peripheral Component Interconnect /
SDRAM RAM – Older type of system memory. Express – Expansion slots for cards.
PCI Express lanes – x16 is for high
Memory speed – 1600 million cycles per
1600MHz x16 / x1 bandwidth (graphics), x1 is for smaller
second.
devices.
32 Gigabytes – Amount of RAM (system High-Definition Multimedia Interface –
32GB HDMI
memory). For digital video/audio output.
Level 1 Cache – Fastest memory located Liquid Crystal Display – Type of monitor
L1 cache LCD
inside CPU, very small (128KB here). screen technology.
5/23/2025 Compiled by Yatru Harsha Hiski 4
Explanation of Ad
Short Form
Full Form / Meaning Short Form /
/ Term Full Form / Meaning
Term
24” Monitor screen size – 24 inches diagonally. Can read/write both CD and
Aspect ratio – Ratio of width to height of the CD/DVD ± RW DVD disks in + or - formats.
16:10
screen.
Dedicated graphics card with 1
1920×1200 Widescreen Ultra eXtended Graphics Array – 1GB PCIe video GB memory, connected via PCI
WUXGA High resolution display. card
Express.
300 cd/m² Brightness – 300 candelas per square meter.
Separate sound card connected
Active Type of LCD tech with better image quality PCIe sound card
through PCI Express.
matrix and response time.
Wired network interface with
1000:1 Contrast ratio – Difference between brightest Ethernet
contrast
10 Mbps, 100 Mbps, or 1 Gbps
white and darkest black. (10/100/1000)
speeds.
8 milliseconds – Response time of the
8ms
monitor.
24-bit color 16.7 million colors – Display color depth.
Video Graphics Array / Digital Visual Interface
VGA / DVI
– Monitor connectors.
5/23/2025 Compiled by Yatru Harsha Hiski 5
Computer Architecture
Definition:
Computer architecture refers to the attributes of a system visible to the programmer and those that affect
the logical execution of programs.
Includes:
Instruction Set Architecture (ISA)
Instruction formats and opcodes
Registers and memory addressing
Effects of instruction execution
Input/Output mechanisms
Data types and representation (e.g., number of bits for integers, characters)
Purpose:
Defines what a computer is supposed to do. It's a high-level design concern focused on functionality and
program behavior.
Example:
Deciding whether the system should support a "multiply" instruction.
Stability:
Architectures often remain consistent across many generations of computers to ensure software
compatibility (e.g., IBM System/370).
5/23/2025 Compiled by Yatru Harsha Hiski 6
Computer Organization
Definition:
Computer organization refers to the physical and operational structure of a computer—how the architectural
specifications are implemented.
Includes:
Control signals and data paths
Hardware components (ALU, memory units, buses)
Interfaces with peripherals
Memory technology
Purpose:
Defines how a computer performs operations—the internal working and construction of the system.
Example:
Choosing whether to implement the "multiply" instruction using a dedicated multiply unit or repeated addition
logic.
Flexibility:
Organizational changes often occur to improve performance or reduce cost without altering the architecture. For
example, various IBM System/370 models differ in organization but share the same architecture.
5/23/2025 Compiled by Yatru Harsha Hiski 7
Computer Organization and Architecture
1. Computer Organization
Focuses on the operational structure of the computer.
Deals with how components work and interact at the hardware
level.
Topics include:
Control Unit
ALU (Arithmetic Logic Unit)
Registers
Memory Hierarchy
I/O Mechanisms
Micro-operations
5/23/2025 Compiled by Yatru Harsha Hiski 8
Computer Organization and Architecture
2. Computer Architecture
Focuses on the design principles and programmer’s perspective.
Deals with what a computer system does, not how.
Topics include:
Instruction Set Architecture (ISA)
Aspect Architecture Organization
Addressing Modes Design & Physical
What is it?
Data Types Specification Implementation
Instruction sets, Data paths, control
Memory Formats Concerned With
addressing modes signals, memory
Performance Metrics (CPI, MIPS, etc.) Computer
Hardware
Engineers,
Who is interested? Architects, System
Microprocessor
Designers
Designers
5/23/2025 Compiled by Yatru Harsha Hiski 9
Computer Organization vs Computer
Architecture
Computer Architecture Computer Organization
It is the description of what the computer does. It is the description of how the computer does things.
It refers to the operational units and their
It refers to those attributes of a system that have a direct
interconnections that realize the architectural
impact on the logical execution of a program.
specifications.
A programmer can view architecture in terms of
Organization expresses the realization of architecture.
instructions, addressing modes and registers.
While designing a computer system architecture is
An organization is done on the basis of architecture.
considered first.
Computer Architecture deals with high-level design Computer Organization deals with low-level design
issues. issues.
Architecture involves Logic (Instruction sets, Organization involves Physical Components (Circuit
Addressing modes, Data types, Cache optimization) design, Adders, Signals, Peripherals)
For example, it is architectural design issues on what For example, it is an organizational issue whether to
type of instructions are to be included, whether to use implement a special purpose unit or make use the pre-
direct or indirect addressing for accessing memory and existing unit, for instance, to implement multiply
so on. instruction, a special multiply unit can be used or an
5/23/2025 Compiled by Yatruadd unit
Harsha can be used repeatedly.
Hiski 10
Two Basic Computer Architectures
1. Von Neumann 2. Harvard
5/23/2025 Compiled by Yatru Harsha Hiski 11
Von Neumann Architecture
Based on Stored program concept
The concept holds that:
Data and instructions should be stored
together in same memory area of computer
Execution occurs in sequential fashion (unless
explicitly modified) from one instruction to
next
Same signal pathways and memory for data
and instructions (CPU do one thing at a time,
either read/write data or read instruction)
E.g.: desktop personal computer
5/23/2025 Compiled by Yatru Harsha Hiski 12
Harvard Architecture
Physically separate memory and
pathways for instructions and data
CPU can read both instructions and data
from memory at the same time
Have double memory bandwidth
E.g.: Digital signal processor (DSP) based
computer system
5/23/2025 Compiled by Yatru Harsha Hiski 13
Structure and Function
A computer is a complex machine made of many small
electronic components.
To understand or design it better, we look at it in a
hierarchical way, meaning we break it down into layers or
levels. At each level, we focus on two things:
1. Structure: How the parts are connected.
2. Function: What each part does.
5/23/2025 Compiled by Yatru Harsha Hiski 14
Function
Refers to operations each component performs.
Computers perform only four basic functions:
Data Processing: Manipulation of data through operations (e.g.,
addition, comparison).
Data Storage: Holding data temporarily (RAM, cache) or
permanently (SSD, HDD).
Data Movement: Input/output operations like keyboard entry or
display output.
Control: Coordination of all operations via the control unit.
5/23/2025 Compiled by Yatru Harsha Hiski 15
Structure
Describes the relationship among different components: CPU, memory,
I/O, buses.
Example: A CPU is composed of the control unit, ALU, and registers.
Figure: The Computer
5/23/2025 Compiled by Yatru Harsha Hiski 16
Simple Single-processor Computer
The hierarchical view of the internal structure of a traditional single-
processor computer is given in the figure. There are four main
structural components:
1. Central processing unit (CPU): Controls the operation of the
computer and performs its data processing functions; often simply
referred to as processor.
2. Main memory: Stores data.
3. I/O: Moves data between the computer and its external environment.
4. System interconnection: Some mechanism that provides for
communication among CPU, main memory, and I/O. A common
example of system interconnection is by means of a system bus,
consisting of a number of conducting wires to which all the other
components attach.
5/23/2025 Compiled by Yatru Harsha Hiski 17
Simple Single-processor
Computer
The major structural components of a
simple single-processor computer are as
follows:
Control unit: Controls the operation of
the CPU and hence the computer.
Arithmetic and logic unit (ALU):
Performs the computer’s data processing
functions.
Registers: Provides storage internal to
the CPU.
CPU interconnection: Some
mechanism that provides for
communication among the control unit,
ALU, and registers.
5/23/2025 Compiled by Yatru Harsha Hiski 18
Figure The Computer: Top-Level Structure
Multicore Computer Structure
Multicore computer structure, contemporary computers generally have
multiple processors. When these processors all reside on a single chip, the term
multicore computer is used, and each processing unit (consisting of a control
unit, ALU, registers, and perhaps cache) is called a core.
Central processing unit (CPU): That portion of a computer that fetches and executes
instructions. It consists of an ALU, a control unit, and registers. In a system with a single
processing unit, it is often simply referred to as a processor.
Core: An individual processing unit on a processor chip. A core may be equivalent in
functionality to a CPU on a single-CPU system. Other specialized processing units, such as
one optimized for vector and matrix operations, are also referred to as cores.
Processor: A physical piece of silicon containing one or more cores. The processor is the
computer component that interprets and executes instructions. If a processor contains
multiple cores, it is referred to as a multicore processor.
5/23/2025 Compiled by Yatru Harsha Hiski 19
Multicore Computer
Structure
In general terms, the functional
elements of a core are:
Instruction logic: This includes the
tasks involved in fetching instructions,
and decoding each instruction to
determine the instruction operation and
the memory locations of any operands.
Arithmetic and logic unit (ALU):
Performs the operation specified by an
instruction.
Load/store logic: Manages the transfer
of data to and from main memory via
cache.
5/23/2025 Compiled by Yatru Harsha Hiski 21
The Evolution of Computer Architecture
The evolution of computer architecture reflects the progression from
complex to simpler and more efficient instruction designs.
This evolution is centered around the ideas of CISC (Complex
Instruction Set Computing) and RISC (Reduced Instruction Set
Computing), with key innovations such as the Berkeley RISC I and
Overlapped Register Windows playing vital roles.
5/23/2025 Compiled by Yatru Harsha Hiski 22
CISC (Complex Instruction Set
Computing)
Era: 1960s–1970s
Philosophy: Provide rich, complex instructions to reduce the number of instructions per
program.
Features:
Large instruction set (hundreds of instructions)
Variable instruction lengths
Instructions that combine multiple low-level operations (e.g., memory access and arithmetic)
Microcoded control unit
Examples: Intel x86, Digital Equipment Corporation VAX computer and the IBM 370
computer.
Problems with CISC:
Complex hardware to decode and execute instructions
Slower clock speeds due to complexity
Harder to pipeline due to variable instruction lengths
5/23/2025 Compiled by Yatru Harsha Hiski 23
CISC Characteristics
The major characteristics of CISC architecture are:
1. A large number of instructions-typically from 100 to 250 instructions
2. Some instructions that perform specialized tasks and are used
infrequently
3. A large variety of addressing modes-typically from 5 to 20 different
modes
4. Variable-length instruction formats
5. Instructions that manipulate operands in memory
5/23/2025 Compiled by Yatru Harsha Hiski 24
RISC (Reduced Instruction Set
Computing)
Era: 1980s onwards
Philosophy: RISC architecture is designed to reduce execution time
by simplifying the instruction set and using efficient hardware
techniques like pipelining.
Features:
Small, optimized instruction set
Fixed instruction length (commonly 32-bit)
Load/store architecture (only load/store can access memory)
Simple addressing modes
Emphasis on register usage
Efficient pipelining and compiler optimization
5/23/2025 Compiled by Yatru Harsha Hiski 25
RISC Characteristics
The concept of RISC architecture involves an attempt to reduce execution time by
simplifying the instruction set of the computer. The major characteristics of a
RISC processor are:
1. Relatively few instructions
2. Relatively few addressing modes
3. Memory access limited to load and store instructions
4. All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction format
6. Single-cycle instruction execution
7. Hardwired rather than microprogrammed control
5/23/2025 Compiled by Yatru Harsha Hiski 26
Overlapped Register Windows
Introduced in: Berkeley RISC and SPARC architectures
Problem Addressed: Function calls cause overhead due to saving and
restoring registers
Solution:
Divide registers into windows
On function call, shift to a new register window
Each window overlaps with the previous one, sharing parameters
Reduces memory traffic for saving/restoring registers
Benefits:
Faster function calls and returns
Less memory usage during nested calls
5/23/2025 Compiled by Yatru Harsha Hiski 27
Overlapped Register Windows
Purpose and Motivation
Procedure calls and returns are frequent in high-level languages and involve
saving/restoring registers and passing parameters/results.
Traditional methods like memory stacks are time-consuming due to memory access
delays.
Overlapped register windows aim to reduce overhead and improve efficiency in
procedure calls.
Concept of Overlapped Register Windows
Each procedure call activates a new register window using a pointer.
Windows overlap with adjacent procedures to share registers for parameters/results
without copying.
Only one window is active at a time.
5/23/2025 Compiled by Yatru Harsha Hiski 28
Overlapped Register Windows
Advantages
No need to save/restore register values during procedure calls.
Parameters passed automatically via overlapping registers.
Faster execution due to reduced memory access.
General Formulas
Organization of register windows will have the following relationships:
The number of registers available for each window is calculated as follows:
Window size = L + 2C + G
The total number of registers needed in the processor is.
Total registers in file = (L + C) × W + G
•Where:
•G = number of global registers
•L = number of local registers in each window
•C = number of registers common to two windows
•W = number of windows
5/23/2025 Compiled by Yatru Harsha Hiski 29
Overlapped Register
Windows
In the example of Figure we have G = 10,
L = 10, C = 6, and W = 4.
The window size is 10 + 12 + 10 = 32
registers, and the register file consists of
(10 + 6) x 4 + 10 = 74 registers.
Figure: Overlapped register windows.
5/23/2025 Compiled by Yatru Harsha Hiski 30
Overlapped Register
Windows
For an example, suppose that procedure A calls procedure B. Registers
R26 through R31 are common to both procedures, and therefore
procedure A stores the parameters for procedure B in these registers.
Procedure B uses local registers R32 through R41 for local variable
storage.
If procedure B calls procedure C, it will pass the parameters through
registers R42 through R47. When procedure B is ready to return at the end
of its computation, the program stores results of the computation in
registers R26 through R31 and transfers back to the register window of
procedure A.
Note that registers R10 through R15 are common to procedures A and D
because the four windows have a circular organization with A being
adjacent to D.
5/23/2025 Compiled by Yatru Harsha Hiski 31
Overlapped Register
Windows
As mentioned previously, the 10 global registers R0 through R9 are
available to all procedures. Each procedure has available a total of 32
registers while it is active.
This includes 10 global registers, 10 local registers, six low overlapping
registers, and six high overlapping registers.
Other fixed size register window schemes are possible, and each may differ
in the size of the register window and the size of the total register file.
5/23/2025 Compiled by Yatru Harsha Hiski 32
Berkeley RISC I
General Overview
Developed at University of California, Berkeley.
Among the first RISC architectures to demonstrate the benefits of the RISC
concept.
Implemented as a 32-bit integrated circuit CPU.
Architecture Features
32-bit:
Address bus
Data (supports 8-, 16-, or 32-bit data)
Instruction format
Instruction set: Only 31 instructions (simple, fast operations).
Addressing modes:
Register addressing
Immediate operand
Relative to PC (for branch instructions)
5/23/2025 Compiled by Yatru Harsha Hiski 33
Berkeley RISC I
Register File and Windows
138 registers total:
10 global registers
8 register windows with 32 registers each
Each window includes local and overlapping registers (like the overlapped register
window model).
Only one 32-register window active at a time.
A 5-bit field is enough to select any register (2⁵ = 32).
5/23/2025 Compiled by Yatru Harsha Hiski 34
Berkeley RISC I
Instruction Formats
All instructions are 32 bits wide.
Three types of instruction formats:
Register-to-register
Memory access
Branch and jump (19-bit relative address)
Opcode:
7 bits for the operation.
1 bit to indicate status flag update after ALU operations.
Operand fields:
Rd: Destination register (5 bits)
Rs: First source register
S2: Second source register or a 13-bit immediate constant (based on bit 13)
Memory access:
Rs contains base address
S2 is the offset
5/23/2025 Compiled by Yatru Harsha Hiski 35
Berkeley RISC I
Special Features
Register R0 with all 0's (used to specify zero in any field).
COND field: Used in jump instructions to specify 1 of 16 branch conditions.
All instructions use a three-operand format.
Instruction Set
31 total instructions, categorized into:
1. Data manipulation (arithmetic, logic, shift)
2. Register transfer
3. Control flow
Second operand (S2) can be register or immediate (denoted by # in assembly).
5/23/2025 Compiled by Yatru Harsha Hiski 36
Instruction Set of
Berkeley RISC I
Figure: Berkeley RISC I instruction formats.
5/23/2025 Compiled by Yatru Harsha Hiski 37
Berkeley RISC I
Consider, for example, the ADD instruction and how it can be used to
perform a variety of operations.
ADD R22, R21, R23 R23 ← R22 + R21
ADD R22, #150, R23 R23 ← R22 + 150
ADD R0, R21, R22 R22 ← R21 (Move)
ADD R0, #150, R22 R22 ← 150 (Load Immediate)
ADD R22, #1, R22 R22 ← R22 + 1 (Increment)
5/23/2025 Compiled by Yatru Harsha Hiski 38
Berkeley RISC I
The following are examples of load long instructions with different
addressing modes.
LDL (R22)#150, R5 R5 ← M[R22] + 150
LDL (R22)#0, R5 R5 ← M[R22]
LDL (R0)#500, R5 R5 ← M[500]
5/23/2025 Compiled by Yatru Harsha Hiski 39
Performance Assessment
Performance Assessment refers to the systematic evaluation of a
computer system’s efficiency, speed, and capability to execute
programs and tasks.
It involves analyzing various quantitative metrics that indicate how
well a computer system performs under specific conditions or workloads.
5/23/2025 Compiled by Yatru Harsha Hiski 40
Clock Speed and Instructions Per
Second
Clock Speed:
Measured in Hertz (Hz), typically GHz today.
Indicates how many clock cycles occur per second.
Each instruction may take multiple clock cycles to execute.
Instructions Per Second (IPS):
Indicates how many instructions the CPU can execute per second.
Depends on both clock speed and CPI (Cycles Per Instruction).
Not always a reliable performance metric because different ISAs have
instructions of varying complexity.
5/23/2025 Compiled by Yatru Harsha Hiski 41
Instruction Execution Rate
a. CPI (Cycles Per Instruction):
Average number of clock cycles each instruction takes.
Formula:
Total Clock Cycles
CPI=
Total Instructions
Lower CPI generally implies better performance.
b. MIPS (Million Instructions Per Second):
Formula:
Clock Speed (in MHz)
MIPS =
CPI
Does not account for instruction complexity, so it's not always a good comparison
metric across architectures.
5/23/2025 Compiled by Yatru Harsha Hiski 42
Instruction Execution Rate
c. MFLOPS (Million Floating Point Operations Per Second):
Measures performance in scientific applications where floating point operations
dominate.
Formula:
Number of FP operations
MFLOPS=
Execution Time
5/23/2025 Compiled by Yatru Harsha Hiski 43
Benchmarks
Benchmarks are standardized tests or programs used to measure
and evaluate the performance of computer systems, components (like
CPUs, memory, or storage), or software.
They provide a quantitative basis for comparing the speed,
efficiency, and capabilities of different systems under consistent and
repeatable workloads.
Purpose of Benchmarks
To evaluate performance of hardware/software components.
To compare systems (e.g., two CPUs or GPUs).
To identify bottlenecks in processing, memory, or I/O.
To assist in optimization, design, and purchasing decisions
5/23/2025 Compiled by Yatru Harsha Hiski 44
Types of Benchmarks
Type Description Example
Designed specifically to test LINPACK (floating-point),
Synthetic Benchmarks
certain features or workloads Dhrystone (integer)
Real-world applications used for Microsoft Word load time,
Application Benchmarks
testing Photoshop rendering
Test specific portions of
Kernel Benchmarks Matrix multiplication kernel
programs (e.g., loops, I/O)
Focus on specific hardware PassMark, 3DMark,
Component Benchmarks
(CPU, GPU, Disk, etc.) CrystalDiskMark
5/23/2025 Compiled by Yatru Harsha Hiski 45
Performance Averages and Metrics
When comparing systems using multiple benchmarks, we need to use
mathematical averages. Choosing the right average is critical depending on
what you're measuring.
1. Arithmetic Mean (AM)
Used to average performance across several benchmarks.
Formula:
Example:
Execution times of a program on 3 systems: 2s, 3s, 5s.
AM =
Not suitable for averaging rates like "speed" or "performance per unit
time".
5/23/2025 Compiled by Yatru Harsha Hiski 46
Performance Averages and Metrics
2. Harmonic Mean (HM)
More appropriate for averaging rates (like CPI or execution time).
Formula:
Example:
MIPS values of a CPU on 3 tasks: 20, 30, and 40.
HM =
Why HM? It gives a more conservative average for rates, especially when one of
the rates is significantly lower.
5/23/2025 Compiled by Yatru Harsha Hiski 47
Performance Averages and Metrics
3. Geometric Mean (GM)
Best for comparing relative performance across multiple benchmarks.
Formula:
Example:
Performance ratios of CPU A vs CPU B on 3 benchmarks: 2x, 4x, 0.5x.
GM =
So, CPU A is ~1.59 times faster than CPU B overall.
Why GM? It neutralizes the effect of outliers and is commonly used in SPEC
benchmarks.
5/23/2025 Compiled by Yatru Harsha Hiski 48
Performance Metric Types
These metrics categorize how performance is measured. Each is used in different
scenarios.
1. Speed Metric
Measures how fast a single task completes.
Focus: Time per task (lower is better).
Examples:
Execution Time: Program A takes 2 seconds.
CPI (Cycles Per Instruction): Lower CPI = better performance.
Old Time
Speedup =
New Time
Use case: Measuring latency – e.g., "How fast can this CPU render a
frame?"
Example:
System A takes 5 seconds to sort a list; System B takes 2.5 seconds.
5
Speedup of B over A =
2.5 =2x
5/23/2025 Compiled by Yatru Harsha Hiski 49
Performance Metric Types
2. Rate Metric
Measures how many tasks are completed per unit time.
Focus: Throughput (higher is better).
Examples:
MIPS (Million Instructions Per Second)
MFLOPS (Million Floating Point Ops/Sec)
Requests/sec in a web server
Use case: Measuring throughput – e.g., "How many images can be processed per
second?"
Example:
CPU A executes 50 million instructions in 1 second → MIPS = 50.
CPU B executes 100 million in 1 second → MIPS = 100.
CPU B has higher throughput.
5/23/2025 Compiled by Yatru Harsha Hiski 50
Amdahl’s Law
Amdahl’s Law gives the theoretical maximum speedup you can
achieve by improving or parallelizing a portion of a system, while the rest
remains unchanged (or serial).
Formula:
1
Speedup(N) =
(1 − f) + f
N
Where:
f: Fraction of the program that can be parallelized.
(1 − 𝑓𝑓): Fraction that is inherently serial (cannot be parallelized).
N: Number of processors (or speedup factor applied to the parallel part).
5/23/2025 Compiled by Yatru Harsha Hiski 51
Amdahl’s Law
Figure: Illustration of Amdahl’s Law
5/23/2025 Compiled by Yatru Harsha Hiski 52
Amdahl’s Law
Even with infinite processors, the serial part limits the overall
speedup.
1
Max Speedup = as N→∞
(1 − f)
Example 1: Simple Calculation
Suppose:
80% of your program can be parallelized (f = 0.8)
20% is serial (1 − f = 0.2)
You use 4 processors ( N = 4)
Then:
Speedup(4) =
So, the program runs 2.5 times faster with 4 processors.
5/23/2025 Compiled by Yatru Harsha Hiski 53
Amdahl’s Law
Example 2: Infinite Processors
Using the same f = 0.8, if we assume infinite processors:
Speedup(∞) =
No matter how many processors you use, speedup will never exceed 5x.
5/23/2025 Compiled by Yatru Harsha Hiski 54
Speedup
Speedup measures how much faster a system performs after an
improvement or optimization (hardware, software, or algorithm).
Formula:
Execution Time (Before)
Speedup =
Execution Time (After)
Example:
A program originally takes 10 seconds.
After optimizing the code or upgrading hardware, it now takes 4 seconds.
10
Speedup = = 2.5
4
So, your optimization gives you a 2.5× speedup.
5/23/2025 Compiled by Yatru Harsha Hiski 55
Speedup
Speedup from Multiple Enhancements (Extended Amdahl’s
Law)
If multiple parts are improved separately, Amdahl’s Law can be applied
repeatedly or in parts, such as:
1
Speeduptotal =
P1 + P2 + ⋯+ Pn
S1 S2 Sn
Where Pi are fractions of time and Si are speedup factors of each part.
5/23/2025 Compiled by Yatru Harsha Hiski 56
Computer Components
Virtually all contemporary computer designs are based on
concepts developed by John von Neumann at the Institute
for Advanced Studies, Princeton. Such a design is referred to as
the von Neumann architecture and is based on three key
concepts:
1. Data and instructions are stored in a single read–write memory.
2. The contents of this memory are addressable by location, without
regard to the type of data contained there.
3. Execution occurs in a sequential fashion (unless explicitly modified)
from one instruction to the next.
5/23/2025 Compiled by Yatru Harsha Hiski 57
Computer Components
1.Processor (CPU)
Performs computations and controls other parts of the computer.
Contains control unit, ALU (arithmetic logic unit), and registers.
2.Main Memory
Temporarily stores data and instructions during processing.
Volatile (data is lost when power is off).
3.I/O Modules
Allow the CPU to communicate with external devices like keyboards, mice,
printers, and disks.
4.System Interconnection
Mechanism for components to communicate.
Usually implemented as a bus or interconnect fabric.
5/23/2025 Compiled by Yatru Harsha Hiski 58
Computer
Components
PC: Contains the address of the next
instruction pair to be fetched from memory
IR: Contains the 8-bit opcode instruction
being executed.
MAR: Specifies the address in memory for
next read or write operation
MBR: contains data to be written into
memory or receives the data from memory
I/O AR: Specifies particular I/O devices
I/O BR: Used for the exchange of data
between an I/O module and the CPU
5/23/2025 Compiled by Yatru Harsha Hiski 59
Figure: Computer Components: Top-Level View
Computer Function
Instruction Fetch and Execute
Interrupt Handling
I/O function
5/23/2025 Compiled by Yatru Harsha Hiski 60
Computer Function
The basic function performed by a computer is execution of a program,
which consists of a set of instructions stored in memory. The processor
does the actual work by executing instructions specified in the program.
In its simplest form, instruction processing consists of two steps:
Fetch: reads the instruction from memory
Execute: execute the fetched instruction
Figure: Basic Instruction Cycle
5/23/2025 Compiled by Yatru Harsha Hiski 61
Computer Function
Fetch Cycle:
At the beginning of each instruction cycle, the processor fetches an
instruction from Memory pointed by a register.
In a typical microprocessor, the register is called Program Counter (PC)
which holds the address of the instruction to be fetched next.
The PC is incremented each time an instruction is fetched (unless told
otherwise)
The fetched instruction is loaded into the Instruction Register.
5/23/2025 Compiled by Yatru Harsha Hiski 62
Computer Function
Execute Cycle
The instruction present in the IR register contains bits that specify the
action to be taken by the processor.
The processor interprets the instruction and performs required actions
such as:
Processor-Memory: Data transfer between processor and memory module.
Processor-I/O: Data may be transferred to or from peripheral devices.
Data processing: The processor may perform some arithmetic or logical
operations on data.
Control: An instruction may specify the sequence of execution be altered.
Example: Jump, Call, etc.
5/23/2025 Compiled by Yatru Harsha Hiski 63
Computer Function
The diagram illustrates the Basic Instruction Cycle, consisting of two primary cycles: the Fetch
Cycle and the Execute Cycle. Here’s a stepwise explanation:
1. Start:
The CPU is initialized, and the Program Counter (PC) is set to the address of the first instruction to be
executed.
2. Fetch Cycle:
The CPU reads the next instruction from memory using the address in the Program Counter.
The fetched instruction is then stored in the Instruction Register (IR).
The Program Counter is incremented to point to the address of the next instruction to be fetched.
3. Execute Cycle:
The instruction in the Instruction Register is decoded and executed.
Depending on the type of instruction, the CPU performs data transfer, arithmetic/logic operations, or
control operations.
If the instruction is a branch or jump, the Program Counter may be modified to point to a different address.
4. Check for Halt Condition:
After executing an instruction, the CPU checks if the halt (HALT) instruction has been reached.
If not, it loops back to the Fetch Cycle to fetch the next instruction.
If the halt instruction is encountered, the cycle terminates, and the CPU stops executing further
instructions.
5/23/2025 Compiled by Yatru Harsha Hiski 64
Instruction Fetch and Execute
Example of Program Execution
Consider an example of a hypothetical machine with a single data register
“Accumulator” (AC). Both instructions and data are 16 bits long. The first 4
bits in the instruction represents the opcode which specifies the operation
to be performed. There can be as many as 24 = 16 different opcodes, and up
to 212 = 4096 4 K) words of memory can be directly addressed.
Here:
Registers: Partial list of Opcodes:
PC: Program Counter 0001 Load AC from memory
AC: Accumulator 0010 Store AC to memory
IR: Instruction Register 0101 Add to AC from memory
5/23/2025 Compiled by Yatru Harsha Hiski 65
Instruction Fetch and Execute
Figure: Characteristics of a Hypothetical Machine
5/23/2025 Compiled by Yatru Harsha Hiski 66
Instruction Fetch and Execute
For example, consider a computer in which each instruction occupies one
16-bit word of memory.
Assume that the program counter is set to location 300. The processor will
next fetch the instruction at location 300. On succeeding instruction cycles,
it will fetch instructions from locations 301, 302, 303, and so on.
This sequence may be altered, as explained presently.
5/23/2025 Compiled by Yatru Harsha Hiski 67
Instruction Fetch
and Execute
Figure: Example of Program
Execution (contents of memory
and registers in hexadecimal)
5/23/2025 Compiled by Yatru Harsha Hiski 68
Instruction Fetch and Execute
Three instructions, which can be described as three fetch and three
execute cycles, are required:
1. The PC contains 300, the address of the first instruction. This instruction (the value
1940 in hexadecimal) is loaded into the instruction register IR and the PC is
incremented. Note that this process involves the use of a memory address register
(MAR) and a memory buffer register (MBR). For simplicity, these intermediate
registers are ignored.
2. The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is to be
loaded. The remaining 12 bits (three hexadecimal digits) specify the address (940)
from which data are to be loaded.
3. The next instruction (5941) is fetched from location 301 and the PC is incremented.
4. The old contents of the AC and the contents of location 941 are added and the result
is stored in the AC.
5. The next instruction (2941) is fetched from location 302 and the PC is incremented.
6. The contents of the AC are stored in location 941.
5/23/2025 Compiled by Yatru Harsha Hiski 69
Instruction Fetch and Execute
In this example, three instruction cycles, each consisting of a fetch cycle and an execute
cycle, are needed to add the contents of location 940 to the contents of 941. With a more
complex set of instructions, fewer cycles would be needed. Some older processors, for
example, included instructions that contain more than one memory address. Thus the
execution cycle for a particular instruction on such processors could involve more than
one reference to memory. Also, instead of memory references, an instruction may specify
an I/O operation.
For example, the PDP-11 processor includes an instruction, expressed symbolically as
ADD B,A, that stores the sum of the contents of memory locations B and A into memory
location A. A single instruction cycle with the following steps occurs:
Fetch the ADD instruction.
Read the contents of memory location A into the processor.
Read the contents of memory location B into the processor. In order that the contents of A
are not lost; the processor must have at least two registers for storing memory values,
rather than a single accumulator.
Add the two values.
5/23/2025 Compiled by Yatru Harsha Hiski 70
Write the result from the processor to memory location A.
Instruction Cycle State Diagram
Figure: Instruction Cycle State Diagram
5/23/2025 Compiled by Yatru Harsha Hiski 71
Instruction Cycle State Diagram
Instruction address calculation (iac): Determine the address of the next instruction
to be executed. Usually, this involves adding a fixed number to the address of the previous
instruction. For example, if each instruction is 16 bits long and memory is organized into
16-bit words, then add 1 to the previous address. If, instead, memory is organized as
individually addressable 8-bit bytes, then add 2 to the previous address.
Instruction fetch (if): Read instruction from its memory location into the processor.
Instruction operation decoding (iod): Analyze instruction to determine type of
operation to be performed and operand(s) to be used.
Operand address calculation (oac): If the operation involves reference to an operand
in memory or available via I/O, then determine the address of the operand.
Operand fetch (of): Fetch the operand from memory or read it in from I/O.
Data operation (do): Perform the operation indicated in the instruction.
Operand store (os): Write the result into memory or out to I/O.
5/23/2025 Compiled by Yatru Harsha Hiski 72
Instruction Cycle State Diagram
The diagram illustrates the Instruction Cycle State Diagram, representing the
various stages through which a CPU processes an instruction. Here's a stepwise
explanation:
1. Instruction Fetch:
The CPU fetches the instruction from memory using the Program Counter (PC) address.
The instruction is then placed in the Instruction Register (IR).
2. Instruction Address Calculation:
The address of the next instruction is calculated and updated in the Program Counter (PC).
This step ensures that the next instruction can be fetched while the current instruction is being
decoded and executed.
3. Instruction Operation Decoding:
The fetched instruction is decoded to determine the operation to be performed.
This includes identifying the opcode and the operands involved.
4. Operand Fetch:
The CPU retrieves the necessary operands from memory or registers as specified by the
decoded instruction.
If multiple operands are required, this step may be repeated.
5/23/2025 Compiled by Yatru Harsha Hiski 73
Instruction Cycle State Diagram
5. Operand Address Calculation:
The addresses of the operands are calculated, especially if they are located in memory.
This step is crucial when dealing with indirect addressing modes.
6. Data Operation:
The actual operation is performed using the fetched operands.
This may involve arithmetic, logical, data transfer, or control operations.
7. Operand Store:
The result of the operation is stored back in memory or registers.
If multiple results are produced, they are stored sequentially.
8. Return to Fetch Cycle:
After storing the result, the cycle returns to the Instruction Fetch state to process the
next instruction.
5/23/2025 Compiled by Yatru Harsha Hiski 74
Interrupts
Virtually all computers provide a mechanism by which other modules
(I/O, memory) may interrupt the normal processing of the processor.
Table below lists the most common classes of interrupts.
5/23/2025 Compiled by Yatru Harsha Hiski 75
Interrupts
Mechanism by which other modules (e.g. I/O) may interrupt normal
sequence of processing
Classes of Interrupt: Program, Timer, I/O, Hardware failure
A program can interrupt in case of : Overflow, division by zero; as a result of an
instruction execution
A timer can interrupt for multitasking or from internal processor itself
An I/O controller can interrupt for I/O operation
Hardware failure can generate interrupt. Eg, Memory parity error
5/23/2025 Compiled by Yatru Harsha Hiski 76
Interrupts
The user program performs a series of WRITE calls interleaved with processing.
Code segments 1, 2, and 3 refer to sequences of instructions that do not involve
I/O.
The WRITE calls are to an I/O program that is a system utility and that will
perform the actual I/O operation. The I/O program consists of three sections:
A sequence of instructions, labeled 4 in the figure, to prepare for the actual I/O
operation. This may include copying the data to be output into a special buffer and
preparing the parameters for a device command.
The actual I/O command. Without the use of interrupts, once this command is
issued, the program must wait for the I/O device to perform the requested function
(or periodically poll the device).The program might wait by simply repeatedly
performing a test operation to determine if the I/O operation is done.
A sequence of instructions, labeled 5 in the figure, to complete the operation. This
may include setting a flag indicating the success or failure of the operation.
5/23/2025 Compiled by Yatru Harsha Hiski 77
Interrupt Handler
Interrupt signal is detected
Normal sequence of execution
suspended,
Interrupt generating device is serviced
by processor by branching off to a
program called interrupt handler
Original code execution sequence is
resumed from suspended point
Figure: Transfer of Control via Interrupts
5/23/2025 Compiled by Yatru Harsha Hiski 78
5/23/2025
Interrupts
Compiled by Yatru Harsha Hiski
Figure: Program Flow of Control without and with
Interrupts
79
Interrupts and the Instruction Cycle
With interrupts, the processor can be engaged in executing other instructions
while an I/O operation is in progress.
Upon a WRITE system call, control transfers to the I/O program which executes
the I/O command and returns control to the user program, allowing the I/O
operation to proceed concurrently with user program execution.
When an external device is ready, it's I/O module sends an interrupt request to
the processor, which invokes the corresponding interrupt handler by suspending
the current program, services the device, and then resumes normal execution.
From the user program’s perspective, an interrupt is a transparent pause in
execution handled entirely by the processor and OS, with execution resuming
automatically at the same point after interrupt processing.
5/23/2025 Compiled by Yatru Harsha Hiski 80
Interrupts and the Instruction Cycle
To accommodate interrupts, an interrupt cycle is added to the instruction cycle as
shown in figure.
Figure: Instruction Cycle with Interrupts
5/23/2025 Compiled by Yatru Harsha Hiski 81
Interrupts and the Instruction Cycle
In the interrupt cycle, the processor checks to see if any interrupts have
occurred, indicated by the presence of an interrupt signal.
If no interrupts are pending, the processor proceeds to the fetch cycle and
fetches the next instruction of the current program.
If an interrupt is pending then the CPU:
Suspends execution of the current program being executed
Saves context
Sets the program counter to the starting address of an interrupt handler routine.
Process the interrupt
Restore context and continue interrupted program.
5/23/2025 Compiled by Yatru Harsha Hiski 82
Interrupts and the Instruction Cycle
1.Start: Initialization point where the processor prepares to begin fetching
instructions.
2.Fetch cycle: The processor reads the next instruction from memory
based on the program counter.
3.Execute cycle: The processor executes the fetched instruction while
interrupts are temporarily disabled to avoid disruption.
4.Interrupt cycle: Interrupts are enabled; the processor checks for any
pending interrupt requests and, if found, saves context and transfers
control to the interrupt handler.
5.HALT: Execution is stopped when a HALT instruction is encountered,
suspending all processing until reset or external intervention.
5/23/2025 Compiled by Yatru Harsha Hiski 83
Program Timing:
Short I/O Wait
Figure: Program Timing: Short I/O Wait
5/23/2025 Compiled by Yatru Harsha Hiski 84
Program Timing:
Long I/O Wait
Figure: Program Timing: Long I/O Wait
5/23/2025 Compiled by Yatru Harsha Hiski 85
Instruction Cycle State Diagram, with
Interrupts
Figure: Instruction Cycle State Diagram, with Interrupts
5/23/2025 Compiled by Yatru Harsha Hiski 86
Instruction Cycle State Diagram, with
Interrupts
This state diagram illustrates the instruction cycle of a typical
processor. The instruction cycle is the sequence of steps that the CPU
performs to fetch, decode, and execute an instruction.
Let’s walk through the diagram step-by-step:
1. Instruction Address Calculation
The CPU begins by calculating the address of the next instruction, typically held in the Program
Counter (PC).
This address is used to locate the next instruction to be executed.
2. Instruction Fetch
The instruction located at the address calculated is fetched from memory and loaded into the
Instruction Register (IR).
5/23/2025 Compiled by Yatru Harsha Hiski 87
Instruction Cycle State Diagram, with
Interrupts
3. Instruction Operation Decoding
The fetched instruction is then decoded to understand what operation it specifies (e.g., addition,
subtraction, move).
This step identifies:
The operation to perform
The source and destination operands
Addressing modes
4. Operand Address Calculation
If the instruction requires data (operands), the address of the operand(s) is calculated.
This applies when operands are in memory (not in registers).
5. Operand Fetch
The operand(s) are fetched from their memory locations or registers.
This step may involve multiple operands, depending on the instruction.
5/23/2025 Compiled by Yatru Harsha Hiski 88
Instruction Cycle State Diagram, with
Interrupts
6. Data Operation
The actual computation or data manipulation is performed.
Example operations: add, subtract, AND, OR, shift, etc.
7. Operand Address Calculation (Result Storage)
If the result needs to be stored, the destination address is calculated.
This is often a separate calculation if indirect or indexed addressing is used.
8. Operand Store
The results of the computation are stored at the destination address, either in memory or in a
register.
This step may involve multiple results, especially in vector or string operations.
9. Interrupt Check
The system checks for any pending interrupts (e.g., I/O completion, timer expiration).
If an interrupt is detected, control is transferred to the interrupt handler.
5/23/2025 Compiled by Yatru Harsha Hiski 89
Instruction Cycle State Diagram, with
Interrupts
10. Interrupt (if any)
If an interrupt is present, the system services it.
Once completed, control typically returns to the instruction cycle.
11. Return Paths
If there are no interrupts, the cycle returns to Instruction Address Calculation for the next
instruction.
For string or vector data, the processor may return to Data Operation to process the next
element in the set.
If instruction completes, it fetches the next instruction.
5/23/2025 Compiled by Yatru Harsha Hiski 90
Multiple Interrupts
Two approaches can be taken for dealing with multiple interrupts.
1. Disable interrupt
Ignore further interrupts while processing one interrupt
Interrupts remain pending and are checked after first interrupt has been processed
Interrupts handled in sequence as they occur
2. Define priorities
Low priority interrupts can be interrupted by higher priority interrupts
When higher priority interrupt has been processed, processor return to previous
interrupt
5/23/2025 Compiled by Yatru Harsha Hiski 91
Transfer of Control with Multiple
Interrupts
5/23/2025 Compiled by Yatru Harsha Hiski 92
Transfer of Control with Multiple
Interrupts
5/23/2025 Compiled by Yatru Harsha Hiski 93
Transfer of Control with Multiple
Interrupts
5/23/2025 Figure:
Compiled byExample
Yatru HarshaTime
Hiski Sequence of Multiple Interrupts 94
Time Sequence of Multiple Interrupts
Handling
1. t = 0: 5. t = 25 – Completion of
• User program starts execution. Communications ISR:
• Communications ISR finishes.
2. t = 10 – Printer Interrupt:
• Processor restores the state of the Printer
• A printer interrupt occurs. ISR.
• User program state is saved on the stack. • Before execution can resume, the disk
• Execution switches to the Printer Interrupt interrupt is recognized.
Service Routine (ISR).
6. t = 25 – Disk Interrupt Handling:
3. t = 15 – Communications Interrupt: • Disk ISR is executed as it has higher
• Communications interrupt occurs with higher priority than the Printer ISR.
priority than the printer.
• Printer ISR state is pushed onto the stack.
7. t = 35 – Completion of Disk ISR:
• Disk ISR finishes.
• Execution switches to the Communications ISR.
• Control returns to the Printer ISR.
4. t = 20 – Disk Interrupt:
• Disk interrupt occurs, but it has lower priority
8. t = 40 – Completion of Printer ISR:
than the Communications ISR. • Printer ISR finishes.
• Disk interrupt is held until the Communications • Control returns to the user program.
ISR completes.
5/23/2025 Compiled by Yatru Harsha Hiski 95
Computer Function: I/O Function
The I/O function in a computer system allows data exchange between the
processor and external devices through I/O modules (e.g., disk
controllers). The processor can initiate read or write operations with I/O
modules similarly to memory operations, by specifying the device address.
There are two primary modes of data exchange:
1. Processor-Controlled I/O: The processor directly reads from or writes to the
I/O module, identifying specific devices for data transfer. This is similar to
memory-referencing instructions.
2. Direct Memory Access (DMA): The processor grants control to an I/O module
to perform data transfers directly between memory and I/O devices without
processor intervention. This allows data exchange to occur concurrently with other
CPU tasks, optimizing processing efficiency.
DMA significantly reduces the processor’s involvement in data transfers,
enhancing overall system performance.
5/23/2025 Compiled by Yatru Harsha Hiski 96
Interconnection Structures
A computer consists of a set of components or modules of three basic types
(Processor, memory, I/O) that communicate with each other. There must
be a path for connecting these modules.
The collection of paths connecting the various modules is called the
interconnection structure. The design of this structure will depend on the
exchanges that must be made among modules.
Different types of connections are required for different types of modules
CPU Module
Memory Module
I/O Module
5/23/2025 Compiled by Yatru Harsha Hiski 97
Interconnection Structures
A computer system comprises three core components: Processor,
Memory, Input/Output (I/O) Modules, these components must
communicate with one another, and the infrastructure that enables this
communication is known as the interconnection structure.
It acts as the network of pathways through which data and control
signals are exchanged between components.
5/23/2025 Compiled by Yatru Harsha Hiski 98
Interconnection Structures
Figure: Computer Modules
5/23/2025 Compiled by Yatru Harsha Hiski 99
Interconnection Structures
Components and Their Roles
1. Memory Module
Stores data in N words of equal length, each with a unique address.
Supports read and write operations.
Uses address and control signals to specify the operation and its target.
2. I/O Module
Manages interaction with external devices.
Like memory, it supports read and write operations.
Each external device is accessed via a port with a unique address.
I/O modules provide external data paths for device communication and may issue
interrupts to the processor.
3. Processor
Executes instructions, reads/writes data, and controls system operations using
control signals.
Receives interrupts to handle external or internal events.
5/23/2025 Compiled by Yatru Harsha Hiski 100
Interconnection Structures
The interconnection structure must support the following types
of transfers:
1. Memory to Processor: Fetching instructions or data.
2. Processor to Memory: Writing processed data back to memory.
3. I/O to Processor: Processor reads data from an external device.
4. Processor to I/O: Processor sends data to an external device.
5. I/O to/from Memory: Data exchange directly between memory and I/O devices
using Direct Memory Access (DMA), bypassing the processor.
5/23/2025 Compiled by Yatru Harsha Hiski 101
Bus Interconnection
Figure: Bus Interconnection Scheme
5/23/2025 Compiled by Yatru Harsha Hiski 102
Bus Interconnection
1. Definition and Role
A bus is a shared communication pathway connecting multiple devices
in a computer system.
It allows data transmission between components like the CPU,
memory, and I/O devices.
Only one device can transmit at a time to avoid data collisions.
Traditionally dominant in system design, but now more common in
embedded systems (e.g., microcontrollers) than in high-
performance computers, which use point-to-point
interconnections.
5/23/2025 Compiled by Yatru Harsha Hiski 103
Bus Interconnection
2. Structure of a Bus
A bus consists of multiple lines, each capable of transmitting binary data (1s
and 0s).
Buses can transfer data serially (one bit at a time) or in parallel
(multiple bits simultaneously).
An 8-bit bus has 8 lines and can send 8 bits in one operation.
3. Types of System Buses
A System Bus connects major components (CPU, memory, I/O devices).
It typically includes 50–100+ separate lines.
Lines are grouped into three functional categories:
1) Data Lines (Data Bus)
2) Address Lines (Address Bus)
3) Control Lines (Control Bus)
5/23/2025 Compiled by Yatru Harsha Hiski 104
Bus Interconnection
4. Functional Groups
Data Bus
Carries actual data between components.
Width (e.g., 32, 64, 128 bits) determines how much data can be transferred at once.
Wider buses improve performance (e.g., transferring a 64-bit instruction in one vs.
two cycles).
Address Bus
Specifies the source or destination address of data on the data bus.
Width determines maximum memory capacity (e.g., 32-bit address bus supports
2³² = 4 GB memory).
Also used to address I/O ports.
Control Bus
Manages access and use of data and address lines.
Transmits command and timing signals to coordinate operations.
5/23/2025 Compiled by Yatru Harsha Hiski 105
Bus Interconnection
5. Common Control Signals
Signal Function
Memory Read Reads data from a specified memory location.
Memory Write Writes data to a specified memory location.
I/O Read Reads data from an I/O port.
I/O Write Writes data to an I/O port.
Transfer ACK Confirms successful data transfer.
Bus Request Requests control of the bus.
Bus Grant Grants bus access to the requester.
Interrupt Request Signals an interrupt has occurred.
Interrupt ACK Acknowledges the interrupt request.
Clock Synchronizes all bus operations.
Reset Initializes the system.
5/23/2025 Compiled by Yatru Harsha Hiski 106
Bus Interconnection
6. Bus Operation Procedure
To send data:
1. A module requests control of the bus.
2. Once granted, it sends data over the data lines.
To request data:
1. A module requests bus control.
2. Sends a read request over control/address lines.
3. Waits for the target module to place the requested data on the data bus.
5/23/2025 Compiled by Yatru Harsha Hiski 107
A communication pathway Signals transmitted by any
connecting two or more
devices
one device are available for
reception by all other I
• Key characteristic is that it is a devices attached to the bus
n
n
shared transmission medium • If two devices transmit during the
same time period their signals will
overlap and become garbled
e
t
Typically consists of Computer systems contain B c
e
multiple communication a number of different buses
lines that provide pathways
u t
• Each line is capable of transmitting between components at
signals representing binary 1 and
various levels of the
r
binary 0
computer system hierarchy
s i
c
o
System bus
• A bus that connects major The most common
o
computer components (processor, computer interconnection
memory, I/O) structures are based on the
n
use of one or more system
buses
5/23/2025 n 108
Compiled by Yatru Harsha Hiski
The interconnection structure must support the
following types of transfers:
Memory Processor I/O to or
I/O to Processor
to to from
processor to I/O
processor memory memory
An I/O
module is
allowed to
exchange
data
Processor Processor
directly
reads an Processor reads data Processor
with
instruction writes a from an I/O sends data
memory
or a unit of unit of data device via to the I/O
without
data from to memory an I/O device
going
memory module
through the
processor
using direct
memory
access
5/23/2025 109
Compiled by Yatru Harsha Hiski
PCI (Peripheral Component
Interconnect)
1. General Overview
PCI is a high-bandwidth, processor-independent bus used for connecting I/O subsystems
like:
Graphic display adapters
Network interface controllers
Disk controllers
Functions as both a mezzanine and peripheral bus.
Delivers better system performance than earlier bus standards.
2. Speed and Data Capacity
Supports up to 64 data lines at 66 MHz.
Maximum raw data transfer rate:
528 MB/s or 4.224 Gbps.
Speed is not the only benefit; PCI is also cost-effective and requires fewer chips for
implementation.
5/23/2025 Compiled by Yatru Harsha Hiski 110
PCI (Peripheral Component
Interconnect)
3. Development and Compatibility
Developed by Intel in 1990 for Pentium systems.
Intel released the patents to the public domain to encourage
adoption.
Formation of the PCI Special Interest Group (PCI SIG) to:
Develop the standard further
Maintain compatibility
Widely used in PCs, workstations, and servers.
Open specification allows products from different vendors to be
interoperable.
5/23/2025 Compiled by Yatru Harsha Hiski 111
PCI (Peripheral Component
Interconnect)
Figure:
Example PCI
Configurations
- Typical
desktop system
5/23/2025 Compiled by Yatru Harsha Hiski 112
PCI (Peripheral Component
Interconnect)
Figure:
Example PCI
Configurations
- Typical server
system
5/23/2025 Compiled by Yatru Harsha Hiski 113
PCI (Peripheral Component
Interconnect)
4. Architectural Flexibility
Supports single and multiple-processor systems.
Uses synchronous timing and centralized arbitration to manage access and coordination.
5. System Integration
In a single-processor system:
A combined DRAM controller and PCI bridge connects the processor to the PCI bus.
The bridge acts as a data buffer, decoupling PCI speed from processor I/O speed.
In a multiprocessor system:
Multiple PCI buses can be linked via bridges.
The system bus connects processors, main memory, and PCI bridges only.
Bridges allow high-speed data transfer while keeping PCI independent of processor speed.
5/23/2025 Compiled by Yatru Harsha Hiski 114
PCI (Peripheral Component
Interconnect)
6. Bus Structure and Signal Lines
PCI supports 32- or 64-bit configurations.
Contains 49 mandatory signal lines, grouped into:
1. System Pins
Handle clock and reset functions.
2. Address and Data Pins
32 lines used for time-multiplexed addresses and data.
Additional lines validate and interpret these signals.
3. Interface Control Pins
Manage timing and coordination between devices (initiators and targets).
4. Arbitration Pins
Each PCI master has its own pair of arbitration lines.
Connects directly to the PCI arbiter, unlike shared signal lines.
5. Error Reporting Pins
Report parity errors and other faults.
5/23/2025 Compiled by Yatru Harsha Hiski 115
PCI (Peripheral Component
Interconnect)
Bus Structure
PCI Commands
Data Transfers
PCI Arbitration
PCI Bus Arbitration is the process by which the PCI bus decides which device
gets control of the bus when multiple devices request it simultaneously.
5/23/2025 Compiled by Yatru Harsha Hiski 116
Assignment#1
Q1) Explain Brief History of Computer Generations.
Q2) Explain the Evolution of the Intel x86 and ARM Architecture.
Q3) Explain PCI Arbitration
Q3) Compare in between PCI and PCIe.
5/23/2025 Compiled by Yatru Harsha Hiski 117
Thank You