CSC303: Computer Organization and
Architecture
TOPICS COVERED
Course schedule
Course objectives
Course resources and textbooks
Grading system
Introduction
0
General information
Course :Computer Organization and Architecture
Instructors : Nyanga Bernard Y.; Tabo Delphine F.
Lecture time : Tuesday 7:00-9:00 &Friday 13:00-15:00
.
Online Resources:
https://www.bu.edu.eg/staff/abdelwahabalsammak3-
courses/10596/files;
http://www.eecs.harvard.edu/~dbrooks/cs146-spring2004/
1
Course Objective
Describe the general organization and architecture of
computers.
Identify computers’ major components and study their
functions.
Introduce hardware design of modern computer
architectures.
Learn assembly language programming.
2
Textbooks
Introduction to Computer Architecture and Organisation 2nd
Ed by Harold Lorin
Computer Organisation and Architecture by Morris Mano
REFERENCES (Web) www.freebookcentre.net
www.freetechbooks.com
3
Grading System
•Continuous Assessment (30%: two tests and/or projects/assignments)
and
• End-of Semester Examination (70%)
Course Objectives:
•The main objective of this course is to introduce the students to the
operations of a digital computer, Hardware components and their
basic
building block.
•Students will be introduced to the electronic components of the
different hardware components of a computer system and how they
are
programmed to function.
4
Content Coverage
What is a computer?
a computer is a sophisticated electronic calculating machine
that:
Accepts input information,
Processes the information according to a list of internally
stored instructions and
Produces the resulting output information.
Functions performed by a computer are:
Accepting information to be processed as input.
Storing a list of instructions to process the information.
Processing the information according to the list of
instructions.
Providing the results of the processing as output.
What are the functional units of a computer?
6
What is Computer Organization and
Architecture?
It is the study of internal working, structuring and
implementation of A computer system.
Architecture in Computer system Refers To the Externally
Visual attributes Of the system.
Organization Of computer System results In realization Of
architectural Specifications of A computer system .
7
Overview
Computer organization
physical aspects of computer systems.
E.g., circuit design, control signals, memory types.
How does a computer work?
Computer architecture
Logical aspects of system as seen by the programmer.
E.g., instruction sets, instruction formats, data types,
addressing modes.
How do I design a computer?
8
Application Areas
General-Purpose Laptop/Desktop
Productivity, interactive graphics, video, audio
Optimize price-performance
Examples: Intel Pentium 4, AMD Athlon XP
Embedded Computers
PDAs, cell-phones, sensors => Price, Energy efficiency
Examples: Intel XScale, StrongARM (SA-110)
Game Machines, Network uPs => Price-Performance
Examples: Sony Emotion Engine, IBM 750FX
9
Application Areas
Commercial Servers
Database, transaction processing, search engines
Performance, Availability, Scalability
Server downtime could cost a brokerage company more
than $6M/hour
Examples: Sun Fire 15K, IBM p690, Google Cluster
Scientific Applications
Protein Folding, Weather Modeling, CompBio, Defense
Floating-point arithmetic, Huge Memories
Examples: IBM DeepBlue,
BlueGene, Cray T3E, etc.
10
von Neumann Architecture /Harvard Architecture
von Neumann Architecture
Instructions and data mixed
Used in modern computers
Harvard Architecture
Instructions and data separate
Used in low-level cache memory design
11
Functional units of a computer
Input unit accepts Arithmetic and logic unit(ALU):
information: •Performs the desired
•Human operators, operations on the input
•Electromechanical devices (keyboard) information as determined
•Other computers by instructions in the memory
Memory Arithmetic
Input Instr1 & Logic
Instr2
Instr3
Data1
Output Data2 Control
I/O Stores Processor
information: Control unit coordinates
Output unit sends various actions
results of processing: •Instructions,
•Data •Input,
•To a monitor display, •Output
•To a printer •Processing
12
Information in a computer -- Instructions
Instructions specify commands to:
Transfer information within a computer (e.g., from memory to
ALU)
Transfer of information between the computer and I/O devices
(e.g., from keyboard to computer, or computer to printer)
Perform arithmetic and logic operations (e.g., Add two numbers,
Perform a logical AND).
A sequence of instructions to perform a task is called a
program, which is stored in the memory.
Processor fetches instructions that make up a program from
the memory and performs the operations stated in those
instructions.
What do the instructions operate upon?
13
Information in a computer -- Data
Data are the “operands” upon which instructions operate.
Data could be:
Numbers,
Encoded characters.
Data, in a broad sense means any digital information.
Computers use data that is encoded as a string of binary
digits called bits.
14
Input unit
Binary information must be presented to a computer in a specific format. This
task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world Computer
Memory
Keyboard
Audio input
Input Unit
……
Processor
15
Memory unit
Memory unit stores instructions and data.
Recall, data is represented as a series of bits.
To store data, memory unit thus stores bits.
Processor reads instructions and reads/writes data from/to
the memory during the execution of a program.
In theory, instructions and data could be fetched one bit at a
time.
In practice, a group of bits is fetched at a time.
Group of bits stored or retrieved at a time is termed as “word”
Number of bits in a word is termed as the “word length” of a
computer.
In order to read/write to and from memory, a processor
should know where to look:
“Address” is associated with each word location.
16
Memory unit (contd..)
Processor reads/writes to/from memory based on the
memory address:
Access any word location in a short and fixed amount of time
based on the address.
Random Access Memory (RAM) provides fixed access time
independent of the location of the word.
Access time is known as “Memory Access Time”.
Memory and processor have to “communicate” with each
other in order to read/write information.
In order to reduce “communication time”, a small amount of
RAM (known as Cache) is tightly coupled with the processor.
Modern computers have three to four levels of RAM units with
different speeds and sizes:
Fastest, smallest known as Cache
Slowest, largest known as Main memory.
17
Memory unit (contd..)
Primary storage of the computer consists of RAM units.
Fastest, smallest unit is Cache.
Slowest, largest unit is Main Memory.
Primary storage is insufficient to store large amounts of
data and programs.
Primary storage can be added, but it is expensive.
Store large amounts of data on secondary storage devices:
Magnetic disks and tapes,
Optical disks (CD-ROMS).
Access to the data stored in secondary storage in slower, but
take advantage of the fact that some information may be
accessed infrequently.
Cost of a memory unit depends on its access time, lesser
access time implies higher cost.
18
Arithmetic and logic unit (ALU)
Operations are executed in the Arithmetic and Logic Unit
(ALU).
Arithmetic operations such as addition, subtraction.
Logic operations such as comparison of numbers.
In order to execute an instruction, operands need to be
brought into the ALU from the memory.
Operands are stored in general purpose registers available in
the ALU.
Access times of general purpose registers are faster than the
cache.
Results of the operations are stored back in the memory or
retained in the processor for immediate use.
19
Output unit
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.
Computer Real world
Memory Printer
Graphics display
Speakers
……
Output Unit
Processor
20
Control unit
Operation of a computer can be summarized as:
Accepts information from the input units (Input unit).
Stores the information (Memory).
Processes the information (ALU).
Provides processed results through the output units (Output
unit).
Operations of Input unit, Memory, ALU and Output unit are
coordinated by Control unit.
Instructions control “what” operations take place (e.g. data
transfer, processing).
Control unit generates timing signals which determines
“when” a particular operation takes place.
21
How are the functional units connected?
•For a computer to achieve its operation, the functional units need to
communicate with each other.
•In order to communicate, they need to be connected.
Input Output Memory Processor
Bus
•Functional units may be connected by a group of parallel wires.
•The group of parallel wires is called a bus.
•Each wire in a bus can transfer one bit of information.
•The number of parallel wires in a bus is equal to the word length of
a computer
22
Organization of cache and main memory
Main Cache
memory memory Processor
Bus
Why is the access time of the cache memory lesser than the
access time of the main memory?
23
Computer Components: Top-Level View
Basic Operational Concepts
A Partial Program Execution Example
A Partial Program Execution Example
Interrupt
Normal execution of programs may be interrupted if some
device requires urgent servicing
To deal with the situation immediately, the normal execution of the
current program must be interrupted
Procedure of interrupt operation
The device raises an interrupt signal
The processor provides the requested service by executing an
appropriate interrupt-service routine
The state of the processor is first saved before servicing the
interrupt
• Normally, the contents of the PC, the general registers, and some
control information are stored in memory
When the interrupt-service routine is completed, the state of the
processor is restored so that the interrupted program may
continue
Classes of Interrupts
Program
Generated by some condition that occurs as a result of an
instruction execution such as arithmetic overflow, division
by zero, attempt to execute an illegal machine instruction,
or reference outside a user’s allowed memory space
Timer
Generated by a timer within the processor. This allows the
operating system to perform certain functions on a regular
basis
I/O
Generated by an I/O controller, to signal normal completion
of an operation or to signal a variety of error conditions
Hardware failure
Generated by a failure such as power failure or memory
parity error
Bus Structures
A group of lines that serves a connecting path for several
devices is called a bus
In addition to the lines that carry the data, the bus must
have lines for address and control purposes
The simplest way to interconnect functional units is to use a
single bus, as shown below
Drawbacks of the Single Bus Structure
The devices connected to a bus vary widely in their speed of
operation
Some devices are relatively slow, such as printer and
keyboard
Some devices are considerably fast, such as optical disks
Memory and processor units operate are the fastest parts of
a computer
Efficient transfer mechanism thus is needed to cope with this
problem
A common approach is to include buffer registers with the
devices to hold the information during transfers
An another approach is to use two-bus structure and an
additional transfer mechanism
• A high-performance bus, a low-performance, and a bridge
for transferring the data between the two buses. ARMA
Bus belongs to this structure
Software
In order for a user to enter and run an application
program, the computer must already contain some system
software in its memory
System software is a collection of programs that are
executed as needed to perform functions such as
Receiving and interpreting user commands
Running standard application programs such as word
processors, etc, or games
Managing the storage and retrieval of files in
secondary storage devices
Controlling I/O units to receive input information and
produce output results
Software
Translating programs from source form prepared by
the user into object form consisting of machine
instructions
Linking and running user-written application programs
with existing standard library routines, such as
numerical computation packages
System software is thus responsible for the
coordination of all activities in a computing system
Operating System
Operating system (OS)
This is a large program, or actually a collection of routines,
that is used to control the sharing of and interaction among
various computer units as they perform application programs
The OS routines perform the tasks required to assign computer
resource to individual application programs
These tasks include assigning memory and magnetic disk
space to program and data files, moving data between
memory and disk units, and handling I/O operations
In the following, a system with one processor, one disk, and one
printer is given to explain the basics of OS
Assume that part of the program’s task involves reading a
data file from the disk into the memory, performing some
computation on the data, and printing the results
User Program and OS Routine Sharing
Multiprogramming or Multitasking
Performance
The speed with which a computer executes programs
is affected by the design of its hardware and its
machine language instructions
Because programs are usually written in a high-level
language, performance is also affected by the
compiler that translates programs into machine
languages
For best performance, the following factors must be
considered
Compiler
Instruction set
Hardware design
Performance
Processor circuits are controlled by a timing signal
called a clock
The clock defines regular time intervals, called clock cycles
To execute a machine instruction, the processor
divides the action to be performed into a sequence of
basic steps, such that each step can be completed in
one clock cycle
Let the length P of one clock cycle, its inverse is the
clock rate, R=1/P
Basic performance equation
T=(NxS)/R, where T is the processor time required to
execute a program, N is the number of instruction executions,
and S is the average number of basic steps needed to execute
one machine instruction
Performance Improvement
Pipelining and superscalar operation
Pipelining: by overlapping the execution of successive
instructions
Superscalar: different instructions are concurrently
executed with multiple instruction pipelines. This means that
multiple functional units are needed
Clock rate improvement
Improving the integrated-circuit technology makes
logic circuits faster, which reduces the time needed
to complete a basic step
Performance Improvement
Reducing amount of processing done in one basic
step also makes it possible to reduce the clock
period, P.
However, if the actions that have to be performed
by an instruction remain the same, the number of
basic steps needed may increase
Reduce the number of basic steps to execute
Reduced instruction set computers (RISC) and complex
instruction set computers (CISC)
Instruction Set Architecture
“Instruction Set Architecture is the structure of a
computerthat a machine language programmer (or a
compiler) mustunderstand to write a correct (timing
independent) programfor that machine.” IBM, Introducing
the IBM 360 (1964)
The ISA defines:
Operations that the processor can execute
Data Transfer mechanisms + how to access data
Control Mechanisms (branch, jump, etc)
“Contract” between programmer/compiler + HW
41
Classifying ISAs
42
Stack
Architectures with implicit “stack”
Acts as source(s) and/or destination, TOS is implicit
Push and Pop operations have 1 explicit operand
Example: C = A + B
Push A // S[++TOS] = Mem[A]
Push B // S[++TOS] = Mem[B]
Add // Tem1 = S[TOS--], Tem2 = S[TOS--] ,
S[++TOS] = Tem1 + Tem2
Pop C // Mem[C] = S[TOS--]
x86 FP uses stack (complicates pipelining)
43
Accumulator
Architectures with one implicit register
Acts as source and/or destination
One other source explicit
Example: C = A + B
Load A // (Acc)umulator <= A
Add B // Acc <= Acc + B
Store C // C <= Acc
Accumulator implicit, bottleneck?
x86 uses accumulator concepts for integer
44
Register
Most common approach
Fast, temporary storage (small)
Explicit operands (register IDs)
Example: C = A + B
Register-memory load/store
Load R1, A Load R1, A
Add R3, R1, B Load R2, B
Store R3, C Add R3, R1, R2
Store R3, C
All RISC ISAs are load/store
IBM 360, Intel x86, Moto 68K are register-memory
45
Common Addressing Modes
Base/Displacement Load R4, 100(R1)
Register Indirect Load R4, (R1)
Indexed Load R4, (R1+R2)
Direct Load R4, (1001)
Memory Indirect Load R4, @(R3)
Autoincrement Load R4, (R2)+
Scaled Load R4, 100(R2)[R3]
46
What leads to a good/bad ISA?
Ease of Implementation (Job of Architect/Designer)
Does the ISA lead itself to efficient implementations?
Ease of Programming (Job of Programmer/Compiler)
Can the compiler use the ISA effectively?
Future Compatibility
ISAs may last 30+yrs
Special Features, Address range, etc. need to be thought
out
47
Implementation Concerns
Simple Decoding (fixed length)
Compactness (variable length)
Simple Instructions (no load/update)
Things that get microcoded these days
Deterministic Latencies are key!
Instructions with multiple exceptions are difficult
More/Less registers?
Slower register files, decoding, better compilers
Condition codes/Flags (scheduling!)
48
ISA Compatibility
“In Computer Architecture, no good idea ever goes unpunished.”
Marty Hopkins, IBM Fellow
Never abandon existing code base
Extremely difficult to introduce a new ISA
Alpha failed, IA64 is struggling, best solution may not
win
x86 most popular, is the least liked!
Hard to think ahead, but…
ISA tweak may buy 5-10% today
10 years later it may buy nothing, but must be
implemented
• Register windows, delay branches
49
CISC vs. RISC
Debate raged from early 80s through 90s
Now it is fairly irrelevant
Despite this Intel (x86 => Itanium) and DEC/Compaq (VAX =>
Alpha) have tried to switch
Research in the late 70s/early 80s led to RISC
IBM 801 -- John Cocke – mid 70s
Berkeley RISC-1 (Patterson)
Stanford MIPS (Hennessy)
50
RISC vs. CISC Arguments
• RISC
Simple Implementation
• Load/store, fixed-format 32-bit instructions,
efficient pipelines
Lower CPI
Compilers do a lot of the hard work
• MIPS = Microprocessor without Interlocked Pipelined
Stages
CISC
Simple Compilers (assists hand-coding, many addressing
modes, many instructions)
Code Density
51
After the dust settled
Turns out it doesn’t matter much
Can decode CISC instructions into internal “micro-ISA”
This takes a couple of extra cycles (PLA implementation)
and a few hundred thousand transistors
In 20 stage pipelines, 55M tx processors this is minimal
Pentium 4 caches these micro-Ops
Actually may have some advantages
External ISA for compatibility, internal ISA can be
tweaked each generation (Transmeta)
52