KEMBAR78
MS Computer Architecture Course | PDF | Central Processing Unit | Cpu Cache
0% found this document useful (0 votes)
54 views49 pages

MS Computer Architecture Course

This document discusses the evolution of computer architecture and performance. It covers the main generations of computers from vacuum tubes to integrated circuits. Key topics include Moore's law, the development of semiconductor memory and microprocessors, and techniques for improving performance such as pipelining, caching, and branch prediction. The document provides an overview of the major advances in computer hardware and architecture over time.

Uploaded by

torabgull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views49 pages

MS Computer Architecture Course

This document discusses the evolution of computer architecture and performance. It covers the main generations of computers from vacuum tubes to integrated circuits. Key topics include Moore's law, the development of semiconductor memory and microprocessors, and techniques for improving performance such as pipelining, caching, and branch prediction. The document provides an overview of the major advances in computer hardware and architecture over time.

Uploaded by

torabgull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Advanced Computer Architecture

CS-522

MS – Computer Science

Credit Hours : 3-0

Dr. Shahid Latif (Associate Professor)

Department of Computer Science & IT


1
Sarhad University of Science and Information Technology, Peshawar
Course Details
Course title/code: Adv. Computer Architecture/CS-522

Lecture: 02

Topic: Computer Evolution and Performance

Designing for Performance

Program: MS – Computer Science (Sem: 1st, 2nd, 3rd & 4th)

Department of Computer Science & IT


Sarhad University of Science and Information Technology, Peshawar 2
Lecture Outlines
• A Brief History of Computers • Designing for Performance
• First Generation: Vacuum Tubes • Microprocessor speed
• ENIAC • Performance balance
• John von Neumann (IAS) • Handling I/O device
machine • Improvements in Chip
• Commercial computers Organization and
• Second Generation: Transistors Architecture
• Third Generation: Integrated • Diminishing return
Circuits (ICs) • New approach: Multiple
• Integrated Circuits, Core
Generations of Computer, • Intel Product Evolution
Moore’s Law
• Later Generations
• Semiconductor memory &
Microprocessors

3
Computer Evolution and
Performance

Department of Computer Science & IT


Sarhad University of Science and Information Technology, Peshawar
First Generation: Vacuum Tubes
ENIAC - background
• Electronic Numerical Integrator And Computer
– world’s first general-purpose electronic digital computer
– John Eckert and John Mauchly (student and Professor)
– University of Pennsylvania
• Started 1943 (was needed during World War-II)
• Needed for development of Firing Tables
– Range & Trajectory tables for new weapons
• Finished 1946
– Too late for war effort
• Used until 1955
ENIAC - details
• Decimal (not binary)
• Memory = 20 accumulators of 10 digits
• Programmed manually by switches (and plugging and unplugging
cables)
• Containing → 18,000 vacuum tubes
• Weighing → 30 tons
• Occupying → 15,00 square feet
• Consuming → 140 kW power
• Capable → 5,000 additions per second
John von-Neumann Machine
• Stored-Program concept
• by John Von Neumann, a mathematician
– as entering and altering of programs in ENIAC was extremely tedious
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from memory and
executing
• I/O equipment operated by control unit
• referred IAS computer (at Princeton Institute for Advanced Studies)
– Started 1946 → not Completed until 1952
– However, prototype of all subsequent general-purpose computers
Structure of von-Neumann (IAS)
machine
IAS - details
• 1000 storage location of 40-bits each (termed words)
– Binary number (data and instructions are binary codes)
– Word may= 2 x 20 bit instructions

• Set of registers (storage in CPU)


– Memory Buffer Register (MBR), Memory Address Register (MAR)
– Instruction Register (IR), Instruction Buffer Register (IBR)
– Program Counter (PC)
– Accumulator (AC) & Multiplier Quotient (MQ) (operands and results)
• For Example: 40*40 = 80 (MSBs→ AC and LSBs → MQ )
Expanded
Structure
of IAS
Commercial Computers
• 1947 - Eckert-Mauchly Computer Corporation
– 1st successful commercial machine = UNIVAC I (Universal Automatic
Computer)
– Commissioned by US Bureau of Census for 1950 calculations
– Intended for both scientific and commercial applications, such as
• matrix algebraic computations
• statistical problems
• premium billings for a life insurance company
• logistical problems etc.

• Late 1950s - UNIVAC II


– Faster
– More memory
Commercial Computers cont.
• IBM
• Manufacturer of Punched-card processing equipment
• 1953 - the 701
– IBM’s first stored program computer
– Scientific applications (calculations)
• 1955 - the 702
– Suited for Business applications
• Lead to 700/7000 series of computers
– 7000 were transistor based (2nd generation)
Second Generation: Transistors

14
Transistors
• Replaced vacuum tubes by Transistors
– Smaller
– Cheaper
– Less heat dissipation
• Solid State device
– Solid semiconductor crystals (materials)
– Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al.
Transistor Based Computers
• Second generation machines
• Front-runner companies → produce small transistor machines
– NCR (National Cash Register)
– RCA (Radio Corporation of America)
– IBM i.e. 7000 series

• Digital Equipment Corporation (DEC) – founded in 1957


– Produced PDP-1
– Start the minicomputer phenomenon
• become so prominent in the third generation
Third Generation: Integrated Circuits

17
Integrated Circuits
• A single, self-contained transistor is called a discrete
component
• Throughout 1950s → early 1960s
– Devices were composed of transistors, resistors, capacitors
– Early second-generation computers contained about 10,000 transistors
– This figure grew to the hundreds of thousands, making the manufacture
increasingly difficult
• Era of microelectronics: the invention of the integrated circuit
– Begins in 1958
– A computer is made up of gates, memory cells and interconnections
– These can be manufactured on a semiconductor e.g. silicon wafer
Generations of Computer
• 1st → Vacuum tube - 1946-1957
• 2nd → Transistor - 1958-1964
• 3rd → Small scale integration - 1965-1971
• Up to 100 devices on a chip
Medium scale integration -
• 100 - 1,000 devices on a chip
• 4th → Large scale integration - 1972-1977
• 1,000 - 10,000 devices on a chip
• 5th → Very large scale integration - 1978 -1991
• 10,000 – 10,00,000 devices on a chip
• 6th → Ultra large scale integration - 1991---
• Over 10,00,000 devices on a chip
12th generation Intel Core processors released in late 2021
Moore’s Law
• Increased density of components on chip (LSI, VLSI, ULSI…)
• Gordon Moore – co-founder of Intel
– Number of transistors on a chip will double every year
• Since 1970’s development has slowed a little
– Number of transistors doubles every 18 months
• Consequences of Moore’s Law
– Cost of a chip has remained almost unchanged
– Higher packing density means shorter electrical paths, giving higher
performance
– Smaller size gives increased flexibility
– Reduced power and cooling requirements
– Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series
• 1964
• Replaced (& not compatible with) 7000 series
• First planned “family” of computers
– Model 30, 40, 50, 65, 75
• Characteristics of family include:
– Similar or identical instruction sets
– Similar or identical O/S (going from
– Increasing speed lower to
higher
– Increasing number of I/O ports (i.e. more terminals)
family
– Increasing memory size members)
– Increased cost
• Multiplexed switch structure
DEC PDP-8
• 1964
• First mini-computer
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000 (PDP-8 price)
– $100k (several hundreds of thousands for IBM 360)
• BUS STRUCTURE
– Instead of central-switched architecture (IBM 360)
– Omnibus consisting 96 signal paths
– Carrying control, data and address signal

DEC = Digital Equipment Corporation


DEC PDP-8, Bus Structure
Later Generations
(as discussed previously i.e. 4th, 5th and 6th)

Remember: Along with other technological advancements, the


increasing density of components on chip result in:

• Semiconductor memory
• Microprocessor

25
Semiconductor Memory
• 1950’s & 1960’s computer memories were single ring/core
– Magnetic
– Expensive, bulky
– Destructive readout

• 1970, Fairchild → first capacious semiconductor memory


– Single Chip (size of single core)
– Holds 256 bits
– Non-destructive read
– Much faster than core
– Capacity approximately doubles each year

(Destructive read= erase to read, then restore)


Microprocessor- Intel
• 1971 - 4004
– First microprocessor
– All CPU components on a single chip
– 4 bit
• 1972 - 8008
– 8 bit
– Complexity: twice than 4004
– Both designed for specific applications
• 1974 - 8080
– 8-bit
– Intel’s first general purpose microprocessor
– Larger instructions set and addressing capabilities
Evolution of Intel Microprocessors
Designing for Performance

30
Speeding it up
clock cycle = K+(n-1)

• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction
– It prefetched the next branch or groups of instructions (or multiple braches)
and buffer them
• Data flow analysis
– analyzes which instructions are dependent on each other’s results/data, to
create an optimized schedule of instructions
– instructions are scheduled to be executed when ready, independent of
the original program order
• Speculative execution
– Using branch prediction & data flow analysis,
– Some processors speculatively execute instructions ahead of their actual
appearance in the program execution, holding the results in temporary
locations
Performance Balance
• Processor speed increased
• Memory capacity increased

• Balance = to compensate the mismatch among the capabilities


of the various components of computer system
– Processor speed has grown rapidly, but
– Data transfer between processor and memory lagged badly
• Consequently, if memory fails to keep pace with the processor’s
insistent demands, the processor waits and processing time is lost
Logic and Memory Performance Gap
Solutions
• Increase number of bits retrieved at one time
– Make DRAM “wider” rather than “deeper”
– Using wide bus data paths
• Change DRAM interface
– Include Cache on DRAM chip
– or any other buffering scheme
• Reduce frequency of memory access
– More complex and efficient cache structure
– On-chip and off-chip cache
• Increase interconnection bandwidth (between CPU & Mem)
– High speed buses
– Hierarchy of buses
Handling of I/O Devices
• Application are developed to support the Peripherals (because of
intensive I/O demands)
• Large data throughput demands

• Processors can handle this


– but the problem was data movement (bet. CPU and peripheral)
• Solutions:
– Caching or any Buffering schemes
– Higher-speed interconnection buses
– More elaborate bus structures
– Multiple-processor configurations
Typical I/O Device Data Rates
Key is Balance
• To balance the throughput and processing demands of
– Processor components
– Main memory
– I/O devices
– Interconnection structures
Improvements in Chip
Organization and Architecture
• Increase hardware speed of processor
– Fundamentally due to shrinking logic gate size on chip
• More gates, packed more tightly
• Propagation time for signals reduced
• Increasing clock rate →rapid execution
• Increase size and speed of caches
– Dedicating part of processor chip
• Cache access times drop significantly
• Change processor organization and architecture
– Increase effective speed of execution
– Using Parallelism (in any form)
Intel Microprocessor Performance
Problems (with Clock Speed and Logic Density)
• Power
– Power density increases with density of logic and clock speed
– Dissipating the heat generated
• RC delay
– Speed/electrons flow is limited by R (Ω) and C (F) of metal wires
connecting transistors on chip
– Delay increases as RC product increases
• Wire interconnects thinner, increasing resistance
• Wires closer together, increasing capacitance

• Memory latency
– Memory speeds lag processor speeds (discussed previously)
• Solution: More emphasis on organizational and architectural approaches
Two Main Strategies
(increasing performance of processor)

1. Increased Cache Capacity

• Typically two or three levels of cache between processor and


main memory
– As chip density increased
– Hence, more cache memory on chip
• Enabling faster cache access
– E.g.
– Pentium chip devoted about 10% of chip area to cache
– Pentium 4 devotes about 50%
2. More Complex Instruction Execution Logic

• Enable parallel execution of instructions within processor

1. Pipeline works like assembly line in manufacturing plant


– Different stages of execution of different instructions at same time
along pipeline

2. Superscalar allows multiple pipelines within single processor


– Instructions that do not depend on one another can be executed in
parallel
Diminishing Returns
(becoming smaller (less/decreasing) or appearing to do so…)

• Internal organization of processors becoming complex


– Difficult to squeeze parallelism out of the instruction stream
– Further significant increases likely to be relatively modest/unsure

• Benefits from cache are reaching a limit


– three levels of cache on the processor chip

• Increasing clock rate runs into power dissipation problem


– Greater the amount of power to be dissipated
– Some fundamental physical limits are being reached
New Approach – MultiCore
• Multiple processors on single chip = multiple cores
– Within a processor, increase in performance proportional to square root
of increase in complexity
– If software can use multiple processors, doubling number of processors
almost doubles performance
– So, use two simpler processors on a chip rather than one more complex
processor

• With two processors, larger caches are justified


– Power consumption of memory logic less than processing logic

• Example: IBM POWER4


– Two cores based on PowerPC
POWER4 Chip Organization
Intel Product Evolution
• 8080
– first general purpose microprocessor
– 8-bit processor with 8-bit data path
– Used in first personal computer – Altair
• 8086
– much more powerful
– 16 bit, wider data paths, larger registers, IMB memory
– instruction cache/queue, prefetch few instructions
– 8088 (8 bit external bus) used in first IBM PC
• 80286
– 16 MB memory addressable up from 1MB
• 80386
– Intel’s First 32 bit
– Support multitasking (run multiple programs at the same time)
Intel Product Evolution
• 80486
– Sophisticated powerful cache and instruction pipelining
– built in Co-processor (offloading complex math operations from CPU)
• Pentium
– Use superscalar technique
– Allowing multiple instructions executed in parallel
• Pentium Pro
– Increased superscalar organization
– Aggressive register renaming
– Branch prediction
– Data flow analysis
– Speculative execution
Intel Product Evolution
• Pentium II
– MMX technology
– To process video, audio, and graphics data
• Pentium III
– Additional floating point instructions for 3D graphics
• Pentium 4 (Note: Arabic rather than Roman numerals)
– Further floating point and multimedia enhancements
• Core
– 1st Intel x86 microprocessor with dual core
• Core 2
– extends the architecture to 64 bits
– Core 2 Quad provides four processors on a single chip
Thank you

Dr. Shahid Latif (Associate Professor)

Department of Computer Science & IT


Sarhad University of Science and Information Technology, Peshawar

49

You might also like