Advanced Computer Architecture
CS-522
MS – Computer Science
Credit Hours : 3-0
Dr. Shahid Latif (Associate Professor)
Department of Computer Science & IT
1
Sarhad University of Science and Information Technology, Peshawar
Course Details
Course title/code: Adv. Computer Architecture/CS-522
Lecture: 02
Topic: Computer Evolution and Performance
Designing for Performance
Program: MS – Computer Science (Sem: 1st, 2nd, 3rd & 4th)
Department of Computer Science & IT
Sarhad University of Science and Information Technology, Peshawar 2
Lecture Outlines
• A Brief History of Computers • Designing for Performance
• First Generation: Vacuum Tubes • Microprocessor speed
• ENIAC • Performance balance
• John von Neumann (IAS) • Handling I/O device
machine • Improvements in Chip
• Commercial computers Organization and
• Second Generation: Transistors Architecture
• Third Generation: Integrated • Diminishing return
Circuits (ICs) • New approach: Multiple
• Integrated Circuits, Core
Generations of Computer, • Intel Product Evolution
Moore’s Law
• Later Generations
• Semiconductor memory &
Microprocessors
3
Computer Evolution and
Performance
Department of Computer Science & IT
Sarhad University of Science and Information Technology, Peshawar
First Generation: Vacuum Tubes
ENIAC - background
• Electronic Numerical Integrator And Computer
– world’s first general-purpose electronic digital computer
– John Eckert and John Mauchly (student and Professor)
– University of Pennsylvania
• Started 1943 (was needed during World War-II)
• Needed for development of Firing Tables
– Range & Trajectory tables for new weapons
• Finished 1946
– Too late for war effort
• Used until 1955
ENIAC - details
• Decimal (not binary)
• Memory = 20 accumulators of 10 digits
• Programmed manually by switches (and plugging and unplugging
cables)
• Containing → 18,000 vacuum tubes
• Weighing → 30 tons
• Occupying → 15,00 square feet
• Consuming → 140 kW power
• Capable → 5,000 additions per second
John von-Neumann Machine
• Stored-Program concept
• by John Von Neumann, a mathematician
– as entering and altering of programs in ENIAC was extremely tedious
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from memory and
executing
• I/O equipment operated by control unit
• referred IAS computer (at Princeton Institute for Advanced Studies)
– Started 1946 → not Completed until 1952
– However, prototype of all subsequent general-purpose computers
Structure of von-Neumann (IAS)
machine
IAS - details
• 1000 storage location of 40-bits each (termed words)
– Binary number (data and instructions are binary codes)
– Word may= 2 x 20 bit instructions
• Set of registers (storage in CPU)
– Memory Buffer Register (MBR), Memory Address Register (MAR)
– Instruction Register (IR), Instruction Buffer Register (IBR)
– Program Counter (PC)
– Accumulator (AC) & Multiplier Quotient (MQ) (operands and results)
• For Example: 40*40 = 80 (MSBs→ AC and LSBs → MQ )
Expanded
Structure
of IAS
Commercial Computers
• 1947 - Eckert-Mauchly Computer Corporation
– 1st successful commercial machine = UNIVAC I (Universal Automatic
Computer)
– Commissioned by US Bureau of Census for 1950 calculations
– Intended for both scientific and commercial applications, such as
• matrix algebraic computations
• statistical problems
• premium billings for a life insurance company
• logistical problems etc.
• Late 1950s - UNIVAC II
– Faster
– More memory
Commercial Computers cont.
• IBM
• Manufacturer of Punched-card processing equipment
• 1953 - the 701
– IBM’s first stored program computer
– Scientific applications (calculations)
• 1955 - the 702
– Suited for Business applications
• Lead to 700/7000 series of computers
– 7000 were transistor based (2nd generation)
Second Generation: Transistors
14
Transistors
• Replaced vacuum tubes by Transistors
– Smaller
– Cheaper
– Less heat dissipation
• Solid State device
– Solid semiconductor crystals (materials)
– Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al.
Transistor Based Computers
• Second generation machines
• Front-runner companies → produce small transistor machines
– NCR (National Cash Register)
– RCA (Radio Corporation of America)
– IBM i.e. 7000 series
• Digital Equipment Corporation (DEC) – founded in 1957
– Produced PDP-1
– Start the minicomputer phenomenon
• become so prominent in the third generation
Third Generation: Integrated Circuits
17
Integrated Circuits
• A single, self-contained transistor is called a discrete
component
• Throughout 1950s → early 1960s
– Devices were composed of transistors, resistors, capacitors
– Early second-generation computers contained about 10,000 transistors
– This figure grew to the hundreds of thousands, making the manufacture
increasingly difficult
• Era of microelectronics: the invention of the integrated circuit
– Begins in 1958
– A computer is made up of gates, memory cells and interconnections
– These can be manufactured on a semiconductor e.g. silicon wafer
Generations of Computer
• 1st → Vacuum tube - 1946-1957
• 2nd → Transistor - 1958-1964
• 3rd → Small scale integration - 1965-1971
• Up to 100 devices on a chip
Medium scale integration -
• 100 - 1,000 devices on a chip
• 4th → Large scale integration - 1972-1977
• 1,000 - 10,000 devices on a chip
• 5th → Very large scale integration - 1978 -1991
• 10,000 – 10,00,000 devices on a chip
• 6th → Ultra large scale integration - 1991---
• Over 10,00,000 devices on a chip
12th generation Intel Core processors released in late 2021
Moore’s Law
• Increased density of components on chip (LSI, VLSI, ULSI…)
• Gordon Moore – co-founder of Intel
– Number of transistors on a chip will double every year
• Since 1970’s development has slowed a little
– Number of transistors doubles every 18 months
• Consequences of Moore’s Law
– Cost of a chip has remained almost unchanged
– Higher packing density means shorter electrical paths, giving higher
performance
– Smaller size gives increased flexibility
– Reduced power and cooling requirements
– Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series
• 1964
• Replaced (& not compatible with) 7000 series
• First planned “family” of computers
– Model 30, 40, 50, 65, 75
• Characteristics of family include:
– Similar or identical instruction sets
– Similar or identical O/S (going from
– Increasing speed lower to
higher
– Increasing number of I/O ports (i.e. more terminals)
family
– Increasing memory size members)
– Increased cost
• Multiplexed switch structure
DEC PDP-8
• 1964
• First mini-computer
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000 (PDP-8 price)
– $100k (several hundreds of thousands for IBM 360)
• BUS STRUCTURE
– Instead of central-switched architecture (IBM 360)
– Omnibus consisting 96 signal paths
– Carrying control, data and address signal
DEC = Digital Equipment Corporation
DEC PDP-8, Bus Structure
Later Generations
(as discussed previously i.e. 4th, 5th and 6th)
Remember: Along with other technological advancements, the
increasing density of components on chip result in:
• Semiconductor memory
• Microprocessor
25
Semiconductor Memory
• 1950’s & 1960’s computer memories were single ring/core
– Magnetic
– Expensive, bulky
– Destructive readout
• 1970, Fairchild → first capacious semiconductor memory
– Single Chip (size of single core)
– Holds 256 bits
– Non-destructive read
– Much faster than core
– Capacity approximately doubles each year
(Destructive read= erase to read, then restore)
Microprocessor- Intel
• 1971 - 4004
– First microprocessor
– All CPU components on a single chip
– 4 bit
• 1972 - 8008
– 8 bit
– Complexity: twice than 4004
– Both designed for specific applications
• 1974 - 8080
– 8-bit
– Intel’s first general purpose microprocessor
– Larger instructions set and addressing capabilities
Evolution of Intel Microprocessors
Designing for Performance
30
Speeding it up
clock cycle = K+(n-1)
• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction
– It prefetched the next branch or groups of instructions (or multiple braches)
and buffer them
• Data flow analysis
– analyzes which instructions are dependent on each other’s results/data, to
create an optimized schedule of instructions
– instructions are scheduled to be executed when ready, independent of
the original program order
• Speculative execution
– Using branch prediction & data flow analysis,
– Some processors speculatively execute instructions ahead of their actual
appearance in the program execution, holding the results in temporary
locations
Performance Balance
• Processor speed increased
• Memory capacity increased
• Balance = to compensate the mismatch among the capabilities
of the various components of computer system
– Processor speed has grown rapidly, but
– Data transfer between processor and memory lagged badly
• Consequently, if memory fails to keep pace with the processor’s
insistent demands, the processor waits and processing time is lost
Logic and Memory Performance Gap
Solutions
• Increase number of bits retrieved at one time
– Make DRAM “wider” rather than “deeper”
– Using wide bus data paths
• Change DRAM interface
– Include Cache on DRAM chip
– or any other buffering scheme
• Reduce frequency of memory access
– More complex and efficient cache structure
– On-chip and off-chip cache
• Increase interconnection bandwidth (between CPU & Mem)
– High speed buses
– Hierarchy of buses
Handling of I/O Devices
• Application are developed to support the Peripherals (because of
intensive I/O demands)
• Large data throughput demands
• Processors can handle this
– but the problem was data movement (bet. CPU and peripheral)
• Solutions:
– Caching or any Buffering schemes
– Higher-speed interconnection buses
– More elaborate bus structures
– Multiple-processor configurations
Typical I/O Device Data Rates
Key is Balance
• To balance the throughput and processing demands of
– Processor components
– Main memory
– I/O devices
– Interconnection structures
Improvements in Chip
Organization and Architecture
• Increase hardware speed of processor
– Fundamentally due to shrinking logic gate size on chip
• More gates, packed more tightly
• Propagation time for signals reduced
• Increasing clock rate →rapid execution
• Increase size and speed of caches
– Dedicating part of processor chip
• Cache access times drop significantly
• Change processor organization and architecture
– Increase effective speed of execution
– Using Parallelism (in any form)
Intel Microprocessor Performance
Problems (with Clock Speed and Logic Density)
• Power
– Power density increases with density of logic and clock speed
– Dissipating the heat generated
• RC delay
– Speed/electrons flow is limited by R (Ω) and C (F) of metal wires
connecting transistors on chip
– Delay increases as RC product increases
• Wire interconnects thinner, increasing resistance
• Wires closer together, increasing capacitance
• Memory latency
– Memory speeds lag processor speeds (discussed previously)
• Solution: More emphasis on organizational and architectural approaches
Two Main Strategies
(increasing performance of processor)
1. Increased Cache Capacity
• Typically two or three levels of cache between processor and
main memory
– As chip density increased
– Hence, more cache memory on chip
• Enabling faster cache access
– E.g.
– Pentium chip devoted about 10% of chip area to cache
– Pentium 4 devotes about 50%
2. More Complex Instruction Execution Logic
• Enable parallel execution of instructions within processor
1. Pipeline works like assembly line in manufacturing plant
– Different stages of execution of different instructions at same time
along pipeline
2. Superscalar allows multiple pipelines within single processor
– Instructions that do not depend on one another can be executed in
parallel
Diminishing Returns
(becoming smaller (less/decreasing) or appearing to do so…)
• Internal organization of processors becoming complex
– Difficult to squeeze parallelism out of the instruction stream
– Further significant increases likely to be relatively modest/unsure
• Benefits from cache are reaching a limit
– three levels of cache on the processor chip
• Increasing clock rate runs into power dissipation problem
– Greater the amount of power to be dissipated
– Some fundamental physical limits are being reached
New Approach – MultiCore
• Multiple processors on single chip = multiple cores
– Within a processor, increase in performance proportional to square root
of increase in complexity
– If software can use multiple processors, doubling number of processors
almost doubles performance
– So, use two simpler processors on a chip rather than one more complex
processor
• With two processors, larger caches are justified
– Power consumption of memory logic less than processing logic
• Example: IBM POWER4
– Two cores based on PowerPC
POWER4 Chip Organization
Intel Product Evolution
• 8080
– first general purpose microprocessor
– 8-bit processor with 8-bit data path
– Used in first personal computer – Altair
• 8086
– much more powerful
– 16 bit, wider data paths, larger registers, IMB memory
– instruction cache/queue, prefetch few instructions
– 8088 (8 bit external bus) used in first IBM PC
• 80286
– 16 MB memory addressable up from 1MB
• 80386
– Intel’s First 32 bit
– Support multitasking (run multiple programs at the same time)
Intel Product Evolution
• 80486
– Sophisticated powerful cache and instruction pipelining
– built in Co-processor (offloading complex math operations from CPU)
• Pentium
– Use superscalar technique
– Allowing multiple instructions executed in parallel
• Pentium Pro
– Increased superscalar organization
– Aggressive register renaming
– Branch prediction
– Data flow analysis
– Speculative execution
Intel Product Evolution
• Pentium II
– MMX technology
– To process video, audio, and graphics data
• Pentium III
– Additional floating point instructions for 3D graphics
• Pentium 4 (Note: Arabic rather than Roman numerals)
– Further floating point and multimedia enhancements
• Core
– 1st Intel x86 microprocessor with dual core
• Core 2
– extends the architecture to 64 bits
– Core 2 Quad provides four processors on a single chip
Thank you
Dr. Shahid Latif (Associate Professor)
Department of Computer Science & IT
Sarhad University of Science and Information Technology, Peshawar
49