The state of computing
• powerful hardware facilities driven by extensive
software packages
Computer Development Milestones
Elements of Modern Computers
Evolution of Computer Architecture
System Attributes to Performance
Computer Development Milestones
500 BC: Abacus (China) – The earliest
mechanical computer/calculating device.
Operated to perform decimal arithmetic with
carry propagation digit by digit
1642: Mechanical Adder/Subtractor (Blaise
Pascal)
1827: Difference Engine (Charles Babbage)
Computer Development Milestones
1941: First binary mechanical computer (Konrad
Zuse; Germany)
1944: Harvard Mark I (IBM)
The very first electromechanical decimal
computer as proposed by Howard Aiken
Computer Development Milestones
computing and communication with moving mechanical
parts limited the computing speed and reliability of
mechanical computers
Modern computer- electronic components
moving parts replaced by high-mobility electrons in
electronic computers.
information transmission by mechanical gears or levers
was replaced by electric signals traveling almost at the
speed of light.
Computer Generation
Generatio Technology Software and Represe
n and Applications ntative
Architecture systems
First Vacuum Machine/ EN IAC,
{I945-54} tubes and assembly Princeto
relay languages, n IAS,
memories, single user, no IBM 701.
CPU driven subroutine
by linkage,
PC and programmed
accumulator, l/O using CPU.
fixed-point
arithmetic.
Computer Generation
Generatio Technology and Software Representati
n Architecture and ve systems
Applications
Second Discrete HLL used IBM T090,
(I955-64) transistors and with CDC I-I5-D4,
core memories, compilers, Univac LARC
floating-point subroutine
arithmetic, libraries,
l/O processors, batch
multiplexed processing
memory access. monitor.
Computer Generation
Gene Technology and Software Represent
ration Architecture and ative
Applicatio systems
ns
Third Integrated circuits Multiprogra IBM
(I965 (SSI/MSI) mming and 360/370 ,C
-74) microprograrnrnin timesharing DC 6600,
g, OS, TI-
pipelining, cache, multiuser ASC,PDP-8
and applications
Lookahead .
processors.
Computer Generation
Generati Technology Software Representa
on and and tive
Architecture Application systems
s
Fourth LSI/VLSI and Multiprocess VAX 9000,
(I975-90) semiconductor or OS, Cray X-MP,
memory, languages, IBM 3090,
multiprocessors compilers, BBN
, and TC2000
vector environment
supercomputer, s
multicomputer. tor parallel
processing.
Computer Generation
Generati Technology Software and Represen
on and Applications tative
Architectur systems
e
Fifth Advanced Superscalar Currently
(1991- VLSl processors, used
present) processors, systems systems
memory, on a chip,
and massively
switches, parallel
high-density processing,
packaging, grand challenge
scalable applications,
architecture heterogeneous
Elements of Modern Computers
Elements of Modern Computers
Computing Problem
Algorithms and Data structure
Hardware Resources
Operating System
System Software Support
Elements of Modern Computers
Computing Problem
A modern computer is an integrated system
consisting
machine hardware
an instruction set
system software
application programs
user interfaces.
Depending on the nature of the problems, the
solutions may require different computing resources.
Computing Problem
numerical problems in science and technology, the
solutions demand complex mathematical formulations
alphanumerical problems in business and government
the solutions demand efficient transaction processing,
large database management, and information retrieval
operation s.
For artificial intelligence [Al] problems, the solutions
demand logic inferences and symbolic manipulations.
Algorithms and Data structure
Special algorithms and data structures are needed
to specify the computations and communications
involved in computing problems.
numerical algorithms are deterministic, using
regularly structured data
Hardware Resource
Hardware resources, an operating system, and
application software.
the hardware core of a computer system
Processors , memory and peripheral devices
Interfaces built into I/O device
display terminals, workstations, optical page
scanners, magnetic ink character recognizers,
modems. network adaptors, voice data entry,
printers , and plotters.
Hardware Resource
software interfaces
file transfer
systems,
editors,
word processors,
device drivers,
interrupt handlers,
network communication programs
Hardware Resource
Operating System
manages the allocation and deallocation of
resources during the execution of user programs
Mapping is a bidirectional process matching
algorithmic structure with hardware architecture
Efficient mapping-better source codes.
Mapping includes processor scheduling, memory
maps, interprocessor communications.
Operating System
implementation of these mappings relies on
efficient compiler and operating system support
Parallelism can be exploited at algorithm design
time, at program time, at compile time, and at run
time.
System Software Support
Compiler -source code written in a HLL must be
first translated into object code
Compiler assigns variables to registers or to
memory words- produce machine code
A loader is used to initiate the program execution
through the OS kernel
System Software Support
effectiveness of this process determines the
efficiency of hardware utilization and the
programmability of the computer.
develop a parallel programming environment with
architecture -independent languages, compilers,
and software tools
Compiler Support
preprocessor, precompiler, and parallelizing
compiler.
Preprocessor -sequential compiler and a low-level
library of the target computer to implement high-
level parallel constructs.
Precompiler- requires some program flow
analysis, dependence checking, and limited
optimizations towards parallelism detection
Compiler Support
parallelizing compiler- automatically detect
parallelism in source code and transform
sequential codes into parallel constructs.
Evolution of Computer Architecture
Evolution of Computer Architecture
Von neumann architecture -sequential machine
executing scalar data .
Slow -sequential execution of instructions in
programs.
Evolution of Computer Architecture
Lookahead, Parallelism, and Pipelining
prefetch instructions to overlap I/E operations and
to enable functional parallelism.
Functional parallelism supported by
multiple functional units simultaneously
practice pipelining at various processing levels
Includes pipelined instruction execution,
pipelined arithmetic computations, and
memory-access operations.
Flynn's Classification
SISD - single instruction stream over single data
stream, (e.g. simple PC)
SIMD - single instruction stream over multiple
data.
MISD – multiple instruction stream over single
data stream
MIMD - multiple instruction stream over multiple
data stream
SISD
A serial (non-parallel) computer
Single Instruction: Only one instruction stream is
being acted on by the CPU during any one clock
cycle
Single Data: Only one data stream is being used
as input during any one clock cycle
SISD
Deterministic execution
This is the oldest and even today, the most
common type of computer
Examples: older generation mainframes,
minicomputers and workstations; most modern
day PCs
SIMD
A type of parallel computer
Single Instruction: All processing units execute
the same instruction at any given clock cycle
Multiple Data: Each processing unit can operate
on a different data element
Best suited for specialized problems
characterized by a high degree of regularity, such
as graphics/image processing.
SIMD
Synchronous (lockstep) and deterministic
execution
Two varieties: Processor Arrays and Vector
Pipelines
Processor Arrays: Connection Machine CM-2,
MasPar MP-1 & MP-2, ILLIAC IV
SIMD
Vector Pipelines: IBM 9000, Cray X-MP, Y-MP &
C90, Fujitsu VP, NEC SX-2, Hitachi S820, ETA10
graphics cards
Most modern computers, particularly those with
graphics processor units (GPUs) employ SIMD
instructions and execution units.
possibility to switch off some processors (with
mask arrays)
MISD
A type of parallel computer
Multiple Instruction: Each processing unit operates
on the data independently via separate instruction
streams.
Single Data: A single data stream is fed into
multiple processing units.
Few actual examples of this class of parallel
computer have ever existed. One is the
experimental Carnegie-Mellon computer (1971).
MISD
Some conceivable uses might be:
multiple frequency filters operating on a
single signal stream
multiple cryptography algorithms attempting
to crack a single coded message.
MIMD
A type of parallel computer
Multiple Instruction: Every processor may be
executing a different instruction stream
Multiple Data: Every processor may be working
with a different data stream
Execution can be synchronous or asynchronous,
deterministic or nondeterministic
MIMD
Currently, the most common type of parallel
computer - most modern supercomputers fall into
this category.
Examples: most current supercomputers,
networked parallel computer clusters and "grids",
multi-processor SMP computers, multi-core PCs.
Note: many MIMD architectures also include
SIMD execution sub-components
MIMD
Comparing SIMD with MIMD
SIMD have less hardware units than MIMD (single
instruction unit)
Nevertheless, as SIMD computers are specially
designed, they tend to be expensive and time
consuming to develop
not all applications suitable for SIMD
Platforms supporting SIMD can be built from cheaper
components
Flynn-Johnson classification
Parallel / Vector Computer
Classes of parallel computers
Shared memory multiprocessors
Message passing multicomputers
Distinction between multiprocessor and
multicomputer
Memory sharing
Mechanisms used for interprocessor
communication
Parallel / Vector Computer
Processors in a multiprocessor system
communicate with each other through shared
variables in a common memory
Each computer node in multicomputer system has
a local memory ,unshared with other nodes
Interprocessor communication through message
passing among the nodes
Parallel / Vector Computer
Vector processor is equipped with multiple vector
pipelines
Concurrently used under hardware or firmware
control
Pipelined vector processors
Memory to memory architecture
Register to register architecture
Parallel / Vector Computer
Pipelined vector processor
Memory to memory architecture
Pipelined flow of vector operands directly
from the memory to pipelines and then back
to memory
Register to register architecture
Uses vector register to interface between
the memory and functional pipelines
System Attributes to performance
System Attributes to performance
Performance of computer system demands a
perfect match b/w machine capability and program
behavior
Machine capability enhanced by
Hardware technology
Architectural features
Efficient resource management
System Attributes to performance
Program behavior
difficult to predict -Dependence on application
and run time conditions
Algorithm design
Data structures
Language efficiency
Programmer skill
Complier technology
Clock Rate and CPI
CPU is driven by a clock with constant cycle time
F=1/t
Size of the program is controlled by instruction
count Ic
Clock cycle differ for machine instruction
Cycles per instruction –tells the time needed to
execute the instruction
Performance factors
CPU time
T= Ic x CPI x ꚍ
Ic = instruction count
CPI= cycles per instruction
ꚍ =cycle time
Execution of instruction requires
Instruction fetch ,Decode ,Operands fetch
Execution ,Store results
Performance factors
Instruction decode and execution phases carried in
CPU
Other operation required access to memory
Memory cycle – time needed to complete one
memory reference
Memory cycle = k times processor cycle
K depends on speed of the memory technology and
processor memory interconnection
Performance factors
CPI of instruction type is divided into
Total processor cycle
Memory cycle -Needed to complete the
execution of instruction
Instruction cycle requires four memory
reference
Performance factors
T= Ic x (p+m+k) x ꚍ
p= number of processor cycle
m= number of memory reference
k= ratio between memory cycle and
processor cycle
System Attributes
MIPS Rate
C- total number of clock cycles needed to execute the
program
CPU Time T = C x ꚍ = C /f = C/Ic
T= Ic x CPI x ꚍ = Ic x CPI/ f
Processor speed measured in million instructions per
second(MIPS)
MIPS = Ic/T x 10^6
MIPS = f/ CPI x 10^ 6
Throughput Rate
How many programs a system can execute per
unit time –system throughput Ws
Ws lower than CPU throughput Wp,Wp= f/Ic xCPI
CPU throughput –how many programs executed
per second
Ws < Wp – additional overheads by i/o , complier ,
OS
Ws = Wp –CPU is kept busy in a perfect program
–interleaving fashion
Programming Environments
Conventional uniprocessor computers are
programmed in sequential environment
Original OS kernel –response to one system call
at a time
Existing complier –sequential object code to run
on a sequential manner
Conventional computers – sequential
programming environment
Programming Environments
Parallel computers –desires parallel environment
–parallelism exploited
Language extensions – support parallelism
OS should support parallelism
Implicit Parallelism
Uses c, fortran, lisp pascal –write source program
Sequential coded source program is translated
into parallel object code –parallelizing complier
Complier – detect parallelism and assign target
machine resources
Success relies on intelligence of complier
Explicit Parallelism
Uses c, fortran, lisp pascal –write source program
Parallelism is specified in user programs
Reduce the burden on complier to detect
parallelism
Complier needs to preserve parallelism and
assigns target machine resources
Implicit & Explicit Parallelism