KEMBAR78
03 Why Parallel | PDF | Parallel Computing | Multi Core Processor
0% found this document useful (0 votes)
25 views34 pages

03 Why Parallel

The document outlines the evolution of computer architecture from the first electronic computers in the 1940s to modern multicore processors, highlighting key milestones and technological advancements. It discusses the shift from single-core to multicore systems, the challenges of parallel programming, and the importance of understanding hardware for software efficiency. The document concludes that multicore processors are essential for future computing, emphasizing the need for effective parallel programming strategies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views34 pages

03 Why Parallel

The document outlines the evolution of computer architecture from the first electronic computers in the 1940s to modern multicore processors, highlighting key milestones and technological advancements. It discusses the shift from single-core to multicore systems, the challenges of parallel programming, and the importance of understanding hardware for software efficiency. The document concludes that multicore processors are essential for future computing, emphasizing the need for effective parallel programming strategies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Computer History

Eckert and Mauchly

• 1st working electronic


computer (1946)
• 18,000 Vacuum tubes
• 1,800 instructions/sec
• 3,000 ft3
Computer History
• Maurice Wilkes

1st stored program


computer
EDSAC 1 (1949) 650 instructions/sec
http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/ 1,400 ft3
Intel 4004 Die Photo
• Introduced in 1970
– First
microprocessor
• 2,250 transistors
• 12 mm2
• 108 KHz
Intel 8086 Die Scan
• 29,000 transistors
• 33 mm2
• 5 MHz
• Introduced in 1979
– Basic architecture
of the IA32 PC
Intel 80486 Die Scan
• 1,200,000
transistors
• 81 mm2
• 25 MHz
• Introduced in 1989
– 1st pipelined
implementation of
IA32
Pentium Die Photo
• 3,100,000
transistors
• 296 mm2
• 60 MHz
• Introduced in 1993
– 1st superscalar
implementation of
IA32
Pentium III
• 9,500,000
transistors
• 125 mm2
• 450 MHz
• Introduced in 1999

http://www.intel.com/intel/museum/25anniv/hof/hof_main.htm
Pentium 4
• 55,000,000
transistors
• 146 mm2
• 3 GHz
• Introduced in 2000

http://www.chip-architect.com
Core 2 Duo (Merom)
Pentium 4 Intel Core i7 (Nehalem)

Montecito (Itanium 2) Cell Processor


IBM Power 7

(SUN UltraSparc T3)


First Generation (1970s)

Single Cycle Implementation


Second Generation (1980s)

F D I E C

•Pipelinining: temporal parallelism


•Number of stages increase with each generation
•Maximum CPI = 1
Third Generation (1990s)
E

F D I E C

•ILP
•Dynamic: superscalar
•Out-Of-Order Execution (scheduling)
E
•Static: VLIW/EPIC
•Spatial parallelism
•IPC not CPI
•Instruction window
•Speculative Execution (prediction)
Fourth Generation (2000s)
E

F D I E C
E

F D I E C
E

Simultaneous Multithreading (SMT)


(aka Hyperthreading Technology)
The Famous Moore’s Law
Hardware Improvement

Positive Cycle
People ask for more of Computer Better Software
improvements Industry

People get used to the


software
How Did These Advances Happen?
• Restrictions
Wishes • Capabilities

Software Computer Process


Community Architecture Technology

• Performance Design
• Restrictions
Performance in the past
achieved by:

• clock speed
• execution optimization
• cache

Performance now
achieved by:

• hyperthreading
• multicore
• cache
The Status-Quo
• We moved from single core to multicore to
manycore:
– for technological reasons
• Free lunch is over for software folks
– The software will not become faster with every
new generation of processors
• Not enough experience in parallel programming
– Parallel programs of old days were restricted to
some elite applications -> very few programmers
– Now we need parallel programs for many different
applications
Old School New School
 Increasing clock frequency is  Processors parallelism is
primary method of primary method of performance
performance improvement improvement

 Don’t bother parallelizing an  Nobody is building one processor


application, just wait and run on per chip. This marks the end of
much faster sequential computer the La-Z-Boy programming era

 Given the switch to parallel hardware,


 Less than linear scaling for a
even sub-linear speedups are
multiprocessor is failure beneficial as long as you beat the
sequential
35

Slide Source: Berkeley View of Landscape


• Memory Wall
• ILP Wall
• Power Wall
Memory Speed:
Widening of the Processor-DRAM Performance Gap

Courtesy of Elsevier, Computer Architecture, Hennessey and Patterson, fourth edition


12
Power Density
Moore’s law is giving us more transistors than we can afford!

Scaling clock speed (business as usual) will not work


10000 Sun’s
Surface
Rocket
1000
Nozzle
Power Density (W/cm2)

Nuclear
100
Reactor

8086 Hot Plate


10 4004 P6
8008 8085 386 Pentium®
286 486
8080 Source: Patrick
1 Gelsinger, Intel
1970 1980 1990 2000 2010
Year
Multicore Processors Save Power

Power = C * V2 * F Performance = Cores * F

Let’s have two cores

Power = 2*C * V2 * F Performance = 2*Cores * F

But decrease frequency by 50%


Power = 2*C * V2/4 * F/2 Performance = 2*Cores * F/2

Power = C * V2/4 * F Performance = Cores * F


A Case for Multicore Processors
• Can exploit different types of
parallelism
• Reduces power
• An effective way to hide memory
latency
• Simpler cores = easier to design and
test = higher yield = lower cost
Cost and Challenges
of Parallel Execution
• Communication cost
• Synchronization cost
• Not all problems are amenable to
parallelization
• Hard to think in parallel
• Hard to debug
Attempts to Make Multicore
Programming Easy
• 1st idea: The right computer language
would make parallel programming
straightforward
– Result so far: Some languages made
parallel programming easier, but none has
made it as fast, efficient, and flexible as
traditional sequential programming.
Attempts to Make Multicore
Programming Easy
• 2nd idea: If you just design the
hardware properly, parallel programming
would become easy.
– Result so far: no one has yet succeeded!
Attempts to Make Multicore
Programming Easy
• 3rd idea: Write software that will
automatically parallelize existing
sequential programs.
– Result so far: Success here is inversely
proportional to the number of cores!
Qualcomm Snapdragon SoC
 Hardware:
 Krait processor (4-core ARM)
 Adreno (128-core GPU)

 Applications:
Nvidia Tegra SoC
 Hardware
 4-core ARM A57
 4-core ARM A53
 256-core GPU

 Applications
 Robotics, CV, Imaging
 E.g., BMW Driver Assistance
Tilera Tile-Gx SoC
 Hardware
 100-core ARM

 Applications
 Network, Cloud, Security
 E.g., 10Gbps Layer-7 Network Appl. Classification
 E.g., 40Gbps Lossless Network Packet Capture
Google Data Center, USA
Parallel Languages/Libraries
 Pthread
 MPI
 CUDA
 OpenCL
 OpenMP
 OpenACC
 and many more!
Conclusions
• The free lunch is over.
• Mulicore/Manycore processors are here
to stay, so we have to deal with them.
• Knowing about the hardware will make
you way more efficient in software!

You might also like