0% found this document useful (0 votes)

25 views34 pages

03 Why Parallel

The document outlines the evolution of computer architecture from the first electronic computers in the 1940s to modern multicore processors, highlighting key milestones and technological advancements. It discusses the shift from single-core to multicore systems, the challenges of parallel programming, and the importance of understanding hardware for software efficiency. The document concludes that multicore processors are essential for future computing, emphasizing the need for effective parallel programming strategies.

Uploaded by

سید محمد ساعتی

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views34 pages

03 Why Parallel

Uploaded by

سید محمد ساعتی

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Computer History

Eckert and Mauchly

• 1st working electronic

computer (1946)
• 18,000 Vacuum tubes
• 1,800 instructions/sec
• 3,000 ft3
Computer History
• Maurice Wilkes

1st stored program

computer
EDSAC 1 (1949) 650 instructions/sec
http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/ 1,400 ft3
Intel 4004 Die Photo
• Introduced in 1970
– First
microprocessor
• 2,250 transistors
• 12 mm2
• 108 KHz
Intel 8086 Die Scan
• 29,000 transistors
• 33 mm2
• 5 MHz
• Introduced in 1979
– Basic architecture
of the IA32 PC
Intel 80486 Die Scan
• 1,200,000
transistors
• 81 mm2
• 25 MHz
• Introduced in 1989
– 1st pipelined
implementation of
IA32
Pentium Die Photo
• 3,100,000
transistors
• 296 mm2
• 60 MHz
• Introduced in 1993
– 1st superscalar
implementation of
IA32
Pentium III
• 9,500,000
transistors
• 125 mm2
• 450 MHz
• Introduced in 1999

http://www.intel.com/intel/museum/25anniv/hof/hof_main.htm
Pentium 4
• 55,000,000
transistors
• 146 mm2
• 3 GHz
• Introduced in 2000

http://www.chip-architect.com
Core 2 Duo (Merom)
Pentium 4 Intel Core i7 (Nehalem)

Montecito (Itanium 2) Cell Processor

IBM Power 7

(SUN UltraSparc T3)

First Generation (1970s)

Single Cycle Implementation

Second Generation (1980s)

F D I E C

•Pipelinining: temporal parallelism

•Number of stages increase with each generation
•Maximum CPI = 1
Third Generation (1990s)
E

F D I E C

•ILP
•Dynamic: superscalar
•Out-Of-Order Execution (scheduling)
E
•Static: VLIW/EPIC
•Spatial parallelism
•IPC not CPI
•Instruction window
•Speculative Execution (prediction)
Fourth Generation (2000s)
E

F D I E C
E

Simultaneous Multithreading (SMT)

(aka Hyperthreading Technology)
The Famous Moore’s Law
Hardware Improvement

Positive Cycle
People ask for more of Computer Better Software
improvements Industry

People get used to the

software
How Did These Advances Happen?
• Restrictions
Wishes • Capabilities

Software Computer Process

Community Architecture Technology

• Performance Design
• Restrictions
Performance in the past
achieved by:

• clock speed
• execution optimization
• cache

Performance now
achieved by:

• hyperthreading
• multicore
• cache
The Status-Quo
• We moved from single core to multicore to
manycore:
– for technological reasons
• Free lunch is over for software folks
– The software will not become faster with every
new generation of processors
• Not enough experience in parallel programming
– Parallel programs of old days were restricted to
some elite applications -> very few programmers
– Now we need parallel programs for many different
applications
Old School New School
Increasing clock frequency is Processors parallelism is
primary method of primary method of performance
performance improvement improvement

Don’t bother parallelizing an Nobody is building one processor

application, just wait and run on per chip. This marks the end of
much faster sequential computer the La-Z-Boy programming era

Given the switch to parallel hardware,

Less than linear scaling for a
even sub-linear speedups are
multiprocessor is failure beneficial as long as you beat the
sequential
35

Slide Source: Berkeley View of Landscape

• Memory Wall
• ILP Wall
• Power Wall
Memory Speed:
Widening of the Processor-DRAM Performance Gap

Courtesy of Elsevier, Computer Architecture, Hennessey and Patterson, fourth edition

12
Power Density
Moore’s law is giving us more transistors than we can afford!

Scaling clock speed (business as usual) will not work

10000 Sun’s
Surface
Rocket
1000
Nozzle
Power Density (W/cm2)

Nuclear
100
Reactor

8086 Hot Plate

10 4004 P6
8008 8085 386 Pentium®
286 486
8080 Source: Patrick
1 Gelsinger, Intel
1970 1980 1990 2000 2010
Year
Multicore Processors Save Power

Power = C * V2 * F Performance = Cores * F

Let’s have two cores

Power = 2C V2 * F Performance = 2Cores F

But decrease frequency by 50%

Power = 2*C * V2/4 * F/2 Performance = 2*Cores * F/2

Power = C * V2/4 * F Performance = Cores * F

A Case for Multicore Processors
• Can exploit different types of
parallelism
• Reduces power
• An effective way to hide memory
latency
• Simpler cores = easier to design and
test = higher yield = lower cost
Cost and Challenges
of Parallel Execution
• Communication cost
• Synchronization cost
• Not all problems are amenable to
parallelization
• Hard to think in parallel
• Hard to debug
Attempts to Make Multicore
Programming Easy
• 1st idea: The right computer language
would make parallel programming
straightforward
– Result so far: Some languages made
parallel programming easier, but none has
made it as fast, efficient, and flexible as
traditional sequential programming.
Attempts to Make Multicore
Programming Easy
• 2nd idea: If you just design the
hardware properly, parallel programming
would become easy.
– Result so far: no one has yet succeeded!
Attempts to Make Multicore
Programming Easy
• 3rd idea: Write software that will
automatically parallelize existing
sequential programs.
– Result so far: Success here is inversely
proportional to the number of cores!
Qualcomm Snapdragon SoC
 Hardware:
 Krait processor (4-core ARM)
 Adreno (128-core GPU)

 Applications:
Nvidia Tegra SoC
 Hardware
 4-core ARM A57
 4-core ARM A53
 256-core GPU

 Applications
 Robotics, CV, Imaging
 E.g., BMW Driver Assistance
Tilera Tile-Gx SoC
 Hardware
 100-core ARM

 Applications
 Network, Cloud, Security
 E.g., 10Gbps Layer-7 Network Appl. Classification
 E.g., 40Gbps Lossless Network Packet Capture
Google Data Center, USA
Parallel Languages/Libraries
 Pthread
 MPI
 CUDA
 OpenCL
 OpenMP
 OpenACC
 and many more!
Conclusions
• The free lunch is over.
• Mulicore/Manycore processors are here
to stay, so we have to deal with them.
• Knowing about the hardware will make
you way more efficient in software!

Lecture 1
No ratings yet
Lecture 1
37 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Lecture Slides-Week1
No ratings yet
Lecture Slides-Week1
59 pages
Computer Evolution & Architecture
No ratings yet
Computer Evolution & Architecture
41 pages
02 - Computer Evolution and Performance
No ratings yet
02 - Computer Evolution and Performance
32 pages
02 Computer-Evolution
No ratings yet
02 Computer-Evolution
52 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
50 pages
02 - Computer Evolution Ina
No ratings yet
02 - Computer Evolution Ina
43 pages
Modle 01 - HPC Introduction To Pipeline
No ratings yet
Modle 01 - HPC Introduction To Pipeline
124 pages
Lecture 1 - CH01 - Abstraction - Tech
No ratings yet
Lecture 1 - CH01 - Abstraction - Tech
73 pages
Computer Evolution and Perfomance1
No ratings yet
Computer Evolution and Perfomance1
11 pages
02 - Computer Evolution and Perfomance1 PDF
No ratings yet
02 - Computer Evolution and Perfomance1 PDF
11 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
52 pages
Computer Evolution and Performance
No ratings yet
Computer Evolution and Performance
52 pages
William Stallings Computer Organization and Architecture
No ratings yet
William Stallings Computer Organization and Architecture
20 pages
COA - 02 - Computer Evolution and Performance
No ratings yet
COA - 02 - Computer Evolution and Performance
9 pages
Introduction & Matrix Multiplication: 6.172 Performance Engineering of Software Systems
No ratings yet
Introduction & Matrix Multiplication: 6.172 Performance Engineering of Software Systems
69 pages
Multicore Processor
100% (1)
Multicore Processor
23 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Computer Evolution & Performance Overview
No ratings yet
Computer Evolution & Performance Overview
42 pages
Parallel Computing Course Guide
No ratings yet
Parallel Computing Course Guide
50 pages
02 - Computer Evolution and Performance
No ratings yet
02 - Computer Evolution and Performance
51 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
47 pages
William Stallings Computer Organization and Architecture 7 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 7 Edition Computer Evolution and Performance
44 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
02 - Computer Evolution and Performance
No ratings yet
02 - Computer Evolution and Performance
57 pages
Computer Evolution 2 (Details)
No ratings yet
Computer Evolution 2 (Details)
23 pages
Lecture 3
No ratings yet
Lecture 3
40 pages
Ec23 Chapter1
No ratings yet
Ec23 Chapter1
84 pages
Microprocessor (Report)
No ratings yet
Microprocessor (Report)
4 pages
Par Prog Course Many Core SW Pats Ocl
No ratings yet
Par Prog Course Many Core SW Pats Ocl
90 pages
Advanced Computer Architecture: Azvjvhd
No ratings yet
Advanced Computer Architecture: Azvjvhd
61 pages
CH 02 W1 Computer Evolution and Performance
No ratings yet
CH 02 W1 Computer Evolution and Performance
58 pages
02-Computer Evolution and Perfo
No ratings yet
02-Computer Evolution and Perfo
57 pages
Computer Architecture: Lecture 1: Introduction
No ratings yet
Computer Architecture: Lecture 1: Introduction
29 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
Lec 01
No ratings yet
Lec 01
67 pages
UNIT1
No ratings yet
UNIT1
11 pages
Introduction To ACA 2021
No ratings yet
Introduction To ACA 2021
73 pages
Lecture 3
No ratings yet
Lecture 3
26 pages
8085 All
No ratings yet
8085 All
165 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Lecture 0. Introduction: Instructor: Weidong Shi (Larry), PHD Computer Science Department University of Houston
No ratings yet
Lecture 0. Introduction: Instructor: Weidong Shi (Larry), PHD Computer Science Department University of Houston
65 pages
Multi-Core Programming - Increasing Performance Through Software Multi-Threading
No ratings yet
Multi-Core Programming - Increasing Performance Through Software Multi-Threading
11 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
The Central Processing Unit:: What Goes On Inside The Computer
No ratings yet
The Central Processing Unit:: What Goes On Inside The Computer
42 pages
Parallel Computing Course Guide
100% (1)
Parallel Computing Course Guide
49 pages
An Introduction To Computer Architecture: © 2019 Arm Limited
No ratings yet
An Introduction To Computer Architecture: © 2019 Arm Limited
46 pages
Advance Operating System-Computer Organization: Chap 1a: Overview
No ratings yet
Advance Operating System-Computer Organization: Chap 1a: Overview
71 pages
Chapter 2
No ratings yet
Chapter 2
59 pages
Parallel Programming for Scientists
No ratings yet
Parallel Programming for Scientists
50 pages
Ch2 PDF
No ratings yet
Ch2 PDF
53 pages
Communication Costs in Parallel Machines
No ratings yet
Communication Costs in Parallel Machines
80 pages
Computer Organization & Articture No. 4 From APCOMS
No ratings yet
Computer Organization & Articture No. 4 From APCOMS
29 pages
Embedded System Design and History
No ratings yet
Embedded System Design and History
99 pages
CS-3006 2 PDC Overview Compressed
No ratings yet
CS-3006 2 PDC Overview Compressed
107 pages
Midterm Quiz 1 - Attempt Review-1
No ratings yet
Midterm Quiz 1 - Attempt Review-1
3 pages
MES Module1 - Notes
No ratings yet
MES Module1 - Notes
17 pages
Microcontroller Basics Course
No ratings yet
Microcontroller Basics Course
2 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
18 pages
s71500 Cpu1518 4 PNDP MFP Manual en-US en-US
No ratings yet
s71500 Cpu1518 4 PNDP MFP Manual en-US en-US
64 pages
Dt301 Smart
No ratings yet
Dt301 Smart
54 pages
FCPC Notes
100% (1)
FCPC Notes
147 pages
PeopleCode Training for Developers
100% (3)
PeopleCode Training for Developers
29 pages
CMPE 011 Topic 1
No ratings yet
CMPE 011 Topic 1
58 pages
EE P62 Microprocessor & Microcontrollers Lab
No ratings yet
EE P62 Microprocessor & Microcontrollers Lab
2 pages
The Computer System (IT ERA)
No ratings yet
The Computer System (IT ERA)
6 pages
8085 Microprocessor Guide
No ratings yet
8085 Microprocessor Guide
9 pages
Xi Cs em Book Back One Mark Without Answers
0% (1)
Xi Cs em Book Back One Mark Without Answers
24 pages
Pricelist Lettersize
No ratings yet
Pricelist Lettersize
4 pages
A K 2 0 0 S e R R o R M e S S A G e e X P L A N A T I o N S
50% (2)
A K 2 0 0 S e R R o R M e S S A G e e X P L A N A T I o N S
128 pages
ICT Championship Question Paper (Set 1)
No ratings yet
ICT Championship Question Paper (Set 1)
14 pages
Attack and Risk Analysis For Hardware Supported Software Copy Protection Systems
No ratings yet
Attack and Risk Analysis For Hardware Supported Software Copy Protection Systems
25 pages
ACPI and SMI Handlers Some Limits To Trusted Computing
No ratings yet
ACPI and SMI Handlers Some Limits To Trusted Computing
22 pages
Microprocessors & Microcontrollers
No ratings yet
Microprocessors & Microcontrollers
56 pages
World-Record Performance For Big Data and Analytics
No ratings yet
World-Record Performance For Big Data and Analytics
8 pages
Atmega32 IC Overview for CS Students
No ratings yet
Atmega32 IC Overview for CS Students
14 pages
Eee Project
No ratings yet
Eee Project
14 pages
Operating System (Types of Operating System) : Department of Computer Science University of Swabi (Khyber Pakhtunkhwa)
No ratings yet
Operating System (Types of Operating System) : Department of Computer Science University of Swabi (Khyber Pakhtunkhwa)
27 pages
Computer Basics for Business Users
No ratings yet
Computer Basics for Business Users
57 pages
Referencebook 816980674an Overview of Microprocessors and Assembly Langua
No ratings yet
Referencebook 816980674an Overview of Microprocessors and Assembly Langua
12 pages
ECEN 203 Fundamentals of Computer Engineering: Prof. Mohamed-Nabil Sabry
No ratings yet
ECEN 203 Fundamentals of Computer Engineering: Prof. Mohamed-Nabil Sabry
4 pages
A Presentation of Summer Training On EMBEDDED SYSTEM
No ratings yet
A Presentation of Summer Training On EMBEDDED SYSTEM
15 pages
03.CA (CL) - IT - (Module-2) - (3) Information Technology-Hardware
No ratings yet
03.CA (CL) - IT - (Module-2) - (3) Information Technology-Hardware
18 pages
NISE103
No ratings yet
NISE103
2 pages
Operating System CSC 123
No ratings yet
Operating System CSC 123
44 pages

03 Why Parallel

Uploaded by

03 Why Parallel

Uploaded by

Computer History

Eckert and Mauchly

• 1st working electronic

1st stored program

Montecito (Itanium 2) Cell Processor

(SUN UltraSparc T3)

Single Cycle Implementation

•Pipelinining: temporal parallelism

Simultaneous Multithreading (SMT)

People get used to the

Software Computer Process

Don’t bother parallelizing an Nobody is building one processor

Given the switch to parallel hardware,

Slide Source: Berkeley View of Landscape

Courtesy of Elsevier, Computer Architecture, Hennessey and Patterson, fourth edition

Scaling clock speed (business as usual) will not work

8086 Hot Plate

Power = C * V2 * F Performance = Cores * F

Let’s have two cores

Power = 2*C * V2 * F Performance = 2*Cores * F

But decrease frequency by 50%

Power = C * V2/4 * F Performance = Cores * F

You might also like

Power = 2C V2 * F Performance = 2Cores F