0% found this document useful (0 votes)

124 views22 pages

CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012

This document provides an overview of parallel and distributed computing. It discusses how multicore processors have become ubiquitous due to physical limitations on increasing clock speeds. This has led to a renewed focus on parallel programming techniques to take advantage of multiple cores. However, developing parallel programs is challenging due to issues such as load balancing, communication overhead, and ensuring sufficient parallelism. Standard parallel programming models and platforms help address some of these challenges.

Uploaded by

Howard Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views22 pages

CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012

Uploaded by

Howard Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

1

CS 133 Parallel & Distributed Computing

Course Instructor: Adam Kaplan (kaplan@cs.ucla.edu) Lecture #1: 4/2/2012

Vanishing from your Desktops: The Uniprocessor

Uniprocessor
Single processor plus associated cache(s) on a chip Traditional computer until 2003 Supports traditional sequential programming model Where can you still find them?
From Herlihy & Shavit, Art of Multiprocessor Programming

cpu

cache

Traditional Server: Shared Memory Multiprocessor (SMP)

Multi-chip computer systems High-performance computing Servers Supercomputers

cache

cache
Bus

cache

Bus

shared memory

Each processor chip had a CPU and cache Multiple chips connected by a bus to a shared main memory

From Herlihy & Shavit, Art of Multiprocessor Programming

Your New Server or Desktop: The Chip Multiprocessor (CMP)

All on the same chip

cache

cache
Bus

cache

Bus

shared memory

Sun T2000 Niagara

From Herlihy & Shavit, Art of Multiprocessor Programming

How did this happen?

Moores Law Every 18 months, # transistors on chip doubles Until early 2000s Single processor performance got better all the time The same sequential code would automatically get faster on new hardware Computer marketing all about the MHz/GHz
Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond)

Application performance was increasing by 52% per year as measured by the widely used SpecInt benchmark suite

due to transistor density due to architecture changes, e.g., Instruction Level Parallelism (ILP)

VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

1990s: How to make a faster processor

Increase the clock speed (frequency scaling) Deeper pipelinesmore/shorter stages BUTeventually chips get too hot Speculative Superscalar (SS) Multiple instructions can execute at a time (instruction level parallelism, ILP)
Hardware finds independent instructions in a sequential program that can execute simultaneously Hardware predicts which branches will be taken Executes instructions from a likely execution path before it is known whether the path will be taken

BUTeventually diminishing returns Nice feature: programmers did not need to know/care about this

Chip density grows by 2x every 18 mos

Clock speed does not

2000s: How to make a faster processor

Diminishing returns seen by speculative superscalar

Only so much ILP to exploit

Use additional transistors to create more/simpler processors on chip

BUTcan application software keep them busy?
Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond)

How can simpler processors help?

Potential performance the same

Source: Intel

What is Parallel Computing?

Parallel computing
Using multiple processors in parallel to solve problems more quickly than with a single processor

How to realize speedup

Divide a single task into subtasks Execute these subtasks simultaneously on multiple processors

What can go wrong for the programmer?

Task does not divide easily/evenly into subtasks Subtasks need to take turns accessing resources Heisenbugs: different runs produce different results And so on

Percent Multiprocessor Papers in ISCA

100 10 20 30 40 50 60 70 80 90 0

Parallel Computing: Not Always Popular?

Source: Mark Hill, 2/2007

1973 1974 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Why Parallel Computing Waned?

We have been using parallel computing for decades Mostly used in computational science/engineering Problems too large to solve on one computer?
Use 100s or 1000s

Many companies in the 80s/90s gambled on parallel computing and lost Computers got faster too quickly
Parallel platforms quickly became obsolete as they were outperformed by better uniprocessors

Why bother with parallel programming?

Just wait a year or two for a faster uniprocessor

Why Parallel Computing Thrives Again

We are dedicating all of our future product development to multicore designs. This is a sea change in computing.
Paul Otellini, President, Intel (2005)

The entire computing industry has bet on parallelism There is now a desperate need for parallel programmers
Parallelism must be exposed to and managed by software Unfortunately, most programmers have been trained to think sequentially about software

Multicore Products
All microprocessor companies switch to MP (2X CPUs / 2 yrs)
Manufacturer/Year Processors/chip Threads/Processor Threads/chip AMD/05 2 1 2 Intel/06 2 2 4 IBM/04 Sun/07 2 2 4 8 16 128

And at the same time, The STI Cell processor (PS3) has 1 main core + 8 helper cores The latest NVidia Graphics Processing Unit (GPU)
GTX 680 has 1,536 small cores

Intel has demonstrated an 80-core research chip

Looking Ahead
All major players are producing multicore chips Every machine will soon be a parallel machine Will all programmers be parallel programmers?! New software model Hide the cost of new features - first speed up the code Will all programmers be performance programmers?! Some overhead may eventually be hidden In libraries, compilers, and higher-level languages
But a lot of work is needed to get there

Big open questions: What will be the killer apps for multicore machines? How should the chips be designed and programmed?

Why writing (fast) parallel programs is hard

Finding enough parallelism (Amdahls Law) Granularity Locality Load balance Coordination and synchronization Performance modeling All of these things makes parallel programming harder than sequential programming.

Finding Enough Parallelism

Suppose only part of an application can be parallelized Amdahls law
Let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable P = number of processors
Speedup(P) = Time(1)/Time(P) = 1 / ( s + (1 - s) / P ) ~= 1/s as P approaches

Even if the parallel part speeds up perfectly performance is limited by the sequenMal part

Overhead of Parallelism
Given enough parallel work, this is the biggest barrier to getting desired speedup Parallelism overheads include:
cost of starting a thread or process cost of communicating shared data cost of synchronizing extra (redundant) computation
Each can be in the range of milliseconds on some systems

Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (i.e. large granularity), but not so large that there is not enough parallel work

Locality and Parallelism

Conventional Storage Hierarchy Proc Cache L2 Cache Proc Cache L2 Cache Proc Cache L2 Cache potential interconnects

L3 Cache

Memory

Large memories are slow, fast memories are small Storage hierarchies are large and fast on average Parallel processors, collectively, have large, fast cache
the slow accesses to remote data we call communication

Algorithm should do most work on local data

Load Imbalance
Load imbalance is the time that some processors in the system are idle due to
insufficient parallelism (during that phase) unequal size tasks
adapting to interesting parts of a domain tree-structured computations fundamentally unstructured problems

Parallel algorithm/platform needs to balance load

What makes parallel programming easier?

Standardized parallel programming platforms
OpenMP Message Passing Interface (MPI) Posix Threads (pthreads) Java thread model Compute Unified Device Architecture (CUDA)

Why do they help?

Longer life-cycle for parallel programs Code works across platforms Automatic scaling?

New Adventures In Parallel Computing

Internet can be seen as a large parallel/distributed computing environment
The cloud
A set of computers on the internet available on demand, like a public utility

Googles MapReduce
A software framework enabling the computing of large data sets on clusters of computers
Can map a parallel algorithm to worker nodes in the cloud Reduce results from worker nodes to a single output/answer

Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
Multicore02 2
No ratings yet
Multicore02 2
18 pages
Cours 1
No ratings yet
Cours 1
38 pages
Cours 1
No ratings yet
Cours 1
38 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
34 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
63 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
No ratings yet
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
40 pages
Computação Paralela
No ratings yet
Computação Paralela
18 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
BDS Session 2
No ratings yet
BDS Session 2
59 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Lecture 03
No ratings yet
Lecture 03
39 pages
CS-3006 2 PDC Overview Compressed
No ratings yet
CS-3006 2 PDC Overview Compressed
107 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
Pipelining vs. Parallel Processing
No ratings yet
Pipelining vs. Parallel Processing
23 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
32 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
PDC Lecture 2
No ratings yet
PDC Lecture 2
13 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
No ratings yet
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
8 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Concurrent Programming With Threads: Rajkumar Buyya
No ratings yet
Concurrent Programming With Threads: Rajkumar Buyya
168 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
PDC 3
No ratings yet
PDC 3
26 pages
Lec6 - TLP Data Dependence Solutions
No ratings yet
Lec6 - TLP Data Dependence Solutions
20 pages
Parallel Programming & Multithreading
No ratings yet
Parallel Programming & Multithreading
168 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Lecture 2 Introduction To Parallel and Distributed Computing
No ratings yet
Lecture 2 Introduction To Parallel and Distributed Computing
29 pages
Presentation 3
No ratings yet
Presentation 3
63 pages
Parallel Programming FDP
No ratings yet
Parallel Programming FDP
43 pages
14013204-3 - Parallel Computing - Lecture1
No ratings yet
14013204-3 - Parallel Computing - Lecture1
52 pages
CO1-Parallel Computers - Latest
No ratings yet
CO1-Parallel Computers - Latest
16 pages
Generic Questions
No ratings yet
Generic Questions
70 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Unit 5
No ratings yet
Unit 5
66 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
UNIT 2 (HPC)
No ratings yet
UNIT 2 (HPC)
10 pages
Oracle 10g Dataguard Best Practice and Setup Steps
No ratings yet
Oracle 10g Dataguard Best Practice and Setup Steps
6 pages
Partitioning Oracle Sources in PowerCenter
No ratings yet
Partitioning Oracle Sources in PowerCenter
12 pages
Python Glossary
No ratings yet
Python Glossary
9 pages
UX & Software Engineering Resume
No ratings yet
UX & Software Engineering Resume
1 page
Arduino Based Entrance Monitoring System Using Rfid and Real Time Control
No ratings yet
Arduino Based Entrance Monitoring System Using Rfid and Real Time Control
4 pages
Challenges and Best Practices For Mobile Application
No ratings yet
Challenges and Best Practices For Mobile Application
8 pages
Mysql Security
No ratings yet
Mysql Security
3 pages
A Novel Fast Canonical-Signed-Digit Conversion Technique For Multiplication
No ratings yet
A Novel Fast Canonical-Signed-Digit Conversion Technique For Multiplication
4 pages
Performance Comparison of Various Face Detection Techniques
No ratings yet
Performance Comparison of Various Face Detection Techniques
9 pages
Splunk Admin / Developer - : Payal Sinha
No ratings yet
Splunk Admin / Developer - : Payal Sinha
3 pages
Gradudation Project Ideas Fall 2014
No ratings yet
Gradudation Project Ideas Fall 2014
47 pages
BPM and CEP Integration Guide
No ratings yet
BPM and CEP Integration Guide
14 pages
8.2.3.5 Packet Tracer - Troubleshooting EIGRP For IPv4 Instructions IG
100% (2)
8.2.3.5 Packet Tracer - Troubleshooting EIGRP For IPv4 Instructions IG
2 pages
Recipes With S7-1200
No ratings yet
Recipes With S7-1200
45 pages
Rust Programming Language PDF
No ratings yet
Rust Programming Language PDF
41 pages
Symantec GDPR PDF
No ratings yet
Symantec GDPR PDF
6 pages
Ossec Docs
No ratings yet
Ossec Docs
196 pages
Maintenance View and Table Maintenance Generator - SCN
No ratings yet
Maintenance View and Table Maintenance Generator - SCN
4 pages
En Ac80remapcelem C
No ratings yet
En Ac80remapcelem C
351 pages
Pure Script Book
100% (1)
Pure Script Book
232 pages
Essentials of Metaheuristics
No ratings yet
Essentials of Metaheuristics
253 pages
Unix Tips for Senior Oracle DBAs
No ratings yet
Unix Tips for Senior Oracle DBAs
8 pages
A Level Syllabus MOE
No ratings yet
A Level Syllabus MOE
12 pages
g12 Lesson Plan
No ratings yet
g12 Lesson Plan
17 pages
OOP Lab Exercises for Students
100% (1)
OOP Lab Exercises for Students
75 pages
Es Ii
No ratings yet
Es Ii
16 pages
Java and Regular Expressions
No ratings yet
Java and Regular Expressions
18 pages
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
No ratings yet
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
3 pages
Scan-Line Polygon Fill Guide
No ratings yet
Scan-Line Polygon Fill Guide
16 pages
eXLerate Level 1-2 Training Manual V2.1 - CL - eXL - L1+2 - TM-EN
No ratings yet
eXLerate Level 1-2 Training Manual V2.1 - CL - eXL - L1+2 - TM-EN
80 pages

CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012

Uploaded by

CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012

Uploaded by

1

CS 133 Parallel & Distributed Computing

Course Instructor: Adam Kaplan (kaplan@cs.ucla.edu) Lecture #1: 4/2/2012

Vanishing from your Desktops: The Uniprocessor

Traditional Server: Shared Memory Multiprocessor (SMP)

From Herlihy & Shavit, Art of Multiprocessor Programming

Your New Server or Desktop: The Chip Multiprocessor (CMP)

All on the same chip

Sun T2000 Niagara

From Herlihy & Shavit, Art of Multiprocessor Programming

How did this happen?

VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002

1990s: How to make a faster processor

Chip density grows by 2x every 18 mos

2000s: How to make a faster processor

Diminishing returns seen by speculative superscalar

Use additional transistors to create more/simpler processors on chip

How can simpler processors help?

Potential performance the same

What is Parallel Computing?

How to realize speedup

What can go wrong for the programmer?

Percent Multiprocessor Papers in ISCA

Parallel Computing: Not Always Popular?

Source: Mark Hill, 2/2007

Why Parallel Computing Waned?

Why bother with parallel programming?

Why Parallel Computing Thrives Again

Intel has demonstrated an 80-core research chip

Why writing (fast) parallel programs is hard

Finding Enough Parallelism

Locality and Parallelism

Algorithm should do most work on local data

Parallel algorithm/platform needs to balance load

What makes parallel programming easier?

Why do they help?

New Adventures In Parallel Computing

You might also like