KEMBAR78
Seminar on Parallel and Concurrent Programming | PPTX
Stefan Marr, Daniele Bonetta
2016
Seminar on
Parallel and Concurrent Programming
Agenda
1. Modus Operandi
2. Introduction to
Concurrent Programming Models
3. Seminar Paper Overview
2
MODUS OPERANDI
3
Tasks and Deadlines
• Talk on selected paper (student 1)
– 30min with slides (+ 15min discussion)
• to be discussed with us 1 week before
– Summary (max. 500 word)
• 2 days before seminar, 11:59am
• Questions on assigned paper (student 2)
– Min. 5 questions
– 2 days before seminar, 11:59am
4
Report
Category 1: Theoretical treatment
• Focus on paper, related work, state of the art
of the field
• Detailed discussion
Category 2: Practical treatment of topic, for
instance
• Reproduce experiments/results
• Extend experiments
• Experiment with variations
5
Report
• paper summary (500 words)
• outline, content, and experiments to be
discussed with us
• Cat. 1: ca. 4000 word (excl. references)
– state of the art, context in field, and specific
technique from paper
• Cat. 2: ca. 2000 word (excl. references)
– Discuss experiments, gained insights, found
limitations, etc.
Deadline: Feb. 6th
6
Consultations
• For alternative paper proposals
• To prepare presentation!
• To agree on focus of report/experiments
– For experiments mandatory
7
Grading
• Required attendance: 80% of all meetings
• 50% slides, presentation, and discussion
• 50% write-up/experiments
8
Timeline
Oct. 5th Introduction to Concurrent
Programming Models
Oct. 10th Deadline: List of ranked papers
Oct. 12th Runtime Techniques for Big Data
and Parallelism
Week 3-5 Preparations and Consultations
Week 6-12 Presentations
Feb. 6th Deadline for Report
9
Got Background in
Concurrency/Parallelism?
Show of Hands!
10
Multicore is the Norm
8 Cores
200 Euro Phones
24 Cores
Workstation
>=72 Cores
Embedded System
Problem: Power Wall at ca. 5 GHz
CPUs don’t get Faster But Multiply
0.2
1.5
3.8
3.33
3.8
0
1
2
3
4
1990 1995 2000 2005 2010 2015
4, 6, 12,
… cores
GHz
1 core
Based on the Clock Frequency of Intel Processors
Power ≈ Voltage2  Frequency
Voltage = -15%
Frequency = -15%
Power = 1
Performance ≈ 1.8
Problem: Memory Wall
Memory Wall
1
10
100
1000
10000
1980 1985 1990 1995 2000 2005
CPU Frequency
DRAM Speeds
Relative
Performance
Gap
Source: Sun World Wide Analyst Conference Feb. 25, 2003
Multicore Transition
Work around physical limitations
Power Wall and Memory Wall
10/5/2016 17
MemoryMemory
MemoryMemory
Main Memory Main Memory
For a brief bit of history:
ENIAC’s recessive gene
Marcus Mitch, and Akera Atsushi. Penn Printout (March 1996)
http://www.upenn.edu/computing/printout/archive/v12/4/pdf/gene.pdf
ENIAC's main control panel, U. S. Army Photo
Decades of Research
and Solutions for Everything
10/5/2016 19
…
But no Silver Bullet
CSP
Locks, Monitors, …
Fork/Join
Transactional Memory
20
Data Flow
Actors
A Rough Categorization
21
Communicating
Isolates
Threads and Locks Coordinating
Threads
A Rough Categorization
22
Marr, S. (2013), 'Supporting Concurrency Abstractions in High-level Language Virtual Machines', PhD
thesis, Software Languages Lab, Vrije Universiteit Brussel.
Data Parallelism
THREADS AND LOCKS
Powerful but hard
23
Uniform Shared Memory
A Model
for the Machines We Used to Have
24
C/C++
Threads
• Sequences of instructions
• Unit of scheduling
– Preemptive and concurrent
– Or parallel
25
time
A Snake Game
• Multiple players
• Compete for ‘apples’
• Shared board
10/5/2016 26
Race Conditions and Data Races
Race Condition
• Result depending on
timing of operations
Data Race
• Race condition on
memory
• Synchronization
absent or incomplete
27
Locks
synchronized (board) {
board.moveLeft(snake)
}
28
Optimized Locking for more Parallelism
synchronized (board[3][3]) {
synchronized (board[3][2]) {
board.moveLeft(snake)
}
}
29
Strategy: Lock only cells you need to update
What could go
wrong?
Common Issues
• Lack of Progress
– Deadlock
– Livelock
• Race Condition
– Data race
– Atomicity violation
• Performance
– Sequential bottle necks
– False sharing
30
Basic Concepts
Shared Memory with Threads and Locks
• Threads
• Synchronization
• No safety guarantees
– Data Races
– Deadlocks
31
P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al.
P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis
P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. Bond et al.
P2.10 Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al.
Questions?
COORDINATING THREADS
Making Coordination Explicit
32
Communicating
Threads
Shared Memory with
Explicit Coordination
Raising the Abstraction Level
Libraries for
most languages
Two Main Variants
Temporal Isolation
Transactional Memory
Explicit Communication
Channel or Message-based
34
Transactional Memory
atomic {
board.moveLeft(snake)
}
35
Coordinated by
Runtime System
Transactional Memory
Simple Programing Model
• No Data Races
(within transactions)
• No Deadlocks
36
Issues
• Performance overhead
• Still experimental
• Livelocks
• Inter-transactional
race conditions
• I/O semantics
Some Issues
atomic {
dataArray = getData();
fork { compute(dataArray[0]); }
compute(dataArray[1]);
}
37
P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al.
P1.1 Transactional Data Structure Libraries, A. Spiegelman et al.
P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al.
What happens with
forked thread when
transaction aborts?
Channel-based Communication
coordChannel ! (#moveLeft, snake)
38
for i in players():
msg ? coordChannels[i]
match msg:
(#moveLeft, snake):
board[…,…] = …
Player Thread
Coordinator Thread
Coordinator Thread
Player Thread Player Thread
send
receive
High-level communication
but no safety guarantees
Coordinating Threads
Transactional Memory
• Transactions
• Simple Programming Model
• Practical Issues
Channel/Message Communication
• Explicit coordination
– Channels or message sending
– Higher abstraction level
• No safety guarantees
39
P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S.
Tasharofi et al.
P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc-
model, AMP'10)
Questions?
COMMUNICATING ISOLATES
Communication is Everything
40
Explicit Communication Only
Absence of Low-level Data Races
41
All Interactions Explicit
42
Actor A Actor B
Actor Principle
Many Many Variations
• Channel based
– Communicating Sequential Processes
• Message based
– Actor models
43
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key
Properties, J. De Koster et al.
Communicating Event Loops
44
Actor A Actor B
One Message at a Time
Communicating Event Loops
45
Actor A Actor B
Actors Contain Objects
Communicating Event Loops
46
Actor A Actor B
Interacting via Messages
Message-based Communication
47
Player 1
Player 1
Board Actor
board <- moveLeft(snake)
class Board {
private array;
public moveLeft(snake) {
array[snake.x][snake.y] = ...
}
}
Player Actor
Board Actor
async send
actors.create(Board)
actors.create(Snake)
actors.create(Snake)
Main Program
Communicating Isolates
Message or Channel Based
• Explicit communication
• No shared memory
• Still potential for
– Behavioral deadlocks
– Livelocks
– Bad message inter-leavings
– Message protocol violations
48
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et
al.
P1.11 Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)
Questions?
DATA PARALLELISM
Parallelism for Structured Problems
49
DATA PARALLELISM WITH FORK/JOIN
Just one Example
50
Fork/Join with Work-Stealing
• Recursive
divide-and-conquer
• Automatic and efficient
parallel scheduling
• Widely available for C++,
Java, and .NET
10/5/2016 51
Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H. & Zhou, Y. (1995),
'Cilk: An Efficient Multithreaded Runtime System', SIGPLAN Not. 30 (8), 207-216.
Typical Applications
• Recursive Algorithms1
– Mergesort
– List and tree traversals
• Parallel prefix, pack, and
sorting problems2
• Irregular and unbalanced
computation
– On directed acyclic graphs
(DAGs)
– Ideally tree-shaped
52
1) More material can be found at: http://homes.cs.washington.edu/~djg/teachingMaterials/spac/
2) Prefix Sums and Their Applications: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf
Tiny Example: Summing a large Array
• Simple array with numbers
• Recursively divide
– Every ‘ ’ is a parallel fork
• Then do addition
– Every ‘ ’ is a join
53
Note: This example is academic, and could be better expressed with a parallel map/reduce
library, such as Scala’s Parallel Collections, Java 8 Streams, or Microsoft’s PLINQ.
46 9 42 7 55
45724965
4965
5 6
11
49
13
24
4572
72 45
9 9
18
42
Data Parallelism with Fork/Join
• Parallel programming
technique
• Recursive divide-and-
conquer
• Automatic and efficient
load-balancing
58
P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)
CONCLUSION CONCURRENCY
MODELS
59
Four Rough Categories
60
Communicating
Isolates
Threads and Locks
Coordinating
Threads
Data Parallelism
SEMINAR PAPERS
61
These are Suggestions
Please, feel free to
propose papers of your interest.
(Papers need to be approved by us)
62
Topics of Interest
• High-level language
concurrency models
– Actors, Communicating
Sequential Processes,
STM, Stream Processing,
...
• Tooling
– Debugging
– Profiling
• Implementation and
runtime systems
– Communication
mechanisms
– Data/object
representation
– System-level aspects
• Big Data Frameworks
– Programming models
– Runtime level problems
63
Papers without Artifacts
P1.1 Transactional Data Structure Libraries, A. Spiegelman et al.
(conc-model, PLDI'16)
P1.2 Type-Aware Transactions for Faster Concurrent Code, N.
Herman et al. (conc-model, runtime, EuroSys'16)
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key
Properties, J. De Koster et al. (conc-model, Agere'16)
P1.4 Why Do Scala Developers Mix the Actor Model with other
Concurrency Models?, S. Tasharofi et al. (conc-model, ECOOP'13)
P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime,
Java'00)
P1.6 The Asynchronous Partitioned Global Address Space Model, V.
Saraswat et al. (conc-model, AMP'10)
64
Papers without Artifacts
P1.7 Pydron: Semi-Automatic Parallelization for Multi-
Core and the Cloud, S. C. Müller et al. (conc-model,
runtime, OSDI'15)
P1.8 Fast Splittable Pseudorandom Number Generators,
G. L. Steele et al. (runtime, OOPSLA'14)
P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-
P. Lozi et al. (runtime, EuroSys'15)
P1.10Application-Assisted Live Migration of Virtual
Machines with Java Applications, K.-Y. Hou et al.
(runtime, EuroSys'15)
P1.11Distributed Debugging for Mobile Networks, E.
Gonzalez Boix et al. (tooling, JSS'14)
65
Papers with Artifacts
P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui,
V. Trigonakis (conc-model, PPoPP'16)
P2.2 Transactional Tasks: Parallelism in Software
Transactions, J. Swalens et al. (conc-model,
ECOOP'16)
P2.3 StreamJIT: a commensal compiler for high-
performance stream programming, J. Bosboom et
al. (conc-model, runtime, OOPSLA'14)
P2.4 An Efficient Synchronization Mechanism for Multi-
core Systems, M. Aldinucci et al. (conc-model,
runtime, EuroPar'12)
P2.5 Parallel parsing made practical, A. Barenghi et al.
(runtime, SCP'15) 66
Papers with Artifacts
P2.6 SparkR : Scaling R Program with Spark, S.
Venkataraman et al. (conc-model, bigdata,
SIGMOD'16)
P2.7 SparkSQL: Relational Data Processing in Spark, M.
Armbrust et al. (bigdata, runtime, VLDB'14)
P2.8 Twitter Heron: Stream Processing at Scale, S.
Kulkarni et al. (bigdata, SIGMOD'15)
P2.9 OCTET: Capturing and Controlling Cross-Thread
Dependences Efficiently, M. D. Bond et al. (tooling,
OOPSLA'13)
P2.10Efficient and Thread-Safe Objects for Dynamically-
Typed Languages, B. Daloze et al. (runtime,
OOPSLA'16) 67

Seminar on Parallel and Concurrent Programming

  • 1.
    Stefan Marr, DanieleBonetta 2016 Seminar on Parallel and Concurrent Programming
  • 2.
    Agenda 1. Modus Operandi 2.Introduction to Concurrent Programming Models 3. Seminar Paper Overview 2
  • 3.
  • 4.
    Tasks and Deadlines •Talk on selected paper (student 1) – 30min with slides (+ 15min discussion) • to be discussed with us 1 week before – Summary (max. 500 word) • 2 days before seminar, 11:59am • Questions on assigned paper (student 2) – Min. 5 questions – 2 days before seminar, 11:59am 4
  • 5.
    Report Category 1: Theoreticaltreatment • Focus on paper, related work, state of the art of the field • Detailed discussion Category 2: Practical treatment of topic, for instance • Reproduce experiments/results • Extend experiments • Experiment with variations 5
  • 6.
    Report • paper summary(500 words) • outline, content, and experiments to be discussed with us • Cat. 1: ca. 4000 word (excl. references) – state of the art, context in field, and specific technique from paper • Cat. 2: ca. 2000 word (excl. references) – Discuss experiments, gained insights, found limitations, etc. Deadline: Feb. 6th 6
  • 7.
    Consultations • For alternativepaper proposals • To prepare presentation! • To agree on focus of report/experiments – For experiments mandatory 7
  • 8.
    Grading • Required attendance:80% of all meetings • 50% slides, presentation, and discussion • 50% write-up/experiments 8
  • 9.
    Timeline Oct. 5th Introductionto Concurrent Programming Models Oct. 10th Deadline: List of ranked papers Oct. 12th Runtime Techniques for Big Data and Parallelism Week 3-5 Preparations and Consultations Week 6-12 Presentations Feb. 6th Deadline for Report 9
  • 10.
  • 11.
    Multicore is theNorm 8 Cores 200 Euro Phones 24 Cores Workstation >=72 Cores Embedded System
  • 12.
    Problem: Power Wallat ca. 5 GHz
  • 13.
    CPUs don’t getFaster But Multiply 0.2 1.5 3.8 3.33 3.8 0 1 2 3 4 1990 1995 2000 2005 2010 2015 4, 6, 12, … cores GHz 1 core Based on the Clock Frequency of Intel Processors
  • 14.
    Power ≈ Voltage2 Frequency Voltage = -15% Frequency = -15% Power = 1 Performance ≈ 1.8
  • 15.
  • 16.
    Memory Wall 1 10 100 1000 10000 1980 19851990 1995 2000 2005 CPU Frequency DRAM Speeds Relative Performance Gap Source: Sun World Wide Analyst Conference Feb. 25, 2003
  • 17.
    Multicore Transition Work aroundphysical limitations Power Wall and Memory Wall 10/5/2016 17 MemoryMemory MemoryMemory Main Memory Main Memory
  • 18.
    For a briefbit of history: ENIAC’s recessive gene Marcus Mitch, and Akera Atsushi. Penn Printout (March 1996) http://www.upenn.edu/computing/printout/archive/v12/4/pdf/gene.pdf ENIAC's main control panel, U. S. Army Photo
  • 19.
    Decades of Research andSolutions for Everything 10/5/2016 19
  • 20.
    … But no SilverBullet CSP Locks, Monitors, … Fork/Join Transactional Memory 20 Data Flow Actors
  • 21.
  • 22.
    A Rough Categorization 22 Marr,S. (2013), 'Supporting Concurrency Abstractions in High-level Language Virtual Machines', PhD thesis, Software Languages Lab, Vrije Universiteit Brussel. Data Parallelism
  • 23.
  • 24.
    Uniform Shared Memory AModel for the Machines We Used to Have 24 C/C++
  • 25.
    Threads • Sequences ofinstructions • Unit of scheduling – Preemptive and concurrent – Or parallel 25 time
  • 26.
    A Snake Game •Multiple players • Compete for ‘apples’ • Shared board 10/5/2016 26
  • 27.
    Race Conditions andData Races Race Condition • Result depending on timing of operations Data Race • Race condition on memory • Synchronization absent or incomplete 27
  • 28.
  • 29.
    Optimized Locking formore Parallelism synchronized (board[3][3]) { synchronized (board[3][2]) { board.moveLeft(snake) } } 29 Strategy: Lock only cells you need to update What could go wrong?
  • 30.
    Common Issues • Lackof Progress – Deadlock – Livelock • Race Condition – Data race – Atomicity violation • Performance – Sequential bottle necks – False sharing 30
  • 31.
    Basic Concepts Shared Memorywith Threads and Locks • Threads • Synchronization • No safety guarantees – Data Races – Deadlocks 31 P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al. P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. Bond et al. P2.10 Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al. Questions?
  • 32.
    COORDINATING THREADS Making CoordinationExplicit 32 Communicating Threads
  • 33.
    Shared Memory with ExplicitCoordination Raising the Abstraction Level Libraries for most languages
  • 34.
    Two Main Variants TemporalIsolation Transactional Memory Explicit Communication Channel or Message-based 34
  • 35.
  • 36.
    Transactional Memory Simple ProgramingModel • No Data Races (within transactions) • No Deadlocks 36 Issues • Performance overhead • Still experimental • Livelocks • Inter-transactional race conditions • I/O semantics
  • 37.
    Some Issues atomic { dataArray= getData(); fork { compute(dataArray[0]); } compute(dataArray[1]); } 37 P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al. P1.1 Transactional Data Structure Libraries, A. Spiegelman et al. P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al. What happens with forked thread when transaction aborts?
  • 38.
    Channel-based Communication coordChannel !(#moveLeft, snake) 38 for i in players(): msg ? coordChannels[i] match msg: (#moveLeft, snake): board[…,…] = … Player Thread Coordinator Thread Coordinator Thread Player Thread Player Thread send receive High-level communication but no safety guarantees
  • 39.
    Coordinating Threads Transactional Memory •Transactions • Simple Programming Model • Practical Issues Channel/Message Communication • Explicit coordination – Channels or message sending – Higher abstraction level • No safety guarantees 39 P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S. Tasharofi et al. P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc- model, AMP'10) Questions?
  • 40.
  • 41.
    Explicit Communication Only Absenceof Low-level Data Races 41
  • 42.
    All Interactions Explicit 42 ActorA Actor B Actor Principle
  • 43.
    Many Many Variations •Channel based – Communicating Sequential Processes • Message based – Actor models 43 P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.
  • 44.
    Communicating Event Loops 44 ActorA Actor B One Message at a Time
  • 45.
    Communicating Event Loops 45 ActorA Actor B Actors Contain Objects
  • 46.
    Communicating Event Loops 46 ActorA Actor B Interacting via Messages
  • 47.
    Message-based Communication 47 Player 1 Player1 Board Actor board <- moveLeft(snake) class Board { private array; public moveLeft(snake) { array[snake.x][snake.y] = ... } } Player Actor Board Actor async send actors.create(Board) actors.create(Snake) actors.create(Snake) Main Program
  • 48.
    Communicating Isolates Message orChannel Based • Explicit communication • No shared memory • Still potential for – Behavioral deadlocks – Livelocks – Bad message inter-leavings – Message protocol violations 48 P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al. P1.11 Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14) Questions?
  • 49.
    DATA PARALLELISM Parallelism forStructured Problems 49
  • 50.
    DATA PARALLELISM WITHFORK/JOIN Just one Example 50
  • 51.
    Fork/Join with Work-Stealing •Recursive divide-and-conquer • Automatic and efficient parallel scheduling • Widely available for C++, Java, and .NET 10/5/2016 51 Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H. & Zhou, Y. (1995), 'Cilk: An Efficient Multithreaded Runtime System', SIGPLAN Not. 30 (8), 207-216.
  • 52.
    Typical Applications • RecursiveAlgorithms1 – Mergesort – List and tree traversals • Parallel prefix, pack, and sorting problems2 • Irregular and unbalanced computation – On directed acyclic graphs (DAGs) – Ideally tree-shaped 52 1) More material can be found at: http://homes.cs.washington.edu/~djg/teachingMaterials/spac/ 2) Prefix Sums and Their Applications: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf
  • 53.
    Tiny Example: Summinga large Array • Simple array with numbers • Recursively divide – Every ‘ ’ is a parallel fork • Then do addition – Every ‘ ’ is a join 53 Note: This example is academic, and could be better expressed with a parallel map/reduce library, such as Scala’s Parallel Collections, Java 8 Streams, or Microsoft’s PLINQ. 46 9 42 7 55 45724965 4965 5 6 11 49 13 24 4572 72 45 9 9 18 42
  • 54.
    Data Parallelism withFork/Join • Parallel programming technique • Recursive divide-and- conquer • Automatic and efficient load-balancing 58 P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)
  • 55.
  • 56.
    Four Rough Categories 60 Communicating Isolates Threadsand Locks Coordinating Threads Data Parallelism
  • 57.
  • 58.
    These are Suggestions Please,feel free to propose papers of your interest. (Papers need to be approved by us) 62
  • 59.
    Topics of Interest •High-level language concurrency models – Actors, Communicating Sequential Processes, STM, Stream Processing, ... • Tooling – Debugging – Profiling • Implementation and runtime systems – Communication mechanisms – Data/object representation – System-level aspects • Big Data Frameworks – Programming models – Runtime level problems 63
  • 60.
    Papers without Artifacts P1.1Transactional Data Structure Libraries, A. Spiegelman et al. (conc-model, PLDI'16) P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al. (conc-model, runtime, EuroSys'16) P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al. (conc-model, Agere'16) P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S. Tasharofi et al. (conc-model, ECOOP'13) P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00) P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc-model, AMP'10) 64
  • 61.
    Papers without Artifacts P1.7Pydron: Semi-Automatic Parallelization for Multi- Core and the Cloud, S. C. Müller et al. (conc-model, runtime, OSDI'15) P1.8 Fast Splittable Pseudorandom Number Generators, G. L. Steele et al. (runtime, OOPSLA'14) P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.- P. Lozi et al. (runtime, EuroSys'15) P1.10Application-Assisted Live Migration of Virtual Machines with Java Applications, K.-Y. Hou et al. (runtime, EuroSys'15) P1.11Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14) 65
  • 62.
    Papers with Artifacts P2.1Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis (conc-model, PPoPP'16) P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al. (conc-model, ECOOP'16) P2.3 StreamJIT: a commensal compiler for high- performance stream programming, J. Bosboom et al. (conc-model, runtime, OOPSLA'14) P2.4 An Efficient Synchronization Mechanism for Multi- core Systems, M. Aldinucci et al. (conc-model, runtime, EuroPar'12) P2.5 Parallel parsing made practical, A. Barenghi et al. (runtime, SCP'15) 66
  • 63.
    Papers with Artifacts P2.6SparkR : Scaling R Program with Spark, S. Venkataraman et al. (conc-model, bigdata, SIGMOD'16) P2.7 SparkSQL: Relational Data Processing in Spark, M. Armbrust et al. (bigdata, runtime, VLDB'14) P2.8 Twitter Heron: Stream Processing at Scale, S. Kulkarni et al. (bigdata, SIGMOD'15) P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. D. Bond et al. (tooling, OOPSLA'13) P2.10Efficient and Thread-Safe Objects for Dynamically- Typed Languages, B. Daloze et al. (runtime, OOPSLA'16) 67

Editor's Notes

  • #2 Talk: 18min + 5min questions
  • #12 Multicore is everywhere Just one-processor systems here, workstations usually have 2 processors, server even more Embedded systems already use manycore processors If you buy a notebook/computer something today, it is multicore
  • #13 GHz == consumed power == produced heat Cooling to complex, no way to put such things in portable devices
  • #14 Why do we need to?  So, why manycore then? Unfortunately CPUs are not becoming faster anymore Reached a peak in 2005, no CPUs are actually slower (simplified speaking) Notes: - show graph 1990, 2000, 2005, 2010 GHz count + CPUs red line power-wall   -89' Intel486™ DX Processor: 50, 33, 25 MHz - November 1, 1995, Intel® Pentium® Pro Processor, 200, 180, 166, 150 MHz   - November 20, 2000, Intel® Pentium® 4 Processor, 1.50 GHz, 1.40 GHz   - February, 2005: Intel® Pentium® 4 Processor Extreme Edition supporting HT Technology 3.80 GHz   (570)    -  3.33 GHz (with boost to 3.6 GHz) Intel® Core™ i7-980X processor Extreme Edition
  • #15 - decreasing GHz a bit and putting another core on the chip   allows to keep power consumption stable Theoretical speedup is times 1.8 but cores have lower sequential performance
  • #19 ENIAC
  • #27 AI players can consume as much CPU as they like Presentation can be done on different core
  • #52 - Efficient load balancing
  • #59 Good fit for tree-recursion Irregular computational complexity