0% found this document useful (0 votes)

31 views35 pages

Lect11 Openmp1

Uploaded by

cvidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views35 pages

Lect11 Openmp1

Uploaded by

cvidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Programming shared memory systems:

OpenMP
 Various methods to program shared memory systems
 Introduction to OpenMP
Programming shared memory systems

 Create multiple threads to use multiple cores

o Use a specialized programming language designed for shared memory multi-
thread programming. E.g Cilk plus. These are not mainstream yet.
o Use library support for shared memory multi-threading. E.g. Python
multiprocessing shared memory module
o Use parallelizing compiler
o Use compiler directives that supplement a sequential programming language.
E.g. OpenMP (with C/C++, fortran)
o Use processes
o Use threads
Programming shared memory systems

 Although there are different ways for shared memory programming,

issues in programming shared memory systems are similar. Any
shared memory programming (or parallel programming) needs to
address two issues.
o How to create multiple execution threads, mostly the fork-join model.
o How to share data and coordinate – e.g. avoiding race condition

 We will review processes and threads, but focus on OpenMP

o OpenMP is widely supported: Gcc supports OpenMP
o OpenMP is relatively easy to understand and use
Use processes

 A topic in the UG OS class.

 Heavyweight
o Process creation, termination, and coordination are expensive.
 UNIX system calls: fork(), exit(), wait().
o Inter-process communication is somewhat awkward because process is designed
for isolation, not sharing.
 Explicit shared memory operations: shmget(), shmat(), and shmctl()

 Due to its overhead, although processes can be used for shared memory
programming, they are rarely used in practice.
Use threads

 A topics in the UG OS class

 A thread is a lightweight processes: creation, termination, communication
are all much cheaper than those for process.
 Pthreads: POSIX-compliant threads
o Thread creation, termination, and coordination: pthread_create(), pthread_exit(),
and pthread_join().
o Communication is through shared memory. Pthread routines to coordinating memory
access: pthread_mutex_lock(), pthread_cond_wait(), pthread_cond_signal() 。
 Very easy to write wrong programs. Some legacy system routines are still not “thread-safe.”
o Programmer needs to explicitly manage the threads.
OpenMP
 What does OpenMP stands for?
o Open specifications for Multi Processing via collaborative work between
interested parties from the hardware and software industry, government
and academia.
 OpenMP is an Application Program Interface (API) that may be used
to explicitly direct multi-threaded, shared memory parallelism.
o API components: Compiler Directives, Runtime Library Routines.
Environment Variables
 OpenMP is a directive-based method to invoke parallel computations on share-
memory multiprocessors
OpenMP

 OpenMP API is specified for C/C++ and Fortran.

 OpenMP is not intrusive to the original serial code: instructions
appear in comment statements for fortran and pragmas for C/C++.
 OpenMP website: http://www.openmp.org
o Materials in this lecture are taken from various OpenMP tutorials in the
website and other places.
Why OpenMP

 OpenMP is portable: It is jointly defined and endorsed by a group of

major hardware and software vendors.
o Hoping for an ANSI standard

 OpenMP can be implemented incrementally, one function or even

one loop at a time.
o A nice way to get a parallel program from a sequential program.
How to compile and run OpenMP programs?

 To compile: use gcc with flag -fopenmp

o Example: gcc –fopenmp a.c
o Try lect11/helloworld.cpp, lect11/helloworld1.cpp – see how easy it is to write wrong
shared memory program.
 To run: ‘a.out’
o To change the number of threads (the default is the number of cores on the machine):
 setenv OMP_NUM_THREADS 4 (tcsh) or export OMP_NUM_THREADS=4(bash)
 Put it togather, on linprog:
 gcc –fopenmp helloworld1.cpp
 Export OMP_NUM_THREADS = 4
 ./a.out -- run the program with 4 threads
OpenMP basic syntax

 Most of the constructs in OpenMP are compiler directives

#pragma omp construct [clause [clause] …]
o Example
#pragma omp parallel num_threads(8)
 OpenMP constructs usually apply to a structured block (one
statement).
 Function prototypes are in
#include <omp.h>
OpenMP “Hello world” program
#include <omp.h>
#include <iostream> OpenMP include file
#include <sstream>

using namespace std;

int main()
{
stringstream ss[100];
int nprocs;
OpenMP directive, parallel region with
the default number of threads
#pragma omp parallel
{
int total_threads = omp_get_num_threads();
int myid = omp_get_thread_num();
if (myid == 0) nprocs = total_threads; OpenMP library routines that return the number
ss[myid] << "Hello world, I am No." << myid << " out of "
<< total_threads << " Members\n"; Of threads and the thread ID respectively.
}
for (int i=0; i<nprocs; i++) {
string a; while (ss[i] >> a) cout << a << " "; cout << "\n";
}
return 0;
}
OpenMP execution model

 OpenMP uses the fork-join model of parallel execution.

o All OpenMP programs begin with a single master thread.
o The master thread executes sequentially until a parallel region is encountered, when it
creates a team of parallel threads (FORK).
o When the team threads complete the parallel region, they synchronize and terminate,
leaving only the master thread that executes sequentially (JOIN).
OpenMP general code structure
#include <omp.h>
main () {
int var1, var2, var3;
Serial code
...
/* Beginning of parallel section. Fork a team of threads. Specify variable scoping*/
#pragma omp parallel private(var1, var2) shared(var3)
{
/* Parallel section executed by all threads */
...
/* All threads join master thread and disband*/
}
Resume serial code
...
}
An example lect11/simple_multithread.cpp
 If variables are not specified as public or private. All variables outside the parallel
region are public variables (shared by all threads)
 What about variables declared inside the parallel block?
 What if the ‘nprocs = total_threads;’ statement in the program is moved before the if
statement?
 Easy for OpenMP to create threads
o Deterministic multi-threaded program: no race condition – no two threads can access one
memory location with one writing to it!
o Non-deterministic multi-threaded programs are mostly incorrect programs.
o Need to be mindful when making system/library routine calls.
Data model

• Private and shared variables

• Variables in the global data space are accessed
by all parallel threads (shared variables).
• Variables in a thread’s private space can only be
accessed by the thread (private variables)
• several variations, depending on the initial values and
whether the results are copied outside the region.

• OpenMP allows for specification of private and

shared variable before entering each parallel region.
FOR ( I = 0; I < ARRAYSIZE; I++ ) {

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Can this loop run in parallel? On what condition?

If one has a choice, one should parallelize inner or outer loop?
#PRAGMA OMP PARALLEL FOR PRIVATE( PRIVINDX, PRIVDBL )

FOR ( I = 0; I < ARRAYSIZE; I++ ) {

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Parallel for loop index variable is

private by default.
OpenMP directives

 Format:
#progma omp directive-name [clause,..] newline
(use ‘\’ for multiple lines)
 Example:
#pragma omp parallel default(shared) private(beta,pi)

 Scope of a directive is one block of statements { …}

 More detail in the specification at http://www.openmp.org/ (OpenMP
5.0 at https://www.openmp.org/spec-html/5.0/openmp.html)
Parallel region construct
 A block of code that will be executed by multiple threads.
#pragma omp parallel [clause …]
{
……
} (implied barrier)

Clauses: if ([parallel :]expression), num_thread(integer-expression), private (list),

shared (list), default (shared | none), reduction ([reduction-modifier,] reduction-
modifier: list), firstprivate(list), copyin(list), proc_bind(master | close | spread),
allocate([allocator:} list)

o private(list): everything private and local (no relation with variables outside the
block).
o shared(list): data accessed by all threads
o default (none| shared)
The reduction clause (see
lect11/reduction.cpp)
Sum = 0.0;
#pragma parallel for default(none) shared (n, x) private (I) reduction(+ : sum)
{
For(I=0; I<n; I++) sum = sum + x(I);
}

 Updating sum must avoid racing condition

 With the reduction clause, OpenMP generates code such that the
race condition is avoided.
o See also lect11/reduction1.cpp and lect11/reduction2.cpp.
Clauses
 Firstprivate(list): variables are private, but are initialized with the value
before entering the block.
 copyin(list): The copyin clause provides a mechanism to copy the value of
a threadprivate variable of the master thread to the threadprivate variable
of each other member of the team that is executing the parallel region.
 proc_bind(master | close | spread): thread affinity related,
to talk more in the next class (OpenMP for NUMA
architectures)
 allocate([allocator :] list): memory allocation related, to talk
more in the next class. (OpenMP for NUMA architectures)
Work-sharing constructs
 #pragma omp for [clause …]
 #pragma omp sections [clause …]
 #pragma omp single [clause …]

 These must be enclosed in parallel region

 No implied barrier on entry, implied barrier on exit (unless specified otherwise)
 “omp for”: loop iterations are distributed over the number of threads.
 “omp sections”: each section is assigned to one thread
 “omp single”: only one thread runs the block
The omp for directive: example
omp for shares loop iteration among different threads. It can have a schedule clause
schedule (static | dynamic | guided [, chunk])
The omp session clause – example (see also
lect11/simple_multithread1.cpp)
Omp single construct

 The single construct specifies that the associated structured block is

executed by only one of the threads in the team (not necessarily the
master thread), in the context of its implicit task. The other threads
in the team, which do not execute the block, wait at an implicit
barrier at the end of the single construct unless a nowait clause is
specified.
 See lect11/omp_single.cpp
Some merged constructs
Synchronization: barrier
For(I=0; I<N; I++)  Both loops are in parallel region
a[I] = b[I] + c[I];
With no synchronization in between.
For(I=0; I<N; I++) What is the problem?
d[I] = a[I] + b[I]
 The second loop should not start
For(I=0; I<N; I++)
a[I] = b[I] + c[I]; before all of the first loop finishes.

#pragma omp barrier

For(I=0; I<N; I++)

d[I] = a[I] + b[I]
Critical session
For(I=0; I<N; I++) {
……
Cannot be parallelized if sum is shared.
sum += A[I];
See lect11/reduction1.cpp.
……
What if we comment out ‘#pragma omp critical’ in
}
the program?
For(I=0; I<N; I++) {
……
#pragma omp critical
{
sum += A[I];
}
……
}
OpenMP environment variables

 OMP_NUM_THREADS
 OMP_SCHEDULE
OpenMP runtime environment

 omp_get_num_threads()
 omp_get_thread_num()
 omp_in_parallel()
 Routines related to locks
 ……
OpenMP examples

 lect11/piomp.c
Sequential Matrix Multiply

For (I=0; I<n; I++)

for (j=0; j<n; j++)
c[I][j] = 0;
for (k=0; k<n; k++)
c[I][j] = c[I][j] + a[I][k] * b[k][j];
OpenMP Matrix Multiply (lect11/mm_omp.c)

#pragma omp parallel for private(j, k)

For (I=0; I<n; I++)
for (j=0; j<n; j++)
c[I][j] = 0;
for (k=0; k<n; k++)
c[I][j] = c[I][j] + a[I][k] * b[k][j];
OpenMP Summary
o OpenMP provides a compact, yet powerful programming model for shared memory
programming
 It is very easy to use OpenMP to create parallel programs.
o OpenMP preserves the sequential version of the program
o Developing an OpenMP program:
 Start from a sequential program
 Identify the code segment that takes most of the time.
 Determine whether the important loops can be parallelized
 The loops may have critical sections, reduction variables, etc
 Determine the shared and private variables.
 Add directives

OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
40 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Unit 3
No ratings yet
Unit 3
13 pages
OpenMP Intro
No ratings yet
OpenMP Intro
52 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Unit III
No ratings yet
Unit III
15 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
OpenMP Tutorial: Hands-On Introduction
No ratings yet
OpenMP Tutorial: Hands-On Introduction
153 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Open MP
No ratings yet
Open MP
30 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Shared Memory Parallel Programming: Introduction To Openmp
No ratings yet
Shared Memory Parallel Programming: Introduction To Openmp
39 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
10 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Num Tech
No ratings yet
Num Tech
39 pages
Openmp
No ratings yet
Openmp
21 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
10 pages
Updated - CS8083 MCP UNIT III Notes
No ratings yet
Updated - CS8083 MCP UNIT III Notes
26 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
OMP Exec
No ratings yet
OMP Exec
24 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
29 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
OpenMP Basics and Examples
No ratings yet
OpenMP Basics and Examples
80 pages
OpenMP Multithreading Tutorial
100% (1)
OpenMP Multithreading Tutorial
82 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
11 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Openmp
No ratings yet
Openmp
115 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
CS8083 UNIT III Notes
No ratings yet
CS8083 UNIT III Notes
26 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 19-Aug-2021 Module 2 - OpenMP
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 19-Aug-2021 Module 2 - OpenMP
10 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Sockets
No ratings yet
Sockets
23 pages
Intro Fork 1
No ratings yet
Intro Fork 1
10 pages
Proceso S
No ratings yet
Proceso S
45 pages
Lamdas
No ratings yet
Lamdas
54 pages
White Paper Top Considerations For Enterprise Ssds PDF
No ratings yet
White Paper Top Considerations For Enterprise Ssds PDF
8 pages
Lab Set Up Guide: Last Updated: May 9, 2022
100% (1)
Lab Set Up Guide: Last Updated: May 9, 2022
7 pages
TCS NQT Questions and Answers For Programming Logic
100% (2)
TCS NQT Questions and Answers For Programming Logic
4 pages
Hasomed Rehealth Stim Science - Mode - 200909 - Protocol
No ratings yet
Hasomed Rehealth Stim Science - Mode - 200909 - Protocol
15 pages
A Level CPU Architecture Guide
No ratings yet
A Level CPU Architecture Guide
7 pages
Android Alert Dialog Example
No ratings yet
Android Alert Dialog Example
2 pages
Verilog HDL: Module
No ratings yet
Verilog HDL: Module
9 pages
Arrays and Strings Guide
No ratings yet
Arrays and Strings Guide
26 pages
MCPTT Update
100% (1)
MCPTT Update
15 pages
1 Esp It Worksheet Book 1 Unit 4
100% (2)
1 Esp It Worksheet Book 1 Unit 4
2 pages
4ch Million Pixels SD Mobile DVR User Manual
No ratings yet
4ch Million Pixels SD Mobile DVR User Manual
36 pages
m340 Canopen Altivar Lexium Tesysu Preventa en PDF
No ratings yet
m340 Canopen Altivar Lexium Tesysu Preventa en PDF
154 pages
Unit 4 - Computer System Organisation - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Computer System Organisation - WWW - Rgpvnotes.in
8 pages
Mp8th Unit
No ratings yet
Mp8th Unit
40 pages
Assignment-2!03!135191-007 M. Raza Siddique
No ratings yet
Assignment-2!03!135191-007 M. Raza Siddique
5 pages
Exercise 10
No ratings yet
Exercise 10
1 page
CADWorx Plant 2016 Enhancement
No ratings yet
CADWorx Plant 2016 Enhancement
17 pages
Migration Path: From TC1767 To TC1782
No ratings yet
Migration Path: From TC1767 To TC1782
19 pages
How To Create A Bootable Usb Drive
No ratings yet
How To Create A Bootable Usb Drive
5 pages
Java8 New Features - Study Guide
No ratings yet
Java8 New Features - Study Guide
96 pages
Tool Set Quick Start Guide
No ratings yet
Tool Set Quick Start Guide
33 pages
Aidan Seeback - The Me Report
No ratings yet
Aidan Seeback - The Me Report
8 pages
Embedded Internship PPT Final
No ratings yet
Embedded Internship PPT Final
61 pages
FSM Slides
0% (1)
FSM Slides
37 pages
DB-20 - Bascom D+L
No ratings yet
DB-20 - Bascom D+L
40 pages
Technical Specs for Engineers
No ratings yet
Technical Specs for Engineers
60 pages
BaanERP Tools - Programming Features PDF
100% (1)
BaanERP Tools - Programming Features PDF
228 pages
1Ch1-Introduction To Microprocessor and Computer, Brey Barry
No ratings yet
1Ch1-Introduction To Microprocessor and Computer, Brey Barry
110 pages
C Programming Book
No ratings yet
C Programming Book
177 pages
Hercules Overview
No ratings yet
Hercules Overview
15 pages

Lect11 Openmp1

Uploaded by

Lect11 Openmp1

Uploaded by

Programming shared memory systems:

 Create multiple threads to use multiple cores

 Although there are different ways for shared memory programming,

 We will review processes and threads, but focus on OpenMP

 A topic in the UG OS class.

 A topics in the UG OS class

 OpenMP API is specified for C/C++ and Fortran.

 OpenMP is portable: It is jointly defined and endorsed by a group of

 OpenMP can be implemented incrementally, one function or even

 To compile: use gcc with flag -fopenmp

 Most of the constructs in OpenMP are compiler directives

using namespace std;

 OpenMP uses the fork-join model of parallel execution.

• Private and shared variables

• OpenMP allows for specification of private and

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Can this loop run in parallel? On what condition?

FOR ( I = 0; I < ARRAYSIZE; I++ ) {

FOR ( PRIVINDX = 0; PRIVINDX < 16; PRIVINDX++ ) {

PRIVDBL = ( (DOUBLE) PRIVINDX ) / 16;

Y[I] = SIN( EXP( COS( - EXP( SIN(X[I]) ) ) ) ) + COS( PRIVDBL );

Parallel for loop index variable is

 Scope of a directive is one block of statements { …}

Clauses: if ([parallel :]expression), num_thread(integer-expression), private (list),

 Updating sum must avoid racing condition

 These must be enclosed in parallel region

 The single construct specifies that the associated structured block is

#pragma omp barrier

For(I=0; I<N; I++)

For (I=0; I<n; I++)

#pragma omp parallel for private(j, k)

You might also like