0% found this document useful (0 votes)

4 views11 pages

PDC Lecture 7

Uploaded by

najmul.islam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

PDC Lecture 7

Uploaded by

najmul.islam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

OpenMP in a nutshell

OpenMP is a library for parallel programming in the SMP (symmetric multi-processors, or shared-
memory processors) model. When programming with OpenMP, all threads share memory and data.
OpenMP supports C, C++ and Fortran. The OpenMP functions are included in a header file called omp.h .

OpenMP program structure: An OpenMP program has sections that are sequential and sections that are
parallel. In general an OpenMP program starts with a sequential section in which it sets up the
environment, initializes the variables, and so on.

When run, an OpenMP program will use one thread (in the sequential sections), and several threads (in
the parallel sections).

There is one thread that runs from the beginning to the end, and it's called the master thread. The parallel
sections of the program will cause additional threads to fork. These are called the slave threads.

A section of code that is to be executed in parallel is marked by a special directive (omp pragma). When
the execution reaches a parallel section (marked by omp pragma), this directive will cause slave threads to
form. Each thread executes the parallel section of the code independently. When a thread finishes, it joins
the master. When all threads finish, the master continues with code following the parallel section.

Each thread has an ID attached to it that can be obtained using a runtime library function
(called omp_get_thread_num()). The ID of the master thread is 0.

Why OpenMP? More efficient, and lower-level parallel code is possible, however OpenMP hides the
low-level details and allows the programmer to describe the parallel code with high-level constructs,
which is as simple as it can get.

OpenMP has directives that allow the programmer to:

• specify the parallel region

• specify whether the variables in the parallel section are private or shared
• specify how/if the threads are synchronized
• specify how to parallelize loops
• specify how the works is divided between threads (scheduling)

Compiling and running OpenMP code

The OpenMP functions are included in a header file called omp.h . The public linux machines dover and
foxcroft have gcc/g++ installed with OpenMP support. All you need to do is use the -fopenmp flag on the
command line:

gcc -fopenmp hellosmp.c -o hellosmp

Itâ€™s also pretty easy to get OpenMP to work on a Mac. A quick search with google reveals that the
native apple compiler clang is installed without openmp support. When you installed gcc it probably got
installed without openmp support. To test, go to the terminal and try to compile something:

gcc -fopenmp hellosmp.c -o hellosmp

If you get an error message saying that â€œomp.hâ€• is unknown, that mans your compiler does not
have openmp support.

hellosmp.c:12:10: fatal error: 'omp.h' file not found

#include

1 error generated.

make: *** [hellosmp.o] Error 1

Hereâ€™s what I did:

1. I installed Homebrew, the missing package manager for MacOS, http://brew.sh/index.html

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

2. Then I asked brew to install gcc:

brew install gcc

3. Then type â€˜gccâ€™ and press tab; it will complete with all the versions of gcc installed:

$gcc

gcc gcc-6 gcc-ar-6 gcc-nm-6 gcc-ranlib-6 gccmakedep

4. The obvious guess here is that gcc-6 is the latest version, so I use it to compile:

gcc-6 -fopenmp hellosmp.c

Works!

Specifying the parallel region (creating threads)

The basic directive is:

#pragma omp parallel

{}

When the master thread reaches this line, it forks additional threads to carry out the work enclosed in the
block following the #pragma construct. The block is executed by all threads in parallel. The original
thread will be denoted as master thread with thread-id 0.
Example (C program): Display "Hello, world." using multiple threads.

#include < stdio.h >

int main(void)

{ #pragma omp parallel

printf("Hello, world.\n");

return 0;

Use flag -fopenmp to compile using gcc:

$ gcc -fopenmp hello.c -o hello

Output on a computer with two cores, and thus two threads:

Hello, world.

On dover, I got 24 hellos, for 24 threads. On my desktop I get (only) 8. How many do you get?

Note that the threads are all writing to the standard output, and there is a race to share it. The way the
threads are interleaved is completely arbitrary, and you can get garbled output:

Hello, wHello, woorld.

rld.

Private and shared variables

In a parallel section variables can be private or shared:

• private: the variable is private to each thread, which means each thread will have its own local
copy. A private variable is not initialized and the value is not maintained for use outside the
parallel region. By default, the loop iteration counters in the OpenMP loop constructs are private.
• shared: the variable is shared, which means it is visible to and accessible by all threads
simultaneously. By default, all variables in the work sharing region are shared except the loop
iteration counter. Shared variables must be used with care because they cause race
conditions.

The type of the variable, private or shared, is specified following the #pragma omp:

Example:

int main (int argc, char *argv[]) {

int th_id, nthreads;

#pragma omp parallel private(th_id)

//th_id is declared above. It is is specified as private; so each

//thread will have its own copy of th_id

th_id = omp_get_thread_num();

printf("Hello World from thread %d\n", th_id);

Private or shared? Sometimes your algorithm will require sharing variables, other times it will require
private variables. The caveat with sharing is the race conditions. The task of thinking through the details
of a parallel algorithm and specifying the type of the variables is on, of course, the programmer.

Synchronization

OpenMP lets you specify how to synchronize the threads. Hereâ€™s whatâ€™s available:

• critical: the enclosed code block will be executed by only one thread at a time, and not
simultaneously executed by multiple threads. It is often used to protect shared data from race
conditions.
• atomic: the memory update (write, or read-modify-write) in the next instruction will be performed
atomically. It does not make the entire statement atomic; only the memory update is atomic. A
compiler might use special hardware instructions for better performance than when using critical.
• ordered: the structured block is executed in the order in which iterations would be executed in a
sequential loop
• barrier: each thread waits until all of the other threads of a team have reached this point. A work-
sharing construct has an implicit barrier synchronization at the end.
• nowait: specifies that threads completing assigned work can proceed without waiting for all
threads in the team to finish. In the absence of this clause, threads encounter a barrier
synchronization at the end of the work sharing construct.

More on barriers: If we wanted all threads to be at a specific point in their execution before proceeding,
we would use a barrier. A barrier basically tells each thread, "wait here until all other threads have
reached this point...".

Barrier example:

int main (int argc, char *argv[]) {

int th_id, nthreads;

#pragma omp parallel private(th_id)

th_id = omp_get_thread_num();

printf("Hello World from thread %d\n", th_id);

#pragma omp barrier <----------- master waits until all threads finish before printing

if ( th_id == 0 ) {

nthreads = omp_get_num_threads();

printf("There are %d threads\n",nthreads);

}//main

Note above the function omp_get_num_threads(). Can you guess what itâ€™s doing? Some other runtime
functions are:

• omp_get_num_threads
• omp_get_num_procs
• omp_set_num_threads
• omp_get_max_threads
Parallelizing loops

Parallelizing loops with OpenMP is straightforward. One simply denotes the loop to be parallelized and a
few parameters, and OpenMP takes care of the rest. Can't be easier!

The directive is called a work-sharing construct, and must be placed inside a parallel section:

#pragma omp for

//specify a for loop to be parallelized; no curly braces

The â€œ#pragma omp forâ€• distributes the loop among the threads. It must be used inside a parallel
block:

#pragma omp parallel

â€¦

#pragma omp for

//for loop to parallelize

â€¦

}//end of parallel block

Example:

//compute the sum of two arrays in parallel

#include < stdio.h >

#include < omp.h >

#define N 1000000

int main(void) {

float a[N], b[N], c[N];

int i;

/* Initialize arrays a and b */

for (i = 0; i < N; i++) {

a[i] = i * 2.0;

b[i] = i * 3.0;

/* Compute values of array c = a+b in parallel. */

#pragma omp parallel shared(a, b, c) private(i)

#pragma omp for

for (i = 0; i < N; i++) {

c[i] = a[i] + b[i];

printf ("%f\n", c[10]);

Another example (here): adding all elements in an array.

//example4.c: add all elements in an array in parallel

#include < stdio.h >

int main() {

const int N=100;

int a[N];
//initialize

for (int i=0; i < N; i++)

a[i] = i;

//compute sum

int local_sum, sum;

#pragma omp parallel private(local_sum) shared(sum)

local_sum =0;

//the array is distributde statically between threads

#pragma omp for schedule(static,1)

for (int i=0; i< N; i++) {

local_sum += a[i];

//each thread calculated its local_sum. ALl threads have to add to

//the global sum. It is critical that this operation is atomic.

#pragma omp critical

sum += local_sum;

printf("sum=%d should be %d\n", sum, N*(N-1)/2);

}
There exists also a â€œparallel forâ€• directive which combines a parallel and a for (no need to nest a for
inside a parallel):

int main(int argc, char **argv)

{ int a[100000];

#pragma omp parallel for

for (int i = 0; i < 100000; i++) {

a[i] = 2 * i;

printf(â€œassigning i=%d\nâ€•);

return 0;

Exactly how the iterations are assigned to ecah thread, that is specified by the schedule (see below).
Note:Since variable i is declared inside the parallel for, each thread will have its own private version of i.

Loop scheduling

OpenMP lets you control how the threads are scheduled. The type of schedule available are:

• static: Each thread is assigned a chunk of iterations in fixed fashion (round robin). The iterations
are divided among threads equally. Specifying an integer for the parameter chunk will allocate
chunk number of contiguous iterations to a particular thread. Note: is this the default? check.
• dynamic: Each thread is initialized with a chunk of threads, then as each thread completes its
iterations, it gets assigned the next set of iterations. The parameter chunk defines the number of
contiguous iterations that are allocated to a thread at a time.
• guided: Iterations are divided into pieces that successively decrease exponentially, with chunk
being the smallest size.

This is specified by appending schedule(type, chunk) after the pragma for directive:

#pragma omp for schedule(static, 5)

More complex directives

...which you probably won't need.

• can define â€œsectionsâ€• inside a parallel block

• can request that iterations of a loop are executed in order
• specify a block to be executed only by the master thread
• specify a block to be executed only by the first thread that reaches it
• define a section to be â€œcriticalâ€•: will be executed by each thread, but can be executed only
by a single thread at a time. This forces threads to take turns, not interrupt each other.
• define a section to be â€œatomicâ€•: this forces threads to write to a shared memory location in
a serial manner to avoid race conditions

#include < stdio.h >

#include < omp.h >

int main(void) {

int count = 0;

#pragma omp parallel shared(count)

#pragma omp atomic

count++; // count is updated by only a single thread at a time

printf_s("Number of threads:Â %d\n", count);

Performance considerations

Critical sections and atomic sections serialize the execution and eliminate the concurrent execution of
threads. If used unwisely, OpenMP code can be worse than serial code because of all the thread overhead.

Some comments

OpenMP is not magic. A loop must be obviously parallelizable in order for OpenMP to unroll it and
facilitate the assignment of iterations among threads. If there are any data dependencies from one iteration
to the next, then OpenMP can't parallelize it.

The for loop cannot exit early, for example:

// BAD - can;t parallelize with OpenMP

for (int i=0;i < 100; i++) {

if (i > 50) break; <----- breaking when i greater than 50

Values of the loop control expressions must be the same for all iterations of the loop. For example:

// BAD - can;t parallelize with OpenMP

for (int i=0;i < 100; i++) {

if (i == 50)

i = 0;

OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
29 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
40 pages
Mcap-Lab Manual 1
No ratings yet
Mcap-Lab Manual 1
19 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
10 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
HPC - Unit 3
No ratings yet
HPC - Unit 3
15 pages
Program Excecution ExpFinal
No ratings yet
Program Excecution ExpFinal
10 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
OpenMP Intro
No ratings yet
OpenMP Intro
52 pages
OpenMP Basics and Examples
No ratings yet
OpenMP Basics and Examples
80 pages
Sample - Code - Parallel - Cse6230 Fa14 04 Omp
No ratings yet
Sample - Code - Parallel - Cse6230 Fa14 04 Omp
51 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
OpenMP Guide for Parallel Computing
No ratings yet
OpenMP Guide for Parallel Computing
32 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Open MP
No ratings yet
Open MP
28 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
NGK Openmp
No ratings yet
NGK Openmp
13 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Open MP
No ratings yet
Open MP
30 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
51 pages
Azizul Azri Bin Mustaffa - PEC12-60
No ratings yet
Azizul Azri Bin Mustaffa - PEC12-60
36 pages
OpenMP Basics for Programmers
No ratings yet
OpenMP Basics for Programmers
5 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
OpenMP Tutorial: Hands-On Introduction
No ratings yet
OpenMP Tutorial: Hands-On Introduction
153 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
09 OpenMP Intro
No ratings yet
09 OpenMP Intro
15 pages
Unit III
No ratings yet
Unit III
15 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
25 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
Updated - CS8083 MCP UNIT III Notes
No ratings yet
Updated - CS8083 MCP UNIT III Notes
26 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
PDSOpen MP
No ratings yet
PDSOpen MP
22 pages
Bhi
No ratings yet
Bhi
3 pages
Ge6100 Understanding Self
No ratings yet
Ge6100 Understanding Self
100 pages
Batangas I Electric Cooperative, Inc
No ratings yet
Batangas I Electric Cooperative, Inc
13 pages
Optimization in Operations Research 2nd Edition Rardin
100% (1)
Optimization in Operations Research 2nd Edition Rardin
313 pages
Making Hard Decisions With DecisionTools 3rd Edition Clemen Solutions Manual - Complete Set of Chapters Available For One-Click Download
100% (14)
Making Hard Decisions With DecisionTools 3rd Edition Clemen Solutions Manual - Complete Set of Chapters Available For One-Click Download
44 pages
Becoming You
100% (1)
Becoming You
44 pages
Writing Task 2
No ratings yet
Writing Task 2
3 pages
MBT LT 07 Topic Cards For Board Games
No ratings yet
MBT LT 07 Topic Cards For Board Games
1 page
Zscaler Cisco SD WAN Deployment Guide FINAL
No ratings yet
Zscaler Cisco SD WAN Deployment Guide FINAL
129 pages
Unpublished
No ratings yet
Unpublished
4 pages
Vinay - A S - 374996044 PDF
No ratings yet
Vinay - A S - 374996044 PDF
4 pages
Maintenance Fowa - Fuel - 50305
100% (1)
Maintenance Fowa - Fuel - 50305
20 pages
Research Paper Final
No ratings yet
Research Paper Final
29 pages
White Paper: Sensors For Industrial Iot
No ratings yet
White Paper: Sensors For Industrial Iot
13 pages
(Ebook PDF) Maternal and Child Health: Programs, Problems, and Policy in Public Health 3rd Edition Instant Download
100% (1)
(Ebook PDF) Maternal and Child Health: Programs, Problems, and Policy in Public Health 3rd Edition Instant Download
57 pages
Beyond Cultural Distance - Switching To A Friction Lens in The Study of Cultural Differences
No ratings yet
Beyond Cultural Distance - Switching To A Friction Lens in The Study of Cultural Differences
6 pages
MCA - FY08 - Anheuser-Busch InBev India PVT LTD
No ratings yet
MCA - FY08 - Anheuser-Busch InBev India PVT LTD
51 pages
Case Study Questions PDF
No ratings yet
Case Study Questions PDF
4 pages
The Art of Problem PDF
No ratings yet
The Art of Problem PDF
223 pages
Animal Vocabulary and Sentence Practice
No ratings yet
Animal Vocabulary and Sentence Practice
26 pages
Week 13 (Intro To Linear Programming)
No ratings yet
Week 13 (Intro To Linear Programming)
24 pages
Warehouse Skills Resume Examples
100% (2)
Warehouse Skills Resume Examples
6 pages
Ziaraat Books
No ratings yet
Ziaraat Books
2 pages
OCS 4.X Troubleshooting
No ratings yet
OCS 4.X Troubleshooting
96 pages
Sense and Sensibility PDF
No ratings yet
Sense and Sensibility PDF
459 pages
Experiment 17
No ratings yet
Experiment 17
5 pages
What Is The Correct Text Height in Paper Space For 1:100 Scale Metric
No ratings yet
What Is The Correct Text Height in Paper Space For 1:100 Scale Metric
19 pages
The World Customs Organization
100% (1)
The World Customs Organization
3 pages
Flowchart For Customs Clearance of Imported Goods
No ratings yet
Flowchart For Customs Clearance of Imported Goods
3 pages
Group Assignment Fat Solible Vitamins
No ratings yet
Group Assignment Fat Solible Vitamins
6 pages