0% found this document useful (0 votes)

28 views64 pages

02 Parallel Processing - CPP and OMP

The document discusses the differences between multi-process and multi-thread programming, highlighting that processes do not share resources while threads within a process do. It covers thread concepts such as synchronization, communication, and issues like race conditions and deadlocks, as well as mutexes and atomic operations. Additionally, it introduces multi-threading in C++ and OpenMP, explaining their functionalities and programming models.

Uploaded by

Sergiu Cusnir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views64 pages

02 Parallel Processing - CPP and OMP

Uploaded by

Sergiu Cusnir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

PARALLEL

PROCESSING/PROGRAMMING
CATALIN BOJA
CATALIN.BOJA@IE.ASE.RO
BUCHAREST UNIVERSITY OF ECONOMIC STUDIES
DEPARTMENT OF ECONOMIC INFORMATICS & CYBERNETICS
MULTI-PROCESS VS MULTI-THREAD

CATALIN.BOJA@IE.ASE.RO 2

https://computing.llnl.gov/tutorials/pthreads/ http://www.javamex.com/tutorials/threads/how_threads_work.shtml
MULTI-PROCESS VS MULTI-THREAD

• A process is an instance of a computer • A thread of execution is the smallest unit of

program that is being executed. It contains processing that can be scheduled by an
the program code and its current activity operating system
• Different processes do not share these • A thread is contained inside a process.
resources Multiple threads can exist within the same
• Each process has its own address space process and share resources such as
memory. The threads of a process share the
• A process contains all the information latter’s instructions (code) and its context
needed to execute the program (ID, (values that its variables reference at any
Program code, Data, Global data, Heap given moment)
data)
CATALIN.BOJA@IE.ASE.RO
http://en.wikipedia.org/wiki/Process_(computing) 3
MULTI-PROCESS VS MULTI-THREAD

• Thread model is an extension of the process model. • Thread operations include thread creation, termination,
synchronization (joins, blocking), scheduling, data
• Each process consists of multiple independent
management and process interaction.
instruction streams (or threads) that are assigned
computer resources by some scheduling procedure. • A thread does not maintain a list of created threads,
nor does it know the thread that created it.
• Threads of a process share the address space of this
process. Global variables and all dynamically • Threads in the same process share: Process instructions,
allocated data objects are accessible by all threads of Most data, open files (descriptors), signals and signal
a process. handlers, current working directory, User and group id

• Each thread has its own run time stack, register, • Each thread has a unique: (Thread ID, set of registers,
program counter. stack pointer, stack for local variables, return
addresses, signal mask, priority, Return value: errno)
• Threads can communicate by reading/writing
variables in the common address space. • pthread functions return "0" if OK.
CATALIN.BOJA@IE.ASE.RO 4
THREADS CONCEPTS

• Thread synchronization – establish running conditions between threads

• A thread will wait until another one has reached a predefined point or it has released a
required resource
• A thread will transmit data to another one when the later is able to receive it
• The main thread will continue the execution when all other threads finish their tasks
• Thread communication – sending and receiving data between threads

CATALIN.BOJA@IE.ASE.RO 5
THREADS CONCEPTS

• Atomic operations – operations which are executed at once (in the same processor
time) by a thread (and not partially); the thread will not executed parts of it; it’s all
or nothing
• Race condition – 2 or more threads have simultaneously read/write access on the
same resource (object, variable, file) and the operation is not atomic
• Critical section – code sequence executed by multiple threads in the same time ,
which is accessing a shared resource
• Deadlock – 2 or more threads blocking each other as each of them is locking
simultaneously a resource needed by the others
CATALIN.BOJA@IE.ASE.RO 6
THREADS CONCEPTS

• Mutual exclusion – preventing 2 or more threads to access a critical section

in the same time (mutex); problem identified by Edsger W. Dijkstra in 1965
• Lock – a mechanism that allows a thread to block the access to other threads
wile is executing the critical section
• Mutex – Mutual Exclusion Object implements a locking a mechanism
• Semaphore – a signaling mechanism that allows threads to communicate (a
mutex is a binary semaphore)

CATALIN.BOJA@IE.ASE.RO 7
MULTI-THREADS IN C++

• C++ 11 introduced the multi-threading support based on the <thread>

library and the thread class
• Different than the POSIX pthread library (which is a C library)
• Provides a simplified syntax that integrates support for mutexes, condition
variables and locks

CATALIN.BOJA@IE.ASE.RO 8
MULTI-THREADS IN C++

void thread_function() {
std::cout << "Hello World !";
• Threads are created based on
} existing functions
int main() {
//start a thread • Creating a thread object will start a
std::thread t1(thread_function); new one automatically
//join the thread with main • Joining the thread with the main one
t1.join(); is done by calling join()
}

CATALIN.BOJA@IE.ASE.RO 9
MULTI-THREADS IN C++ - SHARING DATA

static int SUM = 0;

static void increment(int iterations, int& s) {

both threads will share SUM
for (int j = 0; j < iterations; j++)
for (int i = 0; i < iterations; i++)
s += 1;
}
void main() {
std::thread t1(increment, ITERATIONS, std::ref(SUM));
std::thread t2(increment, ITERATIONS, std::ref(SUM));
}

CATALIN.BOJA@IE.ASE.RO 10
MULTI-THREADS IN C++ - PROTECTING SHARED
DATA

• Semaphores – mutex is a binary semaphore

• Atomic references
• Monitors – implemented in Java using the synchronization mechanism
• Condition variables
• Compare-and-swap – checks if a memory variable has the same value with a
given one and if true then it modifies it

CATALIN.BOJA@IE.ASE.RO 11
MULTI-THREADS IN C++ - MUTEX

class Counter { • A mutex is implemented by the std::mutex

private: class defined in the <mutex> library
int counter = 0;
• A lock is acquired by calling the lock()
std::mutex mutex;
method
public:
void increment() { • The lock is released by calling unlock()
mutex.lock();
counter += 1;
mutex.unlock();
}
};

CATALIN.BOJA@IE.ASE.RO 12
MULTI-THREADS IN C++ - MUTEX AND RAII

class RAIICounter { • Use the RAII - "resource acquisition is

private: initialization" technique
Counter counter;
• http://www.stroustrup.com/bs_faq2.html#fi
std::mutex mutex;
nally
public:
void increment() { • Lock the call to the function – you avoid
mutex.lock(); locking and unlocking the mutex on all
this->counter.increment(); execution paths (even exceptions)
mutex.unlock();
}
};

CATALIN.BOJA@IE.ASE.RO 13
MULTI-THREADS IN C++ - MUTEX AND RAII

class Counter { • lock_guard is a mutex wrapper that provides a convenient

private: RAII-style mechanism for owning a mutex for the duration
int counter = 0; of a scoped block
std::mutex mutex;
public:
• mutex is automatically released when lock goes out of
scope
void safe_increment() • you avoid locking and unlocking the mutex on all
{ execution paths (even exceptions)
std::lock_guard<std::mutex> lock(mutex);
counter += 1;

// mutex is automatically released when lock

// goes out of scope
}
};

CATALIN.BOJA@IE.ASE.RO 14
MULTI-THREADS IN C++ - ADVANCED MUTEXES

• Recursive Locking: std::recursive_mutex

• enable the same thread to lock the same mutex twice and won’t deadlock.

• Timed Locking: std::timed_mutex, std::recursive_timed_mutex

• enable a thread to do something else when waiting for a thread to finish.

• Call once: std::call_once(std::once_flag flag, function);

• a function will be called only one time no matter how many thread are launched. Each
std::call_once is matched to a std:::once_flag variable.

CATALIN.BOJA@IE.ASE.RO 15
MULTI-THREADS IN C++ - ATOMIC

class AtomiCounter { • C++ 11 introduced atomic types in the <atomic>

private: library
std::atomic<int> counter = 0;
public:
• It’s a template class std::atomic<type>
void increment() { • the operation on that variable will be atomic and
counter += 1; so thread-safe
}
• Different locking techniques are applied based on
the the data type and size
int getCounter() {
return counter.load();
• lock-free technique: integral types like int, long, float. It
is much faster than mutexes technique
}
• Mutexes technique: for big type (such as 2MB storage).
}; There is no performance advantage for atomic type
over mutexes

CATALIN.BOJA@IE.ASE.RO 16
MULTI-THREADS IN C++ - PROBLEMS

• Race conditions
• Shared variables
• Load balancing
• Cache coherence
• Cache false sharing
• Deadlocks
• Synchronization overhead

CATALIN.BOJA@IE.ASE.RO 17
OPENMP – PROGRAMMING MODEL

 Shared memory, thread-based parallelism

 OpenMP is based on the existence of multiple threads in the shared memory programming
paradigm.
 A shared memory process consists of multiple threads.
 Explicit Parallelism
 Programmer has full control over parallelization. OpenMP is not an automatic parallel
programming model.
 Compiler directive based
 Most OpenMP parallelism is specified through the use of compiler directives which are
embedded in the source code.
CATALIN.BOJA@IE.ASE.RO 18
OPENMP ARCHITECTURE

CATALIN.BOJA@IE.ASE.RO 19
OPENMP – WHAT IS NOT

 Necessarily implemented identically by all vendors

 Meant for distributed-memory parallel systems (it is designed for shared address spaced machines)
 Guaranteed to make the most efficient use of shared memory
 Required to check for data dependencies, data conflicts, race conditions, or deadlocks
 Required to check for code sequences
 Meant to cover compiler-generated automatic parallelization and directives to the compiler to assist such
parallelization
 Designed to guarantee that input or output to the same file is synchronous when executed in parallel.

CATALIN.BOJA@IE.ASE.RO 20
OPENMP

 OpenMP program begin as a single process: the master thread (in pictures in red/grey). The
master thread executes sequentially until the first parallel region construct is encountered.
 When a parallel region is encountered, master thread
 Create a group of threads by FORK.
 Becomes the master of this group of threads, and is assigned the thread id 0 within the
group.
 The statement in the program that are enclosed by the parallel region construct are then executed
in parallel among these threads.
 JOIN: When the threads complete executing the statement in the parallel region construct, they
synchronize and terminate, leaving only the master thread.
CATALIN.BOJA@IE.ASE.RO 21
OPENMP

CATALIN.BOJA@IE.ASE.RO 22
OPENMP

I/O
 OpenMP does not specify parallel I/O.
 It is up to the programmer to ensure that I/O is conducted correctly within the context of a multi-threaded
program.

Memory Model
 Threads can “cache” their data and are not required to maintain exact consistency with real memory all of
the time.
 When it is critical that all threads view a shared variable identically, the programmer is responsible for
insuring that the variable is updated by all threads as needed.

CATALIN.BOJA@IE.ASE.RO 23
OPENMP – HELLO WORLD

#include <stdlib.h> Set # of threads for OpenMP:

#include <stdio.h> - In csh:
#include "omp.h" setenv OMP_NUM_THREAD 8
- In bash:
int main() set OMP_NUM_THREAD=8
{ export $OMP_NUM_THREAD
#pragma omp parallel
{ Compile: g++ -fopenmp hello.c
int ID = omp_get_thread_num();
printf("Hello (%d)\n", ID); Run: ./a.out
printf(" world (%d)\n", ID);
}
}

CATALIN.BOJA@IE.ASE.RO 24
OPENMP

#include “omp.h”
void main ()
{
int var1, var2, var3;
// 1. Serial code
...
// 2. Beginning of parallel section.
// Fork a team of threads. Specify variable scoping
#pragma omp parallel private(var1, var2) shared(var3)
{
// 3. Parallel section executed by all threads
...
// 4. All threads join master thread and disband
}
// 5. Resume serial code . . .
}

CATALIN.BOJA@IE.ASE.RO 25
OPENMP - C/C++ DIRECTIVE FORMAT

• C/C++ use compiler directives

• Prefix: #pragma omp …
• A directive consists of a directive name followed by clauses
#pragma omp parallel [clause list]

• General Rules:
• Case sensitive
• Only one directive-name may be specified per directive
• Each directive applies to at most one succeeding statement, which must be a structured block.
• Long directive lines can be “continued” on succeeding lines by escaping the newline character with a
backslash “\” at the end of a directive line.

CATALIN.BOJA@IE.ASE.RO 26
OPENMP - C/C++ DIRECTIVE FORMAT

#pragma omp parallel [clause list]

Typical clauses in [clause list]

• Conditional parallelization
• if (scalar expression): Determine whether the parallel construct creates threads
• Degree of concurrency
• num_threads (integer expression): number of threads to create
• Date Scoping
• private (variable list): Specifies variables local to each thread
• firstprivate (variable list): Similar to the private but Private variables are initialized to variable value before
the parallel directive
• shared (variable list): Specifies variables that are shared among all the threads
• default (data scoping specifier): default data scoping specifier may be shared or none

CATALIN.BOJA@IE.ASE.RO 27
OPENMP - C/C++ DIRECTIVE FORMAT

if (is_parallel == 1) num_threads(8)
• If the value of the variable is_parallel is one then creates 8 threads
int is_parallel = 1; shared (var_b)
int var_c = 10; • Each thread shares a single copy of variable var_b
int var_b = 0;
//…. private (var_a) firstprivate (var_c)
#pragma omp parallel if (is_parallel == 1) num_threads(8) \ • Each thread gets private copies of variable var_a and var_c
shared (var_b) private (var_a) firstprivate (var_c) \ • Each private copy of var_c is initialized with the value of var_c
default (none) in main thread when the parallel directive is encountered
{ • var_a must be initialized in the parallel block
var_a = 0; //must be initialized
/* structured block */ default (none)
} • Default state of a variable is specified as none (rather than shared)
• Signals error if not all variables are specified as shared or private

CATALIN.BOJA@IE.ASE.RO 28
OPENMP – NUMBER OF THREADS

 The number of threads in a parallel region is determined by the following factors, in order
of precedence:
1. Evaluation of the if clause
2. Setting of the num_threads() clause
3. Use of the omp_set_num_threads() library function
4. Setting of the OMP_NUM_THREAD environment variable
5. Implementation default – usually the number of cores on a node

 Threads are numbered from 0 (master thread) to N-1

CATALIN.BOJA@IE.ASE.RO 29
OPENMP – RUNTIME LIBRARY METHODS

• omp_get_max_threads() – returns the maximum number of threads that can

be used without a num_threads clause is encountered; calling
omp_set_num_threads() will change this value
• omp_get_num_threads() – returns number of threads currently executing the
parallel region from which it is called
• omp_get_thread_num() – returns the ID of the current thread executing the
parallel region from which it is called (master is always 0)
• omp_get_wtime() – returns a value in seconds of the time elapsed from some
arbitrary, but consistent point
CATALIN.BOJA@IE.ASE.RO 30
OPENMP – EXAMPLE

• Requests 4 threads to run the parallel

region
• Each thread runs a copy of the { } block
• tId – a private variable gets the ID of
each running thread
• nThreads – shared variable by all
threads that is initialized by master
thread (Id = 0) to number of available
threads
CATALIN.BOJA@IE.ASE.RO 31
OPENMP – PERFORMANCE PROBLEMS

• Race conditions – generated by shared variables

• Load balancing – generated by a non-equal distribution of the processing
effort among threads
• Cache coherence – generated by shared variables
• Cache false sharing – generated by variables that share the same cache line
• Synchronization overhead – generated by locking operations

CATALIN.BOJA@IE.ASE.RO 32
OPENMP – SYNCHRONIZATION

Thread Thread
Thread Thread 1 2
1 Thread 3
2
Thread critical waits Thread
4 section 4
wait wait
wait Thread
Time
2
Time barrier
Thread
Thread Thread 1 critical
waits
1 Thread Thread 3 section
2 4
Thread critical
2 section

CATALIN.BOJA@IE.ASE.RO Barrier Locking critical sections 33

OPENMP – SYNCHRONIZATION

• #pragma omp barrier

• all threads need to reach it in order to continue
• #pragma omp critical
• only one of the threads can execute the atomic section
• implement a mutual exclusion lock - mutex
• #pragma omp atomic
• only one of the threads can execute the atomic section
• uses special constructs in hardware that will speed up updating the memory for simple operations: +=, -
=, *=, /=, ++ (post and prefix) , -- (post and prefix)
• dependent on hardware – if not available will behave like critical

CATALIN.BOJA@IE.ASE.RO 34
OPENMP - WORK SHARING CONSTRUCTS

A parallel construct by itself creates a “Single Work-sharing is to split up pathways through

Program/Instruction Multiple Data” the code between threads within a team.
(SIMD/SPMD) program, i.e., each thread • Loop construct (for/do): concurrent loop
executes the same code. iterations
• Sections/section constructs: concurrent
tasks
• Single construct
• Tasks

CATALIN.BOJA@IE.ASE.RO 35
OPENMP - WORK SHARING CONSTRUCTS

• Work-sharing directives allow concurrency between iterations or tasks

• A work-sharing construct must be enclosed dynamically within a parallel
region in order for the directive to execute in parallel
• Work-sharing constructs do not create new threads
• Work-sharing constructs must be encountered by all members of a team or
none at all

CATALIN.BOJA@IE.ASE.RO 36
OPENMP - WORK SHARING CONSTRUCTS

void main() {
Work-Sharing do/for Directives: int nthreads, tid;

omp_set_num_threads(3);
• Shares iterations of a loop across the group
• Represents a “data parallelism” #pragma omp parallel private(tid)
{
• for directive partitions parallel iterations int i;
tid = omp_get_thread_num();
across threads in C++ printf("Hello world from (%d)\n", tid);
• do is the analogous directive in Fortran #pragma omp for
for(i = 0; i <=4; i++)
• Implicit barrier at end of for loop {
printf(“Iteration %d by %d\n”, i, tid);
}
#pragma omp for [clause list] } // all threads join master thread and terminates
}
/* for loop */

CATALIN.BOJA@IE.ASE.RO 37
OPENMP - WORK SHARING CONSTRUCTS

Sequential code to add two vectors: Parallel logic to add two vectors:

for(int i = 0 ; i < N ; i++) {

c[i] = b[i] + a[i];
}

CATALIN.BOJA@IE.ASE.RO 38
OPENMP - WORK SHARING CONSTRUCTS

//OpenMP implementation 1 (not desired): //A worksharing for construct to add vectors:
#pragma omp parallel #pragma omp parallel
{ {
int id, i, Nthrds, istart, iend; #pragma omp for
id = omp_get_thread_num(); {
Nthrds = omp_get_num_threads(); for(i=0; i<N; i++) { c[i]=b[i]+a[i]; }
istart = id*N/Nthrds; }
iend = (id+1)*N/Nthrds; }

if(id == Nthrds-1) iend = N;

for(I = istart; i<iend; i++) { //A worksharing for construct to add vectors:
c[i] = b[i]+a[i]; #pragma omp parallel for
} for(i=0; i<N; i++) { c[i]=b[i]+a[i]; }
}
CATALIN.BOJA@IE.ASE.RO 39
OPENMP - WORK SHARING CONSTRUCTS

for directive syntax: Directive Restrictions for the “for loop” that follows the for
omp directive:
#pragma omp for [clause list]
schedule (type [,chunk])
 It must NOT have a break statement
ordered  The loop control variable must be an integer
private (variable list)  The initialization expression of the “for loop” must be
firstprivate (variable list) an integer assignment.
shared (variable list)
 The logical expression must be one of <, ≤, > ,≥
reduction (operator: variable list)
collapse (n)  The increment expression must have integer increments
nowait or decrements only.

/* for_loop */

CATALIN.BOJA@IE.ASE.RO 40
OPENMP – DATA ENVIRONMENT

Changing Storage Attributes

• shared – the variable is shared by all threads (watch out for race conditions)
• private – the variable has a private copy in each thread
• firstprivate – the variable has a private copy in each thread but the initial value is the last
global one
• lastprivate – the variable has a private copy in each thread but the last value is the the one
after the parallel region
• default(private|shared|none) – for private and shared interprets variables as default; for
none forces the programmer to define each used variable as private or shared
CATALIN.BOJA@IE.ASE.RO 41
OPENMP – DATA ENVIRONMENT

void something(){ • private does NOT initialize the value

int temp = 0; • using a private declared variable will
#pragma omp parallel for private(temp) generate a compiler error
for(int i=0;i<100;i++)
temp+=i; • to retain the global value must be
cout << endl << “temp =“ << temp; defined as firstprivate
} • outside of parallel loop, private
variables loses their value

CATALIN.BOJA@IE.ASE.RO 42
OPENMP - WORK SHARING CONSTRUCTS

How to combine values into a single accumulation variable (avg)?

//Sequential code to do average value from an array-vector:

{
double avg = 0.0, A[MAX];
int i;
…
for(i =0; i<MAX; i++) {
avg += a[i];
}

avg /= MAX;
}

CATALIN.BOJA@IE.ASE.RO 43
OPENMP - WORK SHARING CONSTRUCTS

reduction clause int VB= 0;

• reduction (operator: variable list)
• combines local copies of a variable in #pragma omp parallel reduction(+: VB)
different threads into a single copy in num_threads(4)
{
the master when threads exit
/* compute local VBs in each thread */
• variables in variable list are implicitly }
private to threads
• operators used in Reduction Clause: +, /* VB here contains sum of all local instances
*, -, &, |, ^, &&, and || of VB*/

CATALIN.BOJA@IE.ASE.RO 44
OPENMP - WORK SHARING CONSTRUCTS

reduction clause Reduction Operators/Initial-Values in C/C++

OpenMP
• Used inside a parallel or a work-sharing construct:
• A local copy of each list variable is made and
initialized depending on operator (e.g. 0 for “+”) Operator Initial Value Operator Initial Value
• Compiler finds standard reduction expressions + 0 | 0
containing operator and uses it to update the local
* 1 ^ 0
copy
- 0 && 1
• Local copies are reduced into a single value and & ~0 || 0
combined with the original global value when
returns to the master thread.

CATALIN.BOJA@IE.ASE.RO 45
OPENMP - WORK SHARING CONSTRUCTS

//A work-sharing for average value from a vector: • avg – is a local variable in each thread of the
{ parallel region
double avg = 0.0, A[MAX];
• After the for loop, the external avg variable
int i;
becomes the sum of the local avg variables
…
#pragma omp parallel for reduction (+: avg)
for(i = 0; i < MAX; i++) {avg += a[i];}

avg /= MAX;
}

CATALIN.BOJA@IE.ASE.RO 46
OPENMP - MATRIX-VECTOR MULTIPLICATION

#pragma omp parallel default (none) \

shared (a, b, c, m,n) private (i,j,sum) \
num_threads(4)
for(i = 0; i < m; i++)
{
sum = 0.0;
for(j=0; j < n; j++)
sum += b[i][j]*c[j];

a[i] =sum;
}

CATALIN.BOJA@IE.ASE.RO ……… 47
OPENMP - WORK SHARING CONSTRUCTS

for schedule clause static

• Describe how iterations of the loop
are divided among the threads in
auto dynamic
the group. The default schedule is
Schedule
implementation dependent.

schedule (scheduling_class[, parameter])

runtime guided

CATALIN.BOJA@IE.ASE.RO 48
OPENMP - WORK SHARING CONSTRUCTS

• schedule (static [, chunk]) - Loop iterations are divided into pieces of size chunk and then
statically (compile time) assigned to threads. If chunk is not specified, the iteration are evenly
(if possible) divided contiguously among the threads.
• schedule (dynamic [, chunk]) - Loop iterations are divided into pieces of size chunk and then
dynamically assigned to threads. When a thread finishes one chunk, it is dynamically assigned
another. The default chunk size is 1.
• Schedule (guided [, chunk]) - For a chunk size of 1, the size of each chunk is proportional to
the number of unassigned iterations divided by the number of threads, decreasing to 1. For a
chunk size with value 𝑘(𝑘>1), the size of each chunk is determined in the same way with the
restriction that the chunks do not contain fewer than 𝑘 iterations (except for the last chunk to
be assigned, which may have fewer than 𝑘 iterations). The default chunk size is 1.

CATALIN.BOJA@IE.ASE.RO 49
OPENMP - WORK SHARING CONSTRUCTS

• schedule (runtime) - The scheduling decision is deferred until runtime by the

environment variable OMP_SCHEDULE. It is illegal to specify a chunk size for
this clause
• schedule (auto) - The scheduling decision is made by the compiler and/or
runtime system (not supported in OpenMP 2.0 – VS)

CATALIN.BOJA@IE.ASE.RO 50
OPENMP - WORK SHARING CONSTRUCTS

Static Dynamic
• Predictable • Unpredictable
• Pre-determined at compile time by • Determined at run-time
the programmer • Complex logic at run-time which
• Reduce overhead at run-time leads to an overhead

CATALIN.BOJA@IE.ASE.RO 51
OPENMP - WORK SHARING CONSTRUCTS

Find loop
intensive
routines

Remove loop
carry
dependencies

Implement a
work-sharing
construct

CATALIN.BOJA@IE.ASE.RO 52
OPENMP - MATRIX-VECTOR MULTIPLICATION

// Static schedule maps iterations to threads at compile time

// static scheduling of matrix multiplication loops
#pragma omp parallel default (private) \
shared (a, b, c, dim) num_threads(4) Static scheduling - 16 iterations, 4 threads:
#pragma omp for schedule(static)
for(i=0; i < dim; i++)
{
for(j=0; j < dim; j++)
{
c[i][j] = 0.0;
for(k=0; j < dim; k++)
c[i][j] += a[i][k]*b[k][j];
}
}

CATALIN.BOJA@IE.ASE.RO 53
OPENMP - WORK SHARING CONSTRUCTS

All work-share sections have a barrier at the

end of a parallel construct

CATALIN.BOJA@IE.ASE.RO 54
OPENMP - WORK SHARING CONSTRUCTS

#pragma omp parallel

nowait for clause
{
#pragma omp for • A for construct has a default barrier
for (int i = 0; i < 10; i++) {
//...work sharing at the end
}//implicit barrier
• The nowait clause will instruct the
#pragma omp for nowait threads to continue the execution
for (int i = 0; i < 10; i++) {
//...work sharing and to no wait for others
}//no implicit barrier
}
CATALIN.BOJA@IE.ASE.RO 55
OPENMP - WORK SHARING CONSTRUCTS

#pragma omp parallel master parallel clause

{
//parallel section for all threads • defines a sequence that will be
#pragma omp master executed only by the master thread
{ (Id = 0)
//section only for master thread
}
}

CATALIN.BOJA@IE.ASE.RO 56
OPENMP - WORK SHARING CONSTRUCTS

#pragma omp parallel single parallel clause

{ • identifies a section of code that is
//parallel section for all threads executed by a single thread
#pragma omp single
{ • which thread will execute the single
//only one thread will execute it sections depends on the environment
} • using copyprivate(variable) the
} value is transmited to other threads

CATALIN.BOJA@IE.ASE.RO 57
OPENMP – SECTIONS CONSTRUCT

#pragma omp parallel sections sections construct

{
#pragma omp section • the sections construct contains 1 ore more
{ #pragma omp section areas
//only one thread
• Each section is executed by only one thread
}
• How a thread will get any of the sections
#pragma omp section depends on environment
{
//only one thread
• If there are more threads than available
sections, the others will wait
}
}
CATALIN.BOJA@IE.ASE.RO 58
OPENMP – LOCKS

• omp_init_lock() – inits/creates a mutex lock using a omp_lock_t type

variable
• omp_set_lock() – sets the lock
• omp_unset_lock() – removes the lock
• omp_destroy_lock() – destroys the lock
• omp_test_lock() - test if the lock is already set; the thread can do
other processing
CATALIN.BOJA@IE.ASE.RO 59
OPENMP – RUNTIME LIBRARY METHODS

• omp_in_parallel() – returns true if the section is executed in parallel

region
• omp_set_dynamic() – the number of threads available in next parallel
region can be adjusted at the run time
• omp_get_dynamic() – returns if the number of threads available in the
next parallel region can be adjusted by the run time (if non zero)
• omp_num_procs() – returns the number of processors available

CATALIN.BOJA@IE.ASE.RO 60
OPENMP – RUNTIME ROUTINES

int nThreaduri; • disable dynamic thread allocation

omp_set_dynamic(0);
omp_set_num_threads(omp_get_num_procs()); • request threads number = available
#pragma omp parallel processors
{
int id = omp_get_thread_num(); • get each thread id
#pragma omp single
nThreaduri = omp_get_num_threads(); • get total number of threads only
once
//sectiune paralela
}
CATALIN.BOJA@IE.ASE.RO 61
OPENMP – ENVIRONMENT VARIABLES

• OMP_NUM_THREADS – a default number of threads to use

• OMP_STACKSIZE – defines the stack size for each thread
• OMP_WAIT_POLICY – defines the default wait policy (ACTIVE | PASSIVE)
• OMP_PROC_BIND – binds threads to the same core (ACTIVE | FALSE); for
FALSE (default) depending on load the OS can move threads to a different
core with a lower load

CATALIN.BOJA@IE.ASE.RO 62
OPENMP - TASKS

• introduced in OMP 3.0 (not supported by VS)

• allows parallelization of recursive/dependent routines

CATALIN.BOJA@IE.ASE.RO 63
PARALLEL PROGRAMMING PROBLEMS

• Min, Max and other statistical data

• Sorting
• Vector and matrix processing
• Searching (Map – Reduce)

CATALIN.BOJA@IE.ASE.RO 64

MULTITHREADING
No ratings yet
MULTITHREADING
33 pages
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
No ratings yet
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
22 pages
10 Threads and Mutexes
No ratings yet
10 Threads and Mutexes
48 pages
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Threads
No ratings yet
Threads
16 pages
Threads
No ratings yet
Threads
38 pages
Threads and Multithreading: Mr. Vincent L. Gonzaga
No ratings yet
Threads and Multithreading: Mr. Vincent L. Gonzaga
12 pages
Chapter - 4 Threads (Full)
No ratings yet
Chapter - 4 Threads (Full)
72 pages
2.2 DD2356 Threads
No ratings yet
2.2 DD2356 Threads
22 pages
Chapter 4: Threads: in This Chapter Our Focus Is On: Multithreading Models Thread Libraries Threading Issues
No ratings yet
Chapter 4: Threads: in This Chapter Our Focus Is On: Multithreading Models Thread Libraries Threading Issues
23 pages
Operating System 4
No ratings yet
Operating System 4
33 pages
CS235AI Unit2
No ratings yet
CS235AI Unit2
46 pages
Contents of Lecture Threads Multithreads Threads Models Multithreads in Java Advantages Disadvantages
No ratings yet
Contents of Lecture Threads Multithreads Threads Models Multithreads in Java Advantages Disadvantages
48 pages
A ShortIntroductiontoPOSIX Threads
No ratings yet
A ShortIntroductiontoPOSIX Threads
8 pages
Chapter 4: Threads: in This Chapter Our Focus Is On: Multithreading Models Thread Libraries Threading Issues
No ratings yet
Chapter 4: Threads: in This Chapter Our Focus Is On: Multithreading Models Thread Libraries Threading Issues
23 pages
Chapter 4
No ratings yet
Chapter 4
18 pages
Lect9 Pthread
No ratings yet
Lect9 Pthread
24 pages
Os4 p2c4 Threads
No ratings yet
Os4 p2c4 Threads
20 pages
3 Multithreading in C
No ratings yet
3 Multithreading in C
4 pages
Lecture #10: Threads & Synchronization
No ratings yet
Lecture #10: Threads & Synchronization
7 pages
Os2 &3module
No ratings yet
Os2 &3module
69 pages
High Performance Computing
No ratings yet
High Performance Computing
67 pages
Chenyu Zheng CSCI 5828 - Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder
No ratings yet
Chenyu Zheng CSCI 5828 - Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder
44 pages
4 Threads
No ratings yet
4 Threads
41 pages
OS Module 1 Slides-2
No ratings yet
OS Module 1 Slides-2
47 pages
C++ Multithreading Tutorial
No ratings yet
C++ Multithreading Tutorial
7 pages
Lecture 12
No ratings yet
Lecture 12
43 pages
BLG305 Ders4-En
No ratings yet
BLG305 Ders4-En
34 pages
Programming Shared Address Space Platforms
No ratings yet
Programming Shared Address Space Platforms
44 pages
Operating Systems: Suad Alaofi
No ratings yet
Operating Systems: Suad Alaofi
13 pages
Parallel Progamming With Pthreads
No ratings yet
Parallel Progamming With Pthreads
79 pages
4.OS Threads Dr. Punit
No ratings yet
4.OS Threads Dr. Punit
48 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
System Programming - II Threads
No ratings yet
System Programming - II Threads
46 pages
OS Module2 Unit2
No ratings yet
OS Module2 Unit2
43 pages
Pthread PDF
No ratings yet
Pthread PDF
33 pages
Chapter 4: Threads
No ratings yet
Chapter 4: Threads
33 pages
Multi Threaded Programming: Heavyweight Process. There Is One Program Counter, and One Sequence of
No ratings yet
Multi Threaded Programming: Heavyweight Process. There Is One Program Counter, and One Sequence of
39 pages
Threads
No ratings yet
Threads
32 pages
Threads OS
No ratings yet
Threads OS
21 pages
OS Module-2 (Highlighted)
No ratings yet
OS Module-2 (Highlighted)
43 pages
OS Module-2 Notes
No ratings yet
OS Module-2 Notes
46 pages
5 Thread
No ratings yet
5 Thread
17 pages
Os4 p2c4 Threads
No ratings yet
Os4 p2c4 Threads
20 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
OS-module 2 Notes
No ratings yet
OS-module 2 Notes
35 pages
Operating Systems: Bits, Pilani - K. K. Birla Goa Campus
No ratings yet
Operating Systems: Bits, Pilani - K. K. Birla Goa Campus
50 pages
NV Operating Systems UNIT II
No ratings yet
NV Operating Systems UNIT II
91 pages
OS Multi-Threading Concepts
No ratings yet
OS Multi-Threading Concepts
25 pages
MAP - Unit2
No ratings yet
MAP - Unit2
134 pages
Wa0010
No ratings yet
Wa0010
42 pages
Thread
No ratings yet
Thread
15 pages
Lecture 03
No ratings yet
Lecture 03
62 pages
Module 2 Notes SNG Os
No ratings yet
Module 2 Notes SNG Os
25 pages
OS Notes For M.phil
No ratings yet
OS Notes For M.phil
30 pages
Lec04 SOFE3950 Threads
No ratings yet
Lec04 SOFE3950 Threads
53 pages
Os - Module 2
No ratings yet
Os - Module 2
34 pages
Concurrency & Threads Guide
No ratings yet
Concurrency & Threads Guide
47 pages
Simple Trading Book - Trading Smart
93% (155)
Simple Trading Book - Trading Smart
60 pages
John J Murphy - Technical Analysis of The Financial Markets
96% (92)
John J Murphy - Technical Analysis of The Financial Markets
596 pages
How To Day Trade Cryptocurrency 1
90% (52)
How To Day Trade Cryptocurrency 1
8 pages
Profitable Short Term Trading Strategies
85% (47)
Profitable Short Term Trading Strategies
196 pages
Trading Risk Disclaimer & Strategies
72% (65)
Trading Risk Disclaimer & Strategies
78 pages
(EN) Advance Smart Money Concept-Market Structure
97% (29)
(EN) Advance Smart Money Concept-Market Structure
66 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Ict Institutional SMC Trading
93% (123)
Ict Institutional SMC Trading
141 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (45)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
7 Chart Patterns
94% (160)
7 Chart Patterns
92 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
The Complete Guide To Forex Trading - by PriceActionLTD PDF
93% (75)
The Complete Guide To Forex Trading - by PriceActionLTD PDF
339 pages
Beginners-Guide-To-Learn-Algorithmic-Trading 1
100% (23)
Beginners-Guide-To-Learn-Algorithmic-Trading 1
58 pages
The Only Technical Analysis Book You Will Ever Need
96% (23)
The Only Technical Analysis Book You Will Ever Need
143 pages
Day Trading Strategies The Complete Guide 1 @exceltrade
88% (26)
Day Trading Strategies The Complete Guide 1 @exceltrade
477 pages
The Art and Science of Trading by Adam Grimes PDF
94% (68)
The Art and Science of Trading by Adam Grimes PDF
727 pages
Reading Price Bar
99% (101)
Reading Price Bar
428 pages
MakeForexEasy - Forex Ebook For Beginners!
87% (61)
MakeForexEasy - Forex Ebook For Beginners!
19 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (16)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
The Complete Guide To Day Trading
95% (83)
The Complete Guide To Day Trading
295 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Quantitative Trading Strategies Using Python Technical Analysis, Statistical Testing, and Machine Learning (Peng Liu) (Z-Library)
100% (13)
Quantitative Trading Strategies Using Python Technical Analysis, Statistical Testing, and Machine Learning (Peng Liu) (Z-Library)
341 pages
How To Day Trade For A Living - Tools, Tactics, Money Management, Discipline and Trading Psychology - PDF Room
100% (22)
How To Day Trade For A Living - Tools, Tactics, Money Management, Discipline and Trading Psychology - PDF Room
213 pages
The Complete Cyber Security Course, Hacking Exposed
97% (31)
The Complete Cyber Security Course, Hacking Exposed
282 pages
A Complete Guide To Volume Price Analysi - A. Coulling
97% (157)
A Complete Guide To Volume Price Analysi - A. Coulling
242 pages
How To Swing Trade
98% (45)
How To Swing Trade
270 pages
Python Notes For Professionals
100% (18)
Python Notes For Professionals
814 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (22)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Understanding Machine Learning
100% (71)
Understanding Machine Learning
416 pages
E-Book Smart Money SMC
95% (422)
E-Book Smart Money SMC
33 pages
(Nisar) Zakat Declaration
100% (1)
(Nisar) Zakat Declaration
2 pages
Petroleum Machinery Installation Guide
No ratings yet
Petroleum Machinery Installation Guide
11 pages
A Season in Hell - The Illuminations - Arthur Rimbaud - 2023 - Anna's Archive
No ratings yet
A Season in Hell - The Illuminations - Arthur Rimbaud - 2023 - Anna's Archive
193 pages
Yeast - WPS Office
No ratings yet
Yeast - WPS Office
4 pages
Classic Metallic Brochure 2010
No ratings yet
Classic Metallic Brochure 2010
24 pages
All Netapp2
No ratings yet
All Netapp2
167 pages
Digital Number Systems Guide
No ratings yet
Digital Number Systems Guide
12 pages
Structural Steel Cambering Guide
No ratings yet
Structural Steel Cambering Guide
3 pages
Magnesium Alloys Containing Rare Earth Metals Structure and Properties 1st Edition L.L. Rokhlin (Author) Download PDF
100% (13)
Magnesium Alloys Containing Rare Earth Metals Structure and Properties 1st Edition L.L. Rokhlin (Author) Download PDF
84 pages
HD WF4 Specification V6.0.1
100% (1)
HD WF4 Specification V6.0.1
5 pages
Nov 2022 - Dela Cruz F - Set 1
No ratings yet
Nov 2022 - Dela Cruz F - Set 1
2 pages
Introduction: Pestel Analysis
No ratings yet
Introduction: Pestel Analysis
44 pages
Belimo NMV-D3-MFT
No ratings yet
Belimo NMV-D3-MFT
10 pages
Architecture Cover Letter Issuu
100% (1)
Architecture Cover Letter Issuu
4 pages
Pedagogy of The Oppressed - Quotes and Reflection
No ratings yet
Pedagogy of The Oppressed - Quotes and Reflection
2 pages
Weeks 1 To 4 Fundamental Analysis
No ratings yet
Weeks 1 To 4 Fundamental Analysis
166 pages
Easy Tense Chart
No ratings yet
Easy Tense Chart
3 pages
Test 2 Jan 2022
No ratings yet
Test 2 Jan 2022
3 pages
Vendor/Contractor Data Form
No ratings yet
Vendor/Contractor Data Form
6 pages
Significance and Problems of Fisheries Co-Operatives in India PDF
0% (1)
Significance and Problems of Fisheries Co-Operatives in India PDF
27 pages
Covid Certificate
No ratings yet
Covid Certificate
1 page
Ujian Bulan Mac: Bahasa Inggeris Kertas 1 Tahun 4
No ratings yet
Ujian Bulan Mac: Bahasa Inggeris Kertas 1 Tahun 4
9 pages
Mechanical Reliability
No ratings yet
Mechanical Reliability
3 pages
Worksheet On The Neuron
No ratings yet
Worksheet On The Neuron
8 pages
InfoSec Policy for Company LLC Staff
No ratings yet
InfoSec Policy for Company LLC Staff
5 pages
ZD 180B Rescue Chopper
No ratings yet
ZD 180B Rescue Chopper
20 pages
Differences Between Face Up Blackjack and Regular Blackjack
No ratings yet
Differences Between Face Up Blackjack and Regular Blackjack
2 pages
Sales Forecast 2023
No ratings yet
Sales Forecast 2023
33 pages
Competencies, Objectives and Outcome
No ratings yet
Competencies, Objectives and Outcome
1 page
Marketing Students' Food Delivery Satisfaction
No ratings yet
Marketing Students' Food Delivery Satisfaction
40 pages

02 Parallel Processing - CPP and OMP

Uploaded by

02 Parallel Processing - CPP and OMP

Uploaded by

PARALLEL

• A process is an instance of a computer • A thread of execution is the smallest unit of

• Thread synchronization – establish running conditions between threads

• Mutual exclusion – preventing 2 or more threads to access a critical section

• C++ 11 introduced the multi-threading support based on the <thread>

static int SUM = 0;

static void increment(int iterations, int& s) {

• Semaphores – mutex is a binary semaphore

class Counter { • A mutex is implemented by the std::mutex

class RAIICounter { • Use the RAII - "resource acquisition is

class Counter { • lock_guard is a mutex wrapper that provides a convenient

// mutex is automatically released when lock

• Recursive Locking: std::recursive_mutex

• Timed Locking: std::timed_mutex, std::recursive_timed_mutex

• Call once: std::call_once(std::once_flag flag, function);

class AtomiCounter { • C++ 11 introduced atomic types in the <atomic>

 Shared memory, thread-based parallelism

 Necessarily implemented identically by all vendors

#include <stdlib.h> Set # of threads for OpenMP:

• C/C++ use compiler directives

#pragma omp parallel [clause list]

Typical clauses in [clause list]

 Threads are numbered from 0 (master thread) to N-1

• omp_get_max_threads() – returns the maximum number of threads that can

• Requests 4 threads to run the parallel

• Race conditions – generated by shared variables

CATALIN.BOJA@IE.ASE.RO Barrier Locking critical sections 33

• #pragma omp barrier

A parallel construct by itself creates a “Single Work-sharing is to split up pathways through

• Work-sharing directives allow concurrency between iterations or tasks

for(int i = 0 ; i < N ; i++) {

if(id == Nthrds-1) iend = N;

Changing Storage Attributes

void something(){ • private does NOT initialize the value

How to combine values into a single accumulation variable (avg)?

//Sequential code to do average value from an array-vector:

reduction clause int VB= 0;

reduction clause Reduction Operators/Initial-Values in C/C++

#pragma omp parallel default (none) \

for schedule clause static

schedule (scheduling_class[, parameter])

• schedule (runtime) - The scheduling decision is deferred until runtime by the

// Static schedule maps iterations to threads at compile time

All work-share sections have a barrier at the

#pragma omp parallel

#pragma omp parallel master parallel clause

#pragma omp parallel single parallel clause

#pragma omp parallel sections sections construct

• omp_init_lock() – inits/creates a mutex lock using a omp_lock_t type

• omp_in_parallel() – returns true if the section is executed in parallel

int nThreaduri; • disable dynamic thread allocation

• OMP_NUM_THREADS – a default number of threads to use

• introduced in OMP 3.0 (not supported by VS)

• Min, Max and other statistical data

You might also like