OpenMP Application Program Interface
OpenMP Application Program Interface
2    OpenMP Application
3    Program Interface
10
1    1.   Introduction      ...............................................1
2         1.1   Scope     ................................................1
3         1.2   Glossary      ..............................................2
4               1.2.1    Threading Concepts                   ..............................2
5               1.2.2    OpenMP language terminology                          ......................2
6               1.2.3    Tasking Terminology                  ..............................8
7               1.2.4    Data Terminology               .................................9
8               1.2.5    Implementation Terminology                       . . . . . . . . . . . . . . . . . . . . . . . . 10
9         1.3   Execution Model             . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
10        1.4   Memory Model            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
11              1.4.1    Structure of the OpenMP Memory Model                              . . . . . . . . . . . . . . . 13
12              1.4.2    The Flush Operation                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
13              1.4.3    OpenMP Memory Consistency                            . . . . . . . . . . . . . . . . . . . . . . 16
14        1.5   OpenMP Compliance                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
15        1.6   Normative References                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
16        1.7   Organization of this document                   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
17   2.   Directives     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
18        2.1   Directive Format          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
19              2.1.1    Fixed Source Form Directives                       . . . . . . . . . . . . . . . . . . . . . . . 23
20              2.1.2    Free Source Form Directives                      . . . . . . . . . . . . . . . . . . . . . . . . 24
21        2.2   Conditional Compilation               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
22              2.2.1    Fixed Source Form Conditional Compilation Sentinels                                     . . . . 26
23              2.2.2    Free Source Form Conditional Compilation Sentinel                                   . . . . . . 27
24        2.3   Internal Control Variables              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
25              2.3.1    ICV Descriptions               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
26              2.3.2    Modifying and Retrieving ICV Values                         . . . . . . . . . . . . . . . . . . 29
27              2.3.3    How the Per-task ICVs Work                       . . . . . . . . . . . . . . . . . . . . . . . . 30
28              2.3.4    ICV Override Relationships                     . . . . . . . . . . . . . . . . . . . . . . . . . 30
29        2.4   parallel Construct                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
30                                                                                                                              i
1                      2.4.1   Determining the Number of Threads for a
2                              parallel Region           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3                2.5   Worksharing Constructs        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4                      2.5.1   Loop Construct        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5                              2.5.1.1    Determining the Schedule of a
6                                         Worksharing Loop               . . . . . . . . . . . . . . . . . . . . . . . . 45
7                      2.5.2   sections Construct            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8                      2.5.3   single Construct          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9                      2.5.4   workshare Construct             . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10               2.6   Combined Parallel Worksharing Constructs                    . . . . . . . . . . . . . . . . . . . 54
11                     2.6.1   Parallel Loop construct           . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
12                     2.6.2   parallel sections Construct                     . . . . . . . . . . . . . . . . . . . . . 56
13                     2.6.3   parallel workshare Construct                      . . . . . . . . . . . . . . . . . . . . 58
14               2.7   task Construct     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
15                     2.7.1   Task Scheduling       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
16               2.8   Master and Synchronization Constructs                 . . . . . . . . . . . . . . . . . . . . . . 63
17                     2.8.1   master Construct          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
18                     2.8.2   critical Construct            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
19                     2.8.3   barrier Construct           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
20                     2.8.4   taskwait Construct            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
21                     2.8.5   atomic Construct          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
22                     2.8.6   flush Construct         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
23                     2.8.7   ordered Construct           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
24               2.9   Data Environment     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
25                     2.9.1   Data-sharing Attribute Rules              . . . . . . . . . . . . . . . . . . . . . . . . 77
26                             2.9.1.1    Data-sharing Attribute Rules for Variables
27                                        Referenced in a Construct                  . . . . . . . . . . . . . . . . . . 78
28                             2.9.1.2    Data-sharing Attribute Rules for Variables
29                                        Referenced in a Region but not in a Construct                             . . 80
30                     2.9.2   threadprivate Directive                 . . . . . . . . . . . . . . . . . . . . . . . . . 81
30                                                                                                                iii
1                       3.2.16 omp_get_level                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2                       3.2.17 omp_get_ancestor_thread_num                                    . . . . . . . . . . . . . . . . . . 130
3                       3.2.18 omp_get_team_size                         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4                       3.2.19 omp_get_active_level                             . . . . . . . . . . . . . . . . . . . . . . . . . 133
5                 3.3   Lock Routines          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6                       3.3.1     omp_init_lock and omp_init_nest_lock                                        . . . . . . . . . . 136
7                       3.3.2     omp_destroy_lock and omp_destroy_nest_lock                                               . . . 137
8                       3.3.3     omp_set_lock and omp_set_nest_lock                                      . . . . . . . . . . . . 138
9                       3.3.4     omp_unset_lock and omp_unset_nest_lock                                         . . . . . . . . 140
10                      3.3.5     omp_test_lock and omp_test_nest_lock                                        . . . . . . . . . . 141
11                3.4   Timing Routines            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
12                      3.4.1     omp_get_wtime                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
13                      3.4.2     omp_get_wtick                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
23            A. Examples       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
24                A.1   A Simple Parallel Loop               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
25                A.2   The OpenMP Memory Model                        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
26                A.3   Conditional Compilation                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
27                A.4   Internal Control Variables               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
28                A.5   The parallel Construct                   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
29                A.6   The num_threads Clause                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
31                                                                                                                 v
1                A.34 The lastprivate Clause                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
2                A.35 The reduction Clause                   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
3                A.36 The copyin Clause                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
4                A.37 The copyprivate Clause                     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
5                A.38 Nested Loop Constructs                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
6                A.39 Restrictions on Nesting of Regions                    . . . . . . . . . . . . . . . . . . . . . . . . . . 258
7                A.40 The omp_set_dynamic and
8                      omp_set_num_threads Routines                           . . . . . . . . . . . . . . . . . . . . . . . . . 265
9                A.41 The omp_get_num_threads Routine                               . . . . . . . . . . . . . . . . . . . . . . 266
10               A.42 The omp_init_lock Routine                        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
11               A.43 Ownership of Locks               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
12               A.44 Simple Lock Routines               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
13               A.45 Nestable Lock Routines                 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
3                  Introduction
4
22
23   1.1           Scope
24                 The OpenMP API covers only user-directed parallelization, wherein the user explicitly
25                 specifies the actions to be taken by the compiler and runtime system in order to execute
26                 the program in parallel. OpenMP-compliant implementations are not required to check
27                 for data dependencies, data conflicts, race conditions, or deadlocks, any of which may
28                 occur in conforming programs. In addition, compliant implementations are not required
29                 to check for code sequences that cause a program to be classified as non-conforming.
30                                                                                                        1
1                   The user is responsible for using OpenMP in his application to produce a conforming
2                   program. OpenMP does not cover compiler-generated automatic parallelization and
3                   directives to the compiler to assist such parallelization.
5 1.2 Glossary
7                   thread    An execution entity with a stack and associated static memory, called
8                             threadprivate memory.
10     thread-safe routine    A routine that performs the intended function even when executed
11                            concurrently (by more than one thread).
13           base language    A programming language that serves as the foundation of the OpenMP
14                            specification.
18         structured block   For C/C++, an executable statement, possibly compound, with a single entry
19                            at the top and a single exit at the bottom, or an OpenMP construct.
20                            For Fortran, a block of executable statements with a single entry at the top and
21                            a single exit at the bottom, or an OpenMP construct.
22 COMMENTS:
9 For Fortran:
16     OpenMP program        A program that consists of a base program, annotated with OpenMP directives
17                           and runtime library routines.
18   conforming program      An OpenMP program that follows all the rules and restrictions of the
19                           OpenMP specification.
20   declarative directive   An OpenMP directive that may only be placed in a declarative context. A
21                           declarative directive has no associated executable user code, but instead has
22                           one or more associated user declarations.
24    executable directive   An OpenMP directive that is not declarative; i.e., it may be placed in an
25                           executable context.
28 stand-alone directive An OpenMP executable directive that has no associated executable user code.
31                                                                         Chapter 1     Introduction        3
1           simple directive    An OpenMP executable directive whose associated user code must be a
2                               simple (single, non-compound) executable statement.
4             loop directive    An OpenMP executable directive whose associated user code must be a loop
5                               nest that is a structured block.
6                               COMMENTS:
8                                      For Fortran, only the do directive and the optional end do directive
9                                      are loop directives.
13                 construct    An OpenMP executable directive (and for Fortran, the paired end directive, if
14                              any) and the associated statement, loop or structured block, if any, not
15                              including the code in any called routines; i.e., the lexical extent of an
16                              executable directive.
17                    region    All code encountered during a specific instance of the execution of a given
18                              construct or of an OpenMP library routine. A region includes any code in
19                              called routines as well as any implicit code introduced by the OpenMP
20                              implementation. The generation of a task at the point where a task directive
21                              is encountered is a part of the region of the encountering thread, but the
22                              explicit task region associated with the task directive is not.
23 COMMENTS:
28     active parallel region   A parallel region that is executed by a team consisting of more than one
29                              thread.
30          inactive parallel
31                    region    A parallel region that is executed by a team of only one thread.
4 COMMENTS:
10     master thread     The thread that encounters a parallel construct, creates a team, generates
11                       a set of tasks, then executes one of those tasks as thread number 0.
12     parent thread     The thread that encountered the parallel construct and generated a
13                       parallel region is the parent thread of each of the threads in the team of
14                       that parallel region. The master thread of a parallel region is the
15                       same thread as its parent thread with respect to any resources associated with
16                       an OpenMP thread.
17   ancestor thread     For a given thread, its parent thread or one of its parent thread’s ancestor
18                       threads.
21 COMMENTS:
22                              For an active parallel region, the team comprises the master thread
23                              and at least one additional thread.
24                              For an inactive parallel region, the team comprises only the master
25                              thread.
27   implicit parallel
28             region    The inactive parallel region that encloses the sequential part of an OpenMP
29                       program.
31     nested region     A region (dynamically) enclosed by another region; i.e., a region encountered
32                       during the execution of another region.
33                       COMMENT: Some nestings are conforming and some are not. See
34                       Section 2.10 on page 104 for the restrictions on nesting.
35                                                                       Chapter 1    Introduction      5
1      closely nested region   A region nested inside another region with no parallel region nested
2                              between them.
4             current team     All threads in the team executing the innermost enclosing parallel region
5      encountering thread     For a given region, the thread that encounters the corresponding construct.
7       current team tasks     All tasks encountered during the execution of the innermost enclosing
8                              parallel region by the threads of the corresponding team. Note that the
9                              implicit tasks constituting the parallel region and any descendant tasks
10                             encountered during the execution of these implicit tasks are included in this
11                             binding task set.
12 generating task For a given region the task whose execution by a thread generated the region.
13       binding thread set    The set of threads that are affected by, or provide the context for, the
14                             execution of a region.
15                             The binding thread set for a given region can be all threads, the current team,
16                             or the encountering thread.
17                             COMMENT: The binding thread set for a particular region is described in its
18                             corresponding subsection of this specification.
19         binding task set    The set of tasks that are affected by, or provide the context for, the execution
20                             of a region.
21                             The binding task set for a given region can be all tasks, the current team
22                             tasks, or the generating task.
23                             COMMENT: The binding task set for a particular region (if applicable) is
24                             described in its corresponding subsection of this specification.
3                          Binding region is not defined for regions whose binding thread set is all
4                          threads or the encountering thread, nor is it defined for regions whose binding
5                          task set is all tasks.
6                          COMMENTS:
7                                 The binding region for an ordered region is the innermost enclosing
8                                 loop region.
9                                 The binding region for a taskwait region is the innermost enclosing
10                                task region.
11                                For all other regions for which the binding thread set is the current
12                                team or the binding task set is the current team tasks, the binding
13                                region is the innermost enclosing parallel region.
14                                For regions for which the binding task set is the generating task, the
15                                binding region is the region of the generating task.
16                                A parallel region need not be active nor explicit to be a binding
17                                region.
21   orphaned construct    A construct that gives rise to a region whose binding thread set is the current
22                         team, but that is not nested within another construct giving rise to the binding
23                         region.
24         worksharing
25            construct    A construct that defines units of work, each of which is executed exactly once
26                         by one of the threads in the team executing the construct.
30 sequential loop A loop that is not associated with any OpenMP loop directive.
35                                                                       Chapter 1      Introduction       7
1    1.2.3           Tasking Terminology
2                       task    A specific instance of executable code and its data environment, generated
3                               when a thread encounters a task construct or a parallel construct.
5 task region A region consisting of all code encountered during the execution of a task.
8 explicit task A task generated when a task construct is encountered during execution.
9              implicit task    A task generated by the implicit parallel region or generated when a
10                              parallel construct is encountered during execution.
11 initial task The implicit task associated with the implicit parallel region.
12             current task     For a given thread, the task corresponding to the task region in which it is
13                              executing.
14                child task    A task is a child task of the region of its generating task. A child task region
15                              is not part of its generating task region.
16          descendant task     A task that is the child task of a task region or of one of its descendant task
17                              regions.
18          task completion     Task completion occurs when the end of the structured block associated with
19                              the construct that generated the task is reached.
21     task scheduling point    A point during the execution of the current task region at which it can be
22                              suspended to be resumed later; or the point of task completion, after which the
23                              executing thread may switch to a different task region.
24 COMMENT:
25                                     Within tied task regions, task scheduling points only appear in the
26                                     following:
32 task switching The act of a thread switching from the execution of one task to another task.
3              untied task   A task that, when its task region is suspended, can be resumed by any thread
4                            in the team; that is, the task is not tied to any thread.
5     task synchronization
6                construct   A taskwait or a barrier construct.
11        private variable   With respect to a given set of task regions that bind to the same parallel
12                           region, a variable whose name provides access to a different block of storage
13                           for each task region.
16        shared variable    With respect to a given set of task regions that bind to the same parallel
17                           region, a variable whose name provides access to the same block of storage
18                           for each task region.
22           threadprivate
23                variable   A variable that is replicated, one instance per thread, by the OpenMP
24                           implementation, so that its name provides access to a different block of
25                           storage for each thread.
29           threadprivate
30                memory     The set of threadprivate variables associated with each thread.
31                                                                         Chapter 1       Introduction   9
1         data environment     All the variables associated with the execution of a given task. The data
2                              environment for a given task is constructed from the data environment of the
3                              generating task at the time the task is generated.
5 For C:
7 For C++:
8                              For the contents of variables of POD (plain old data) type, the property of
9                              having a valid value.
10                             For variables of non-POD class type, the property of having been constructed
11                             but not subsequently destructed.
12 For Fortran:
13                             For the contents of variables, the property of having a valid value. For the
14                             allocation or association status of variables, the property of having a valid
15                             status.
16                             COMMENT: Programs that rely upon variables that are not defined are non-
17                             conforming programs.
18 class type For C++: Variables declared with one of the class, struct, or union keywords.
24        supporting nested
25              parallelism    Supporting more than one level of parallelism.
26          internal control
27                 variable    A conceptual variable that specifies run-time behavior of a set of threads or
28                             tasks in an OpenMP program.
29                             COMMENT: The acronym ICV is used interchangeably with the term internal
30                             control variable in the remainder of this specification.
6     unspecified behavior   A behavior or result that is not specified by the OpenMP specification or not
7                            known prior to the compilation or execution of an OpenMP program.
13         implementation
14                defined    Behavior that must be documented by the implementation, and which is
15                           allowed to vary among different compliant implementations. An
16                           implementation is allowed to define this behavior as unspecified.
19
33                                                                        Chapter 1      Introduction     11
1               An OpenMP program begins as a single thread of execution, called the initial thread.
2               The initial thread executes sequentially, as if enclosed in an implicit task region, called
3               the initial task region, that is defined by an implicit inactive parallel region
4               surrounding the whole program.
5               When any thread encounters a parallel construct, the thread creates a team of itself
6               and zero or more additional threads and becomes the master of the new team. A set of
7               implicit tasks, one per thread, is generated. The code for each task is defined by the code
8               inside the parallel construct. Each task is assigned to a different thread in the team
9               and becomes tied; that is, it is always executed by the thread to which it is initially
10              assigned. The task region of the task being executed by the encountering thread is
11              suspended, and each member of the new team executes its implicit task. There is an
12              implicit barrier at the end of the parallel construct. Beyond the end of the
13              parallel construct, only the master thread resumes execution, by resuming the task
14              region that was suspended upon encountering the parallel construct. Any number of
15              parallel constructs can be specified in a single program.
16              parallel regions may be arbitrarily nested inside each other. If nested parallelism is
17              disabled, or is not supported by the OpenMP implementation, then the new team that is
18              created by a thread encountering a parallel construct inside a parallel region
19              will consist only of the encountering thread. However, if nested parallelism is supported
20              and enabled, then the new team can consist of more than one thread.
21              When any team encounters a worksharing construct, the work inside the construct is
22              divided among the members of the team, and executed cooperatively instead of being
23              executed by every thread. There is an optional barrier at the end of each worksharing
24              construct. Redundant execution of code by every thread in the team resumes after the
25              end of the worksharing construct.
26              When any thread encounters a task construct, a new explicit task is generated.
27              Execution of explicitly generated tasks is assigned to one of the threads in the current
28              team, subject to the thread's availability to execute work. Thus, execution of the new
29              task could be immediate, or deferred until later. Threads are allowed to suspend the
30              current task region at a task scheduling point in order to execute a different task. If the
31              suspended task region is for a tied task, the initially assigned thread later resumes
32              execution of the suspended task region. If the suspended task region is for an untied
33              task, then any thread may resume its execution. In untied task regions, task scheduling
34              points may occur at implementation defined points anywhere in the region. In tied task
35              regions, task scheduling points may occur only in task, taskwait, explicit or
36              implicit barrier constructs, and at the completion point of the task. Completion of all
37              explicit tasks bound to a given parallel region is guaranteed before the master thread
38              leaves the implicit barrier at the end of the region. Completion of a subset of all explicit
39              tasks bound to a given parallel region may be specified through the use of task
40              synchronization constructs. Completion of all explicit tasks bound to the implicit
41              parallel region is guaranteed by the time the program exits.
10
34                                                                   Chapter 1      Introduction     13
1               relationship between the value of the original variable and the initial or final value of the
2               private version depends on the exact clause that specifies it. Details of this issue, as well
3               as other issues with privatization, are provided in Section 2.9 on page 77.
4               The minimum size at which memory accesses by multiple threads without
5               synchronization, either to the same variable or to different variables that are part of the
6               same variable (as array or structure elements), are atomic with respect to each other, is
7               implementation defined. Any additional atomicity restrictions, such as alignment, are
8               implementation defined.
9               A single access to a variable may be implemented with multiple load or store
10              instructions, and hence is not guaranteed to be atomic with respect to other accesses to
11              the same variable. Accesses to variables smaller than the implementation-defined
12              minimum size or to C or C++ bit-fields may be implemented by reading, modifying, and
13              rewriting a larger unit of memory, and may thus interfere with updates of variables or
14              fields in the same unit of memory.
15              If multiple threads write without synchronization to the same memory unit, including
16              cases due to atomicity considerations as described above, then a data race occurs.
17              Similarly, if at least one thread reads from a memory unit and at least one thread writes
18              without synchronization to that same memory unit, including cases due to atomicity
19              considerations as described above, then a data race occurs. If a data race occurs then the
20              result of the program is unspecified.
21              A private variable in a task region that eventually generates an inner nested parallel
22              region is permitted to be made shared by implicit tasks in the inner parallel region.
23              A private variable in a task region can be shared by an explicit task region generated
24              during its execution. However, it is the programmer’s responsibility to ensure through
25              synchronization that the lifetime of the variable does not end before completion of the
26              explicit task region sharing it. Any other access by one task to the private variables of
27              another task results in unspecified behavior.
36                                                           Chapter 1      Introduction     15
1    1.4.3         OpenMP Memory Consistency
2                  The type of relaxed memory consistency provided by OpenMP is similar to weak
3                  ordering as described in S. V. Adve and K. Gharachorloo, “Shared Memory Consistency
4                  Models: A Tutorial”, IEEE Computer, 29(12), pp.66-76, December 1996. Weak ordering
5                  requires that some memory operations be defined as synchronization operations and that
6                  these be ordered with respect to each other. In the context of OpenMP, two flushes of the
7                  same variable are synchronization operations. OpenMP does not apply any other
8                  restriction to the reordering of memory operations executed by a single thread. The
9                  OpenMP memory model is slightly weaker than weak ordering since flushes are not
10                 ordered with respect to each other if their flush-sets have an empty intersection.
11                 The restrictions in Section 1.4.2 on page 14 on reordering with respect to flush
12                 operations guarantee the following:
13                 • If the intersection of the flush-sets of two flushes performed by two different threads
14                    is non-empty, then the two flushes must be completed as if in some sequential order,
15                    seen by all threads.
16                 • If the intersection of the flush-sets of two flushes performed by one thread is non-
17                    empty, then the two flushes must appear to be completed in that thread’s program
18                    order.
19                 • If the intersection of the flush-sets of two flushes is empty, the threads can observe
20                    these flushes in any order.
21                 The flush operation can be specified using the flush directive, and is also implied at
22                 various locations in an OpenMP program: see Section 2.8.6 on page 72 for details. For
23                 an example illustrating the memory model, see Section A.2 on page 154.
24
14
27                                                                Chapter 1     Introduction     17
1                  This OpenMP API specification refers to ISO/IEC 1539-1:1997 as Fortran 95.
2                  Where this OpenMP API specification refers to C, C++ or Fortran, reference is made to
3                  the base language supported by the implementation.
16                 Some sections of this document only apply to programs written in a certain base
17                 language. Text that applies only to programs whose base language is C or C++ is shown
18                 as follows:
19
                                                           C/C++
20                 C/C++ specific text....
                                                          C/C++
21                 Text that applies only to programs whose base language is Fortran is shown as follows:
22
                                                          Fortran
23                 Fortran specific text......
24
                                                          Fortran
25                 Where an entire page consists of, for example, Fortran specific text, a marker is shown
26                 at the top of the page like this:
                                                      Fortran (cont.)
27                 Some text is for information only, and is not part of the normative specification. Such
28                 text is designated as a note, like this:
4                                   Chapter 1   Introduction   19
1   20   OpenMP API • Version 3.0 May 2008
1
2    CHAPTER   2
3                  Directives
4
5                  This chapter describes the syntax and behavior of OpenMP directives, and is divided
6                  into the following sections:
7                  • The language-specific directive format (Section 2.1 on page 22)
8                  • Mechanisms to control conditional compilation (Section 2.2 on page 26)
9                  • Control of OpenMP API ICVs (Section 2.3 on page 28)
10                 • Details of each OpenMP directive (Section 2.4 on page 32 to Section 2.10 on page
11                   104)
12
                                                           C/C++
13                 In C/C++, OpenMP directives are specified by using the #pragma mechanism provided
14                 by the C and C++ standards.
                                                           C/C++
15
                                                          Fortran
16                 In Fortran, OpenMP directives are specified by using special comments that are
17                 identified by unique sentinels. Also, a special comment form is available for conditional
18                 compilation.
19
                                                          Fortran
20                 Compilers can therefore ignore OpenMP directives and conditionally compiled code if
21                 support of OpenMP is not provided or enabled. A compliant implementation must
22                 provide an option or interface that ensures that underlying support of all OpenMP
23                 directives and OpenMP conditional compilation mechanisms is enabled. In the
24                 remainder of this document, the phrase OpenMP compilation is used to mean a
25                 compilation with these OpenMP features enabled.
26                                                                                                       21
1
                                                                Fortran
2                  Restrictions
3                  The following restriction applies to all OpenMP directives:
4                  • OpenMP directives may not appear in PURE or ELEMENTAL procedures.
5
                                                                Fortran
14                 Each directive starts with #pragma omp. The remainder of the directive follows the
15                 conventions of the C and C++ standards for compiler directives. In particular, white
16                 space can be used before and after the #, and sometimes white space must be used to
17                 separate the words in a directive. Preprocessing tokens following the #pragma omp
18                 are subject to macro replacement.
19                 Directives are case-sensitive.
20                 An OpenMP executable directive applies to at most one succeeding statement, which
21                 must be a structured block.
                                                                 C/C++
22
                                                                Fortran
23                 OpenMP directives for Fortran are specified as follows:
24
25                   sentinel directive-name [clause[[,] clause]...]
26                 All OpenMP compiler directives must begin with a directive sentinel. The format of a
27                 sentinel differs between fixed and free-form source files, as described in Section 2.1.1
28                 on page 23 and Section 2.1.2 on page 24.
20
                                                      Fortran
21   2.1.1   Fixed Source Form Directives
22           The following sentinels are recognized in fixed form source files:
23
24             !$omp | c$omp | *$omp
25           Sentinels must start in column 1 and appear as a single word with no intervening
26           characters. Fortran fixed form line length, white space, continuation, and column rules
27           apply to the directive line. Initial directive lines must have a space or zero in column 6,
28           and continuation directive lines must have a character other than a space or a zero in
29           column 6.
30           Comments may appear on the same line as a directive. The exclamation point initiates a
31           comment when it appears after column 6. The comment extends to the end of the source
32           line and is ignored. If the first non-blank character after the directive sentinel of an
33           initial or continuation directive line is an exclamation point, the line is ignored.
34                                                                      Chapter 2      Directives      23
1
                                                   Fortran (cont.)
3               Note – in the following example, the three formats for specifying the directive are
4               equivalent (the first line represents the position of the first 9 columns):
5               c23456789
6               !$omp parallel do shared(a,b,c)
7               c$omp parallel do
8               c$omp+shared(a,b,c)
9               c$omp paralleldoshared(a,b,c)
10
15              The sentinel can appear in any column as long as it is preceded only by white space
16              (spaces and tab characters). It must appear as a single word with no intervening
17              character. Fortran free form line length, white space, and continuation rules apply to the
18              directive line. Initial directive lines must have a space after the sentinel. Continued
19              directive lines must have an ampersand as the last nonblank character on the line, prior
20              to any comment placed inside the directive. Continuation directive lines can have an
21              ampersand after the directive sentinel with optional white space before and after the
22              ampersand.
23              Comments may appear on the same line as a directive. The exclamation point initiates a
24              comment. The comment extends to the end of the source line and is ignored. If the first
25              nonblank character after the directive sentinel is an exclamation point, the line is
26              ignored.
27              One or more blanks or horizontal tabs must be used to separate adjacent keywords in
28              directives in free source form, except in the following cases, where white space is
29              optional between the given pair of keywords:
2 end do
3 end master
4 end ordered
5 end parallel
6 end sections
7 end single
8 end task
9 end workshare
10 parallel do
11 parallel sections
12               parallel workshare
13
14   Note – in the following example the three formats for specifying the directive are
15   equivalent (the first line represents the position of the first 9 columns):
16   !23456789
17             !$omp parallel do &
18                           !$omp shared(a,b,c)
23
24
                                           Fortran
25                                                          Chapter 2     Directives      25
1
27                 If these criteria are met, the sentinel is replaced by two spaces. If these criteria are not
28                 met, the line is left unchanged.
3            Note – in the following example, the two forms for specifying conditional compilation
4            in fixed source form are equivalent (the first line represents the position of the first 9
5            columns):
6            c23456789
7            !$ 10 iam = omp_get_thread_num() +
8            !$       &               index
9            #ifdef _OPENMP
10                 10 iam = omp_get_thread_num() +
11                    &               index
12           #endif
13
30                                                                      Chapter 2      Directives      27
1
2                  Note – in the following example, the two forms for specifying conditional compilation
3                  in free source form are equivalent (the first line represents the position of the first 9
4                  columns):
5                  c23456789
6                   !$ iam = omp_get_thread_num() +                    &
7                   !$&       index
8                  #ifdef _OPENMP
9                         iam = omp_get_thread_num() +                 &
10                            index
11                 #endif
12
13
14
                                                          Fortran
15
32                                                                          Chapter 2       Directives   29
1
2                ICV               Scope Ways to modify value       Way to retrieve value    Initial value
6               Comments:
7               • The initial value of dyn-var is implementation defined if the implementation supports
8                   dynamic adjustment of the number of threads; otherwise, the initial value is false.
9               • The initial value of max-active-levels-var is the number of levels of parallelism that
10                  the implementation supports. See the definition of supporting n levels of parallelism
11                  in Section 1.2.5 on page 10 for further details.
12              After the initial values are assigned, but before any OpenMP construct or OpenMP API
13              routine executes, the values of any OpenMP environment variables that were set by the
14              user are read and the associated ICVs are modified accordingly. After this point, no
15              changes to any OpenMP environment variables will affect the ICVs.
16              Clauses on OpenMP constructs do not modify the values of any of the ICVs.
14   Cross References:
15   • parallel construct, see Section 2.4 on page 32.
16   • num_threads clause, see Section 2.4.1 on page 35.
17   • schedule clause, see Section 2.5.1.1 on page 45.
18   • Loop construct, see Section 2.5.1 on page 38.
19   • omp_set_num_threads routine, see Section 3.2.1 on page 110.
20   • omp_get_max_threads routine, see Section 3.2.3 on page 112.
21   • omp_set_dynamic routine, see Section 3.2.7 on page 117.
22   • omp_get_dynamic routine, see Section 3.2.8 on page 118.
23   • omp_set_nested routine, see Section 3.2.9 on page 119.
24   • omp_get_nested routine, see Section 3.2.10 on page 120.
25   • omp_set_schedule routine, see Section 3.2.11 on page 121.
26   • omp_get_schedule routine, see Section 3.2.12 on page 123.
27   • omp_get_thread_limit routine, see Section 3.2.13 on page 125.
28   • omp_set_max_active_levels routine, see Section 3.2.14 on page 126.
29   • omp_get_max_active_levels routine, see Section 3.2.15 on page 127.
30   • OMP_SCHEDULE environment variable, see Section 4.1 on page 146.
31   • OMP_NUM_THREADS environment variable, see Section 4.2 on page 147.
32   • OMP_DYNAMIC environment variable, see Section 4.3 on page 148.
33                                                                   Chapter 2         Directives          31
1                  • OMP_NESTED environment variable, see Section 4.4 on page 148.
2                  • OMP_STACKSIZE environment variable, see Section 4.5 on page 149.
3                  • OMP_WAIT_POLICY environment variable, see Section 4.6 on page 150.
4                  • OMP_MAX_ACTIVE_LEVELS environment variable, see Section 4.7 on page 150.
5                  • OMP_THREAD_LIMIT environment variable, see Section 4.8 on page 151.
8                  Summary
9                  This fundamental construct starts parallel execution. See Section 1.3 on page 11 for a
10                 general description of the OpenMP execution model.
11                 Syntax
12
                                                           C/C++
13                 The syntax of the parallel construct is as follows:
14
15                   #pragma omp parallel [clause[ [, ]clause] ...] new-line
16                      structured-block
18                            if(scalar-expression)
19                            num_threads(integer-expression)
20                            default(shared | none)
21                            private(list)
22                            firstprivate(list)
23                            shared(list)
24                            copyin(list)
25                            reduction(operator: list)
26
                                                           C/C++
8               if(scalar-logical-expression)
9               num_threads(scalar-integer-expression)
10              default(private | firstprivate | shared | none)
11              private(list)
12              firstprivate(list)
13              shared(list)
14              copyin(list)
15              reduction({operator|intrinsic_procedure_name}:list)
16   The end parallel directive denotes the end of the parallel construct.
17
                                             Fortran
18   Binding
19   The binding thread set for a parallel region is the encountering thread. The
20   encountering thread becomes the master thread of the new team.
21   Description
22   When a thread encounters a parallel construct, a team of threads is created to
23   execute the parallel region (see Section 2.4.1 on page 35 for more information about
24   how the number of threads in the team is determined, including the evaluation of the if
25   and num_threads clauses). The thread that encountered the parallel construct
26   becomes the master thread of the new team, with a thread number of zero for the
27   duration of the new parallel region. All threads in the new team, including the
28   master thread, execute the region. Once the team is created, the number of threads in the
29   team remains constant for the duration of that parallel region.
30                                                            Chapter 2     Directives     33
1               Within a parallel region, thread numbers uniquely identify each thread. Thread
2               numbers are consecutive whole numbers ranging from zero for the master thread up to
3               one less than the number of threads in the team. A thread may obtain its own thread
4               number by a call to the omp_get_thread_num library routine.
5               A set of implicit tasks, equal in number to the number of threads in the team, is
6               generated by the encountering thread. The structured block of the parallel construct
7               determines the code that will be executed in each implicit task. Each task is assigned to
8               a different thread in the team and becomes tied. The task region of the task being
9               executed by the encountering thread is suspended and each thread in the team executes
10              its implicit task. Each thread can execute a path of statements that is different from that
11              of the other threads.
12              The implementation may cause any thread to suspend execution of its implicit task at a
13              task scheduling point, and switch to execute any explicit task generated by any of the
14              threads in the team, before eventually resuming execution of the implicit task (for more
15              details see Section 2.7 on page 59).
16              There is an implied barrier at the end of a parallel region. After the end of a
17              parallel region, only the master thread of the team resumes execution of the
18              enclosing task region.
19              If a thread in a team executing a parallel region encounters another parallel
20              directive, it creates a new team, according to the rules in Section 2.4.1 on page 35, and
21              it becomes the master of that new team.
22              If execution of a thread terminates while inside a parallel region, execution of all
23              threads in all teams terminates. The order of termination of threads is unspecified. All
24              work done by a team prior to any barrier that the team has passed in the program is
25              guaranteed to be complete. The amount of work done by each thread after the last
26              barrier that it passed and before it terminates is unspecified.
27              For an example of the parallel construct, see Section A.5 on page 164. For an
28              example of the num_threads clause, see Section A.6 on page 166.
29              Restrictions
30              Restrictions to the parallel construct are as follows:
31              • A program that branches into or out of a parallel region is non-conforming.
32              • A program must not depend on any ordering of the evaluations of the clauses of the
33                 parallel directive, or on any side effects of the evaluations of the clauses.
34              • At most one if clause can appear on the directive.
35              • At most one num_threads clause can appear on the directive. The num_threads
36                 expression must evaluate to a positive integer value.
9            Cross References
10           • default, shared, private, firstprivate, and reduction clauses, see
11             Section 2.9.3 on page 85.
12           • copyin clause, see Section 2.9.4 on page 100.
13           • omp_get_thread_num routine, see Section 3.2.4 on page 113.
33                                                                    Chapter 2      Directives    35
1
2                                        Algorithm 2.1
3                       then let IfClauseValue be the value of the if clause expression;
4                       else let IfClauseValue = true;
5                       if a num_threads clause exists
6                       then let ThreadsRequested be the value of the num_threads clause
7                       expression;
8                       else let ThreadsRequested = nthreads-var;
9                       let ThreadsAvailable = (thread-limit-var - ThreadsBusy + 1);
10                      if (IfClauseValue = false)
11                      then number of threads = 1;
12                      else if (ActiveParRegions >= 1) and (nest-var = false)
13                      then number of threads = 1;
14                      else if (ActiveParRegions = max-active-levels-var)
15                      then number of threads = 1;
16                      else if (dyn-var = true) and (ThreadsRequested <= ThreadsAvailable)
17                      then number of threads = [ 1 : ThreadsRequested ];
18                      else if (dyn-var = true) and (ThreadsRequested > ThreadsAvailable)
19                      then number of threads = [ 1 : ThreadsAvailable ];
20                      else if (dyn-var = false) and (ThreadsRequested <= ThreadsAvailable)
21                      then number of threads = ThreadsRequested;
22                      else if (dyn-var = false) and (ThreadsRequested > ThreadsAvailable)
23                      then behavior is implementation defined;
24
25              Note – Since the initial value of the dyn-var ICV is implementation defined, programs
26              that depend on a specific number of threads for correct execution should explicitly
27              disable dynamic adjustment of the number of threads.
28
23         Restrictions
24         The following restrictions apply to worksharing constructs:
25         • Each worksharing region must be encountered by all threads in a team or by none at
26           all.
27         • The sequence of worksharing regions and barrier regions encountered must be the
28           same for every thread in a team.
29                                                                 Chapter 2     Directives     37
1    2.5.1      Loop Construct
2               Summary
3               The loop construct specifies that the iterations of one or more associated loops will be
4               executed in parallel by threads in the team in the context of their implicit tasks. The
5               iterations are distributed across threads that already exist in the team executing the
6               parallel region to which the loop region binds.
7               Syntax
8
                                                        C/C++
9               The syntax of the loop construct is as follows:
10
11                #pragma omp for [clause[[,] clause] ... ] new-line
12                   for-loops
14                         private(list)
15                         firstprivate(list)
16                         lastprivate(list)
17                         reduction(operator: list)
18                         schedule(kind[, chunk_size])
19 collapse(n)
20 ordered
21 nowait
42                                                               Chapter 2   Directives     39
1               The canonical form allows the iteration count of all associated loops to be computed
2               before executing the outermost loop. The computation is performed for each loop in an
3               integer type. This type is derived from the type of var as follows:
4               • If var is of an integer type, then the type is the type of var.
5               • For C++, if var is of a random access iterator type, then the type is the type that
6                  would be used by std::distance applied to variables of the type of var.
7               • For C, if var is of a pointer type, then the type is ptrdiff_t.
8               The behavior is unspecified if any intermediate result required to compute the iteration
9               count cannot be represented in the type determined above.
10              There is no implied synchronization during the evaluation of the lb, b, or incr
11              expressions. It is unspecified whether, in what order, or how many times any side effects
12              within the lb, b, or incr expressions occur.
13
14              Note – Random access iterators are required to support random access to elements in
15              constant time. Other iterators are precluded by the restrictions since they can take linear
16              time or offer limited functionality. It is therefore advisable to use tasks to parallelize
17              those cases.
18
19
                                                         C/C++
20
                                                         Fortran
21              The syntax of the loop construct is as follows:
22
23                 !$omp do [clause[[,] clause] ... ]
24                    do-loops
25                [!$omp end do [nowait] ]
27                          private(list)
28                          firstprivate(list)
29                          lastprivate(list)
30                          reduction({operator|intrinsic_procedure_name}:list)
2 collapse(n)
3 ordered
4    If an end do directive is not specified, an end do directive is assumed at the end of the
5    do-loop.
15   Binding
16   The binding thread set for a loop region is the current team. A loop region binds to the
17   innermost enclosing parallel region. Only the threads of the team executing the
18   binding parallel region participate in the execution of the loop iterations and
19   (optional) implicit barrier of the loop region.
20   Description
21   The loop construct is associated with a loop nest consisting of one or more loops that
22   follow the directive.
23   There is an implicit barrier at the end of a loop construct unless a nowait clause is
24   specified.
25   The collapse clause may be used to specify how many loops are associated with the
26   loop construct. The parameter of the collapse clause must be a constant positive
27   integer expression. If no collapse clause is present, the only loop that is associated
28   with the loop construct is the one that immediately follows the construct.
29   If more than one loop is associated with the loop construct, then the iterations of all
30   associated loops are collapsed into one larger iteration space which is then divided
31   according to the schedule clause. The sequential execution of the iterations in all
32   associated loops determines the order of the iterations in the collapsed iteration space.
33                                                            Chapter 2      Directives      41
1               The iteration count for each associated loop is computed before entry to the outermost
2               loop. If execution of any associated loop changes any of the values used to compute any
3               of the iteration counts then the behavior is unspecified.
4               The integer type (or kind, for Fortran) used to compute the iteration count for the
5               collapsed loop is implementation defined.
6               A worksharing loop has logical iterations numbered 0,1,...,N-1 where N is the number of
7               loop iterations, and the logical numbering denotes the sequence in which the iterations
8               would be executed if the associated loop(s) were executed by a single thread. The
9               schedule clause specifies how iterations of the associated loops are divided into
10              contiguous non-empty subsets, called chunks, and how these chunks are distributed
11              among threads of the team. Each thread executes its assigned chunk(s) in the context of
12              its implicit task. The chunk_size expression is evaluated using the original list items of
13              any variables that are made private in the loop construct. It is unspecified whether, in
14              what order, or how many times, any side-effects of the evaluation of this expression
15              occur. The use of a variable in a schedule clause expression of a loop construct
16              causes an implicit reference to the variable in all enclosing constructs.
17              Different loop regions with the same schedule and iteration count, even if they occur in
18              the same parallel region, can distribute iterations among threads differently. The only
19              exception is for the static schedule as specified in Table 2-1. Programs that depend
20              on which thread executes a particular iteration under any other circumstances are non-
21              conforming.
22              See Section 2.5.1.1 on page 45 for details of how the schedule for a worksharing loop is
23              determined.
6                  When no chunk_size is specified, the iteration space is divided into chunks that
7                  are approximately equal in size, and at most one chunk is distributed to each
8                  thread. Note that the size of the chunks is unspecified in this case.
22                 Each chunk contains chunk_size iterations, except for the last chunk to be
23                 distributed, which may have fewer iterations.
40                                                                Chapter 2        Directives         43
1               runtime       When schedule(runtime) is specified, the decision regarding scheduling
2                             is deferred until run time, and the schedule and chunk size are taken from the
3                             run-sched-var ICV. If the ICV is set to auto, the schedule is implementation
4                             defined.
5
6
                Note – For a team of p threads and a loop of n iterations, let n ⁄ p be the integer q
7               that satisfies n = p*q - r, with 0 <= r < p. One compliant implementation of the
8               static schedule (with no specified chunk_size) would behave as though chunk_size
9               had been specified with value q. Another compliant implementation would assign q
10              iterations to the first p-r threads, and q-1 iterations to the remaining r threads. This
11              illustrates why a conforming program must not rely on the details of a particular
12              implementation.
18              Restrictions
19              Restrictions to the loop construct are as follows:
20              • All loops associated with the loop construct must be perfectly nested; that is, there
21                 must be no intervening code nor any OpenMP directive between any two loops.
22              • The values of the loop control expressions of the loops associated with the loop
23                 directive must be the same for all the threads in the team.
24              • Only one schedule clause can appear on a loop directive.
25              • Only one collapse clause can appear on a loop directive.
26              • chunk_size must be a loop invariant integer expression with a positive value.
27              • The value of the chunk_size expression must be the same for all threads in the team.
28              • The value of the run-sched-var ICV must be the same for all threads in the team.
29              • When schedule(runtime) or schedule(auto) is specified, chunk_size must
30                 not be specified.
31              • Only a single ordered clause can appear on a loop directive.
32              • The ordered clause must be present on the loop construct if any ordered region
33                 ever binds to a loop region arising from the loop construct.
23             Cross References
24             • private, firstprivate, lastprivate, and reduction clauses, see
25               Section 2.9.3 on page 85.
26             • OMP_SCHEDULE environment variable, see Section 4.1 on page 146.
27             • ordered construct, see Section 2.8.7 on page 75.
33                                                                      Chapter 2     Directives     45
1
                clause then the current value of the def-sched-var ICV determines the schedule. If the
2               loop directive has a schedule clause that specifies the runtime schedule kind then
3               the current value of the run-sched-var ICV determines the schedule. Otherwise, the
4               value of the schedule clause determines the schedule. Figure 2-1 describes how the
5               schedule for a worksharing loop is determined.
6               Cross References
7               • ICVs, see Section 2.3 on page 28.
8
9                                 START
10
11
12                             schedule              No
                               clause present?                         Use def-sched-var schedule kind
13
14                                     Yes
15
16
18
19                                     Yes
20                                                                     Use run-sched-var schedule kind
2            Summary
3            The sections construct is a noniterative worksharing construct that contains a set of
4            structured blocks that are to be distributed among and executed by the threads in a team.
5            Each structured block is executed once by one of the threads in the team in the context
6            of its implicit task.
7            Syntax
8
                                                     C/C++
9            The syntax of the sections construct is as follows:
10
11             #pragma omp sections [clause[[,] clause] ...] new-line
12                {
13                [#pragma omp section new-line]
14                     structured-block
15                [#pragma omp section new-line
16                     structured-block ]
17                 ...
18                }
20                      private(list)
21                      firstprivate(list)
22                      lastprivate(list)
23                      reduction(operator: list)
24                      nowait
25
                                                     C/C++
26                                                                    Chapter 2     Directives     47
1
                                                        Fortran
2               The syntax of the sections construct is as follows:
3
4                 !$omp sections [clause[[,] clause] ...]
5                   [!$omp section]
6                        structured-block
7                   [!$omp section
8                        structured-block ]
9                    ...
10                !$omp end sections [nowait]
12                         private(list)
13                         firstprivate(list)
14                         lastprivate(list)
15                         reduction({operator|intrinsic_procedure_name}:list)
16
17
                                                        Fortran
18              Binding
19              The binding thread set for a sections region is the current team. A sections
20              region binds to the innermost enclosing parallel region. Only the threads of the team
21              executing the binding parallel region participate in the execution of the structured
22              blocks and (optional) implicit barrier of the sections region.
23              Description
24              Each structured block in the sections construct is preceded by a section directive
25              except possibly the first block, for which a preceding section directive is optional.
26              The method of scheduling the structured blocks among the threads in the team is
27              implementation defined.
12           Cross References
13           • private, firstprivate, lastprivate, and reduction clauses, see
14             Section 2.9.3 on page 85.
16           Summary
17           The single construct specifies that the associated structured block is executed by only
18           one of the threads in the team (not necessarily the master thread), in the context of its
19           implicit task. The other threads in the team, which do not execute the block, wait at an
20           implicit barrier at the end of the single construct unless a nowait clause is specified.
21           Syntax
22
                                                     C/C++
23           The syntax of the single construct is as follows:
24
25             #pragma omp single [clause[[,] clause] ...] new-line
26                structured-block
27                                                                    Chapter 2     Directives    49
1               where clause is one of the following:
2                          private(list)
3                          firstprivate(list)
4                          copyprivate(list)
5                          nowait
6
                                                          C/C++
7
                                                        Fortran
8               The syntax of the single construct is as follows:
9
10                !$omp single [clause[[,] clause] ...]
11                   structured-block
12                !$omp end single [end_clause[[,] end_clause] ...]
14                         private(list)
15                         firstprivate(list)
17                         copyprivate(list)
18                         nowait
19
20
                                                        Fortran
21              Binding
22              The binding thread set for a single region is the current team. A single region
23              binds to the innermost enclosing parallel region. Only the threads of the team
24              executing the binding parallel region participate in the execution of the structured
25              block and the (optional) implicit barrier of the single region.
6            Restrictions
7            Restrictions to the single construct are as follows:
8            • The copyprivate clause must not be used with the nowait clause.
9            • At most one nowait clause can appear on a single construct.
10
                                                    C/C++
11           • A throw executed inside a single region must cause execution to resume within the
12             same single region, and the same thread that threw the exception must catch it.
                                                    C/C++
13           Cross References
14           • private and firstprivate clauses, see Section 2.9.3 on page 85.
15           • copyprivate clause, see Section 2.9.4.2 on page 102.
16
                                                   Fortran
17   2.5.4   workshare Construct
18           Summary
19           The workshare construct divides the execution of the enclosed structured block into
20           separate units of work, and causes the threads of the team to share the work such that
21           each unit is executed only once by one thread, in the context of its implicit task.
22                                                                  Chapter 2     Directives     51
1
                                                  Fortran (cont.)
2               Syntax
3               The syntax of the workshare construct is as follows:
4
5                 !$omp workshare
6                    structured-block
7                 !$omp end workshare [nowait]
20              Binding
21              The binding thread set for a workshare region is the current team. A workshare
22              region binds to the innermost enclosing parallel region. Only the threads of the team
23              executing the binding parallel region participate in the execution of the units of
24              work and the (optional) implicit barrier of the workshare region.
25              Description
26              There is an implicit barrier at the end of a workshare construct unless a nowait
27              clause is specified.
26   It is unspecified how the units of work are assigned to the threads executing a
27   workshare region.
28   If an array expression in the block references the value, association status, or allocation
29   status of private variables, the value of the expression is undefined, unless the same
30   value would be computed by every thread.
33   The workshare directive causes the sharing of work to occur only in the workshare
34   construct, and not in the remainder of the workshare region.
35 For examples of the workshare construct, see Section A.14 on page 191.
36                                                             Chapter 2      Directives     53
1                  Restrictions
2                  The following restrictions apply to the workshare directive:
3                  • The construct must not contain any user defined function calls unless the function is
4                     ELEMENTAL.
5
                                                          Fortran
22                 Summary
23                 The parallel loop construct is a shortcut for specifying a parallel construct
24                 containing one loop construct and no other statements.
7    where clause can be any of the clauses accepted by the parallel or for directives,
8    except the nowait clause, with identical meanings and restrictions.
                                             C/C++
9
                                            Fortran
10   The syntax of the parallel loop construct is as follows:
11
12      !$omp parallel do [clause[[,] clause] ...]
13         do-loop
14     [!$omp end parallel do]
15   where clause can be any of the clauses accepted by the parallel or do directives,
16   with identical meanings and restrictions.
21   Description
22
                                             C/C++
23   The semantics are identical to explicitly specifying a parallel directive immediately
24   followed by a for directive.
                                             C/C++
25
                                            Fortran
26   The semantics are identical to explicitly specifying a parallel directive immediately
27   followed by a do directive, and an end do directive immediately followed by an end
28   parallel directive.
29
                                             Fortran
30                                                              Chapter 2   Directives   55
1               Restrictions
2               The restrictions for the parallel construct and the loop construct apply.
3               Cross References
4               • parallel construct, see Section 2.4 on page 32.
5               • loop construct, see Section 2.5.1 on page 38.
6               • Data attribute clauses, see Section 2.9.3 on page 85.
8               Summary
9               The parallel sections construct is a shortcut for specifying a parallel
10              construct containing one sections construct and no other statements.
11              Syntax
12
                                                       C/C++
13              The syntax of the parallel sections construct is as follows:
14
15                #pragma omp parallel sections [clause[[,] clause] ...] new-line
16                   {
17                   [#pragma omp section new-line]
18                       structured-block
19                   [#pragma omp section new-line
20                       structured-block ]
21                   ...
22                   }
23              where clause can be any of the clauses accepted by the parallel or sections
24              directives, except the nowait clause, with identical meanings and restrictions.
                                                       C/C++
11   where clause can be any of the clauses accepted by the parallel or sections
12   directives, with identical meanings and restrictions.
13   The last section ends at the end parallel sections directive. nowait cannot be
14   specified on an end parallel sections directive.
15
                                           Fortran
16   Description
17
                                            C/C++
18   The semantics are identical to explicitly specifying a parallel directive immediately
19   followed by a sections directive.
                                            C/C++
20
                                           Fortran
21   The semantics are identical to explicitly specifying a parallel directive immediately
22   followed by a sections directive, and an end sections directive immediately
23   followed by an end parallel directive.
24
                                           Fortran
25   For an example of the parallel sections construct, see Section A.11 on page 174.
26   Restrictions
27   The restrictions for the parallel construct and the sections construct apply.
28   Cross References:
29   • parallel construct, see Section 2.4 on page 32.
30                                                          Chapter 2      Directives   57
1               • sections construct, see Section 2.5.2 on page 47.
2               • Data attribute clauses, see Section 2.9.3 on page 85.
3
                                                       Fortran
4    2.6.3      parallel workshare Construct
5               Summary
6               The parallel workshare construct is a shortcut for specifying a parallel
7               construct containing one workshare construct and no other statements.
8               Syntax
9               The syntax of the parallel workshare construct is as follows:
10
11                !$omp parallel workshare [clause[[,] clause] ...]
12                   structured-block
13                !$omp end parallel workshare
14              where clause can be any of the clauses accepted by the parallel directive, with
15              identical meanings and restrictions. nowait may not be specified on an end
16              parallel workshare directive.
17              Description
18              The semantics are identical to explicitly specifying a parallel directive immediately
19              followed by a workshare directive, and an end workshare directive immediately
20              followed by an end parallel directive.
21              Restrictions
22              The restrictions for the parallel construct and the workshare construct apply.
23              Cross References
24              • parallel construct, see Section 2.4 on page 32.
25              • workshare construct, see Section 2.5.4 on page 51.
6          Summary
7          The task construct defines an explicit task.
8          Syntax
9
                                                   C/C++
10         The syntax of the task construct is as follows:
11
12           #pragma omp task [clause[[,] clause] ...] new-line
13               structured-block
15 if(scalar-expression)
16                    untied
17                    default(shared | none)
18                    private(list)
19                    firstprivate(list)
20                    shared(list)
21
                                                   C/C++
22                                                                   Chapter 2   Directives   59
1
                                                           Fortran
2               The syntax of the task construct is as follows:
3
4                 !$omp task [clause[[,] clause] ...]
5                     structured-block
6                 !$omp end task
8                          if(scalar-logical-expression)
9                          untied
10                         default(private | firstprivate | shared | none)
11                         private(list)
12                         firstprivate(list)
13                         shared(list)
14
15
                                                           Fortran
16              Binding
17              The binding thread set of the task region is the current parallel team. A task region
18              binds to the innermost enclosing parallel region.
19              Description
20              When a thread encounters a task construct, a task is generated from the code for the
21              associated structured block. The data environment of the task is created according to the
22              data-sharing attribute clauses on the task construct and any defaults that apply.
23              The encountering thread may immediately execute the task, or defer its execution. In the
24              latter case, any thread in the team may be assigned the task. Completion of the task can
25              be guaranteed using task synchronization constructs. A task construct may be nested
26              inside an outer task, but the task region of the inner task is not a part of the task
27              region of the outer task.
28              When an if clause is present on a task construct and the if clause expression
29              evaluates to false, the encountering thread must suspend the current task region and
30              begin execution of the generated task immediately, and the suspended task region may
19   Restrictions
20   Restrictions to the task construct are as follows:
21   • A program that branches into or out of a task region is non-conforming.
22   • A program must not depend on any ordering of the evaluations of the clauses of the
23     task directive, or on any side effects of the evaluations of the clauses.
24   • At most one if clause can appear on the directive.
25
                                             C/C++
26   • A throw executed inside a task region must cause execution to resume within the
27     same task region, and the same thread that threw the exception must catch it.
                                             C/C++
28
                                             Fortran
29   • Unsynchronized use of Fortran I/O statements by multiple tasks on the same unit has
30     unspecified behavior.
31
                                             Fortran
32                                                            Chapter 2      Directives     61
1    2.7.1      Task Scheduling
2               Whenever a thread reaches a task scheduling point, the implementation may cause it to
3               perform a task switch, beginning or resuming execution of a different task bound to the
4               current team. Task scheduling points are implied at the following locations:
5               • the point immediately following the generation of an explicit task
6               • after the last instruction of a task region
7               • in taskwait regions
8               • in implicit and explicit barrier regions.
9               In addition, implementations may insert task scheduling points in untied tasks anywhere
10              that they are not specifically prohibited in this specification.
11              When a thread encounters a task scheduling point it may do one of the following,
12              subject to the Task Scheduling Constraints (below):
13              • begin execution of a tied task bound to the current team.
14              • resume any suspended task region, bound to the current team, to which it is tied.
15              • begin execution of an untied task bound to the current team.
16              • resume any suspended untied task region bound to the current team.
17              If more than one of the above choices is available, it is unspecified as to which will be
18              chosen.
19              Task Scheduling Constraints
20              1. An explicit task whose construct contained an if clause whose if clause expression
21                 evaluated to false is executed immediately after generation of the task.
22              2. Other scheduling of new tied tasks is constrained by the set of task regions that are
23                 currently tied to the thread, and that are not suspended in a barrier region. If this set
24                 is empty, any new tied task may be scheduled. Otherwise, a new tied task may be
25                 scheduled only if it is a descendant of every task in the set.
26              A program relying on any other assumption about task scheduling is non-conforming.
27
28              Note – Task scheduling points dynamically divide task regions into parts. Each part is
29              executed uninterruptedly from start to end. Different parts of the same task region are
30              executed in the order in which they are encountered. In the absence of task
31              synchronization constructs, the order in which a thread executes parts of different
32              schedulable tasks is unspecified.
33              A correct program must behave correctly and consistently with all conceivable
34              scheduling sequences that are compatible with the rules above.
17
28           Summary
29           The master construct specifies a structured block that is executed by the master thread
30           of the team.
31                                                                     Chapter 2     Directives     63
1               Syntax
2
                                                       C/C++
3               The syntax of the master construct is as follows:
4
5                 #pragma omp master new-line
6                    structured-block
7
                                                       C/C++
8
                                                       Fortran
9               The syntax of the master construct is as follows:
10
11                !$omp master
12                   structured-block
13                !$omp end master
14
15
                                                       Fortran
16              Binding
17              The binding thread set for a master region is the current team. A master region
18              binds to the innermost enclosing parallel region. Only the master thread of the team
19              executing the binding parallel region participates in the execution of the structured
20              block of the master region.
21              Description
22              Other threads in the team do not execute the associated structured block. There is no
23              implied barrier either on entry to, or exit from, the master construct.
24 For an example of the master construct, see Section A.15 on page 195.
25              Restrictions
26
                                                       C/C++
27              • A throw executed inside a master region must cause execution to resume within the
28                 same master region, and the same thread that threw the exception must catch it.
                                                       C/C++
2            Summary
3            The critical construct restricts execution of the associated structured block to a
4            single thread at a time.
5            Syntax
6
                                                     C/C++
7            The syntax of the critical construct is as follows:
8
9              #pragma omp critical [(name)] new-line
10                structured-block
11
                                                     C/C++
12
                                                    Fortran
13           The syntax of the critical construct is as follows:
14
15             !$omp critical [(name)]
16                structured-block
17             !$omp end critical [(name)]
18
19
                                                    Fortran
20           Binding
21           The binding thread set for a critical region is all threads. Region execution is
22           restricted to a single thread at a time among all the threads in the program, without
23           regard to the team(s) to which the threads belong.
24           Description
25           An optional name may be used to identify the critical construct. All critical
26           constructs without a name are considered to have the same unspecified name. A thread
27           waits at the beginning of a critical region until no thread is executing a critical
28                                                                    Chapter 2     Directives       65
1               region with the same name. The critical construct enforces exclusive access with
2               respect to all critical constructs with the same name in all threads, not just those
3               threads in the current team.
4
                                                       C/C++
5               Identifiers used to identify a critical construct have external linkage and are in a
6               name space that is separate from the name spaces used by labels, tags, members, and
7               ordinary identifiers.
                                                       C/C++
8
                                                      Fortran
9               The names of critical constructs are global entities of the program. If a name
10              conflicts with any other entity, the behavior of the program is unspecified.
11
                                                      Fortran
12              For an example of the critical construct, see Section A.16 on page 197.
13              Restrictions
14
                                                       C/C++
15              • A throw executed inside a critical region must cause execution to resume within
16                 the same critical region, and the same thread that threw the exception must catch
17                 it.
                                                       C/C++
18
                                                      Fortran
19              The following restrictions apply to the critical construct:
20              • If a name is specified on a critical directive, the same name must also be
21                 specified on the end critical directive.
22              • If no name appears on the critical directive, no name can appear on the end
23                 critical directive.
24
                                                      Fortran
26              Summary
27              The barrier construct specifies an explicit barrier at the point at which the construct
28              appears.
6    Because the barrier construct does not have a C language statement as part of its
7    syntax, there are some restrictions on its placement within a program. The barrier
8    directive may be placed only at a point where a base language statement is allowed. The
9    barrier directive may not be used in place of the statement following an if, while, do,
10   switch, or label. See Appendix C for the formal grammar. The examples in Section A.23
11   on page 214 illustrate these restrictions.
                                            C/C++
12
                                           Fortran
13   The syntax of the barrier construct is as follows:
14
15     !$omp barrier
16
17
                                           Fortran
18   Binding
19   The binding thread set for a barrier region is the current team. A barrier region
20   binds to the innermost enclosing parallel region. See Section A.18 on page 200 for
21   examples.
22   Description
23   All threads of the team executing the binding parallel region must execute the
24   barrier region and complete execution of all explicit tasks generated in the binding
25   parallel region up to this point before any are allowed to continue execution beyond
26   the barrier.
27   The barrier region includes an implicit task scheduling point in the current task
28   region.
29                                                          Chapter 2      Directives    67
1               Restrictions
2               The following restrictions apply to the barrier construct:
3               • Each barrier region must be encountered by all threads in a team or by none at all.
4               • The sequence of worksharing regions and barrier regions encountered must be the
5                  same for every thread in a team.
7               Summary
8               The taskwait construct specifies a wait on the completion of child tasks generated
9               since the beginning of the current task.
10              Syntax
11
                                                        C/C++
12              The syntax of the taskwait construct is as follows:
13
14                #pragma omp taskwait newline
15              Because the taskwait construct does not have a C language statement as part of its
16              syntax, there are some restrictions on its placement within a program. The taskwait
17              directive may be placed only at a point where a base language statement is allowed. The
18              taskwait directive may not be used in place of the statement following an if, while,
19              do, switch, or label. See Appendix C for the formal grammar. The examples in
20              Section A.23 on page 214 illustrate these restrictions.
                                                        C/C++
21
                                                       Fortran
22              The syntax of the taskwait construct is as follows:
23
24                !$omp taskwait
25
26
                                                       Fortran
4            Description
5            The taskwait region includes an implicit task scheduling point in the current task
6            region. The current task region is suspended at the task scheduling point until execution
7            of all its child tasks generated before the taskwait region are completed.
9            Summary
10           The atomic construct ensures that a specific storage location is updated atomically,
11           rather than exposing it to the possibility of multiple, simultaneous writing threads.
12           Syntax
13
                                                     C/C++
14           The syntax of the atomic construct is as follows:
15
16             #pragma omp atomic new-line
17                expression-stmt
27                                                                    Chapter 2     Directives     69
1               • expr is an expression with scalar type, and it does not reference the variable
2                  designated by x.
3               • binop is one of +, *, -, /, &, ^, |, <<, or >>.
4               • binop, binop=, ++, and -- are not overloaded operators.
                                                         C/C++
5
                                                         Fortran
6               The syntax of the atomic construct is as follows:
7
8                 !$omp atomic
9                    statement
11 x = x operator expr
12 x = expr operator x
14 x = intrinsic_procedure_name (expr_list, x)
6    Description
7    Only the load and store of the variable designated by x are atomic; the evaluation of
8    expr is not atomic. No task scheduling points are allowed between the load and the store
9    of the variable designated by x. To avoid race conditions, all updates of the location that
10   could potentially occur in parallel must be protected with an atomic directive.
11   atomic regions do not enforce exclusive access with respect to any critical or
12   ordered regions that access the same storage location x. However, other OpenMP
13   synchronization can ensure the desired exclusive access. For example, a barrier
14   following a series of atomic updates to x guarantees that subsequent accesses do not
15   form a race with the atomic accesses.
16   A compliant implementation may enforce exclusive access between atomic regions
17   that update different storage locations. The circumstances under which this occurs are
18   implementation defined.
19   For an example of the atomic construct, see Section A.19 on page 202.
20   Restrictions
21
                                              C/C++
22   The following restriction applies to the atomic construct:
23   • All atomic references to the storage location x throughout the program are required to
24      have a compatible type. See Section A.20 on page 205 for examples.
                                              C/C++
25
                                             Fortran
26   The following restriction applies to the atomic construct:
27   • All atomic references to the storage location of variable x throughout the program are
28      required to have the same type and type parameters. See Section A.20 on page 205
29      for examples.
30
                                             Fortran
31                                                             Chapter 2      Directives     71
1               Cross References
2               • critical construct, see Section 2.8.2 on page 65.
4               Summary
5               The flush construct executes the OpenMP flush operation. This operation makes a
6               thread’s temporary view of memory consistent with memory, and enforces an order on
7               the memory operations of the variables explicitly specified or implied. See the memory
8               model description in Section 1.4 on page 13 for more details.
9               Syntax
10
                                                       C/C++
11              The syntax of the flush construct is as follows:
12
13                #pragma omp flush [(list)] new-line
14              Note that because the flush construct does not have a C language statement as part of
15              its syntax, there are some restrictions on its placement within a program. The
16              taskwait directive may be placed only at a point where a base language statement is
17              allowed. The taskwait directive may not be used in place of the statement following
18              an if, while, do, switch, or label. See Appendix C for the formal grammar. See
19              Section A.23 on page 214 for an example that illustrates these placement restrictions.
                                                      C/C++
20
                                                      Fortran
21              The syntax of the flush construct is as follows:
22
23                !$omp flush [(list)]
24
25
                                                      Fortran
7    Description
8    A flush construct with a list applies the flush operation to the items in the list, and
9    does not return until the operation is complete for all specified list items. A flush
10   construct without a list, executed on a given thread, operates as if the whole thread-
11   visible data state of the program, as defined by the base language, is flushed.
12
                                               C/C++
13   If a pointer is present in the list, the pointer itself is flushed, not the memory block to
14   which the pointer refers.
                                               C/C++
15
                                              Fortran
16   If the list item or a subobject of the list item has the POINTER attribute, the allocation
17   or association status of the POINTER item is flushed, but the pointer target is not. If the
18   list item is a Cray pointer, the pointer is flushed, but the object to which it points is not.
19   If the list item has the ALLOCATABLE attribute and the list item is allocated, the
20   allocated array is flushed; otherwise the allocation status is flushed.
21
                                              Fortran
22   For examples of the flush construct, see Section A.21 on page 208 and Section A.22
23   on page 211.
24                                                               Chapter 2      Directives      73
1
2               Note – the following examples illustrate the ordering properties of the flush operation.
3               In the following incorrect pseudocode example, the programmer intends to prevent
4               simultaneous execution of the critical section by the two threads, but the program does
5               not work properly because it does not enforce the proper ordering of the operations on
6               variables a and b.
7
8                 Incorrect example:
9
10                                                a = b = 0
11 thread 1 thread 2
12                       b = 1                                        a = 1
13                       flush(b)                                     flush(a)
14                       flush(a)                                     flush(b)
15                       if (a == 0) then                             if (b == 0) then
16                            critical section                                 critical section
17                       end if                                       end if
18
19              The problem with this example is that operations on variables a and b are not ordered
20              with respect to each other. For instance, nothing prevents the compiler from moving the
21              flush of b on thread 1 or the flush of a on thread 2 to a position completely after the
22              critical section (assuming that the critical section on thread 1 does not reference b and the
23              critical section on thread 2 does not reference a). If either re-ordering happens, both
24              threads can simultaneously executed the critical section.
25              The following correct pseudocode example correctly ensures that the critical section is
26              executed by not more than one of the two threads at any one time. Notice that execution of
27              the critical section by neither thread is considered correct in this example. This occurs if
28              both flushes complete prior to either thread executing its if statement.
29
30                Correct example:
31
32                                                a = b = 0
33 thread 1 thread 2
34                       b = 1                                        a = 1
35                       flush(a,b)                                   flush(a,b)
36                       if (a == 0) then                             if (b == 0) then
37                            critical section                                critical section
38                       end if                                       end if
39
24           Summary
25           The ordered construct specifies a structured block in a loop region that will be
26           executed in the order of the loop iterations. This sequentializes and orders the code
27           within an ordered region while allowing code outside the region to run in parallel.
28                                                                      Chapter 2      Directives     75
1               Syntax
2
                                                        C/C++
3               The syntax of the ordered construct is as follows:
4
5                 #pragma omp ordered new-line
6                    structured-block
7
                                                        C/C++
                                                       Fortran
8               The syntax of the ordered construct is as follows:
9
10                !$omp ordered
11                   structured-block
12                !$omp end ordered
13
14
                                                       Fortran
15              Binding
16              The binding thread set for an ordered region is the current team. An ordered region
17              binds to the innermost enclosing loop region. ordered regions that bind to different
18              loop regions execute independently of each other.
19              Description
20              The threads in the team executing the loop region execute ordered regions
21              sequentially in the order of the loop iterations. When the thread executing the first
22              iteration of the loop encounters an ordered construct, it can enter the ordered
23              region without waiting. When a thread executing any subsequent iteration encounters an
24              ordered region, it waits at the beginning of that ordered region until execution of
25              all the ordered regions belonging to all previous iterations have completed.
26              For examples of the ordered construct, see Section A.24 on page 215.
27              Restrictions
28              Restrictions to the ordered construct are as follows:
29              • The loop region to which an ordered region binds must have an ordered clause
30                 specified on the corresponding loop (or parallel loop) construct.
8            Cross References
9            • loop construct, see Section 2.5.1 on page 38.
10           • parallel loop construct, see Section 2.6.1 on page 54.
11
31                                                                      Chapter 2    Directives      77
1                • Section 2.9.1.2 on page 80 describes the data-sharing attribute rules for variables
2                   referenced in a region, but outside any construct.
36                                                             Chapter 2      Directives      79
1                Additional restrictions on the variables whose data-sharing attributes cannot be
2                implicitly determined in a task construct are described in the Restrictions section of
3                the firstprivate clause (Section 2.9.3.4 on page 92).
2            Summary
3            The threadprivate directive specifies that variables are replicated, with each thread
4            having its own copy.
5            Syntax
6
                                                     C/C++
7            The syntax of the threadprivate directive is as follows:
8
9              #pragma omp threadprivate(list) new-line
16           where list is a comma-separated list of named variables and named common blocks.
17           Common block names must appear between slashes.
18
                                                    Fortran
19           Description
20           Each copy of a threadprivate variable is initialized once, in the manner specified by the
21           program, but at an unspecified point in the program prior to the first reference to that
22           copy. The storage of all copies of a threadprivate variable is freed according to how
23           static variables are handled in the base language, but at an unspecified point in the
24           program.
25           A program in which a thread references another thread’s copy of a threadprivate variable
26           is non-conforming.
27                                                                    Chapter 2     Directives       81
1               The content of a threadprivate variable can change across a task scheduling point if the
2               executing thread switches to another schedulable task that modifies the variable. For
3               more details on task scheduling, see Section 1.3 on page 11 and Section 2.7 on page 59.
4               In parallel regions, references by the master thread will be to the copy of the
5               variable in the thread which encountered the parallel region.
6               During the sequential part references will be to the initial thread’s copy of the variable.
7               The values of data in the initial thread’s copy of a threadprivate variable are guaranteed
8               to persist between any two consecutive references to the variable in the program.
9               The values of data in the threadprivate variables of non-initial threads are guaranteed to
10              persist between two consecutive active parallel regions only if all the following
11              conditions hold:
12              • Neither parallel region is nested inside another explicit parallel region.
13              • The number of threads used to execute both parallel regions is the same.
14              • The value of the dyn-var internal control variable in the enclosing task region is false
15                 at entry to both parallel regions.
16              If these conditions all hold, and if a threadprivate variable is referenced in both regions,
17              then threads with the same thread number in their respective regions will reference the
18              same copy of that variable.
19
                                                         C/C++
20              If the above conditions hold, the storage duration, lifetime, and value of a thread’s copy
21              of a threadprivate variable that does not appear in any copyin clause on the second
22              region will be retained. Otherwise, the storage duration, lifetime, and value of a thread’s
23              copy of the variable in the second region is unspecified.
24              If the value of a variable referenced in an explicit initializer of a threadprivate variable
25              is modified prior to the first reference to any instance of the threadprivate variable, then
26              the behavior is unspecified.
27
28              Note – The order in which any constructors for different threadprivate variables of class
29              type are called is unspecified. The order in which any destructors for different
30              threadprivate variables of class type are called is unspecified.
31
32
                                                         C/C++
33
                                                         Fortran
34              A variable is affected by a copyin clause if the variable appears in the copyin clause
35              or it is in a common block that appears in the copyin clause.
26   Restrictions
27   The restrictions to the threadprivate directive are as follows:
28   • A threadprivate variable must not appear in any clause except the copyin,
29      copyprivate, schedule, num_threads, and if clauses.
30   • A program in which an untied task accesses threadprivate storage is non-conforming.
31
                                               C/C++
32   • A variable which is part of another variable (as an array or structure element) cannot
33      appear in a threadprivate clause unless it is a static data member of a C++
34      class.
35                                                               Chapter 2       Directives    83
1               • A threadprivate directive for file-scope variables must appear outside any
2                  definition or declaration, and must lexically precede all references to any of the
3                  variables in its list.
4               • A threadprivate directive for static class member variables must appear in the
5                  class definition, in the same scope in which the member variables are declared, and
6                  must lexically precede all references to any of the variables in its list.
7               • A threadprivate directive for namespace-scope variables must appear outside
8                  any definition or declaration other than the namespace definition itself, and must
9                  lexically precede all references to any of the variables in its list.
10              • Each variable in the list of a threadprivate directive at file, namespace, or class
11                 scope must refer to a variable declaration at file, namespace, or class scope that
12                 lexically precedes the directive.
13              • A threadprivate directive for static block-scope variables must appear in the
14                 scope of the variable and not in a nested scope. The directive must lexically precede
15                 all references to any of the variables in its list.
16              • Each variable in the list of a threadprivate directive in block scope must refer to
17                 a variable declaration in the same scope that lexically precedes the directive. The
18                 variable declaration must use the static storage-class specifier.
19              • If a variable is specified in a threadprivate directive in one translation unit, it
20                 must be specified in a threadprivate directive in every translation unit in which
21                 it is declared.
22              • The address of a threadprivate variable is not an address constant.
23              • A threadprivate variable must not have an incomplete type or a reference type.
24              • A threadprivate variable with class type must have:
25                 • an accessible, unambiguous default constructor in case of default initialization
26                   without a given initializer;
27                 • an accessible, unambiguous constructor accepting the given argument in case of
28                   direct initialization;
29                 • an accessible, unambiguous copy constructor in case of copy initialization with an
30                   explicit initializer.
                                                        C/C++
31
                                                        Fortran
32              • A variable which is part of another variable (as an array or structure element) cannot
33                 appear in a threadprivate clause.
34              • The threadprivate directive must appear in the declaration section of a scoping
35                 unit in which the common block or variable is declared. Although variables in
36                 common blocks can be accessed by use association or host association, common
15           Cross References:
16           • dyn-var ICV, see Section 2.3 on page 28.
17           • number of threads used to execute a parallel region, see Section 2.4.1 on page 35.
18           • copyin clause, see Section 2.9.4.1 on page 101.
31                                                                      Chapter 2       Directives     85
1
                                                          C/C++
2                If a variable referenced in a data-sharing attribute clause has a type derived from a
3                template, and there are no other references to that variable in the program, then any
4                behavior related to that variable is unspecified.
                                                          C/C++
5
                                                         Fortran
6                A named common block may be specified in a list by enclosing the name in slashes.
7                When a named common block appears in a list, it has the same meaning as if every
8                explicit member of the common block appeared in the list. An explicit member of a
9                common block is a variable that is named in a COMMON statement that specifies the
10               common block name and is declared in the same scoping unit in which the clause
11               appears.
12               Although variables in common blocks can be accessed by use association or host
13               association, common block names cannot. This means that a common block name
14               specified in a data-sharing attribute clause must be declared to be a common block in the
15               same scoping unit in which the data-sharing attribute clause appears.
16               When a named common block appears in a private, firstprivate,
17               lastprivate, or shared clause of a directive, none of its members may be declared
18               in another data-sharing attribute clause in that directive (see Section A.27 on page 227
19               for examples). When individual members of a common block appear in a private,
20               firstprivate, lastprivate, or reduction clause of a directive, the storage of
21               the specified variables is no longer associated with the storage of the common block
22               itself (see Section A.32 on page 237 for examples).
23
                                                         Fortran
25               Summary
26               The default clause allows the user to control the data-sharing attributes of variables that
27               are referenced in a parallel or task construct, and whose data-sharing attributes are
28               implicitly determined (see Section 2.9.1.1 on page 78).
6
                                             C/C++
7
                                            Fortran
8    The syntax of the default clause is as follows:
9
10     default(private | firstprivate | shared | none)
11
12
                                            Fortran
13   Description
14   The default(shared) clause causes all variables referenced in the construct that
15   have implicitly determined data-sharing attributes to be shared.
16
                                            Fortran
17   The default(firstprivate) clause causes all variables in the construct that have
18   implicitly determined data-sharing attributes to be firstprivate.
19   The default(private) clause causes all variables referenced in the construct that
20   have implicitly determined data-sharing attributes to be private.
21
                                            Fortran
22   The default(none) clause requires that each variable that is referenced in the
23   construct, and that does not have a predetermined data-sharing attribute, must have its
24   data-sharing attribute explicitly determined by being listed in a data-sharing attribute
25   clause. See Section A.28 on page 229 for examples.
26   Restrictions
27   The restrictions to the default clause are as follows:
28   • Only a single default clause may be specified on a parallel or task directive.
29                                                            Chapter 2     Directives     87
1    2.9.3.2     shared clause
2                Summary
3                The shared clause declares one or more list items to be shared by tasks generated by
4                a parallel or task construct.
5                Syntax
6                The syntax of the shared clause is as follows:
7
8                  shared(list)
9                Description
10               All references to a list item within a task refer to the storage area of the original variable
11               at the point the directive was encountered.
12               It is the programmer's responsibility to ensure, by adding proper synchronization, that
13               storage shared by an explicit task region does not reach the end of its lifetime before
14               the explicit task region completes its execution.
15
                                                          Fortran
16               The association status of a shared pointer becomes undefined upon entry to and on exit
17               from the parallel or task construct if it is associated with a target or a subobject of
18               a target that is in a private, firstprivate, lastprivate, or reduction
19               clause inside the construct.
20               Under certain conditions, passing a shared variable to a non-intrinsic procedure may
21               result in the value of the shared variable being copied into temporary storage before the
22               procedure reference, and back out of the temporary storage into the actual argument
23               storage after the procedure reference. It is implementation defined when this situation
24               occurs. See Section A.29 on page 231 for an example of this behavior.
25
26               Note – This situation may occur when the following three conditions hold regarding an
27               actual argument in a reference to a non-intrinsic procedure:
8                 c. The associated dummy argument for this actual argument is an explicit-shape array
9                    or an assumed-size array.
10             This effectively results in references to, and definitions of, the temporary storage during
11             the procedure reference. Any references to (or definitions of) the shared storage that is
12             associated with the dummy argument by any other task must be synchronized with the
13             procedure reference to avoid possible race conditions.
14
15
16
                                                         Fortran
18             Summary
19             The private clause declares one or more list items to be private to a task.
20             Syntax
21             The syntax of the private clause is as follows:
22
23               private(list)
24             Description
25             Each task that references a list item that appears in a private clause in any statement
26             in the construct receives a new list item whose language-specific attributes are derived
27             from the original list item. Inside the construct, all references to the original list item are
28             replaced by references to the new list item. In the rest of the region, it is unspecified
29             whether references are to the new list item or the original list item. Therefore, if an
30                                                                         Chapter 2       Directives      89
1               attempt is made to reference the original item, its value after the region is also
2               unspecified. If a task does not reference a list item that appears in a private clause, it
3               is unspecified whether that task receives a new list item.
4               The value and/or allocation status of the original list item will change only:
5               • if accessed and modified via pointer,
6               • if (possibly) accessed in the region but outside of the construct, or
7               • as a side effect of directives or clauses.
8               List items that appear in a private, firstprivate, or reduction clause in a
9               parallel construct may also appear in a private clause in an enclosed parallel,
10              task, or workshare construct. List items that appear in a private or
11              firstprivate clause in a task construct may also appear in a private clause in
12              an enclosed parallel or task construct. List items that appear in a private,
13              firstprivate, lastprivate, or reduction clause in a workshare construct
14              may also appear in a private clause in an enclosed parallel or task construct.
15              See Section A.31 on page 235 for an example.
16
                                                          C/C++
17              A new list item of the same type, with automatic storage duration, is allocated for the
18              construct. The storage and thus lifetime of these list items lasts until the block in which
19              they are created exits. The size and alignment of the new list item are determined by the
20              type of the variable. This allocation occurs once for each task generated by the
21              construct, if the task references the list item in any statement.
22              The new list item is initialized, or has an undefined initial value, as if it had been locally
23              declared without an initializer. The order in which any default constructors for different
24              private variables of class type are called is unspecified. The order in which any
25              destructors for different private variables of class type are called is unspecified.
                                                          C/C++
26
                                                         Fortran
27              A new list item of the same type is allocated once for each implicit task in the
28              parallel region, or for each task generated by a task construct, if the construct
29              references the list item in any statement. The initial value of the new list item is
30              undefined. Within a parallel, worksharing, or task region, the initial status of a
31              private pointer is undefined.
32              For a list item with the ALLOCATABLE attribute:
33              • if the list item is "not currently allocated", the new list item will have an initial state
34                 of "not currently allocated";
35              • if the list item is allocated, the new list item will have an initial state of allocated
36                 with the same array bounds.
14   Restrictions
15   The restrictions to the private clause are as follows:
16   • A variable which is part of another variable (as an array or structure element) cannot
17     appear in a private clause.
18
                                             C/C++
19   • A variable of class type (or array thereof) that appears in a private clause requires
20     an accessible, unambiguous default constructor for the class type.
21   • A variable that appears in a private clause must not have a const-qualified type
22     unless it is of class type with a mutable member.
23   • A variable that appears in a private clause must not have an incomplete type or a
24     reference type.
                                             C/C++
25
                                             Fortran
26   • A variable that appears in a private clause must either be definable, or an
27     allocatable array.
28   • Assumed-size arrays may not appear in a private clause.
29   • Variables that appear in namelist statements, in variable format expressions, and in
30     expressions for statement function definitions, may not appear in a private clause.
31
                                             Fortran
32                                                            Chapter 2      Directives        91
1    2.9.3.4     firstprivate clause
2                Summary
3                The firstprivate clause declares one or more list items to be private to a task, and
4                initializes each of them with the value that the corresponding original item has when the
5                construct is encountered.
6                Syntax
7                The syntax of the firstprivate clause is as follows:
8
9                  firstprivate(list)
10               Description
11               The firstprivate clause provides a superset of the functionality provided by the
12               private clause.
13               A list item that appears in a firstprivate clause is subject to the private clause
14               semantics described in Section 2.9.3.3 on page 89. In addition, the new list item is
15               initialized from the original list item existing before the construct. The initialization of
16               the new list item is done once for each task that references the list item in any statement
17               in the construct. The initialization is done prior to the execution of the construct.
18               For a firstprivate clause on a parallel or task construct, the initial value of
19               the new list item is the value of the original list item that exists immediately prior to the
20               construct in the task region where the construct is encountered. For a firstprivate
21               clause on a worksharing construct, the initial value of the new list item for each implicit
22               task of the threads that execute the worksharing construct is the value of the original list
23               item that exists in the implicit task immediately prior to the point in time that the
24               worksharing construct is encountered.
25               To avoid race conditions, concurrent updates of the original list item must be
26               synchronized with the read of the original list item that occurs as a result of the
27               firstprivate clause.
28               If a list item appears in both firstprivate and lastprivate clauses, the update
29               required for lastprivate occurs after all the initializations for firstprivate.
11   Restrictions
12   The restrictions to the firstprivate clause are as follows:
13   • A variable which is part of another variable (as an array or structure element) cannot
14      appear in a firstprivate clause.
15   • A list item that is private within a parallel region must not appear in a
16      firstprivate clause on a worksharing construct if any of the worksharing
17      regions arising from the worksharing construct ever bind to any of the parallel
18      regions arising from the parallel construct.
19   • A list item that appears in a reduction clause of a parallel construct must not
20      appear in a firstprivate clause on a worksharing or task construct if any of
21      the worksharing or task regions arising from the worksharing or task construct
22      ever bind to any of the parallel regions arising from the parallel construct.
23   • A list item that appears in a reduction clause in a worksharing construct must not
24      appear in a firstprivate clause in a task construct encountered during execution
25      of any of the worksharing regions arising from the worksharing construct.
26
                                              C/C++
27   • A variable of class type (or array thereof) that appears in a firstprivate clause
28      requires an accessible, unambiguous copy constructor for the class type.
29   • A variable that appears in a firstprivate clause must not have a const-
30      qualified type unless it is of class type with a mutable member.
31   • A variable that appears in a firstprivate clause must not have an incomplete
32      type or a reference type.
                                              C/C++
33
                                             Fortran
34   • A variable that appears in a firstprivate clause must be definable.
35                                                             Chapter 2      Directives     93
1                • Fortran pointers, Cray pointers, and assumed-size arrays may not appear in a
2                   firstprivate clause.
3                • Variables that appear in namelist statements, in variable format expressions, and in
4                   expressions for statement function definitions, may not appear in a firstprivate
5                   clause.
6
                                                          Fortran
8                Summary
9                The lastprivate clause declares one or more list items to be private to an implicit
10               task, and causes the corresponding original list item to be updated after the end of the
11               region.
12               Syntax
13               The syntax of the lastprivate clause is as follows:
14
15                 lastprivate(list)
16               Description
17               The lastprivate clause provides a superset of the functionality provided by the
18               private clause.
19               A list item that appears in a lastprivate clause is subject to the private clause
20               semantics described in Section 2.9.3.3 on page 89. In addition, when a lastprivate
21               clause appears on the directive that identifies a worksharing construct, the value of each
22               new list item from the sequentially last iteration of the associated loops, or the lexically
23               last section construct, is assigned to the original list item.
24
                                                          C/C++
25               For a (possibly multi-dimensional) array of elements of non-array type, each element is
26               assigned to the corresponding element of the original array.
                                                          C/C++
27               List items that are not assigned a value by the sequentially last iteration of the loops, or
28               by the lexically last section construct, have unspecified values after the construct.
29               Unassigned subcomponents also have unspecified values after the construct.
12   Restrictions
13   The restrictions to the lastprivate clause are as follows:
14   • A variable which is part of another variable (as an array or structure element) cannot
15      appear in a lastprivate clause.
16   • A list item that is private within a parallel region, or that appears in the
17      reduction clause of a parallel construct, must not appear in a lastprivate
18      clause on a worksharing construct if any of the corresponding worksharing regions
19      ever binds to any of the corresponding parallel regions.
20
                                              C/C++
21   • A variable of class type (or array thereof) that appears in a lastprivate clause
22      requires an accessible, unambiguous default constructor for the class type, unless the
23      list item is also specified in a firstprivate clause.
24   • A variable of class type (or array thereof) that appears in a lastprivate clause
25      requires an accessible, unambiguous copy assignment operator for the class type.The
26      order in which copy assignment operators for different variables of class type are
27      called is unspecified.
28   • A variable that appears in a lastprivate clause must not have a const-qualified
29      type unless it is of class type with a mutable member.
30   • A variable that appears in a lastprivate clause must not have an incomplete type
31      or a reference type.
                                              C/C++
32
                                              Fortran
33   • A variable that appears in a lastprivate clause must be definable.
34                                                              Chapter 2      Directives     95
1                • An original list item with the ALLOCATABLE attribute must be in the allocated state
2                    at entry to the construct containing the lastprivate clause. The list item in the
3                    sequentially last iteration or lexically last section must be in the allocated state upon
4                    exit from that iteration or section with the same bounds as the corresponding original
5                    list item.
6                • Fortran pointers, Cray pointers, and assumed-size arrays may not appear in a
7                    lastprivate clause.
8                • Variables that appear in namelist statements, in variable format expressions, and in
9                    expressions for statement function definitions, may not appear in a lastprivate
10                   clause.
11
                                                           Fortran
13               Summary
14               The reduction clause specifies an operator and one or more list items. For each list
15               item, a private copy is created in each implicit task, and is initialized appropriately for
16               the operator. After the end of the region, the original list item is updated with the values
17               of the private copies using the specified operator.
18               Syntax
19
                                                           C/C++
20               The syntax of the reduction clause is as follows:
21
22                   reduction(operator:list)
23               The following table lists the operators that are valid and their initialization values. The
24               actual initialization value depends on the data type of the reduction list item.
25
26               Operator    Initialization value
27 + 0
28 * 1
29 - 0
3 | 0
4 ^ 0
5    &&          1
6    ||          0
7
                                                C/C++
8
                                                Fortran
9    The syntax of the reduction clause is as follows:
10
11       reduction({operator | intrinsic_procedure_name}:list)
12   The following table lists the operators and intrinsic_procedure_names that are valid and
13   their initialization values. The actual initialization value depends on the data type of the
14   reduction list item.
15
16   Operator/
17   Intrinsic   Initialization value
18 + 0
19 * 1
20 - 0
21 .and. .true.
22 .or. .false.
23 .eqv. .true.
24   .neqv.      .false.
25   max         Most negative representable number in the
26               reduction list item type
27   min         Largest representable number in the reduction
28               list item type
29   iand        All bits on
30   ior         0
31   ieor        0
32
33
                                                Fortran
34                                                               Chapter 2     Directives     97
1               Description
2               The reduction clause can be used to perform some forms of recurrence calculations
3               (involving mathematically associative and commutative operators) in parallel.
4               A private copy of each list item is created, one for each implicit task, as if the private
5               clause had been used. The private copy is then initialized to the initialization value for
6               the operator, as specified above. At the end of the region for which the reduction
7               clause was specified, the original list item is updated by combining its original value
8               with the final value of each of the private copies, using the operator specified. (The
9               partial results of a subtraction reduction are added to form the final value.)
10              If nowait is not used, the reduction computation will be complete at the end of the
11              construct; however, if the reduction clause is used on a construct to which nowait is
12              also applied, accesses to the original list item will create a race and, thus, have
13              unspecified effect unless synchronization ensures that they occur after all threads have
14              executed all of their iterations or section constructs, and the reduction computation
15              has completed and stored the computed value of that list item. This can most simply be
16              ensured through a barrier synchronization.
17              The order in which the values are combined is unspecified. Therefore, comparing
18              sequential and parallel runs, or comparing one parallel run to another (even if the
19              number of threads used is the same), there is no guarantee that bit-identical results will
20              be obtained or that side effects (such as floating point exceptions) will be identical.
21              To avoid race conditions, concurrent reads or updates of the original list item must be
22              synchronized with the update of the original list item that occurs as a result of the
23              reduction computation.
24
25              Note – List items specified in a reduction clause are typically used in the enclosed
26              region in certain forms.
27
                                                         C/C++
28              A reduction is typically specified for statements of the form:
29
30                x = x op expr
31                x binop= expr
32                x = expr op x              (except for subtraction)
33                x++
34                ++x
35                x--
36                --x
9    where op is +, *, -, .and., .or., .eqv., or .neqv., the expression does not involve
10   x, and the reduction op is the last operation performed on the right hand side.
11   A reduction using an intrinsic is typically specified for statements of the form:
12
13     x = intr(x,expr_list)
14     x = intr(expr_list, x)
15   where intr is max, min, iand, ior, or ieor and expr_list is a comma separated list of
16   expressions not involving x.
17
                                             Fortran
18   For examples, see Section A.35 on page 242.
19
20   Restrictions
21   The restrictions to the reduction clause are as follows:
22   • A list item that appears in a reduction clause of a worksharing construct must be
23     shared in the parallel regions to which any of the worksharing regions arising
24     from the worksharing construct bind.
25   • A list item that appears in a reduction clause of the innermost enclosing
26     worksharing or parallel construct may not be accessed in an explicit task.
27   • Any number of reduction clauses can be specified on the directive, but a list item
28     can appear only once in the reduction clause(s) for that directive.
29
                                                C/C++
30   • The type of a list item that appears in a reduction clause must be valid for the
31     reduction operator.
32   • Aggregate types (including arrays), pointer types and reference types may not appear
33     in a reduction clause.
34                                                            Chapter 2      Directives   99
1               • A list item that appears in a reduction clause must not be const-qualified.
2               • The operator specified in a reduction clause cannot be overloaded with respect to
3                  the list items that appear in that clause.
                                                          C/C++
4
                                                         Fortran
5               • The type of a list item that appears in a reduction clause must be valid for the
6                  reduction operator or intrinsic.
7               • A list item that appears in a reduction clause must be definable.
8               • A list item that appears in a reduction clause must be a named variable of
9                  intrinsic type.
10              • An original list item with the ALLOCATABLE attribute must be in the allocated state
11                 at entry to the construct containing the reduction clause. Additionally, the list item
12                 must not be deallocated and/or allocated within the region.
13              • Fortran pointers, Cray pointers and assumed-size arrays may not appear in a
14                 reduction clause.
15              • Operators specified must be intrinsic operators and any intrinsic_procedure_name
16                 must refer to one of the allowed intrinsic procedures. Assignment to the reduction list
17                 items must be via intrinsic assignment. See Section A.35 on page 242 for examples.
18
                                                         Fortran
2              Summary
3              The copyin clause provides a mechanism to copy the value of the master thread’s
4              threadprivate variable to the threadprivate variable of each other member of the team
5              executing the parallel region.
6              Syntax
7              The syntax of the copyin clause is as follows:
8
9                copyin(list)
10             Description
11
                                                       C/C++
12             The copy is done after the team is formed and prior to the start of execution of the
13             associated structured block. For variables of non-array type, the copy occurs by copy
14             assignment. For a (possibly multi-dimensional) array of elements of non-array type,
15             each element is copied as if by assignment from an element of the master thread’s array
16             to the corresponding element of the other thread’s array. For class types, the copy
17             assignment operator is invoked. The order in which copy assignment operators for
18             different variables of class type are called is unspecified.
                                                       C/C++
19
                                                      Fortran
20             The copy is done, as if by assignment, after the team is formed and prior to the start of
21             execution of the associated structured block.
22             On entry to any parallel region, each thread’s copy of a variable that is affected by
23             a copyin clause for the parallel region will acquire the allocation, association, and
24             definition status of the master thread’s copy, according to the following rules:
25             • If it has the POINTER attribute:
26               • if the master thread’s copy is associated with a target that each copy can become
27                 associated with, each copy will become associated with the same target;
28               • if the master thread’s copy is disassociated, each copy will become disassociated;
29               • otherwise, each copy will have an undefined association status.
5                Restrictions
6                The restrictions to the copyin clause are as follows:
7
                                                         C/C++
8                • A list item that appears in a copyin clause must be threadprivate.
9                • A variable of class type (or array thereof) that appears in a copyin clause requires
10                  an accessible, unambiguous copy assignment operator for the class type.
                                                         C/C++
11
                                                        Fortran
12               • A list item that appears in a copyin clause must be threadprivate. Named variables
13                  appearing in a threadprivate common block may be specified: it is not necessary to
14                  specify the whole common block.
15               • A common block name that appears in a copyin clause must be declared to be a
16                  common block in the same scoping unit in which the copyin clause appears.
17               • An array with the ALLOCATABLE attribute must be in the allocated state. Each
18                  thread's copy of that array must be allocated with the same bounds.
19
                                                        Fortran
21               Summary
22               The copyprivate clause provides a mechanism to use a private variable to broadcast
23               a value from the data environment of one implicit task to the data environments of the
24               other implicit tasks belonging to the parallel region.
25               To avoid race conditions, concurrent reads or updates of the list item must be
26               synchronized with the update of the list item that occurs as a result of the
27               copyprivate clause.
5    Description
6    The effect of the copyprivate clause on the specified list items occurs after the
7    execution of the structured block associated with the single construct (see
8    Section 2.5.3 on page 49), and before any of the threads in the team have left the barrier
9    at the end of the construct.
10
                                              C/C++
11   In all other implicit tasks belonging to the parallel region, each specified list item
12   becomes defined with the value of the corresponding list item in the implicit task whose
13   thread executed the structured block. For variables of non-array type, the definition
14   occurs by copy assignment. For a (possibly multi-dimensional) array of elements of non-
15   array type, each element is copied by copy assignment from an element of the array in
16   the data environment of the implicit task whose thread executed the structured block to
17   the corresponding element of the array in the data environment of the other implicit
18   tasks. For class types, a copy assignment operator is invoked. The order in which copy
19   assignment operators for different variables of class type are called is unspecified.
                                              C/C++
20
                                             Fortran
21   If a list item is not a pointer, then in all other implicit tasks belonging to the parallel
22   region, the list item becomes defined (as if by assignment) with the value of the
23   corresponding list item in the implicit task whose thread executed the structured block.
24   If the list item is a pointer, then in all other implicit tasks belonging to the parallel
25   region, the list item becomes pointer associated (as if by pointer assignment) with the
26   corresponding list item in the implicit task whose thread executed the structured block.
27
                                             Fortran
28   For examples of the copyprivate clause, see Section A.37 on page 250.
29
30   Note – The copyprivate clause is an alternative to using a shared variable for the
31   value when providing such a shared variable would be difficult (for example, in a
32   recursion requiring a different variable at each level).
33
16
5                  This chapter describes the OpenMP API runtime library routines and is divided into the
6                  following sections:
7                  • Runtime library definitions (Section 3.1 on page 108).
8                  • Execution environment routines that can be used to control and query the parallel
9                    execution environment (Section 3.2 on page 109).
10                 • Lock routines that can be used to synchronize access to data (Section 3.3 on page
11                   134).
12                 • Portable timer routines (Section 3.4 on page 142).
13                 Throughout this chapter, true and false are used as generic terms to simplify the
14                 description of the routines.
15
                                                          C/C++
16                 true means a nonzero integer value and false means an integer value of zero.
                                                          C/C++
17
                                                          Fortran
18                 true means a logical value of .TRUE. and false means a logical value of .FALSE..
19
                                                          Fortran
20
                                                          Fortran
21                 Restrictions
22                 The following restriction applies to all OpenMP runtime library routines:
23                 • OpenMP runtime library routines may not be called from PURE or ELEMENTAL
24                   procedures.
25
                                                          Fortran
26                                                                                                     107
1
20                Interface declarations for the OpenMP Fortran runtime library routines described in this
21                chapter shall be provided in the form of a Fortran include file named omp_lib.h or
22                a Fortran 90 module named omp_lib. It is implementation defined whether the
23                include file or the module file (or both) is provided.
2          It is implementation defined whether any of the OpenMP runtime library routines that
3          take an argument are extended with a generic interface so arguments of different KIND
4          type can be accommodated. See Appendix D.4 for an example of such an extension.
5
                                                  Fortran
2               Summary
3               The omp_set_num_threads routine affects the number of threads to be used for
4               subsequent parallel regions that do not specify a num_threads clause, by setting
5               the value of the nthreads-var ICV.
6               Format
                                                        C/C++
7
8                 void omp_set_num_threads(int num_threads);
9
                                                        C/C++
                                                        Fortran
10
11                subroutine omp_set_num_threads(num_threads)
12                integer num_threads
13
14
                                                        Fortran
15              Constraints on Arguments
16              The value of the argument passed to this routine must evaluate to a positive integer, or
17              else the behavior of this routine is implementation defined.
18              Binding
19              The binding task set for an omp_set_num_threads region is the generating task.
20              Effect
21              The effect of this routine is to set the value of the nthreads-var ICV to the value
22              specified in the argument.
23              See Section 2.4.1 on page 35 for the rules governing the number of threads used to
24              execute a parallel region.
3            Cross References
4            • nthreads-var ICV, see Section 2.3 on page 28.
5            • OMP_NUM_THREADS environment variable, see Section 4.2 on page 147.
6            • omp_get_max_threads routine, see Section 3.2.3 on page 112.
7            • parallel construct, see Section 2.4 on page 32.
8            • num_threads clause, see Section 2.4 on page 32.
9 3.2.2 omp_get_num_threads
10           Summary
11           The omp_get_num_threads routine returns the number of threads in the current
12           team.
13           Format
                                                   C/C++
14
15             int omp_get_num_threads(void);
16
                                                  C/C++
                                                  Fortran
17
18             integer function omp_get_num_threads()
19
20
                                                  Fortran
21           Binding
22           The binding region for an omp_get_num_threads region is the innermost enclosing
23           parallel region.
6               See Section 2.4.1 on page 35 for the rules governing the number of threads used to
7               execute a parallel region.
8               Cross References
9               • parallel construct, see Section 2.4 on page 32.
10              • omp_set_num_threads routine, see Section 3.2.1 on page 110.
11              • OMP_NUM_THREADS environment variable, see Section 4.2 on page 147.
12 3.2.3 omp_get_max_threads
13              Summary
14              The omp_get_max_threads routine returns an upper bound on the number of
15              threads that could be used to form a new team if a parallel region without a
16              num_threads clause were encountered after execution returns from this routine.
17              Format
                                                       C/C++
18
19                int omp_get_max_threads(void);
20
                                                       C/C++
                                                       Fortran
21
22                integer function omp_get_max_threads()
23
24
                                                       Fortran
3            Effect
4            The value returned by omp_get_max_threads is the value of the nthreads-var ICV.
5            This value is also an upper bound on the number of threads that could be used to form a
6            new team if a parallel region without a num_threads clause were encountered after
7            execution returns from this routine.
8            See Section 2.4.1 on page 35 for the rules governing the number of threads used to
9            execute a parallel region.
10
15           Cross References
16           • nthreads-var ICV, see Section 2.3 on page 28.
17           • parallel construct, see Section 2.4 on page 32.
18           • num_threads clause, see Section 2.4 on page 32.
19           • omp_set_num_threads routine, see Section 3.2.1 on page 110.
20           • OMP_NUM_THREADS environment variable, see Section 4.2 on page 147.
21 3.2.4 omp_get_thread_num
22           Summary
23           The omp_get_thread_num routine returns the thread number, within the current
24           team, of the thread executing the implicit or explicit task region from which
25           omp_get_thread_num is called.
9               Binding
10              The binding thread set for an omp_get_thread_num region is the current team. The
11              binding region for an omp_get_thread_num region is the innermost enclosing
12              parallel region.
13              Effect
14              The omp_get_thread_num routine returns the thread number of the current thread,
15              within the team executing the parallel region to which the routine region binds. The
16              thread number is an integer between 0 and one less than the value returned by
17              omp_get_num_threads, inclusive. The thread number of the master thread of the
18              team is 0. The routine returns 0 if it is called from the sequential part of a program.
19
20              Note – The thread number may change at any time during the execution of an untied
21              task. The value returned by omp_get_thread_num is not generally useful during the
22              execution of such a task region.
23
24              Cross References
25              • omp_get_num_threads routine, see Section 3.2.2 on page 111.
2            Summary
3            The omp_get_num_procs routine returns the number of processors available to the
4            program.
5            Format
                                                     C/C++
6
7              int omp_get_num_procs(void);
8
                                                     C/C++
                                                    Fortran
9
10             integer function omp_get_num_procs()
11
12
                                                    Fortran
13           Binding
14           The binding thread set for an omp_get_num_procs region is all threads. The effect
15           of executing this routine is not related to any specific region corresponding to any
16           construct or API routine.
17           Effect
18           The omp_get_num_procs routine returns the number of processors that are available
19           to the program at the time the routine is called. Note that this value may change between
20           the time that it is determined by the omp_get_num_procs routine and the time that it
21           is read in the calling context due to system actions outside the control of the OpenMP
22           implementation.
2               Summary
3               The omp_in_parallel routine returns true if the call to the routine is enclosed by an
4               active parallel region; otherwise, it returns false.
5               Format
                                                      C/C++
6
7                 int omp_in_parallel(void);
8
                                                      C/C++
                                                     Fortran
9
10                logical function omp_in_parallel()
11
12
                                                     Fortran
13              Binding
14              The binding thread set for an omp_in_parallel region is all threads. The effect of
15              executing this routine is not related to any specific parallel region but instead
16              depends on the state of all enclosing parallel regions.
17              Effect
18              omp_in_parallel returns true if any enclosing parallel region is active. If the
19              routine call is enclosed by only inactive parallel regions (including the implicit
20              parallel region), then it returns false.
2            Summary
3            The omp_set_dynamic routine enables or disables dynamic adjustment of the
4            number of threads available for the execution of subsequent parallel regions by
5            setting the value of the dyn-var ICV.
6            Format
                                                    C/C++
7
8              void omp_set_dynamic(int dynamic_threads);
9
                                                   C/C++
                                                   Fortran
10
11             subroutine omp_set_dynamic (dynamic_threads)
12             logical dynamic_threads
13
14
                                                   Fortran
15           Binding
16           The binding task set for an omp_set_dynamic region is the generating task.
17           Effect
18           For implementations that support dynamic adjustment of the number of threads, if the
19           argument to omp_set_dynamic evaluates to true, dynamic adjustment is enabled;
20           otherwise, dynamic adjustment is disabled. For implementations that do not support
21           dynamic adjustment of the number of threads this routine has no effect: the value of
22           dyn-var remains false.
23           For an example of the omp_set_dynamic routine, see Section A.40 on page 265.
24           See Section 2.4.1 on page 35 for the rules governing the number of threads used to
25           execute a parallel region.
6 3.2.8 omp_get_dynamic
7               Summary
8               The omp_get_dynamic routine returns the value of the dyn-var ICV, which
9               determines whether dynamic adjustment of the number of threads is enabled or disabled.
10              Format
                                                       C/C++
11
12                int omp_get_dynamic(void);
13
                                                       C/C++
                                                       Fortran
14
15                logical function omp_get_dynamic()
16
17
                                                       Fortran
18              Binding
19              The binding task set for an omp_get_dynamic region is the generating task.
20              Effect
21              This routine returns true if dynamic adjustment of the number of threads is enabled; it
22              returns false, otherwise. If an implementation does not support dynamic adjustment of
23              the number of threads, then this routine always returns false.
3            Cross References
4            • dyn-var ICV, see Section 2.3 on page 28.
5            • omp_set_dynamic routine, see Section 3.2.7 on page 117.
6            • OMP_DYNAMIC environment variable, see Section 4.3 on page 148.
7 3.2.9 omp_set_nested
8            Summary
9            The omp_set_nested routine enables or disables nested parallelism, by setting the
10           nest-var ICV.
11           Format
                                                    C/C++
12
13             void omp_set_nested(int nested);
14
                                                   C/C++
                                                   Fortran
15
16             subroutine omp_set_nested (nested)
17             logical nested
18
19
                                                   Fortran
20           Binding
21           The binding task set for an omp_set_nested region is the generating task.
6               See Section 2.4.1 on page 35 for the rules governing the number of threads used to
7               execute a parallel region.
8               Cross References
9               • nest-var ICV, see Section 2.3 on page 28.
10              • omp_set_max_active_levels routine, see Section 3.2.14 on page 126.
11              • omp_get_max_active_levels routine, see Section 3.2.15 on page 127.
12              • omp_get_nested routine, see Section 3.2.10 on page 120.
13              • OMP_NESTED environment variable, see Section 4.4 on page 148.
14 3.2.10 omp_get_nested
15              Summary
16              The omp_get_nested routine returns the value of the nest-var ICV, which
17              determines if nested parallelism is enabled or disabled.
18              Format
                                                        C/C++
19
20                int omp_get_nested(void);
21
                                                        C/C++
                                                       Fortran
22
23                logical function omp_get_nested()
24
25
                                                       Fortran
3             Effect
4             This routine returns true if nested parallelism is enabled; it returns false, otherwise. If an
5             implementation does not support nested parallelism, this routine always returns false.
6             See Section 2.4.1 on page 35 for the rules governing the number of threads used to
7             execute a parallel region.
8             Cross References
9             • nest-var ICV, see Section 2.3 on page 28.
10            • omp_set_nested routine, see Section 3.2.9 on page 119.
11            • OMP_NESTED environment variable, see Section 4.4 on page 148.
12 3.2.11 omp_set_schedule
13            Summary
14            The omp_set_schedule routine affects the schedule that is applied when runtime
15            is used as schedule kind, by setting the value of the run-sched-var ICV.
16            Format
17
                                                        C/C++
18
19              void omp_set_schedule(omp_sched_t kind, int modifier);
20
                                                        C/C++
6
7
                                                      Fortran
8               Constraints on Arguments
9               The first argument passed to this routine can be one of the valid OpenMP schedule kinds
10              (except for runtime) or any implementation specific schedule. The C/C++ header file
11              (omp.h) and the Fortran include file (omp_lib.h) and/or Fortran 90 module file
12              (omp_lib) define the valid constants. The valid constants must include the following,
13              which can be extended with implementation specific values:
14
                                                       C/C++
15
16                typedef enum omp_sched_t {
17                    omp_sched_static = 1,
18                    omp_sched_dynamic = 2,
19                    omp_sched_guided = 3,
20                    omp_sched_auto = 4
21                } omp_sched_t;
22
23
                                                       C/C++
24
                                                      Fortran
25
26                integer(kind=omp_sched_kind),          parameter    ::   omp_sched_static = 1
27                integer(kind=omp_sched_kind),          parameter    ::   omp_sched_dynamic = 2
28                integer(kind=omp_sched_kind),          parameter    ::   omp_sched_guided = 3
29                integer(kind=omp_sched_kind),          parameter    ::   omp_sched_auto = 4
30
31
32
                                                      Fortran
3             Effect
4             The effect of this routine is to set the value of the run-sched-var ICV to the values
5             specified in the two arguments. The schedule is set to the schedule type specified by the
6             first argument kind. It can be any of the standard schedule types or any other
7             implementation specific one. For the schedule types static, dynamic, and guided
8             the chunk_size is set to the value of the second argument, or to the default chunk_size if
9             the value of the second argument is less than 1; for the schedule type auto the second
10            argument has no meaning; for implementation specific schedule types, the values and
11            associated meanings of the second argument are implementation defined.
12            Cross References
13            • run-sched-var ICV, see Section 2.3 on page 28.
14            • omp_get_schedule routine, see Section 3.2.12 on page 123.
15            • OMP_SCHEDULE environment variable, see Section 4.1 on page 146.
16            • Determining the schedule of a worksharing loop, see Section 2.5.1.1 on page 45.
17 3.2.12 omp_get_schedule
18            Summary
19            The omp_get_schedule routine returns the schedule that is applied when the
20            runtime schedule is used.
5
                                                       C/C++
6
                                                      Fortran
7
8                 subroutine omp_get_schedule(kind, modifier)
9                 integer (kind=omp_sched_kind) kind
10                integer modifier
11
12
                                                      Fortran
13              Binding
14              The binding task set for an omp_get_schedule region is the generating task.
15              Effect
16              This routine returns the run-sched-var ICV in the team executing the parallel region
17              to which the routine binds. The first argument kind returns the schedule to be used. It
18              can be any of the standard schedule types as defined in Section 3.2.11 on page 121, or
19              any implementation specific schedule type. The second argument is interpreted as in the
20              omp_set_schedule call, defined in Section 3.2.11 on page 121.
21              Cross References
22              • run-sched-var ICV, see Section 2.3 on page 28.
23              • omp_set_schedule routine, see Section 3.2.11 on page 121.
24              • OMP_SCHEDULE environment variable, see Section 4.1 on page 146.
25              • Determining the schedule of a worksharing loop, see Section 2.5.1.1 on page 45.
2             Summary
3             The omp_get_thread_limit routine returns the maximum number of OpenMP
4             threads available to the program.
5             Format
6
                                                       C/C++
7
8               int omp_get_thread_limit(void)
9
                                                       C/C++
10
                                                      Fortran
11
12              integer function omp_get_thread_limit()
13
14
                                                      Fortran
15            Binding
16            The binding thread set for an omp_get_thread_limit region is all threads. The
17            effect of executing this routine is not related to any specific region corresponding to any
18            construct or API routine.
19            Effect
20            The omp_get_thread_limit routine returns the maximum number of OpenMP
21            threads available to the program as stored in the ICV thread-limit-var.
4 3.2.14 omp_set_max_active_levels
5               Summary
6               The omp_set_max_active_levels routine limits the number of nested active
7               parallel regions, by setting the max-active-levels-var ICV.
8               Format
9
                                                        C/C++
10
11                void omp_set_max_active_levels (int max_levels)
12
                                                        C/C++
13
                                                       Fortran
14
15                subroutine omp_set_max_active_levels (max_levels)
16                integer max_levels
17
18
                                                       Fortran
19              Constraints on Arguments
20              The value of the argument passed to this routine must evaluate to a non-negative integer,
21              or else the behavior of this routine is implementation defined.
6             Effect
7             The effect of this routine is to set the value of the max-active-levels-var ICV to the value
8             specified in the argument.
9             If the number of parallel levels requested exceeds the number of levels of parallelism
10            supported by the implementation, the value of the max-active-levels-var ICV will be set
11            to the number of parallel levels support by the implementation.
12            This routine has the described effect only when called from the sequential part of the
13            program. When called from within an explicit parallel region, the effect of this
14            routine is implementation defined.
15            Cross References
16            • thread-limit-var ICV, see Section 2.3 on page 28.
17            • omp_get_max_active_levels routine, see Section 3.2.15 on page 127.
18            • OMP_MAX_ACTIVE_LEVELS environment variable, see Section 4.7 on page 150.
19 3.2.15 omp_get_max_active_levels
20            Summary
21            The omp_get_max_active_levels routine returns the value of the max-active-
22            levels-var ICV, which determines the maximum number of nested active parallel regions.
5
                                                       C/C++
6
                                                       Fortran
7
8                 integer function omp_get_max_active_levels()
9
10
                                                       Fortran
11              Binding
12              When called from the sequential part of the program, the binding thread set for an
13              omp_get_max_active_levels region is the encountering thread. When called
14              from within any explicit parallel region, the binding thread set (and binding region, if
15              required) for the omp_get_max_active_levels region is implementation defined.
16              Effect
17              The omp_get_max_active_levels routine returns the value of the max-active-
18              levels-var ICV, which determines the maximum number of nested active parallel regions.
19              Cross References
20              • thread-limit-var ICV, see Section 2.3 on page 28.
21              • omp_set_max_active_levels routine, see Section 3.2.14 on page 126.
22              • OMP_MAX_ACTIVE_LEVELS environment variable, see Section 4.7 on page 150.
2             Summary
3             The omp_get_level routine returns the number of nested parallel regions
4             enclosing the task that contains the call.
5             Format
6
                                                      C/C++
7
8               int omp_get_level(void)
9
                                                      C/C++
10
                                                      Fortran
11
12              integer function omp_get_level()
13
14
                                                      Fortran
15            Binding
16            The binding task set for an omp_get_level region is the generating task. The
17            binding region for an omp_get_level region is the innermost enclosing parallel
18            region.
19            Effect
20            The omp_get_level routine returns the number of nested parallel regions
21            (whether active or inactive) enclosing the task that contains the call, not including the
22            implicit parallel region. The routine always returns a non-negative integer, and returns 0
23            if it is called from the sequential part of the program.
4 3.2.17 omp_get_ancestor_thread_num
5               Summary
6               The omp_get_ancestor_thread_num routine returns, for a given nested level of
7               the current thread, the thread number of the ancestor or the current thread.
8 Format
9
                                                  C/C++
10
11                int omp_get_ancestor_thread_num(int level)
12
                                                  C/C++
13
                                                 Fortran
14
15                integer function omp_get_ancestor_thread_num(level)
16                integer level
17
18
                                                  Fortran
19              Binding
20              The binding thread set for an omp_get_ancestor_thread_num region is the
21              encountering thread. The binding region for an omp_get_ancestor_thread_num
22              region is the innermost enclosing parallel region.
11            Cross References
12            • omp_get_level routine, see Section 3.2.16 on page 129.
13            • omp_get_thread_num routine, see Section 3.2.4 on page 113.
14            • omp_get_team_size routine, see Section 3.2.18 on page 131.
15 3.2.18 omp_get_team_size
16            Summary
17            The omp_get_team_size routine returns, for a given nested level of the current
18            thread, the size of the thread team to which the ancestor or the current thread belongs.
2
                                                         C/C++
3
4                 int omp_get_team_size(int level)
5
                                                         C/C++
6
                                                         Fortran
7
8                 integer function omp_get_team_size(level)
9                 integer level
10
11
                                                         Fortran
12              Binding
13              The binding thread set for an omp_get_team_size region is the encountering
14              thread. The binding region for an omp_get_team_size region is the innermost
15              enclosing parallel region.
16              Effect
17              The omp_get_team_size routine returns the size of the thread team to which the
18              ancestor or the current thread belongs. If the requested nested level is outside the range
19              of 0 and the nested level of the current thread, as returned by the omp_get_level
20              routine, the routine returns -1. Inactive parallel regions are regarded like active parallel
21              regions executed with one thread.
22
5 3.2.19 omp_get_active_level
6             Summary
7             The omp_get_active_level routine returns the number of nested, active
8             parallel regions enclosing the task that contains the call.
9 Format
10
                                                 C/C++
11
12              int omp_get_active_level(void)
13
                                                 C/C++
14
                                                 Fortran
15
16              integer function omp_get_active_level()
17
18
                                                 Fortran
19            Binding
20            The binding task set for the an omp_get_active_level region is the generating
21            task. The binding region for an omp_get_active_level region is the innermost
22            enclosing parallel region.
6                 Cross References
7                 • omp_get_level routine, see Section 3.2.16 on page 129.
20                Two types of locks are supported: simple locks and nestable locks. A nestable lock may
21                be set multiple times by the same task before being unset; a simple lock may not be set
22                if it is already owned by the task trying to set it. Simple lock variables are associated
23                with simple locks and may only be passed to simple lock routines. Nestable lock
24                variables are associated with nestable locks and may only be passed to nestable lock
25                routines.
26                Constraints on the state and ownership of the lock accessed by each of the lock routines
27                are described with the routine. If these constraints are not met, the behavior of the
28                routine is unspecified.
29                The OpenMP lock routines access a lock variable in such a way that they always read
30                and update the most current value of the lock variable. The lock routines include a flush
31                with no list; the read and update to the lock variable must be implemented as if they are
32                atomic with the flush. Therefore, it is not necessary for an OpenMP program to include
33                explicit flush directives to ensure that the lock variable’s value is consistent among
34                different tasks.
3 Binding
4    The binding thread set for all lock routine regions is all threads. As a consequence, for
5    each OpenMP lock, the lock routine effects relate to all tasks that call the routines,
6    without regard to which team(s) the threads executing the tasks belong.
20              Summary
21              These routines provide the only means of initializing an OpenMP lock.
9              subroutine omp_init_nest_lock(nvar)
10             integer (kind=omp_nest_lock_kind) nvar
11
12
                                                      Fortran
13           Constraints on Arguments
14           A program that accesses a lock that is not in the uninitialized state through either routine
15           is non-conforming.
16           Effect
17           The effect of these routines is to initialize the lock to the unlocked state (that is, no task
18           owns the lock). In addition, the nesting count for a nestable lock is set to zero.
19           For an example of the omp_init_lock routine, see Section A.42 on page 269.
22           Summary
23           These routines ensure that the OpenMP lock is uninitialized.
9                 subroutine omp_destroy_nest_lock(nvar)
10                integer (kind=omp_nest_lock_kind) nvar
11
12
                                                        Fortran
13              Constraints on Arguments
14              A program that accesses a lock that is not in the unlocked state through either routine is
15              non-conforming.
16              Effect
17              The effect of these routines is to change the state of the lock to uninitialized.
19              Summary
20              These routines provide a means of setting an OpenMP lock. The calling task region is
21              suspended until the lock is set.
9      subroutine omp_set_nest_lock(nvar)
10     integer (kind=omp_nest_lock_kind) nvar
11
12
                                             Fortran
13   Constraints on Arguments
14   A program that accesses a lock that is in the uninitialized state through either routine is
15   non-conforming. A simple lock accessed by omp_set_lock that is in the locked state
16   must not be owned by the task that contains the call or deadlock will result.
17   Effect
18   Each of these routines causes suspension of the task executing the routine until the
19   specified lock is available and then sets the lock.
20   A simple lock is available if it is unlocked. Ownership of the lock is granted to the task
21   executing the routine.
22   A nestable lock is available if it is unlocked or if it is already owned by the task
23   executing the routine. The task executing the routine is granted, or retains, ownership of
24   the lock, and the nesting count for the lock is incremented.
2               Summary
3               These routines provide the means of unsetting an OpenMP lock.
4               Format
                                                        C/C++
5
6                 void omp_unset_lock(omp_lock_t *lock);
7                 void omp_unset_nest_lock(omp_nest_lock_t *lock);
8
                                                        C/C++
                                                        Fortran
9
10                subroutine omp_unset_lock(svar)
11                integer (kind=omp_lock_kind) svar
12                subroutine omp_unset_nest_lock(nvar)
13                integer (kind=omp_nest_lock_kind) nvar
14
15
                                                        Fortran
16              Constraints on Arguments
17              A program that accesses a lock that is not in the locked state or that is not owned by the
18              task that contains the call through either routine is non-conforming.
19              Effect
20              For a simple lock, the omp_unset_lock routine causes the lock to become unlocked.
21              For a nestable lock, the omp_unset_nest_lock routine decrements the nesting
22              count, and causes the lock to become unlocked if the resulting nesting count is zero.
23              For either routine, if the lock becomes unlocked, and if one or more tasks regions were
24              suspended because the lock was unavailable, the effect is that one task is chosen and
25              given ownership of the lock.
2            Summary
3            These routines attempt to set an OpenMP lock but do not suspend execution of the task
4            executing the routine.
5            Format
                                                     C/C++
6
7              int omp_test_lock(omp_lock_t *lock);
8              int omp_test_nest_lock(omp_nest_lock_t *lock);
9
                                                     C/C++
                                                     Fortran
10
11             logical function omp_test_lock(svar)
12             integer (kind=omp_lock_kind) svar
17           Constraints on Arguments
18           A program that accesses a lock that is in the uninitialized state through either routine is
19           non-conforming. The behavior is unspecified if a simple lock accessed by
20           omp_test_lock that is in the locked state is owned by the task that contains the call.
21           Effect
22           These routines attempt to set a lock in the same manner as omp_set_lock and
23           omp_set_nest_lock, except that they do not suspend execution of the task
24           executing the routine.
25           For a simple lock, the omp_test_lock routine returns true if the lock is successfully
26           set; otherwise, it returns false.
8 3.4.1 omp_get_wtime
9                 Summary
10                The omp_get_wtime routine returns elapsed wall clock time in seconds.
11                Format
                                                          C/C++
12
13                  double omp_get_wtime(void);
14
                                                          C/C++
                                                         Fortran
15
16                  double precision function omp_get_wtime()
17
18
                                                         Fortran
19                Binding
20                The binding thread set for an omp_get_wtime region is the encountering thread. The
21                routine’s return value is not guaranteed to be consistent across any set of threads.
8    Note – It is anticipated that the routine will be used to measure elapsed times as shown
9    in the following example:
                                              C/C++
10
11     double start;
12     double end;
13     start = omp_get_wtime();
14     ... work to be timed ...
15     end = omp_get_wtime();
16     printf("Work took %f seconds\n", end - start);
17
                                              C/C++
                                              Fortran
18
19     DOUBLE PRECISION START, END
20     START = omp_get_wtime()
21     ... work to be timed ...
22     END = omp_get_wtime()
23     PRINT *, "Work took", END - START, "seconds"
24
25
                                              Fortran
26
27
2               Summary
3               The omp_get_wtick routine returns the precision of the timer used by
4               omp_get_wtime.
5               Format
                                                      C/C++
6
7                 double omp_get_wtick(void);
8
                                                     C/C++
                                                     Fortran
9
10                double precision function omp_get_wtick()
11
12
                                                     Fortran
13              Binding
14              The binding thread set for an omp_get_wtick region is the encountering thread. The
15              routine’s return value is not guaranteed to be consistent across any set of threads.
16              Effect
17              The omp_get_wtick routine returns a value equal to the number of seconds between
18              successive clock ticks of the timer used by omp_get_wtime.
3                  Environment Variables
4
5                  This chapter describes the OpenMP environment variables that specify the settings of
6                  the ICVs that affect the execution of OpenMP programs (see Section 2.3 on page 28).
7                  The names of the environment variables must be upper case. The values assigned to the
8                  environment variables are case insensitive and may have leading and trailing white
9                  space. Modifications to the environment variables after the program has started, even if
10                 modified by the program itself, are ignored by the OpenMP implementation. However,
11                 the settings of some of the ICVs can be modified during the execution of the OpenMP
12                 program by the use of the appropriate directive clauses or OpenMP API routines.
13                 The environment variables are as follows:
14                 • OMP_SCHEDULE sets the run-sched-var ICV for the runtime schedule type and
15                   chunk size. It can be set to any of the valid OpenMP schedule types (i.e., static,
16                   dynamic, guided, and auto).
17                 • OMP_NUM_THREADS sets the nthreads-var ICV for the number of threads to use for
18                   parallel regions.
19                 • OMP_DYNAMIC sets the dyn-var ICV for the dynamic adjustment of threads to use
20                   for parallel regions.
21                 • OMP_NESTED sets the nest-var ICV to enable or to disable nested parallelism.
22                 • OMP_STACKSIZE sets the stacksize-var ICV that specifies the size of the stack for
23                   threads created by the OpenMP implementation.
24                 • OMP_WAIT_POLICY sets the wait-policy-var ICV that controls the desired behavior
25                   of waiting threads.
26                 • OMP_MAX_ACTIVE_LEVELS sets the max-active-levels-var ICV that controls the
27                   maximum number of nested active parallel regions.
28                 • OMP_THREAD_LIMIT sets the thread-limit-var ICV that controls the maximum
29                   number of threads participating in the OpenMP program.
30                 The examples in this chapter only demonstrate how these variables might be set in Unix
31                 C shell (csh) environments. In Korn shell (ksh) and DOS environments the actions are
32                 similar, as follows:
33                                                                                                     145
1               • csh:
2
3                 setenv OMP_SCHEDULE "dynamic"
4               • ksh:
5
6                 export OMP_SCHEDULE="dynamic"
7               • DOS:
8
9                 set OMP_SCHEDULE=dynamic
10
11   4.1        OMP_SCHEDULE
12              The OMP_SCHEDULE environment variable controls the schedule type and chunk size
13              of all loop directives that have the schedule type runtime, by setting the value of the
14              run-sched-var ICV.
15              The value of this environment variable takes the form:
16              type[,chunk]
17              where
18              • type is one of static, dynamic, guided, or auto
19              • chunk is an optional positive integer that specifies the chunk size
20              If chunk is present, there may be white space on either side of the “,”. See Section 2.5.1
21              on page 38 for a detailed description of the schedule types.
22              The behavior of the program is implementation defined if the value of OMP_SCHEDULE
23              does not conform to the above format.
24              Implementation specific schedules cannot be specified in OMP_SCHEDULE. They can
25              only be specified by calling omp_set_schedule, described in Section 3.2.11 on page
26              121.
27              Example:
28
29                setenv OMP_SCHEDULE "guided,4"
30                setenv OMP_SCHEDULE "dynamic"
8    4.2   OMP_NUM_THREADS
9          The OMP_NUM_THREADS environment variable sets the number of threads to use for
10         parallel regions by setting the initial value of the nthreads-var ICV. See Section 2.3
11         for a comprehensive set of rules about the interaction between the OMP_NUM_THREADS
12         environment variable, the num_threads clause, the omp_set_num_threads
13         library routine and dynamic adjustment of threads.
14         The value of this environment variable must be a positive integer. The behavior of the
15         program is implementation defined if the requested value of OMP_NUM_THREADS is
16         greater than the number of threads an implementation can support, or if the value is not
17         a positive integer.
18         Example:
19
20           setenv OMP_NUM_THREADS 16
21         Cross References:
22         • nthreads-var ICV, see Section 2.3 on page 28.
23         • num_threads clause, Section 2.4 on page 32.
24         • omp_set_num_threads routine, see Section 3.2.1 on page 110.
25         • omp_get_num_threads routine, see Section 3.2.2 on page 111.
26         • omp_get_max_threads routine, see Section 3.2.3 on page 112.
27         • omp_get_team_size routine, see Section 3.2.18 on page 131.
2    4.3        OMP_DYNAMIC
3               The OMP_DYNAMIC environment variable controls dynamic adjustment of the number
4               of threads to use for executing parallel regions by setting the initial value of the
5               dyn-var ICV. The value of this environment variable must be true or false. If the
6               environment variable is set to true, the OpenMP implementation may adjust the
7               number of threads to use for executing parallel regions in order to optimize the use
8               of system resources. If the environment variable is set to false, the dynamic
9               adjustment of the number of threads is disabled. The behavior of the program is
10              implementation defined if the value of OMP_DYNAMIC is neither true nor false.
11              Example:
12
13                setenv OMP_DYNAMIC true
14              Cross References:
15              • dyn-var ICV, see Section 2.3 on page 28.
16              • omp_set_dynamic routine, see Section 3.2.7 on page 117.
17              • omp_get_dynamic routine, see Section 3.2.8 on page 118.
18
19   4.4        OMP_NESTED
20              The OMP_NESTED environment variable controls nested parallelism by setting the
21              initial value of the nest-var ICV. The value of this environment variable must be true
22              or false. If the environment variable is set to true, nested parallelism is enabled; if
23              set to false, nested parallelism is disabled. The behavior of the program is
24              implementation defined if the value of OMP_NESTED is neither true nor false.
25              Example:
26
27                setenv OMP_NESTED false
28              Cross References
29              • nest-var ICV, see Section 2.3 on page 28.
30              • omp_set_nested routine, see Section 3.2.9 on page 119.
3    4.5   OMP_STACKSIZE
4          The OMP_STACKSIZE environment variable controls the size of the stack for threads
5          created by the OpenMP implementation, by setting the value of the stacksize-var ICV.
6          The environment variable does not control the size of the stack for the initial thread.
7          The value of this environment variable takes the form:
8          size | sizeB | sizeK | sizeM | sizeG
9          where:
10         • size is a positive integer that specifies the size of the stack for threads that are created
11            by the OpenMP implementation.
12         • B, K, M, and G are letters that specify whether the given size is in Bytes, Kilobytes,
13            Megabytes, or Gigabytes, respectively. If one of these letters is present, there may be
14            white space between size and the letter.
15         If only size is specified and none of B, K, M, or G is specified, then size is assumed to be
16         in Kilobytes.
17         The behavior of the program is implementation defined if OMP_STACKSIZE does not
18         conform to the above format, or if the implementation cannot provide a stack with the
19         requested size.
20         Examples:
21
22           setenv    OMP_STACKSIZE      2000500B
23           setenv    OMP_STACKSIZE      "3000 k "
24           setenv    OMP_STACKSIZE      10M
25           setenv    OMP_STACKSIZE      " 10 M "
26           setenv    OMP_STACKSIZE      "20 m "
27           setenv    OMP_STACKSIZE      " 1G"
28           setenv    OMP_STACKSIZE      20000
29         Cross References
30         • stacksize-var ICV, see Section 2.3 on page 28.
2    4.6        OMP_WAIT_POLICY
3               The OMP_WAIT_POLICY environment variable provides a hint to an OpenMP
4               implementation about the desired behavior of waiting threads by setting the wait-policy-
5               var ICV. A compliant OpenMP implementation may or may not abide by the setting of
6               the environment variable.
7               The value of this environment variable takes the form:
8               ACTIVE | PASSIVE
9               The ACTIVE value specifies that waiting threads should mostly be active, i.e., consume
10              processor cycles, while waiting. An OpenMP implementation may, for example, make
11              waiting threads spin.
12              The PASSIVE value specifies that waiting threads should mostly be passive, i.e., not
13              consume processor cycles, while waiting. An OpenMP implementation, may for
14              example, make waiting threads yield the processor to other threads or go to sleep.
15              The details of the ACTIVE and PASSIVE behaviors are implementation defined.
16              Examples:
17
18                setenv    OMP_WAIT_POLICY      ACTIVE
19                setenv    OMP_WAIT_POLICY      active
20                setenv    OMP_WAIT_POLICY      PASSIVE
21                setenv    OMP_WAIT_POLICY      passive
22              Cross References
23              • wait-policy-var ICV, see Section 2.3 on page 24.
24
25   4.7        OMP_MAX_ACTIVE_LEVELS
26              The OMP_MAX_ACTIVE_LEVELS environment variable controls the maximum number
27              of nested active parallel regions by setting the initial value of the max-active-levels-var
28              ICV.
6          Cross References
7          • max-active-levels-var ICV, see Section 2.3 on page 28.
8          • omp_set_max_active_levels routine, see Section 3.2.14 on page 126.
9          • omp_get_max_active_levels routine, see Section 3.2.15 on page 127.
10
11   4.8   OMP_THREAD_LIMIT
12         The OMP_THREAD_LIMIT environment variable sets the number of OpenMP threads
13         to use for the whole OpenMP program by setting the thread-limit-var ICV.
14         The value of this environment variable must be a positive integer. The behavior of the
15         program is implementation defined if the requested value of OMP_THREAD_LIMIT is
16         greater than the number of threads an implementation can support, or if the value is not
17         a positive integer.
18         Cross References
19         • thread-limit-var ICV, see Section 2.3 on page 28.
20         • omp_get_thread_limit routine
3                   Examples
4
5                   The following are examples of the constructs and routines defined in this document.
6
                                                           C/C++
7                   A statement following a directive is compound only when necessary, and a non-
8                   compound statement is indented with respect to a directive preceding it.
                                                          C/C++
23                                                                                                    153
1
                                                          Fortran
2                 Example A.1.1f
3 SUBROUTINE A1(N, A, B)
4                        INTEGER I, N
5                        REAL B(N), A(N)
11                    END SUBROUTINE A1
12
                                                          Fortran
13
6    #include <stdio.h>
7    #include <omp.h>
8    int main(){
9      int x;
10
11       x = 2;
12       #pragma omp parallel num_threads(2) shared(x)
13       {
14         if (omp_get_thread_num() == 0) {
15            x = 5;
16         } else {
17         /* Print 1: the following read of x has a race */
18           printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x );
19         }
20
21         #pragma omp barrier
22
23         if (omp_get_thread_num() == 0) {
24         /* Print 2 */
25           printf("2: Thread# %d: x = %d\n", omp_get_thread_num(),x );
26         } else {
27         /* Print 3 */
28           printf("3: Thread# %d: x = %d\n", omp_get_thread_num(),x );
29         }
30       }
31       return 0;
32   }
                                             C/C++
3               PROGRAM A2
4                 INCLUDE "omp_lib.h"           ! or USE OMP_LIB
5                 INTEGER X
6
7                 X = 2
8               !$OMP PARALLEL NUM_THREADS(2) SHARED(X)
9
10                  IF (OMP_GET_THREAD_NUM() .EQ. 0) THEN
11                     X = 5
12                  ELSE
13                  ! PRINT 1: The following read of x has a race
14                    PRINT *,"1: THREAD# ", OMP_GET_THREAD_NUM(), "X = ", X
15                  ENDIF
16
17               !$OMP BARRIER
18
19                  IF (OMP_GET_THREAD_NUM() .EQ. 0) THEN
20                  ! PRINT 2
21                    PRINT *,"2: THREAD# ", OMP_GET_THREAD_NUM(), "X = ", X
22                  ELSE
23                  ! PRINT 3
24                    PRINT *,"3: THREAD# ", OMP_GET_THREAD_NUM(), "X = ", X
25                  ENDIF
26
27              !$OMP END PARALLEL
28              END PROGRAM A2
29
                                                        Fortran
30              The following example demonstrates why synchronization is difficult to perform
31              correctly through variables. The value of flag is undefined in both prints on thread 1 and
32              the value of data is only well-defined in the second print.
3    #include <omp.h>
4    #include <stdio.h>
5    int main()
6    {
7        int data;
8        int flag=0;
9        #pragma omp parallel
10         {
11           if (omp_get_thread_num()==0)
12            {
13                /* Write to the data buffer that will be
14                read by thread */
15                data = 42;
16                /* Flush data to thread 1 and strictly order
17                the write to data
18                relative to the write to the flag */
19                #pragma omp flush(flag, data)
20                /* Set flag to release thread 1 */
21                flag = 1;
22                /* Flush flag to ensure that thread 1 sees
23                the change */
24                #pragma omp flush(flag)
25            }
26           else if(omp_get_thread_num()==1)
27            {
28                /* Loop until we see the update to the flag */
29                #pragma omp flush(flag, data)
30                while (flag < 1)
31                  {
32                    #pragma omp flush(flag, data)
33                  }
34                /* Values of flag and data are undefined */
35                printf("flag=%d data=%d\n", flag, data);
36                #pragma omp flush(flag, data)
37                /* Values data will be 42, value of flag
38                still undefined */
39                printf("flag=%d data=%d\n", flag, data);
40            }
41       }
42   }
                                       C/C++
3                       PROGRAM   EXAMPLE
4                       INCLUDE   "omp_lib.h" ! or USE OMP_LIB
5                       INTEGER   DATA
6                       INTEGER   FLAG
7                       !$OMP PARALLEL
8                         IF(OMP_GET_THREAD_NUM() .EQ. 0) THEN
9                                 ! Write to the data buffer that will be read by thread 1
10                                DATA = 42
11                               ! Flush DATA to thread 1 and strictly order the write to DATA
12                                ! relative to the write to the FLAG
13                                !$OMP FLUSH(FLAG, DATA)
14                                ! Set FLAG to release thread 1
15                                FLAG = 1;
16                                ! Flush FLAG to ensure that thread 1 sees the change */
17                                !$OMP FLUSH(FLAG)
18                        ELSE IF(OMP_GET_THREAD_NUM() .EQ. 1) THEN
19                                ! Loop until we see the update to the FLAG
20                                !$OMP FLUSH(FLAG, DATA)
21                                DO WHILE(FLAG .LT. 1)
22                                        !$OMP FLUSH(FLAG, DATA)
23                                ENDDO
3    #include <omp.h>
4    #include <stdio.h>
5    int main()
6    {
7             int flag=0;
3                      PROGRAM EXAMPLE
4                      INCLUDE "omp_lib.h" ! or USE OMP_LIB
5                      INTEGER FLAG
6                      !$OMP PARALLEL
7                        IF(OMP_GET_THREAD_NUM() .EQ. 0) THEN
8                                ! Set flag to release thread 1
9                                !$OMP ATOMIC
10                                       FLAG = FLAG + 1
11                               !Flush of FLAG is implied by the atomic directive
12                       ELSE IF(OMP_GET_THREAD_NUM() .EQ. 1) THEN
13                                       ! Loop until we see that FLAG reaches 1
14                                       !$OMP FLUSH(FLAG, DATA)
15                                       DO WHILE(FLAG .LT. 1)
16                                               !$OMP FLUSH(FLAG, DATA)
17                                       ENDDO
9 #include <stdio.h>
10         int main()
11         {
12         # ifdef _OPENMP
13             printf("Compiled by an OpenMP-compliant implementation.\n");
14         # endif
15             return 0;
16         }
                                                  C/C++
17
                                                  Fortran
18         The following example illustrates the use of the conditional compilation sentinel (see
19         Section 2.2 on page 26). With OpenMP compilation, the conditional compilation
20         sentinel !$ is recognized and treated as two spaces. In fixed form source, statements
21         guarded by the sentinel must start after column 6.
22
                                                  Fortran
23         Example A.3.1f
24 PROGRAM A3
25         C234567890
26         !$    PRINT *, "Compiled by an OpenMP-compliant implementation."
27               END PROGRAM A3
28
                                                  Fortran
3    #include <stdio.h>
4    #include <omp.h>
3                       program icv
4                       use omp_lib
5                       call   omp_set_nested(.true.)
6                       call   omp_set_max_active_levels(8)
7                       call   omp_set_dynamic(.false.)
8                       call   omp_set_num_threads(2)
9                 !$omp parallel
10                      call omp_set_num_threads(3)
11                !$omp parallel
12                       call omp_set_num_threads(4)
13                !$omp single!
14                !       The following should print:
15                !       Inner: max_act_lev= 8 , num_thds= 3 , max_thds= 4
16                !       Inner: max_act_lev= 8 , num_thds= 3 , max_thds= 4
17                        print *, ("Inner: max_act_lev=", omp_get_max_active_levels(),
18                     &            ", num_thds=", omp_get_num_threads(),
19                     &            ", max_thds=", omp_get_max_threads())
20                !$omp end single
21                !$omp end parallel
22                !$omp barrier
23                !$omp single
24                !      The following should print:
25                !      Outer: max_act_lev= 8 , num_thds= 2 , max_thds= 3
26                       print *, ("Outer: max_act_lev=", omp_get_max_active_levels(),
27                     &           ", num_thds=", omp_get_num_threads(),
28                     &           ", max_thds=", omp_get_max_threads())
29                !$omp end single
30                !$omp end parallel
31                       end
32
                                                       Fortran
33
3 #include <omp.h>
24   int main()
25   {
26       float array[10000];
27 sub(array, 10000);
28         return 0;
29   }
                                         C/C++
6                            INTEGER I
7
8                            DO 100 I=1,IPOINTS
9                               X(ISTART+I) = 123.456
10                 100       CONTINUE
30                       PROGRAM A5
31                           REAL ARRAY(10000)
32                           CALL SUB(ARRAY, 10000)
33                       END PROGRAM A5
34
                                                      Fortran
35
3          #include <omp.h>
4          int main()
5          {
6            omp_set_dynamic(1);
15                 PROGRAM A6
16                   INCLUDE "omp_lib.h"      ! or USE OMP_LIB
17                   CALL OMP_SET_DYNAMIC(.TRUE.)
2                       SUBROUTINE WORK(I, J)
3                       INTEGER I,J
4                       END SUBROUTINE WORK
5                       SUBROUTINE A7_GOOD()
6                         INTEGER I, J
7                         REAL A(1000)
8                         DO 100 I = 1,10
9               !$OMP       DO
10                          DO 100 J = 1,10
11                             CALL WORK(I,J)
12              100       CONTINUE       ! !$OMP ENDDO implied here
13              !$OMP     DO
14                        DO 200 J = 1,10
15              200          A(I) = I + 1
16              !$OMP     ENDDO
17              !$OMP   DO
18                      DO 300 I = 1,10
19                         DO 300 J = 1,10
20                           CALL WORK(I,J)
21              300     CONTINUE
22              !$OMP   ENDDO
23                    END SUBROUTINE A7_GOOD
24              The following example is non-conforming because the matching do directive for the
25              end do does not precede the outermost loop:
26 Example A.7.2f
27                      SUBROUTINE WORK(I, J)
28                      INTEGER I,J
29                      END SUBROUTINE WORK
30                      SUBROUTINE A7_WRONG
31                        INTEGER I, J
32                      DO 100 I = 1,10
33              !$OMP     DO
34                        DO 100 J = 1,10
35                           CALL WORK(I,J)
36              100     CONTINUE
37              !$OMP   ENDDO
38                    END SUBROUTINE A7_WRONG
39
                                                     Fortran
8 Example A.8.1f
9          SUBROUTINE A8_1(A,N)
10         INCLUDE "omp_lib.h"          ! or USE OMP_LIB
11         REAL A(*)
12         INTEGER I, MYOFFSET, N
3 Example A.8.2f
4                 SUBROUTINE A8_2(A,B,N,I1,I2)
5                 REAL A(*), B(*)
6                 INTEGER I1, I2, N
24                Note however that the use of shared loop iteration variables can easily lead to race
25                conditions.
26
                                                         Fortran
27
3 #include <math.h>
4    void a9(int n, int m, float *a, float *b, float *y, float *z)
5    {
6      int i;
7      #pragma omp parallel
8      {
9        #pragma omp for nowait
10         for (i=1; i<n; i++)
11            b[i] = (a[i] + a[i-1]) / 2.0;
12
13           #pragma omp for nowait
14             for (i=0; i<m; i++)
15               y[i] = sqrt(z[i]);
16       }
17   }
                                           C/C++
18
                                          Fortran
19   Example A.9.1f
20 SUBROUTINE A9(N, M, A, B, Y, Z)
21               INTEGER N, M
22               REAL A(*), B(*), Y(*), Z(*)
23 INTEGER I
24 !$OMP PARALLEL
25   !$OMP DO
26           DO I=2,N
27             B(I) = (A(I) + A(I-1)) / 2.0
28           ENDDO
29   !$OMP END DO NOWAIT
30   !$OMP DO
31           DO I=1,M
32             Y(I) = SQRT(Z(I))
33           ENDDO
34   !$OMP END DO NOWAIT
36               END SUBROUTINE A9
37
                                          Fortran
7                  subroutine sub()
8           !$omp do collapse(2) private(i,j,k)
9                  do k = kl, ku, ks
10                   do j = jl, ju, js
11                      do i = il, iu, is
12                        call bar(a,i,j,k)
13                    enddo
14                  enddo
15                enddo
16          !$omp end do
17                end subroutine
18
                                                   Fortran
19          In the next example, the loops over k and j are collapsed and their iteration space is
20          executed by all threads of the current team. The example prints: 2 3.
21
                                                   Fortran
22          Example A.10.2f
23                program test
24          !$omp parallel
25          !$omp do private(j,k) collapse(2) lastprivate(jlast, klast)
26                do k = 1,2
27                  do j = 1,3
28                    jlast=j
29                    klast=k
30                  enddo
31                enddo
32          !$omp end do
33          !$omp single
34                           print *, klast, jlast
35          !$omp end single
36          !$omp end parallel
37                end program test
38
                                                   Fortran
14                      program test
15                      include 'omp_lib.h'
16              !$omp   parallel num_threads(2)
17              !$omp   do collapse(2) ordered private(j,k) schedule(static,3)
18                      do k = 1,3
19                        do j = 1,2
20              !$omp   ordered
21                          print *, omp_get_thread_num(), k, j
22              !$omp   end ordered
23                          call work(a,j,k)
24                        enddo
25                      enddo
26              !$omp   end do
27              !$omp   end parallel
28                      end program test
29
                                                       Fortran
30
3    void XAXIS();
4    void YAXIS();
5    void ZAXIS();
6    void a11()
7    {
8      #pragma omp parallel sections
9      {
10       #pragma omp section
11         XAXIS();
21 SUBROUTINE A11()
25   !$OMP SECTION
26           CALL YAXIS()
27   !$OMP SECTION
28           CALL ZAXIS()
12 #include <stdio.h>
13              void work1() {}
14              void work2() {}
15              void a12()
16              {
17                #pragma omp parallel
18                {
19                  #pragma omp single
20                    printf("Beginning work1.\n");
21 work1();
26                      work2();
27                  }
28              }
                                                       C/C++
3                 SUBROUTINE WORK1()
4                 END SUBROUTINE WORK1
5
6                 SUBROUTINE WORK2()
7                 END SUBROUTINE WORK2
8
9                 PROGRAM A12
10          !$OMP PARALLEL
11          !$OMP SINGLE
12                  print *, "Beginning work1."
13          !$OMP END SINGLE
14 CALL WORK1()
15          !$OMP SINGLE
16                  print *, "Finishing work1."
17          !$OMP END SINGLE
18          !$OMP SINGLE
19                  print *, "Finished work1 and beginning work2."
20          !$OMP END SINGLE NOWAIT
21 CALL WORK2()
3               struct node {
4                  struct node *left;
5                  struct node *right;
6               };
7               extern void process(struct node *);
8               void traverse( struct node *p ) {
9                  if (p->left)
10              #pragma omp task    // p is firstprivate by default
11                     traverse(p->left);
12                 if (p->right)
13              #pragma omp task      // p is firstprivate by default
14                     traverse(p->right);
15                 process(p);
16              }
                                                   C/C++
17
                                                   Fortran
18              Example A.13.1f
6    struct node {
7       struct node *left;
8       struct node *right;
9    };
10   extern void process(struct node *);
11   void postorder_traverse( struct node *p ) {
12        if (p->left)
13           #pragma omp task    // p is firstprivate by default
14               postorder_traverse(p->left);
15        if (p->right)
16           #pragma omp task   // p is firstprivate by default
17               postorder_traverse(p->right);
18        #pragma omp taskwait
19        process(p);
20   }
                                           C/C++
21
                                          Fortran
22   Example A.13.2f
3         MODULE LIST
4            TYPE NODE
5                INTEGER :: PAYLOAD
6                TYPE (NODE), POINTER :: NEXT
7            END TYPE NODE
8         CONTAINS
9             SUBROUTINE PROCESS(p)
10               TYPE (NODE), POINTER :: P
11                    ! do work here
12            END SUBROUTINE
13            SUBROUTINE INCREMENT_LIST_ITEMS (HEAD)
14                 TYPE (NODE), POINTER :: HEAD
15                 TYPE (NODE), POINTER :: P
16                 !$OMP PARALLEL PRIVATE(P)
17                    !$OMP SINGLE
18                         P => HEAD
19                         DO
20                            !$OMP TASK
21                                 ! P is firstprivate by default
22                                 CALL PROCESS(P)
23                            !$OMP END TASK
24                            P => P%NEXT
25                            IF ( .NOT. ASSOCIATED (P) ) EXIT
26                         END DO
27                   !$OMP END SINGLE
28               !$OMP END PARALLEL
29            END SUBROUTINE
30         END MODULE
31
                                      Fortran
6                       int fib(int n) {
7                          int i, j;
8                          if (n<2)
9                            return n;
10                         else {
11                            #pragma omp task shared(i)
12                                i=fib(n-1);
13                            #pragma omp task shared(j)
14                                j=fib(n-2);
15                            #pragma omp taskwait
16                                return i+j;
17                         }
18                      }
                                                         C/C++
19
                                                        Fortran
20              Example A.13.4f
28           real*8 item(10000000)
29           integer i
30
31   !$omp parallel
32   !$omp single ! loop iteration variable i is private
33          do i=1,10000000
34   !$omp task
35            ! i is firstprivate, item is shared
36              call process(item(i))
37   !$omp end task
38          end do
39   !$omp end single
40   !$omp end parallel
41          end
42
                                            Fortran
3           real*8 item(10000000)
4    !$omp parallel
5    !$omp single
6    !$omp task untied
7           ! loop iteration variable i is private
8           do i=1,10000000
9    !$omp task ! i is firstprivate, item is shared
10              call process(item(i))
11   !$omp end task
12          end do
13   !$omp end task
14   !$omp end single
15   !$omp end parallel
16          end
17
                                            Fortran
18   The following two examples demonstrate how the scheduling rules illustrated in
19   Section 2.7.1 on page 62 affect the usage of threadprivate variables in tasks. The value
20   of a threadprivate variable will change across task scheduling points if the executing
21   thread executes a part of another schedulable task that modifies the variable. In tied
22   tasks, the user can control where task scheduling points appear in the code.
23   A single thread may execute both of the task regions that modify tp. The parts of these
24   task regions in which tp is modified may be executed in any order so the resulting value
25   of var can be either 1 or 2.
10              void work()
11              {
12                 #pragma omp task
13                 { //Task 1
14                     #pragma omp task
15                     { //Task 2
16                           #pragma omp critical //Critical region 1
17                           {/*do work here */ }
18                     }
19                     #pragma omp critical //Critical Region 2
20                     {
21                          //Capture data for the following task
22                          #pragma omp task
23                          { /* do work here */ } //Task 3
24                     }
25                 }
26              }
                                                         C/C++
3             module example
4             contains
5             subroutine work
6    !$omp   task
7             ! Task 1
8    !$omp   task
9             ! Task 2
10   !$omp   critical
11            ! Critical region 1
12            ! do work here
13   !$omp   end critical
14   !$omp   end task
15   !$omp   critical
16            ! Critical region 2
17            ! Capture data for the following task
18   !$omp   task
19            !Task 3
20            ! do work here
21   !$omp   end task
22   !$omp   end critical
23   !$omp   end task
24           end subroutine
25           end module
26
                                        Fortran
7               #include <omp.h>
8               void work() {
9                   omp_lock_t lock;
10              #pragma omp parallel
11                  {
12                      int i;
13              #pragma omp for
14                      for (i = 0; i < 100; i++) {
15              #pragma omp task
16                          {
17                                  // lock is shared by default in the task
18                                  omp_set_lock(&lock);
19                               // Capture data for the following task
20              #pragma omp task
21                                  // Task Scheduling Point 1
22                               { /* do work here */ }
23                               omp_unset_lock(&lock);
24                          }
25                      }
26                  }
27              }
                                                        C/C++
3                  module example
4                  include 'omp_lib.h'
5                  integer (kind=omp_lock_kind) lock
6                  integer i
7                  contains
8                  subroutine work
9           !$omp parallel
10               !$omp do
11                do i=1,100
12                   !$omp task
13                         ! Outer task
14                         call omp_set_lock(lock)    ! lock is shared by
15                                                    ! default in the task
16                                ! Capture data for the following task
17                                !$omp task     ! Task Scheduling Point 1
18                                         ! do work here
19                                !$omp end task
20                          call omp_unset_lock(lock)
21                   !$omp end task
22                end do
23          !$omp end parallel
24                end subroutine
25                end module
26
                                                Fortran
27
                                                Fortran
28
5 Example A.14.1f
9               !$OMP      PARALLEL
10              !$OMP       WORKSHARE
11                            AA = BB
12                            CC = DD
13                            EE = FF
14              !$OMP       END WORKSHARE
15              !$OMP     END PARALLEL
17              In the following example, the barrier at the end of the first workshare region is
18              eliminated with a nowait clause. Threads doing CC = DD immediately begin work on
19              EE = FF when they are done with CC = DD.
20 Example A.14.2f
25              !$OMP     PARALLEL
26              !$OMP       WORKSHARE
27                            AA = BB
28                            CC = DD
29              !$OMP       END WORKSHARE NOWAIT
30              !$OMP       WORKSHARE
31                            EE = FF
32              !$OMP       END WORKSHARE
33              !$OMP     END PARALLEL
34                       END SUBROUTINE A14_2
4 Example A.14.3f
9              R=0
10   !$OMP     PARALLEL
11   !$OMP       WORKSHARE
12                 AA = BB
13   !$OMP         ATOMIC
14                   R = R + SUM(AA)
15                 CC = DD
16   !$OMP       END WORKSHARE
17   !$OMP     END PARALLEL
19   Fortran WHERE and FORALL statements are compound statements, made up of a control
20   part and a statement part. When workshare is applied to one of these compound
21   statements, both the control and the statement parts are workshared. The following
22   example shows the use of a WHERE statement in a workshare construct.
23   Each task gets worked on in order by the threads:
24   AA = BB then
25   CC = DD then
26   EE .ne. 0 then
27   FF = 1 / EE then
28   GG = HH
8               !$OMP     PARALLEL
9               !$OMP       WORKSHARE
10                            AA = BB
11                            CC = DD
12                            WHERE (EE .ne. 0) FF = 1 / EE
13                            GG = HH
14              !$OMP       END WORKSHARE
15              !$OMP     END PARALLEL
16
17                      END SUBROUTINE A14_4
20 Example A.14.5f
24 INTEGER SHR
5 INTEGER PRI
18 Example A.14.7f
22          !$OMP     PARALLEL
23          !$OMP       WORKSHARE
24                        AA(1:50) = BB(11:60)
25                        CC(11:20) = AA(1:10)
26          !$OMP       END WORKSHARE
27          !$OMP     END PARALLEL
30
3 #include <stdio.h>
35
17              }
                                                       C/C++
18
                                                       Fortran
19              Example A.16.1f
20 SUBROUTINE A16(X, Y)
24              !$OMP CRITICAL(XAXIS)
25                      CALL DEQUEUE(IX_NEXT, X)
26              !$OMP END CRITICAL(XAXIS)
27                      CALL WORK(IX_NEXT, X)
28              !$OMP CRITICAL(YAXIS)
29                      CALL DEQUEUE(IY_NEXT,Y)
30              !$OMP END CRITICAL(YAXIS)
31                      CALL WORK(IY_NEXT, Y)
10          void a17()
11          {
12            int i = 1;
13            #pragma omp parallel sections
14            {
15              #pragma omp section
16              {
17                #pragma omp critical (name)
18                {
19                  #pragma omp parallel
20                  {
21                     #pragma omp single
22                     {
23                       i++;
24                     }
25                  }
26                }
27              }
28            }
29          }
                                                 C/C++
3 SUBROUTINE A17()
4                         INTEGER I
5                         I = 1
18
3 void work(int n) {}
4    void sub3(int n)
5    {
6      work(n);
7      #pragma omp barrier
8      work(n);
9    }
10   void sub2(int k)
11   {
12     #pragma omp parallel shared(k)
13       sub3(k);
14   }
15   void sub1(int n)
16   {
17     int i;
18     #pragma omp parallel private(i) shared(n)
19     {
20       #pragma omp for
21       for (i=0; i<n; i++)
22         sub2(i);
23     }
24   }
25   int main()
26   {
27     sub1(2);
28     sub2(2);
29     sub3(2);
30     return 0;
31   }
                                        C/C++
3                     SUBROUTINE WORK(N)
4                       INTEGER N
5                     END SUBROUTINE WORK
6                     SUBROUTINE SUB3(N)
7                     INTEGER N
8                       CALL WORK(N)
9               !$OMP   BARRIER
10                      CALL WORK(N)
11                    END SUBROUTINE SUB3
12                    SUBROUTINE SUB2(K)
13                    INTEGER K
14              !$OMP   PARALLEL SHARED(K)
15                        CALL SUB3(K)
16              !$OMP   END PARALLEL
17                    END SUBROUTINE SUB2
18                    SUBROUTINE SUB1(N)
19                    INTEGER N
20                      INTEGER I
21              !$OMP   PARALLEL PRIVATE(I) SHARED(N)
22              !$OMP     DO
23                        DO I = 1, N
24                           CALL SUB2(I)
25                        END DO
26              !$OMP   END PARALLEL
27                    END SUBROUTINE SUB1
28                    PROGRAM A18
29                      CALL SUB1(2)
30                      CALL SUB2(2)
31                      CALL SUB3(2)
32                    END PROGRAM A18
33
                                                     Fortran
34
9    float work1(int i)
10   {
11     return 1.0 * i;
12   }
13   float work2(int i)
14   {
15      return 2.0 * i;
16   }
27   int main()
28   {
29     float x[1000];
30     float y[10000];
31     int index[10000];
32     int i;
16 INTEGER I
24                      PROGRAM A19
25                        REAL X(1000), Y(10000)
26                        INTEGER INDEX(10000)
27                        INTEGER I
28
29                        DO I=1,10000
30                          INDEX(I) = MOD(I, 1000) + 1
31                          Y(I) = 0.0
32                        ENDDO
33
34                        DO I = 1,1000
35                          X(I) = 0.0
36                        ENDDO
17          void a20_1_wrong ()
18          {
19            union {int n; float x;} u;
3                        SUBROUTINE A20_1_WRONG()
4                          INTEGER:: I
5                          REAL:: R
6                          EQUIVALENCE(I,R)
7               !$OMP   PARALLEL
8               !$OMP     ATOMIC
9                           I = I + 1
10              !$OMP     ATOMIC
11                          R = R + 1.0
12              ! incorrect because I and R reference the same location
13              ! but have different types
14              !$OMP   END PARALLEL
15                    END SUBROUTINE A20_1_WRONG
16
                                                    Fortran
17
                                                    C/C++
18              Example A.20.2c
19              void a20_2_wrong ()
20              {
21                int x;
22                int *i;
23                float   *r;
24                  i = &x;
25                  r = (float *)&x;
34                   }
35              }
                                                    C/C++
4 Example A.20.2f
5            SUBROUTINE SUB()
6              COMMON /BLK/ R
7              REAL R
8    !$OMP     ATOMIC
9                R = R + 1.0
10           END SUBROUTINE SUB
11           SUBROUTINE A20_2_WRONG()
12             COMMON /BLK/ I
13             INTEGER I
14 !$OMP PARALLEL
15   !$OMP     ATOMIC
16               I = I + 1
17             CALL SUB()
18   !$OMP   END PARALLEL
19         END SUBROUTINE A20_2_WRONG
3 Example A.20.3f
4                       SUBROUTINE A20_3_WRONG
5                         INTEGER:: I
6                         REAL:: R
7                         EQUIVALENCE(I,R)
8               !$OMP   PARALLEL
9               !$OMP     ATOMIC
10                          I = I + 1
11              ! incorrect because I and R reference the same location
12              ! but have different types
13              !$OMP   END PARALLEL
14              !$OMP   PARALLEL
15              !$OMP     ATOMIC
16                          R = R + 1.0
17              ! incorrect because I and R reference the same location
18              ! but have different types
19              !$OMP   END PARALLEL
22
28              #include <omp.h>
29              #define NUMBER_OF_THREADS 256
30              int   synch[NUMBER_OF_THREADS];
31              float work[NUMBER_OF_THREADS];
32              float result[NUMBER_OF_THREADS];
33              float fn1(int i)
34              {
35                return i*2.0;
6    int main()
7    {
8      int iam, neighbor;
23            /* Wait for neighbor. The first flush ensures that synch is read
24             * from memory, rather than from the temporary view of memory.
25             * The second flush ensures that work is read from memory, and
26             * is done so after the while loop exits.
27             */
39       return 0;
40   }
                                              C/C++
7 int x, *p = &x;
26          int g(int n)
27          {
28            int i = 1, j, sum = 0;
29            *p = 1;
30            #pragma omp parallel reduction(+: sum) num_threads(10)
31            {
32              f1(&j);
37 f2(&j);
5               int main()
6               {
7                 int result = g(7);
8                 return result;
9               }
                                                     C/C++
10
                                                    Fortran
11              Example A.22.1f
12                      SUBROUTINE F1(Q)
13                         COMMON /DATA/ X, P
14                         INTEGER, TARGET :: X
15                         INTEGER, POINTER :: P
16                         INTEGER Q
17                         Q = 1
18              !$OMP      FLUSH
19                         ! X, P and Q are flushed
20                         ! because they are shared and accessible
21                       END SUBROUTINE F1
22                       SUBROUTINE F2(Q)
23                         COMMON /DATA/ X, P
24                         INTEGER, TARGET :: X
25                         INTEGER, POINTER :: P
26                         INTEGER Q
27              !$OMP   BARRIER
28                        Q = 2
29              !$OMP   BARRIER
30                        ! a barrier implies a flush
31                        ! X, P and Q are flushed
32                        ! because they are shared and accessible
33                    END SUBROUTINE F2
40                         I = 1
41                         SUM = 0
42                         P = 1
7                CALL F2(J)
8                  ! I, N, and SUM were not flushed
9                  !   because they were not accessible in f2
10                 ! J was flushed because it was accessible
11               SUM = SUM + I + J + P + N
12   !$OMP     END PARALLEL
13             G = SUM
14           END FUNCTION G
15           PROGRAM A22
16             COMMON /DATA/ X, P
17             INTEGER, TARGET :: X
18             INTEGER, POINTER :: P
19             INTEGER RESULT, G
20             P => X
21             RESULT = G(7)
22             PRINT *, RESULT
23           END PROGRAM A22
24
                                        Fortran
7 Example A.23.1c
8               void a23_wrong()
9               {
10                int a = 1;
17                   if (a != 0)
18                #pragma omp barrier
19              /* incorrect as barrier cannot be immediate substatement
20                 of if statement */
21              if (a != 0)
22                #pragma omp taskwait
23              /* incorrect as taskwait cannot be immediate substatement
24                 of if statement */
25                  }
26              }
27              The following version of the above example is conforming because the flush,
28              barrier, and taskwait directives are enclosed in a compound statement.
2           void a23()
3           {
4             int a = 1;
18
3 #include <stdio.h>
4               void work(int k)
5               {
6                 #pragma omp ordered
7                   printf(" %d\n", k);
8               }
16              int main()
17              {
18                a24(0, 100, 5);
19                return 0;
20              }
                                                    C/C++
3          SUBROUTINE WORK(K)
4            INTEGER k
5    !$OMP ORDERED
6            WRITE(*,*) K
7    !$OMP END ORDERED
18         PROGRAM A24
19           CALL SUBA24(1,100,5)
20         END PROGRAM A24
21
                                          Fortran
22   It is possible to have multiple ordered constructs within a loop region with the
23   ordered clause specified. The first example is non-conforming because all iterations
24   execute two ordered regions. An iteration of a loop must not execute more than one
25   ordered region:
3 void work(int i) {}
4               void a24_wrong(int n)
5               {
6                 int i;
7                 #pragma omp for ordered
8                 for (i=0; i<n; i++) {
9               /* incorrect because an iteration may not execute more than one
10                  ordered region */
11                   #pragma omp ordered
12                     work(i);
13                   #pragma omp ordered
14                     work(i+1);
15                }
16              }
                                                  C/C++
17
                                                  Fortran
18              Example A.24.2f
19                      SUBROUTINE WORK(I)
20                      INTEGER I
21                      END SUBROUTINE WORK
22                      SUBROUTINE A24_WRONG(N)
23                      INTEGER N
24                      INTEGER I
25              !$OMP   DO ORDERED
26                      DO I = 1, N
27              ! incorrect because an iteration may not execute more than one
28              ! ordered region
29              !$OMP     ORDERED
30                          CALL WORK(I)
31              !$OMP     END ORDERED
32              !$OMP     ORDERED
33                          CALL WORK(I+1)
34              !$OMP     END ORDERED
35                      END DO
36                    END SUBROUTINE A24_WRONG
37
                                                  Fortran
5    void work(int i) {}
6    void a24_good(int n)
7    {
8      int i;
15           if (i > 10) {
16             #pragma omp ordered
17               work(i+1);
18           }
19       }
20   }
                                         C/C++
21
                                        Fortran
22   Example A.24.3f
23             SUBROUTINE A24_GOOD(N)
24             INTEGER N
25   !$OMP       DO ORDERED
26               DO I = 1,N
27                 IF (I <= 10) THEN
28   !$OMP           ORDERED
29                     CALL WORK(I)
30   !$OMP           END ORDERED
31                 ENDIF
7               int counter = 0;
8               #pragma omp threadprivate(counter)
9               int increment_counter()
10              {
11                counter++;
12                return(counter);
13              }
                                                     C/C++
14
                                                   Fortran
15              Example A.25.1f
19                      COUNTER = COUNTER +1
20                      INCREMENT_COUNTER = COUNTER
21                      RETURN
22                    END FUNCTION INCREMENT_COUNTER
23
                                                   Fortran
24
                                                     C/C++
25              The following example uses threadprivate on a static variable:
26 Example A.25.2c
27              int increment_counter_2()
28              {
29                static int counter = 0;
30                #pragma omp threadprivate(counter)
31                counter++;
32                return(counter);
33              }
4 Example A.25.3c
5    class T {
6       public:
7         int val;
8         T (int);
9         T (const T&);
10   };
11   T :: T (int v){
12      val = v;
13   }
14   T :: T (const T& t) {
15      val = t.val;
16   }
20   int x = 1;
21   T a(x);
22   const T b_aux(x); /* Capture value of x = 1 */
23   T b(b_aux);
24   #pragma omp threadprivate(a, b)
25   void f(int n) {
26      x++;
27      #pragma omp parallel for
28      /* In each thread:
29       * a is constructed from x (with value 1 or 2?)
30       * b is copy-constructed from b_aux
31       */
7 Example A.25.2f
8                     MODULE A25_MODULE
9                       COMMON /T/ A
10                    END MODULE A25_MODULE
11
12                    SUBROUTINE A25_2_WRONG()
13                      USE A25_MODULE
14              !$OMP   THREADPRIVATE(/T/)
15                    !non-conforming because /T/ not declared in A25_4_WRONG
16                    END SUBROUTINE A25_2_WRONG
17              The following example is also non-conforming because the common block is not
18              declared local to the subroutine that refers to it:
19 Example A.25.3f
20                    SUBROUTINE A25_3_WRONG()
21                      COMMON /T/ A
22              !$OMP   THREADPRIVATE(/T/)
23                      CONTAINS
24                        SUBROUTINE A25_3S_WRONG()
25              !$OMP       PARALLEL COPYIN(/T/)
26                    !non-conforming because /T/ not declared in A35_5S_WRONG
27              !$OMP       END PARALLEL
28                        END SUBROUTINE A25_3S_WRONG
29                    END SUBROUTINE A25_3_WRONG
3 Example A.25.4f
4            SUBROUTINE A25_4_GOOD()
5             COMMON /T/ A
6    !$OMP    THREADPRIVATE(/T/)
7             CONTAINS
8               SUBROUTINE A25_4S_GOOD()
9                 COMMON /T/ A
10   !$OMP        THREADPRIVATE(/T/)
3 Example A.25.5f
4                     PROGRAM A25_5_GOOD
5                       INTEGER, ALLOCATABLE, SAVE :: A(:)
6                       INTEGER, POINTER, SAVE :: PTR
7                       INTEGER, SAVE :: I
8                       INTEGER, TARGET :: TARG
9                       LOGICAL :: FIRSTIN = .TRUE.
10              !$OMP   THREADPRIVATE(A, I, PTR)
11
12                        ALLOCATE (A(3))
13                        A = (/1,2,3/)
14                        PTR => TARG
15                        I = 5
16
17              !$OMP     PARALLEL COPYIN(I, PTR)
18              !$OMP       CRITICAL
19                            IF (FIRSTIN) THEN
20                              TARG = 4           ! Update target of ptr
21                              I = I + 10
22                              IF (ALLOCATED(A)) A = A + 10
23                              FIRSTIN = .FALSE.
24                            END IF
25
26                             IF (ALLOCATED(A)) THEN
27                               PRINT *, 'a = ', A
28                             ELSE
29                               PRINT *, 'A is not allocated'
30                             END IF
37              The above program, if executed by two threads, will print one of the following two sets
38              of output:
39              a = 11 12 13
40              ptr = 4
41              i = 15
42 A is not allocated
3 or
4    A is not allocated
5    ptr = 4
6    i = 15
7    a = 1 2 3
8    ptr = 4
9    i = 5
11 Example A.25.6f
12         MODULE A25_MODULE6
13           REAL, POINTER :: WORK(:)
14           SAVE WORK
15   !$OMP   THREADPRIVATE(WORK)
16         END MODULE A25_MODULE6
17
18         SUBROUTINE SUB1(N)
19         USE A25_MODULE6
20   !$OMP   PARALLEL PRIVATE(THE_SUM)
21           ALLOCATE(WORK(N))
22           CALL SUB2(THE_SUM)
23          WRITE(*,*)THE_SUM
24   !$OMP   END PARALLEL
25         END SUBROUTINE SUB1
26
27          SUBROUTINE SUB2(THE_SUM)
28            USE A25_MODULE6
29            WORK(:) = 10
30            THE_SUM=SUM(WORK)
31          END SUBROUTINE SUB2
32
33          PROGRAM A25_6_GOOD
34            N = 10
35            CALL SUB1(N)
36          END PROGRAM A25_6_GOOD
37
                                             Fortran
38
                                             C/C++
39   The following example illustrates initialization of threadprivate variables for class-type
40   T. t1 is default constructed, t2 is constructed taking a constructor accepting one
41   argument of integer type, t3 is copy constructed with argument f():
2               static T t1;
3               #pragma omp threadprivate(t1)
4               static T t2( 23 );
5               #pragma omp threadprivate(t2)
6               static T t3 = f();
7               #pragma omp threadprivate(t3)
8               The following example illustrates the use of threadprivate for static class members. The
9               threadprivate directive for a static class member must be placed inside the class
10              definition.
11 Example A.25.5c
12              class T {
13               public:
14                 static int i;
15              #pragma omp threadprivate(i)
16              };
17
                                                       C/C++
18
                                                       C/C++
19
22              Example A.26.1c
23              #include <vector>
24              void iterator_example()
25              {
26                std::vector<int> vec(23);
27                std::vector<int>::iterator it;
28              #pragma omp parallel for default(none) shared(vec)
29                for (it = vec.begin(); it < vec.end(); it++)
30                {
31                  // do work with *it //
32                }
33              }
                                                       C/C++
10 Example A.27.1f
11                  SUBROUTINE A27_1_GOOD()
12                    COMMON /C/ X,Y
13                    REAL X, Y
3 Example A.27.2f
4                       SUBROUTINE A27_2_GOOD()
5                          COMMON /C/ X,Y
6                          REAL X, Y
7 INTEGER I
8               !$OMP      PARALLEL
9               !$OMP        DO PRIVATE(/C/)
10                           DO I=1,1000
11                             ! do work here
12                           ENDDO
13              !$OMP        END DO
14              !
15              !$OMP     DO PRIVATE(X)
16                        DO I=1,1000
17                          ! do work here
18                        ENDDO
19              !$OMP     END DO
20              !$OMP   END PARALLEL
21                    END SUBROUTINE A27_2_GOOD
23 Example A.27.3f
24                       SUBROUTINE A27_3_GOOD()
25                         COMMON /C/ X,Y
2 Example A.27.4f
3                 SUBROUTINE A27_4_WRONG()
4                   COMMON /C/ X,Y
5           ! Incorrect because X is a constituent element of C
6           !$OMP   PARALLEL PRIVATE(/C/), SHARED(X)
7                     ! do work here
8           !$OMP   END PARALLEL
9                 END SUBROUTINE A27_4_WRONG
12 Example A.27.5f
13                SUBROUTINE A27_5_WRONG()
14                  COMMON /C/ X,Y
15          ! Incorrect: common block C cannot be declared both
16          ! shared and private
17          !$OMP   PARALLEL PRIVATE (/C/), SHARED(/C/)
18                    ! do work here
19          !$OMP   END PARALLEL
22
3               #include <omp.h>
4               int x, y, z[1000];
5               #pragma omp threadprivate(x)
6               void a28(int a) {
7                 const int c = 1;
8                 int i = 0;
3                   SUBROUTINE A28(A)
4                   INCLUDE "omp_lib.h"       ! or USE OMP_LIB
5
6                   INTEGER A
7                 INTEGER X, Y, Z(1000)
8                 COMMON/BLOCKX/X
9                 COMMON/BLOCKY/Y
10                COMMON/BLOCKZ/Z
11          !$OMP THREADPRIVATE(/BLOCKX/)
12                    INTEGER I, J
13                    i = 1
21          !$OMP       DO firstprivate(y)
22                      DO I = 1,10
23                        Z(I) = Y ! O.K. - I is the loop iteration variable
24                                  ! Y is listed in FIRSTPRIVATE clause
25                      END DO
30
                                                  Fortran
31
4 Example A.29.1f
5               SUBROUTINE A29
6
7                 INCLUDE "omp_lib.h"          ! or USE OMP_LIB
8
9                 REAL A(20)
10                INTEGER MYTHREAD
12                MYTHREAD = OMP_GET_THREAD_NUM()
13                IF (MYTHREAD .EQ. 0) THEN
14                   CALL SUB(A(1:10)) ! compiler may introduce writes to A(6:10)
15                ELSE
16                   A(6:10) = 12
17                ENDIF
20              SUBROUTINE SUB(X)
21                REAL X(*)
22                X(1:5) = 4
23              END SUBROUTINE SUB
24
                                                       Fortran
25
3    #include <stdio.h>
4    #include <assert.h>
5    int main()
6    {
7      int i, j;
8      int *ptr_i, *ptr_j;
9        i = 1;
10       j = 2;
11       ptr_i = &i;
12       ptr_j = &j;
20       return 0;
21   }
                                            C/C++
22
                                           Fortran
23   Example A.30.1f
24           PROGRAM A30
25             INTEGER I, J
26                I = 1
27                J = 2
6 int a;
7               void g(int k) {
8                 a = k; /* Accessed in the region but outside of the construct;
9                         * therefore unspecified whether original or private list
10                        * item is modified. */
11              }
12              void f(int n) {
13                int a = 0;
14
15                  #pragma omp parallel for private(a)
16                   for (int i=1; i<n; i++) {
17                       a = i;
18                       g(a*2);     /* Private copy of "a" */
19                    }
20              }
                                                         C/C++
3                   MODULE A30_2
4                      REAL A
5 CONTAINS
6                        SUBROUTINE G(K)
7                          REAL K
8                          A = K ! Accessed in the region but outside of the
9                                 ! construct; therefore unspecified whether
10                                ! original or private list item is modified.
11                       END SUBROUTINE G
12                       SUBROUTINE F(N)
13                       INTEGER N
14                       REAL A
15
16                         INTEGER I
17          !$OMP          PARALLEL DO PRIVATE(A)
18                           DO I = 1,N
19                             A = I
20                             CALL G(A*2)
21                           ENDDO
22          !$OMP          END PARALLEL DO
23                       END SUBROUTINE F
26
27   A.31   Reprivatization
28          The following example demonstrates the reprivatization of variables (see Section 2.9.3.3
29          on page 89). Private variables can be marked private again in a nested construct.
30          They do not have to be shared in the enclosing parallel region.
3               #include <assert.h>
4               void a31()
5               {
6                 int i, a;
20                      SUBROUTINE A31()
21                        INTEGER I, A
7 Example A.32.1f
8                    SUBROUTINE SUB()
9                    COMMON /BLOCK/ X
10                   PRINT *,X               ! X is undefined
11                   END SUBROUTINE SUB
12                 PROGRAM A32_1
13                   COMMON /BLOCK/ X
14                   X = 1.0
15          !$OMP    PARALLEL PRIVATE (X)
16                   X = 2.0
17                   CALL SUB()
18          !$OMP    END PARALLEL
19                END PROGRAM A32_1
20 Example A.32.2f
21                  PROGRAM A32_2
22                    COMMON /BLOCK2/ X
23                    X = 1.0
28 CONTAINS
29                      SUBROUTINE SUB()
30                      COMMON /BLOCK2/ Y
2
                                                  Fortran (cont.)
3               Example A.32.3f
4                         PROGRAM A32_3
5                         EQUIVALENCE (X,Y)
6                         X = 1.0
13 Example A.32.4f
14                      PROGRAM A32_4
15                        INTEGER I, J
16                        INTEGER A(100), B(100)
17                        EQUIVALENCE (A(51), B(1))
23                             DO J=1,100
24                               A(J) = J     ! B becomes undefined at this point
25                             ENDDO
26                           DO J=1,50
27                             B(J) = B(J) + 1 ! B is undefined
28                                       ! A becomes undefined at this point
29                           ENDDO
30                        ENDDO
31              !$OMP END PARALLEL DO       ! The LASTPRIVATE write for A has
32                                          ! undefined results
2            SUBROUTINE SUB1(X)
3              DIMENSION X(10)
4
5              !   This use of X does not conform to the
6              !   specification. It would be legal Fortran 90,
7              !   but the OpenMP private directive allows the
8              !   compiler to break the sequence association that
9              !   A had with the rest of the common block.
10
11             FORALL (I = 1:10) X(I) = I
12           END SUBROUTINE SUB1
13           PROGRAM A32_5
14             COMMON /BLOCK5/ A
15             DIMENSION B(10)
16             EQUIVALENCE (A,B(1))
26                 CALL SUB1(A)
27   !$OMP         MASTER
28                   PRINT *, A
29   !$OMP         END MASTER
2 #include <assert.h>
23          int main() {
24            f(2, A, A[0]);
25            return 0;
26          }
                                                    C/C++
27
16 SUBROUTINE A34(N, A, B)
17                        INTEGER N
18                        REAL A(*), B(*)
19                        INTEGER I
20              !$OMP PARALLEL
21              !$OMP DO LASTPRIVATE(I)
22                        DO I=1,N-1
23                          A(I) = B(I) + B(I+1)
24                        ENDDO
29
7        a = 0.0;
8        b = 0;
12            a += x[i];
13            b ^= y[i];
14 }
15   }
                                            C/C++
16
                                           Fortran
17   Example A.35.1f
18 SUBROUTINE A35_1(A, B, X, Y, N)
19                INTEGER N
20                REAL X(*), Y(*), A, B
23 DO I=1,N
24 A = A + X(I)
25 B = MIN(B, Y(I))
29 END DO
9                   a = 0.0;
10                  b = 0;
18                          a_p += x[i];
19                          b_p ^= y[i];
20 }
26                  }
27              }
                                                      C/C++
8              A_P = 0.0
9              B_P = HUGE(B_P)
10   !$OMP     DO PRIVATE(I)
11             DO I=1,N
12               A_P = A_P + X(I)
13               B_P = MIN(B_P, Y(I))
14             ENDDO
15   !$OMP     END DO
16   !$OMP     CRITICAL
17               A = A + A_P
18               B = MIN(B, B_P)
19   !$OMP     END CRITICAL
24 Example A.35.3f
25    PROGRAM A35_3_WRONG
26    MAX = HUGE(0)
27    M = 0
28
29    !$OMP PARALLEL DO REDUCTION(MAX: M)
30   ! MAX is no longer the intrinsic so this is non-conforming
31    DO I = 1, 100
32       CALL SUB(M,I)
33    END DO
34
35    END PROGRAM A35_3_WRONG
36
37    SUBROUTINE SUB(M,I)
38       M = MAX(M,I)
39    END SUBROUTINE SUB
3 Example A.35.4f
4               MODULE M
5                  INTRINSIC MAX
6               END MODULE M
7               PROGRAM A35_4
8                  USE M, REN => MAX
9                  N = 0
10              !$OMP PARALLEL DO REDUCTION(REN: N)          ! still does MAX
11                 DO I = 1, 100
12                    N = MAX(N,I)
13                 END DO
14              END PROGRAM A35_4
15              The following conforming program performs the reduction using intrinsic procedure
16              name MAX even though the intrinsic MAX has been renamed to MIN.
17 Example A.35.5f
18              MODULE MOD
19                 INTRINSIC MAX, MIN
20              END MODULE MOD
21              PROGRAM A35_5
22                 USE MOD, MIN=>MAX, MAX=>MIN
23                 REAL :: R
24                 R = -HUGE(0.0)
8 #include <stdio.h>
3 INTEGER A, I
5               !$OMP MASTER
6                     A = 0
7               !$OMP END MASTER
9               !$OMP DO REDUCTION(+:A)
10                    DO I= 0, 9
11                       A = A + I
12                    END DO
13              !$OMP SINGLE
14                    PRINT *, "Sum is ", A
15              !$OMP END SINGLE
19
3 #include <stdlib.h>
4    float* work;
5    int size;
6    float tol;
8    void build()
9    {
10     int i;
11     work = (float*)malloc( sizeof(float)*size );
12     for( i = 0; i < size; ++i ) work[i] = tol;
13   }
23
                                       C/C++
3                     MODULE M
4                       REAL, POINTER, SAVE :: WORK(:)
5                       INTEGER :: SIZE
6                       REAL :: TOL
7               !$OMP   THREADPRIVATE(WORK,SIZE,TOL)
8                     END MODULE M
9                     SUBROUTINE A36( T, N )
10                      USE M
11                      REAL :: T
12                      INTEGER :: N
13                      TOL = T
14                      SIZE = N
15              !$OMP   PARALLEL COPYIN(TOL,SIZE)
16                      CALL BUILD
17              !$OMP   END PARALLEL
18                    END SUBROUTINE A36
19                     SUBROUTINE BUILD
20                       USE M
21                       ALLOCATE(WORK(SIZE))
22                       WORK = TOL
23                     END SUBROUTINE BUILD
24
                                                         Fortran
25
3    #include <stdio.h>
4    float x, y;
5    #pragma omp threadprivate(x, y)
14         SUBROUTINE INIT(A,B)
15         REAL A, B
16           COMMON /XY/ X,Y
17   !$OMP   THREADPRIVATE (/XY/)
18   !$OMP     SINGLE
19               READ (11) A,B,X,Y
20   !$OMP     END SINGLE COPYPRIVATE (A,B,/XY/)
7               #include <stdio.h>
8               #include <stdlib.h>
9               float read_next( ) {
10                float * tmp;
11                float return_val;
27                  return return_val;
28              }
                                                       C/C++
5    !$OMP     SINGLE
6                ALLOCATE (TMP)
7    !$OMP     END SINGLE COPYPRIVATE (TMP)     ! copies the pointer only
8    !$OMP     MASTER
9                READ (11) TMP
10   !$OMP     END MASTER
11   !$OMP     BARRIER
12               READ_NEXT = TMP
13   !$OMP     BARRIER
14   !$OMP     SINGLE
15               DEALLOCATE (TMP)
16   !$OMP     END SINGLE NOWAIT
17             END FUNCTION READ_NEXT
18
                                           Fortran
19   Suppose that the number of lock variables required within a parallel region cannot
20   easily be determined prior to entering it. The copyprivate clause can be used to
21   provide access to shared lock variables that are allocated within that parallel region.
22
                                            C/C++
23   Example A.37.3c
24   #include <stdio.h>
25   #include <stdlib.h>
26   #include <omp.h>
27   omp_lock_t *new_lock()
28   {
29     omp_lock_t *lock_ptr;
35       return lock_ptr;
36   }
                                            C/C++
3                       FUNCTION NEW_LOCK()
4                       USE OMP_LIB       ! or INCLUDE "omp_lib.h"
5                         INTEGER(OMP_LOCK_KIND), POINTER :: NEW_LOCK
6               !$OMP   SINGLE
7                         ALLOCATE(NEW_LOCK)
8                         CALL OMP_INIT_LOCK(NEW_LOCK)
9               !$OMP   END SINGLE COPYPRIVATE(NEW_LOCK)
10                    END FUNCTION NEW_LOCK
11              Note that the effect of the copyprivate clause on a variable with the allocatable
12              attribute is different than on a variable with the pointer attribute.
13 Example A.37.4f
14                      SUBROUTINE S(N)
15                      INTEGER N
27              !$OMP   BARRIER
28              !$OMP   SINGLE
29                        DEALLOCATE (B)
30              !$OMP   END SINGLE NOWAIT
31                    END SUBROUTINE S
32
                                                     Fortran
9           void good_nesting(int n)
10          {
11            int i, j;
12            #pragma omp parallel default(shared)
13            {
14              #pragma omp for
15              for (i=0; i<n; i++) {
16                #pragma omp parallel shared(i, n)
17                {
18                  #pragma omp for
19                  for (j=0; j < n; j++)
20                    work(i, j);
21                }
22              }
23            }
24          }
                                                  C/C++
3                       SUBROUTINE WORK(I, J)
4                       INTEGER I, J
5                       END SUBROUTINE WORK
6                       SUBROUTINE GOOD_NESTING(N)
7                       INTEGER N
8                         INTEGER I
9               !$OMP     PARALLEL DEFAULT(SHARED)
10              !$OMP       DO
11                          DO I = 1, N
12              !$OMP          PARALLEL SHARED(I,N)
13              !$OMP            DO
14                               DO J = 1, N
15                                  CALL WORK(I,J)
16                               END DO
17              !$OMP          END PARALLEL
18                          END DO
19              !$OMP     END PARALLEL
20                      END SUBROUTINE GOOD_NESTING
21
                                                      Fortran
15   void good_nesting2(int n)
16   {
17     int i;
18     #pragma omp parallel default(shared)
19     {
20       #pragma omp for
21       for (i=0; i<n; i++)
22         work1(i, n);
23     }
24   }
                                           C/C++
3                      SUBROUTINE WORK(I, J)
4                      INTEGER I, J
5                      END SUBROUTINE WORK
6                     SUBROUTINE WORK1(I, N)
7                     INTEGER J
8               !$OMP PARALLEL DEFAULT(SHARED)
9               !$OMP DO
10                       DO J = 1, N
11                         CALL WORK(I,J)
12                       END DO
13              !$OMP END PARALLEL
14                    END SUBROUTINE WORK1
15                    SUBROUTINE GOOD_NESTING2(N)
16                    INTEGER N
17              !$OMP PARALLEL DEFAULT(SHARED)
18              !$OMP DO
19                    DO I = 1, N
20                       CALL WORK1(I, N)
21                    END DO
22              !$OMP END PARALLEL
23                    END SUBROUTINE GOOD_NESTING2
24
                                                       Fortran
25
6    void wrong1(int n)
7    {
8      #pragma omp parallel default(shared)
9      {
10       int i, j;
11       #pragma omp for
12       for (i=0; i<n; i++) {
13          /* incorrect nesting of loop regions */
14          #pragma omp for
15            for (j=0; j<n; j++)
16              work(i, j);
17       }
18     }
19   }
                                          C/C++
20
                                          Fortran
21   Example A.39.1f
22           SUBROUTINE WORK(I, J)
23           INTEGER I, J
24           END SUBROUTINE WORK
25           SUBROUTINE WRONG1(N)
26           INTEGER N
27           INTEGER I,J
28   !$OMP   PARALLEL DEFAULT(SHARED)
29   !$OMP     DO
30             DO I = 1, N
31   !$OMP        DO              ! incorrect nesting of loop regions
32                DO J = 1, N
33                   CALL WORK(I,J)
34                END DO
35             END DO
36   !$OMP   END PARALLEL
37         END SUBROUTINE WRONG1
38
                                          Fortran
12              void wrong2(int n)
13              {
14                #pragma omp parallel default(shared)
15                {
16                  int i;
17                  #pragma omp for
18                    for (i=0; i<n; i++)
19                       work1(i, n);
20                }
21              }
                                                     C/C++
22
                                                     Fortran
23              Example A.39.2f
24                      SUBROUTINE WORK1(I,N)
25                      INTEGER I, N
26                      INTEGER J
27              !$OMP    DO        ! incorrect nesting of loop regions
28                       DO J = 1, N
29                          CALL WORK(I,J)
30                       END DO
31                      END SUBROUTINE WORK1
32                      SUBROUTINE WRONG2(N)
33                      INTEGER N
34                      INTEGER I
35              !$OMP    PARALLEL DEFAULT(SHARED)
36              !$OMP       DO
37                          DO I = 1, N
38                             CALL WORK1(I,N)
39                          END DO
40              !$OMP    END PARALLEL
41                      END SUBROUTINE WRONG2
42
                                                     Fortran
21           SUBROUTINE WRONG3(N)
22           INTEGER N
23           INTEGER I
24   !$OMP   PARALLEL DEFAULT(SHARED)
25   !$OMP     DO
26             DO I = 1, N
27   !$OMP        SINGLE            ! incorrect nesting of regions
28                  CALL WORK(I, 1)
29   !$OMP        END SINGLE
30             END DO
31   !$OMP   END PARALLEL
32         END SUBROUTINE WRONG3
33
                                         Fortran
22                      SUBROUTINE WRONG4(N)
23                      INTEGER N
24                      INTEGER I
25              !$OMP   PARALLEL DEFAULT(SHARED)
26              !$OMP     DO
27                        DO I = 1, N
28                           CALL WORK(I, 1)
29              ! incorrect nesting of barrier region in a loop region
30              !$OMP        BARRIER
31                           CALL WORK(I, 2)
32                        END DO
33              !$OMP   END PARALLEL
34                    END SUBROUTINE WRONG4
35
                                                    Fortran
22           SUBROUTINE WRONG5(N)
23           INTEGER N
22                      SUBROUTINE WRONG6(N)
23                       INTEGER N
18          #include <omp.h>
19          #include <stdlib.h>
24              omp_set_dynamic(0);
25              omp_set_num_threads(16);
30                  iam = omp_get_thread_num();
31                  ipoints = npoints/16;
32                  do_by_16(x, iam, ipoints);
33              }
34          }
                                                   C/C++
9                         INTEGER NPOINTS
10                        REAL X(NPOINTS)
12                        CALL OMP_SET_DYNAMIC(.FALSE.)
13                        CALL OMP_SET_NUM_THREADS(16)
18                          IAM = OMP_GET_THREAD_NUM()
19                          IPOINTS = NPOINTS/16
20                          CALL DO_BY_16(X,IAM,IPOINTS)
24
3    #include <omp.h>
4    void work(int i);
5    void incorrect()
6    {
7      int np, i;
8 np = omp_get_num_threads(); /* misplaced */
15           SUBROUTINE WORK(I)
16           INTEGER I
17             I = I + 1
18           END SUBROUTINE WORK
19           SUBROUTINE INCORRECT()
20             INCLUDE "omp_lib.h"       ! or USE OMP_LIB
21             INTEGER I, NP
5               #include <omp.h>
6               void work(int i);
7               void correct()
8               {
9                 int i;
18                      SUBROUTINE WORK(I)
19                        INTEGER I
20
21                        I = I + 1
7 #include <omp.h>
8           omp_lock_t *new_locks()
9           {
10            int i;
11            omp_lock_t *lock = new omp_lock_t[1000];
21                  FUNCTION NEW_LOCKS()
22                    USE OMP_LIB        ! or INCLUDE "omp_lib.h"
23                    INTEGER(OMP_LOCK_KIND), DIMENSION(1000) :: NEW_LOCKS
24 INTEGER I
8               This change in ownership requires extra care when using locks. The following program
9               is conforming in OpenMP 2.5 because the thread that releases the lock lck in the parallel
10              region is the same thread that acquired the lock in the sequential part of the program
11              (master thread of parallel region and the initial thread are the same). However, it is not
12              conforming in OpenMP 3.0, because the task region that releases the lock lck is different
13              from the task region that acquires the lock.
14
                                                        C/C++
15              Example A.43.1c
16              #include <stdlib.h>
17              #include <stdio.h>
18              #include <omp.h>
19              int main()
20              {
21                int x;
22                omp_lock_t lck;
23
24                  omp_init_lock (&lck);
25                  omp_set_lock (&lck);
26                  x = 0;
3                    program lock
4                    use omp_lib
5                    integer :: x
6                    integer (kind=omp_lock_kind) :: lck
18                   call omp_destroy_lock(lck)
19                   end
20
                                                     Fortran
21
4 Example A.44.1c
5               #include <stdio.h>
6               #include <omp.h>
7               void skip(int i) {}
8               void work(int i) {}
9               int main()
10              {
11                omp_lock_t lck;
12                int id;
13 omp_init_lock(&lck);
17                      omp_set_lock(&lck);
18                      /* only one thread at a time can execute this printf */
19                      printf("My thread id is %d.\n", id);
20                      omp_unset_lock(&lck);
21                      while (! omp_test_lock(&lck)) {
22                        skip(id);   /* we do not yet have the lock,
23                                       so we must do something else */
24                      }
27                      omp_unset_lock(&lck);
28                  }
29                  omp_destroy_lock(&lck);
30
31                  return 0;
32              }
                                                      C/C++
3 Example A.44.1f
4            SUBROUTINE SKIP(ID)
5            END SUBROUTINE SKIP
6            SUBROUTINE WORK(ID)
7            END SUBROUTINE WORK
8 PROGRAM A44
10             INTEGER(OMP_LOCK_KIND) LCK
11             INTEGER ID
12 CALL OMP_INIT_LOCK(LCK)
7               #include <omp.h>
8               typedef struct {
9                     int a,b;
10                    omp_nest_lock_t lck; } pair;
11              int work1();
12              int work2();
13              int work3();
14              void incr_a(pair *p, int a)
15              {
16                /* Called only from incr_pair, no need to lock. */
17                p->a += a;
18              }
19              void incr_b(pair *p, int b)
20              {
21                /* Called both from incr_pair and elsewhere, */
22                /* so need a nestable lock. */
23                omp_set_nest_lock(&p->lck);
24                p->b += b;
25                omp_unset_nest_lock(&p->lck);
26              }
27              void incr_pair(pair *p, int a, int b)
28              {
29                omp_set_nest_lock(&p->lck);
30                incr_a(p, a);
31                incr_b(p, b);
32                omp_unset_nest_lock(&p->lck);
33              }
34              void a45(pair *p)
35              {
36                #pragma omp parallel sections
37                {
38                  #pragma omp section
39                    incr_pair(p, work1(), work2());
40                  #pragma omp section
41                    incr_b(p, work3());
42                }
43              }
                                                       C/C++
3         MODULE DATA
4           USE OMP_LIB, ONLY: OMP_NEST_LOCK_KIND
5           TYPE LOCKED_PAIR
6             INTEGER A
7             INTEGER B
8             INTEGER (OMP_NEST_LOCK_KIND) LCK
9          END TYPE
10        END MODULE DATA
11        SUBROUTINE INCR_A(P, A)
12          ! called only from INCR_PAIR, no need to lock
13          USE DATA
14          TYPE(LOCKED_PAIR) :: P
15          INTEGER A
16          P%A = P%A + A
17        END SUBROUTINE INCR_A
18        SUBROUTINE INCR_B(P, B)
19          ! called from both INCR_PAIR and elsewhere,
20          ! so we need a nestable lock
21          USE OMP_LIB       ! or INCLUDE "omp_lib.h"
22          USE DATA
23          TYPE(LOCKED_PAIR) :: P
24          INTEGER B
25          CALL OMP_SET_NEST_LOCK(P%LCK)
26          P%B = P%B + B
27          CALL OMP_UNSET_NEST_LOCK(P%LCK)
28        END SUBROUTINE INCR_B
29        SUBROUTINE INCR_PAIR(P, A, B)
30          USE OMP_LIB        ! or INCLUDE "omp_lib.h"
31          USE DATA
32          TYPE(LOCKED_PAIR) :: P
33          INTEGER A
34          INTEGER B
35          CALL OMP_SET_NEST_LOCK(P%LCK)
36          CALL INCR_A(P, A)
37          CALL INCR_B(P, B)
38          CALL OMP_UNSET_NEST_LOCK(P%LCK)
39        END SUBROUTINE INCR_PAIR
40        SUBROUTINE A45(P)
41          USE OMP_LIB        ! or INCLUDE "omp_lib.h"
42          USE DATA
43          TYPE(LOCKED_PAIR) :: P
44          INTEGER WORK1, WORK2, WORK3
45          EXTERNAL WORK1, WORK2, WORK3
2                !$OMP     SECTION
3                            CALL INCR_PAIR(P, WORK1(), WORK2())
4                !$OMP     SECTION
5                            CALL INCR_B(P, WORK3())
6                !$OMP     END PARALLEL SECTIONS
6                   This section provides stubs for the runtime library routines defined in the OpenMP API.
7                   The stubs are provided to enable portability to platforms that do not support the
8                   OpenMP API. On these platforms, OpenMP programs must be linked with a library
9                   containing these stub routines. The stub routines assume that the directives in the
10                  OpenMP program are ignored. As such, they emulate serial semantics.
11                  Note that the lock variable that appears in the lock routines must be accessed
12                  exclusively through these routines. It should not be initialized or otherwise modified in
13                  the user program.
14                  In an actual implementation the lock variable might be used to hold the address of an
15                  allocated memory block, but here it is used to hold an integer value. Users should not
16                  make assumptions about mechanisms used by OpenMP implementations to implement
17                  locks based on the scheme used by the stub procedures.
18
                                                           Fortran
19                  Note – In order to be able to compile the Fortran stubs file, the include file
20                  omp_lib.h was split into two files: omp_lib_kinds.h and omp_lib.h and the
21                  omp_lib_kinds.h file included where needed. There is no requirement for the
22                  implementation to provide separate files.
23
                                                           Fortran
24                                                                                                       277
1
9                 int omp_get_num_threads(void)
10                {
11                    return 1;
12                }
13                int omp_get_max_threads(void)
14                {
15                    return 1;
16                }
17                int omp_get_thread_num(void)
18                {
19                    return 0;
20                }
21                int omp_get_num_procs(void)
22                {
23                    return 1;
24                }
25                int omp_in_parallel(void)
26                {
27                    return 0;
28                }
32                int omp_get_dynamic(void)
33                {
34                    return 0;
35                }
13   int omp_get_thread_limit(void)
14   {
15       return 1;
16   }
20   int omp_get_max_active_levels(void)
21   {
22       return 0;
23   }
24   int omp_get_level(void)
25   {
26       return 0;
27   }
12              int omp_get_active_level(void)
13              {
14                  return 0;
15              }
16              struct __omp_lock
17              {
18                  int lock;
19              };
38   struct __omp_nest_lock
39   {
40       short owner;
41       short count;
42   };
9    double omp_get_wtime(void)
10   {
11   /* This function does not provide a working
12     * wallclock timer. Replace it with a version
13     * customized for the target machine.
14     */
15        return 0.0;
16   }
17   double omp_get_wtick(void)
18   {
19   /* This function does not provide a working
20     * clock tick function. Replace it with
21     * a version customized for the target machine.
22     */
23        return 365. * 86400.;
24   }
28                         subroutine omp_set_dynamic(dynamic_threads)
29                           logical dynamic_threads
30                           return
31                         end subroutine
36                         subroutine omp_set_nested(nested)
37                           logical nested
38                           return
39                         end subroutine
15     kind = omp_sched_static
16     modifier = 0
17     return
18   end subroutine
14                     subroutine omp_init_lock(lock)
15                       ! lock is 0 if the simple lock is not initialized
16                       !        -1 if the simple lock is initialized but not set
17                       !         1 if the simple lock is set
18                       include 'omp_lib_kinds.h'
19                       integer(kind=omp_lock_kind) lock
20                       lock = -1
21                       return
22                     end subroutine
23                     subroutine omp_destroy_lock(lock)
24                       include 'omp_lib_kinds.h'
25                       integer(kind=omp_lock_kind) lock
26                       lock = 0
27                       return
28                     end subroutine
29                     subroutine omp_set_lock(lock)
30                       include 'omp_lib_kinds.h'
31                       integer(kind=omp_lock_kind) lock
41                       return
42                     end subroutine
13     return
14   end subroutine
27     return
28   end function
29   subroutine omp_init_nest_lock(nlock)
30     ! nlock is
31     ! 0 if the nestable lock is not initialized
32     ! -1 if the nestable lock is initialized but not set
33     ! 1 if the nestable lock is set
34     ! no use count is maintained
35     include 'omp_lib_kinds.h'
36     integer(kind=omp_nest_lock_kind) nlock
37 nlock = -1
38     return
39   end subroutine
4 nlock = 0
5                        return
6                      end subroutine
7                      subroutine omp_set_nest_lock(nlock)
8                        include 'omp_lib_kinds.h'
9                        integer(kind=omp_nest_lock_kind) nlock
19                       return
20                     end subroutine
21                     subroutine omp_unset_nest_lock(nlock)
22                       include 'omp_lib_kinds.h'
23                       integer(kind=omp_nest_lock_kind) nlock
33                       return
34                     end subroutine
13     return
14   end function
19 omp_get_wtime = 0.0d0
20     return
21   end function
28 omp_get_wtick = one_year
29     return
30   end function
6    C.1            Notation
7                   The grammar rules consist of the name for a non-terminal, followed by a colon,
8                   followed by replacement alternatives on separate lines.
9                   The syntactic expression termopt indicates that the term is optional within the
10                  replacement.
11                  The syntactic expression termoptseq is equivalent to term-seqopt with the following
12                  additional rules:
13                  term-seq :
14 term
15 term-seq term
16 term-seq , term
17                                                                                                        291
1
2    C.2          Rules
3                 The notation is described in Section 6.1 of the C standard. This grammar appendix
4                 shows the extensions to the base language grammar for the OpenMP C and C++
5                 directives.
8 statement
9 openmp-directive
10 statement-seq statement
11 statement-seq openmp-directive
14                   statement
15                   openmp-directive
16 statement-list statement
17 statement-list openmp-directive
20 declaration
21 statement
22 openmp-directive
2 /* standard statements */
3       openmp-construct
4    openmp-construct:
5       parallel-construct
6 for-construct
7 sections-construct
8 single-construct
9 parallel-for-construct
10 parallel-sections-construct
11 task-construct
12 master-construct
13 critical-construct
14 atomic-construct
15      ordered-construct
16   openmp-directive:
17 barrier-directive
18 taskwait-directive
19      flush-directive
20   structured-block:
21      statement
22   parallel-construct:
23      parallel-directive structured-block
24   parallel-directive:
2 unique-parallel-clause
3                  data-default-clause
4                  data-privatization-clause
5 data-privatization-in-clause
6 data-sharing-clause
7                  data-reduction-clause
8               unique-parallel-clause:
9 if ( expression )
10 num_threads ( expression )
11                 copyin ( variable-list )
12              for-construct:
13                 for-directive iteration-statement
14              for-directive:
17                 unique-for-clause
18                 data-privatization-clause
19 data-privatization-in-clause
20 data-privatization-out-clause
21 data-reduction-clause
22                 nowait
23              unique-for-clause:
24 ordered
25 schedule ( schedule-kind )
27 collapse ( expression )
2 static
3       dynamic
4       guided
5 auto
6       runtime
7    sections-construct:
8       sections-directive section-scope
9    sections-directive:
12 data-privatization-clause
13 data-privatization-in-clause
14 data-privatization-out-clause
15 data-reduction-clause
16      nowait
17   section-scope:
18      { section-sequence }
19   section-sequence:
20 section-directiveopt structured-block
25      single-directive structured-block
26   single-directive:
2 unique-single-clause
3                  data-privatization-clause
4                  data-privatization-in-clause
5                  nowait
6               unique-single-clause:
7                  copyprivate ( variable-list )
8               task-construct:
9                  task-directive structured-block
10              task-directive:
13 unique-task-clause
14                 data-default-clause
15                 data-privatization-clause
16 data-privatization-in-clause
17                 data-sharing-clause
18              unique-task-clause:
19 if ( scalar-expression )
20                 untied
21              parallel-for-construct:
22                 parallel-for-directive iteration-statement
23              parallel-for-directive:
26 unique-parallel-clause
27 unique-for-clause
28 data-default-clause
2 data-privatization-in-clause
3 data-privatization-out-clause
4 data-sharing-clause
5       data-reduction-clause
6    parallel-sections-construct:
7        parallel-sections-directive section-scope
8    parallel-sections-directive:
11 unique-parallel-clause
12 data-default-clause
13 data-privatization-clause
14 data-privatization-in-clause
15 data-privatization-out-clause
16 data-sharing-clause
17      data-reduction-clause
18   master-construct:
19      master-directive structured-block
20   master-directive:
23      critical-directive structured-block
24   critical-directive:
27 ( identifier )
6                  atomic-directive expression-statement
7               atomic-directive:
12                 ( variable-list )
13              ordered-construct:
14                 ordered-directive structured-block
15              ordered-directive:
18 /* standard declarations */
19                 threadprivate-directive
20              threadprivate-directive:
23 default ( shared )
24                 default ( none )
25              data-privatization-clause:
26                 private ( variable-list )
27              data-privatization-in-clause:
28 firstprivate ( variable-list )
2       lastprivate ( variable-list )
3    data-sharing-clause:
4       shared ( variable-list )
5    data-reduction-clause:
11 identifier
12      variable-list , identifier
13   /* in C++ */
14   variable-list:
15 id-expression
16 variable-list , id-expression
3                  Interface Declarations
4
5                  This appendix gives examples of the C/C++ header file, the Fortran include file and
6                  Fortran 90 module that shall be provided by implementations as specified in Chapter 3.
7                  It also includes an example of a Fortran 90 generic interface for a library routine.
8                                                                                                    301
1
5                 /*
6                  * define the lock data types
7                  */
8                 typedef void *omp_lock_t;
10                /*
11                  * define the schedule kinds
12                  */
13                typedef enum omp_sched_t
14                {
15                     omp_sched_static = 1,
16                     omp_sched_dynamic = 2,
17                     omp_sched_guided = 3,
18                     omp_sched_auto = 4
19                /* , Add vendor specific schedule constants here */
20                } omp_sched_t;
21                /*
22                  * exported OpenMP functions
23                  */
24                #ifdef __cplusplus
25                extern           "C"
26                {
27                #endif
17   #ifdef __cplusplus
18   }
19   #endif
20 #endif
9                        integer      omp_sched_kind
10                       parameter ( omp_sched_kind = 4)
19                omp_lib.h:
20                C                                 default integer type assumed below
21                C                                 default logical type assumed below
22                C                                 OpenMP Fortran API v3.0
23                       include 'omp_lib_kinds.h'
24                       integer      openmp_version
25                       parameter ( openmp_version = 200805 )
26                       external    omp_set_num_threads
27                       external    omp_get_num_threads
28                       integer     omp_get_num_threads
29                       external    omp_get_max_threads
30                       integer     omp_get_max_threads
31                       external    omp_get_thread_num
32                       integer     omp_get_thread_num
33                       external    omp_get_num_procs
34                       integer     omp_get_num_procs
35                       external    omp_in_parallel
36                       logical     omp_in_parallel
37                       external    omp_set_dynamic
38                       external    omp_get_dynamic
39                       logical     omp_get_dynamic
40                       external    omp_set_nested
41                       external    omp_get_nested
42                       logical     omp_get_nested
43                       external    omp_set_schedule
44                       external    omp_get_schedule
45                       external    omp_get_thread_limit
13   external   omp_init_lock
14   external   omp_destroy_lock
15   external   omp_set_lock
16   external   omp_unset_lock
17   external   omp_test_lock
18   logical    omp_test_lock
19   external   omp_init_nest_lock
20   external   omp_destroy_nest_lock
21   external   omp_set_nest_lock
22   external   omp_unset_nest_lock
23   external   omp_test_nest_lock
24   integer    omp_test_nest_lock
25   external omp_get_wtick
26   double precision omp_get_wtick
27   external omp_get_wtime
28   double precision omp_get_wtime
6                            module omp_lib_kinds
7                            integer, parameter :: omp_integer_kind = 4
8                            integer, parameter :: omp_logical_kind = 4
9                            integer, parameter :: omp_lock_kind = 8
10                           integer, parameter :: omp_nest_lock_kind = 8
11                           integer, parameter :: omp_sched_kind = 4
12                           integer(kind=omp_sched_kind), parameter ::
13                      &      omp_sched_static = 1
14                           integer(kind=omp_sched_kind), parameter ::
15                      &      omp_sched_dynamic = 2
16                           integer(kind=omp_sched_kind), parameter ::
17                      &      omp_sched_guided = 3
18                           integer(kind=omp_sched_kind), parameter ::
19                      &      omp_sched_auto = 4
20                           end module omp_lib_kinds
21 module omp_lib
22                            use omp_lib_kinds
23                !                                       OpenMP Fortran API v3.0
24                            integer, parameter :: openmp_version = 200805
25 interface
31                            function omp_get_num_threads ()
32                             use omp_lib_kinds
33                             integer (kind=omp_integer_kind) :: omp_get_num_threads
34                            end function omp_get_num_threads
35                            function omp_get_max_threads ()
36                             use omp_lib_kinds
37                             integer (kind=omp_integer_kind) :: omp_get_max_threads
38                            end function omp_get_max_threads
39                            function omp_get_thread_num ()
40                             use omp_lib_kinds
41                             integer (kind=omp_integer_kind) :: omp_get_thread_num
42                            end function omp_get_thread_num
5        function omp_in_parallel ()
6         use omp_lib_kinds
7         logical (kind=omp_logical_kind) :: omp_in_parallel
8        end function omp_in_parallel
14       function omp_get_dynamic ()
15        use omp_lib_kinds
16        logical (kind=omp_logical_kind) :: omp_get_dynamic
17       end function omp_get_dynamic
23       function omp_get_nested ()
24        use omp_lib_kinds
25        logical (kind=omp_logical_kind) :: omp_get_nested
26       end function omp_get_nested
37       function omp_get_thread_limit()
38        use omp_lib_kinds
39        integer (kind=omp_integer_kind) :: omp_get_thread_limit
40       end function omp_get_thread_limit
41       subroutine omp_set_max_active_levels(var)
42        use omp_lib_kinds
43        integer (kind=omp_integer_kind), intent(in) :: var
44       end subroutine omp_set_max_active_levels
6                         function omp_get_level()
7                          use omp_lib_kinds
8                          integer (kind=omp_integer_kind) :: omp_get_level
9                         end function omp_get_level
10                        function omp_get_ancestor_thread_num(level)
11                         use omp_lib_kinds
12                         integer (kind=omp_integer_kind), intent(in) ::
13                    &      level
14                         integer (kind=omp_integer_kind) ::
15                    &      omp_get_ancestor_thread_num
16                        end function omp_get_ancestor_thread_num
17                        function omp_get_team_size(level)
18                         use omp_lib_kinds
19                         integer (kind=omp_integer_kind), intent(in) ::
20                    &        level
21                         integer (kind=omp_integer_kind) :: omp_get_team_size
22                        end function omp_get_team_size
23                        function omp_get_active_level()
24                         use omp_lib_kinds
25                         integer (kind=omp_integer_kind) ::
26                    &        omp_get_active_level
27                        end function omp_get_active_level
28         function omp_get_wtick ()
29           double precision :: omp_get_wtick
30         end function omp_get_wtick
31         function omp_get_wtime ()
32           double precision :: omp_get_wtime
33         end function omp_get_wtime
34 end interface
3                   Implementation Defined
4                   Behaviors in OpenMP
5
6                   This appendix summarizes the behaviors that are described as implementation defined in
7                   this API. Each behavior is cross-referenced back to its description in the main
8                   specification. An implementation is required to define and document its behavior in
9                   these cases.
10                  • Task scheduling points: it is implementation defined where task scheduling points
11                    occur in untied task regions (see Section 1.3 on page 11).
12                  • Memory model: it is implementation defined as to whether, and in what sizes,
13                    memory accesses by multiple threads to the same variable without synchronization
14                    are atomic with respect to each other (see Section 1.4.1 on page 13).
15                  • Internal control variables: the initial values of nthreads-var, dyn-var, run-sched-var,
16                    def-sched-var, stacksize-var, wait-policy-var, thread-limit-var, and max-active-levels-
17                    var are implementation defined (see Section 2.3.2 on page 29).
18                  • Dynamic adjustment of threads: it is implementation defined whether the ability to
19                    dynamically adjust the number of threads is provided. Implementations are allowed
20                    to deliver fewer threads (but at least one) than indicated in Algorithm 2-1 even if
21                    dynamic adjustment is disabled (see Section 2.4.1 on page 35).
22                  • Loop directive: the integer type or kind used to compute the iteration count of a
23                    collapsed loop is implementation defined. The effect of the schedule(runtime)
24                    clause when the run-sched-var ICV is set to auto is implementation defined. See
25                    Section 2.5.1 on page 38.
26                  • sections construct: the method of scheduling the structured blocks among threads
27                    in the team is implementation defined (see Section 2.5.2 on page 47).
28                  • single construct: the method of choosing a thread to execute the structured block
29                    is implementation defined (see Section 2.5.3 on page 49).
30                                                                                                       311
1               • atomic construct: a compliant implementation may enforce exclusive access
2                  between atomic regions which update different storage locations. The
3                  circumstances under which this occurs are implementation defined (see Section 2.8.5
4                  on page 69).
5               • omp_set_num_threads routine: if the argument is not a positive integer the
6                  behavior is implementation defined (see Section 3.2.1 on page 110).
7               • omp_set_schedule routine: the behavior for implementation defined schedule
8                  types is implementation defined (see Section 3.2.11 on page 121).
9               • omp_set_max_active_levels routine: when called from within any explicit
10                 parallel region the binding thread set (and binding region, if required) for the
11                 omp_set_max_active_levels region is implementation defined and the
12                 behavior is implementation defined. If the argument is not a non-negative integer
13                 then the behavior is implementation defined (see Section 3.2.14 on page 126).
14              • omp_get_max_active_levels routine: when called from within any explicit
15                 parallel region the binding thread set (and binding region, if required) for the
16                 omp_get_max_active_levels region is implementation defined (see
17                 Section 3.2.15 on page 127).
18              • OMP_SCHEDULE environment variable: if the value of the variable does not
19                 conform to the specified format then the result is implementation defined (see
20                 Section 4.1 on page 146).
21              • OMP_NUM_THREADS environment variable: if the value of the variable is greater
22                 than the number of threads the implementation can support or is not a positive integer
23                 then the result is implementation defined (see Section 4.2 on page 147).
24              • OMP_DYNAMIC environment variable: if the value is neither true nor false the
25                 behavior is implementation defined (see Section 4.3 on page 148).
26              • OMP_NESTED environment variable: if the value is neither true nor false the
27                 behavior is implementation defined (see Section 4.4 on page 148).
28              • OMP_STACKSIZE environment variable: if the value does not conform to the
29                 specified format or the implementation cannot provide a stack of the specified size
30                 then the behavior is implementation defined (see Section 4.5 on page 149).
31              • OMP_WAIT_POLICY environment variable: the details of the ACTIVE and
32                 PASSIVE behaviors are implementation defined (see Section 4.6 on page 150).
33              • OMP_MAX_ACTIVE_LEVELS environment variable: if the value is not a non-
34                 negative integer or is greater than the number of parallel levels an impementation can
35                 support then the behavior is implementation defined (see Section 4.7 on page 150).
36              • OMP_THREAD_LIMIT environment variable: if the requested value is greater than
37                 the number of threads an implementation can support, or if the value is not a positive
38                 integer, the behavior of the program is implementation defined (see Section 4.8 on
39                 page 151).
6                   This appendix summarizes the major changes between the OpenMP API Version 2.5
7                   specification and the OpenMP API Version 3.0 specification.
8                   • The concept of tasks has been added to the OpenMP execution model (see
9                     Section 1.2.3 on page 8 and Section 1.3 on page 11).
10                  • The task construct (see Section 2.7 on page 59) has been added, which provides a
11                    mechanism for creating tasks explicitly.
12                  • The taskwait construct (see Section 2.8.4 on page 68) has been added, which
13                    causes a task to wait for all its child tasks to complete.
14                  • The OpenMP memory model now covers atomicity of memory accesses (see
15                    Section 1.4.1 on page 13). The description of the behavior of volatile in terms of
16                    flush was removed.
17                  • In Version 2.5, there was a single copy of of the nest-var, dyn-var, nthreads-var and
18                    run-sched-var internal control variables (ICVs) for the whole program. In Version
19                    3.0, there is one copy of these ICVs per task (see Section 2.3 on page 28). As a result,
20                    the omp_set_num_threads, omp_set_nested and omp_set_dynamic
21                    runtime library routines now have specified effects when called from inside a
22                    parallel region (See Section 3.2.1 on page 110, Section 3.2.7 on page 117 and
23                    Section 3.2.9 on page 119).
24                  • The definition of active parallel region has been changed: in Version 3.0 a
25                    parallel region is active if it is executed by a team consisting of more than one
26                    thread (see Section 1.2.2 on page 2).
27                  • The rules for determining the number of threads used in a parallel region have
28                    been modified (see Section 2.4.1 on page 35).
29                  • In Version 3.0, the assignment of iterations to threads in a loop construct with a
30                    static schedule kind is deterministic (see Section 2.5.1 on page 38).
31                                                                                                         315
1               • In Version 3.0, a loop construct may be associated with more than one perfectly
2                  nested loop. The number of associated loops may be controlled by the collapse
3                  clause (see Section 2.5.1 on page 38).
4               • Random access iterators, and variables of unsigned integer type, may now be used as
5                  loop iterators in loops associated with a loop construct (see Section 2.5.1 on page 38).
6               • The schedule kind auto has been added, which gives the implementation the
7                  freedom to choose any possible mapping of iterations in a loop construct to threads in
8                  the team (see Section 2.5.1 on page 38).
9               • Fortran assumed-size arrays now have predetermined data-sharing attributes (see
10                 Section 2.9.1.1 on page 78).
11              • In Fortran, firstprivate is now permitted as an argument to the default
12                 clause (see Section 2.9.3.1 on page 86).
13              • For list items in the private clause, implementations are no longer permitted to use
14                 the storage of the original list item to hold the new list item on the master thread. If
15                 no attempt is made to reference the original list item inside the parallel region, its
16                 value is well defined on exit from the parallel region (see Section 2.9.3.3 on page
17                 89).
18              • In Version 3.0, Fortran allocatable arrays may appear in private,
19                 firstprivate, lastprivate, reduction, copyin and copyprivate
20                 clauses. (see Section 2.9.2 on page 81, Section 2.9.3.3 on page 89, Section 2.9.3.4 on
21                 page 92, Section 2.9.3.5 on page 94, Section 2.9.3.6 on page 96, Section 2.9.4.1 on
22                 page 101 and Section 2.9.4.2 on page 102).
23              • In Version 3.0, static class members variables may appear in a threadprivate
24                 directive (see Section 2.9.2 on page 81).
25              • Version 3.0 makes clear where, and with which arguments, constructors and
26                 destructors of private and threadprivate class type variables are called (see
27                 Section 2.9.2 on page 81, Section 2.9.3.3 on page 89, Section 2.9.3.4 on page 92,
28                 Section 2.9.4.1 on page 101 and Section 2.9.4.2 on page 102)
29              • The runtime library routines omp_set_schedule and omp_get_schedule
30                 have been added; these routines respectively set and retrieve the value of the
31                 run_sched_var ICV (see Section 3.2.11 on page 121 and Section 3.2.12 on page 123).
32              • The thread-limit-var ICV has been added, which controls the maximium number of
33                 threads participating in the OpenMP program. The value of this ICV can be set with
34                 the OMP_THREAD_LIMIT environment variable and retrieved with the
35                 omp_get_thread_limit runtime library routine (see Section 2.3.1 on page 28,
36                 Section 3.2.13 on page 125 and Section 4.8 on page 151).
37              • The max-active-levels-var ICV has been added, which controls the number of nested
38                 active parallel regions. The value of this ICV can be set with the
39                 OMP_MAX_ACTIVE_LEVELS environment variable and the
40                 omp_set_max_active_levels runtime library routine, and it can be retrieved