NERSCPowering Scientific Discovery Since 1974

OpenMP Tasking

Tasks were first introduced to OpenMP in version 3.0. For an overview of the basic functionality of tasks see The Introduction to OpenMP Tasks. Since then additional mechanisms have been added to more effectively create and synchronize tasks. In OpenMP 4.0, the depend clause and the taskgroup construct were introduced, and in 4.5 taskloops were introduced. Additional material on tasking may be found in the tutorial materials Programming Irregular Applications with OpenMP: A Hands-on Introduction.

 

The Depend Clause

The depend clause takes a type followed by a variable or list of variables.

#pragma omp task depend(in: x) depend( out: y) depend(inout: z)

The (address of) variables passed to the depend clause are used to correctly order the tasks. These constraints are determined by the type of dependency specified:

  • IN dependencies will make a task dependent on the last task that used the same variable as an out (or inout) dependency. 
  • OUT dependencies will make a task dependent on the last task that used the same variable as an in, out (or inout) dependency.
  • INOUT dependencues are the same as an out dependency, they are only used for readability.

These constraints establish an order of tasks and only between sibling tasks. There is no data movement or synchronization with respect to external accesses of the data. Only a synchronization of memory accesses between dependent tasks is established (there is no need for a flush).

A Simple Depend Clause Example

The following is a simple example to show how to build a series of task dependencies with the depend clause.

#pragma omp parallel
{
#pragma omp single
{
int x, y, z;
#pragma omp task depend( out: x )
x = init();
#pragma omp task depend( in: x ) depend( out: y)
y = f(x);
#pragma omp task depend( in: x ) depend( out: z)
z = g(x);
#pragma omp task depend( in: y, z )
finalize(y, z);

}} 

Example: Array Sections As Dependencies

Taken from the OpenMP examples document

void matmul_depend( int N, int BS, 
float A[N][N], float B[N][N], float C[N][N] ) {
for(int i=0; i<N; i+=BS) {
for(int j=0; j<N; j+=BS) {
for(int k=0; k<N; k+=BS) {
#pragma omp task depend(in: A[i:BS][k:BS], B[k:BS][j:BS]) depend(inout: C[i:BS][j:BS])
for(int ii=i; ii<i+BS; ii++)
for(int jj=j; jj<j+BS; jj++)
for(int kk=k; kk<k+BS; kk++)
C[ii][jj] += A[ii][kk] * B[kk][jj];
}
}
}
}

The example also notes that the function assumes BS divides evenly into N, and i, j, k, A, B, C are firstprivate by default, but since A, B and C are just pointers, they all refer to the same data.

Additionally this example uses array sections in the dependencies. OpenMP doesn't allow overlapping regions of memory to be used as dependencies. The runtime will not check these array sections to see if they overlap, it will only use the address of the beginning of the array section.

For another example see the Jacobi Stencil Example, and a more in depth example with the LU Decomposition Example

The taskgroup Construct

A taskgroup is similar in purpose to taskwait, but in addition to waiting on the child tasks (the tasks the current task directly spawned), it waits on all descendant tasks created in the code block that follows it. The code below demonstrates the basic syntax of a taskgroup. 

#pragma omp taskgroup
{
#pragma omp task
task_spawning_function();
}

Taskgroups can also be used to selectively synchronize tasks:


#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
background_work();
#pragma omp taskgroup
{
#pragma omp task
task_spawning_function();
}
}}

 In this example, the task or tasks in background_work() can continue executing until the end of the parallel region, while the task_spawning_function() task and all of it descendants will be waited on at the end of the taskgroup.

The Taskloop Construct

Similar to the for construct, the taskloop directive precedes a for loop, and it parallelizes the loop by creating tasks for one or more iterations of the loop. By default, the taskloop construct executes as if it was enclosed in a taskgroup construct, while the basic usage is similar to a for construct:

#pragma omp taskloop
for(int i=0; i < N; i++) {
//Work
}

It has clauses from the for construct, as well as clauses from the task construct in order to fine tune performance. Additionally it has unique clauses: grainsize, num_tasks and nogroup. The grainsize and num_tasks clauses are used to specify how to break up the iterations, while nogroup is similar to nowait, removing the taskgroup-like behavior.

The Taskloop Simd Construct

The taskloop simd construct is a combined construct, so it has similarities to the taskloop construct and the simd construct. The taskloop simd construct will split a loop up into tasks, and in addition to that, iterations within tasks will be combined as if the simd construct was used on them.