Compiling OpenMP code
To compile program containing OpenMP parallel directives the following flags can be used to create multi-threaded versions:
|-qopenmp||Enables parallelizer to generate multi-threaded code.|
|-qopenmp-stubs||Enables compilation of OpenMP programs in sequential mode.|
[ netID@cluster ~]$ icc -qopenmp -o myprog.x myprog.c [ netID@cluster ~]$ ifort -qopenmp myprog.x myprog.f90 [ netID@cluster ~]$ ifort -qopenmp-stubs -o myprog.x myprog.f90
Running OpenMP code
The table below shows some of the more common environmental variables that can be used to affect OpenMP behavior at run time.
|Environment Variable||Example||Example-Purpose||Default value|
|OMP_NUM_THREADS=n[,m]*||OMP_NUM_THREADS=8||Sets the maximum number of threads per nesting level to 8.||1|
|OMP_STACKSIZE=[B|K|M|G]||OMP_STACKSIZE=8M||Sets the size for the private stack of each worker thread to 8MB. Possible values for type are B(Bytes), K(KB), M(MB), and G(GB).||4M|
|OMP_SCHEDULE=type[,chunk]||OMP_SCHEDULE=DYNAMIC||Sets the default run-time schedule type to DYNAMIC. Possible values for type are STATIC, DYNAMIC, GUIDED, and AUTO.||STATIC|
|OMP_DYNAMIC||OMP_DYNAMIC=TRUE||Enable dynamic adjustment of number of threads.||FALSE|
|OMP_NESTED||OMP_NESTED=TRUE||Enable nested OpenMP regions.||FALSE|
|OMP_DISPLAY_ENV=val||OMP_DISPLAY_ENV=VERBOSE||Instruct the OpenMP runtime to display OpenMP version and environmental variables in verbose form. Possible values are TRUE, FALSE, VERBOSE.||FALSE|
Example 1: set number of threads to 8 and set the stack size for workers thread to 16MB. Note; insufficient stack size is a common reason of run-time crashes of OpenMP programs.
-bash-4.1$ export OMP_NUM_THREADS=8 -bash-4.1$ export OMP_STACKSIZE=16M -bash-4.1$ ./myprog.x
Example 2: enable nested parallel regions and set the number of threads to use for first nesting level to 4 and second nesting level to 2
-bash-4.1$ export OMP_NESTED=true -bash-4.1$ export OMP_NUM_THREADS=4,2 -bash-4.1$ ./myprog.x
Example 3: set maximum number of threads to use to 16, but let run time decide how many threads will actually be used in order to optimize the use of system resources
-bash-4.1$ export OMP_DYNAMIC=true -bash-4.1$ export OMP_NUM_THREADS=16 -bash-4.1$ ./myprog.x
Example 4: change the default scheduling type to dynamic with chunk size of 100.
-bash-4.1$ export OMP_SCHEDULE="dynamic,100" -bash-4.1$ export OMP_NUM_THREADS=16 -bash-4.1$ ./myprog.x
The following tables shows some more advanced environmental variables that can be used to control where OpenMP threads will actually be placed
|Env var||Description||Default value|
|KMP_AFFINITY||binds OpenMP threads to physical threads.|
|OMP_PLACES||Defines an ordered list of places where threads can execute. Every place is a set of hardware (HW) threads. Can be defined as an explicit list of places described by nonnegative numbers or an abstract name. Abstract name can be 'threads' (every place consists of exactly one hw thread), 'cores' (every place contains all the HW threads of the core), 'socket' (every places contains all the HW threads of the socket)||'threads'|
|OMP_PROC_BIND||Sets the thread affinity policy to be used for parallel regions at the corresponding nesting level. Acceptable values are true, false, or a comma separated list, each element of which is one of the following values: master (all threads will be bound to same place as master thread), close (all threads will be bound to successive places close to place of master thread), spread (all threads will be distributed among the places evenly). NOTE: if both OMP_PROC_BIND and KMP_AFFINITY are set the latter will take precedence||'false'|
Example 1: Suppose node with two sockets, each with 8 cores. Program, with nesting level 2, put outer level threads on different sockets, inner level threads on same socket as master.
-bash-4.1$ export OMP_NESTED=true -bash-4.1$ export OMP_NUM_THREADS=2,8 -bash-4.1$ export OMP_PLACES="sockets" -bash-4.1$ export OMP_PROC_BIND="spread,master" -bash-4.1$ ./myprog.x