Hprc banner tamu.png

Difference between revisions of "Ada:Compile:OpenMP"

From TAMU HPRC
Jump to: navigation, search
(OpenMP Programs)
(OpenMP Programs)
Line 17: Line 17:
 
|}
 
|}
  
Examples:
+
'''Examples:'''
 
  [ netID@cluster ~]$ '''icc -qopenmp -o ''myprog.x myprog.c'''''
 
  [ netID@cluster ~]$ '''icc -qopenmp -o ''myprog.x myprog.c'''''
 
  [ netID@cluster ~]$ '''ifort -qopenmp ''myprog.x myprog.f90'''''
 
  [ netID@cluster ~]$ '''ifort -qopenmp ''myprog.x myprog.f90'''''

Revision as of 16:49, 31 January 2017

OpenMP Programs

Compiling OpenMP code

To compile program containing OpenMP parallel directives the following flags can be used to create multi-threaded versions:

Flag Description
-qopenmp Enables parallelizer to generate multi-threaded code.
-qopenmp-stubs Enables compilation of OpenMP programs in sequential mode.

Examples:

[ netID@cluster ~]$ icc -qopenmp -o myprog.x myprog.c
[ netID@cluster ~]$ ifort -qopenmp myprog.x myprog.f90
[ netID@cluster ~]$ ifort -qopenmp-stubs -o myprog.x myprog.f90

Running OpenMP code

The table below shows some of the more common environmental variables that can be used to affect OpenMP behavior at run time.

Environment Variable Example Example-Purpose Default value
OMP_NUM_THREADS=n[,m]* OMP_NUM_THREADS=8 Sets the maximum number of threads per nesting level to 8. 1
OMP_STACKSIZE=[B|K|M|G] OMP_STACKSIZE=8M Sets the size for the private stack of each worker thread to 8MB. Possible values for type are B(Bytes), K(KB), M(MB), and G(GB). 4M
OMP_SCHEDULE=type[,chunk] OMP_SCHEDULE=DYNAMIC Sets the default run-time schedule type to DYNAMIC. Possible values for type are STATIC, DYNAMIC, GUIDED, and AUTO. STATIC
OMP_DYNAMIC OMP_DYNAMIC=TRUE Enable dynamic adjustment of number of threads. FALSE
OMP_NESTED OMP_NESTED=TRUE Enable nested OpenMP regions. FALSE
OMP_DISPLAY_ENV=val OMP_DISPLAY_ENV=VERBOSE Instruct the OpenMP runtime to display OpenMP version and environmental variables in verbose form. Possible values are TRUE, FALSE, VERBOSE. FALSE

Example 1: set number of threads to 8 and set the stack size for workers thread to 16MB. Note; insufficient stack size is a common reason of run-time crashes of OpenMP programs.

-bash-4.1$ export OMP_NUM_THREADS=8
-bash-4.1$ export OMP_STACKSIZE=16M
-bash-4.1$ ./myprog.x

Example 2: enable nested parallel regions and set the number of threads to use for first nesting level to 4 and second nesting level to 2

-bash-4.1$ export OMP_NESTED=true
-bash-4.1$ export OMP_NUM_THREADS=4,2
-bash-4.1$ ./myprog.x

Example 3: set maximum number of threads to use to 16, but let run time decide how many threads will actually be used in order to optimize the use of system resources

-bash-4.1$ export OMP_DYNAMIC=true
-bash-4.1$ export OMP_NUM_THREADS=16
-bash-4.1$ ./myprog.x

Example 4: change the default scheduling type to dynamic with chunk size of 100.

-bash-4.1$ export OMP_SCHEDULE="dynamic,100"
-bash-4.1$ export OMP_NUM_THREADS=16
-bash-4.1$ ./myprog.x

The following tables shows some more advanced environmental variables that can be used to control where OpenMP threads will actually be placed


Env var Description Default value
KMP_AFFINITY binds OpenMP threads to physical threads.
OMP_PLACES Defines an ordered list of places where threads can execute. Every place is a set of hardware (HW) threads. Can be defined as an explicit list of places described by nonnegative numbers or an abstract name. Abstract name can be 'threads' (every place consists of exactly one hw thread), 'cores' (every place contains all the HW threads of the core), 'socket' (every places contains all the HW threads of the socket) 'threads'
OMP_PROC_BIND Sets the thread affinity policy to be used for parallel regions at the corresponding nesting level. Acceptable values are true, false, or a comma separated list, each element of which is one of the following values: master (all threads will be bound to same place as master thread), close (all threads will be bound to successive places close to place of master thread), spread (all threads will be distributed among the places evenly). NOTE: if both OMP_PROC_BIND and KMP_AFFINITY are set the latter will take precedence 'false'

Example 1: Suppose node with two sockets, each with 8 cores. Program, with nesting level 2, put outer level threads on different sockets, inner level threads on same socket as master.

-bash-4.1$ export OMP_NESTED=true
-bash-4.1$ export OMP_NUM_THREADS=2,8
-bash-4.1$ export OMP_PLACES="sockets"
-bash-4.1$ export OMP_PROC_BIND="spread,master"
-bash-4.1$ ./myprog.x