Hprc banner tamu.png

Difference between revisions of "Ada:More Examples"

From TAMU HPRC
Jump to: navigation, search
(Example Job 1 (Serial))
(Example Job 5 (MPMD))
 
(13 intermediate revisions by one other user not shown)
Line 1: Line 1:
==More Ada Examples==
+
=More Ada Examples=
 
__TOC__
 
__TOC__
 
The following five job scripts with corresponding program source, illustrate a common variety of computation: serial, OpenMP threads, MPI, MPI-OpenMP hybrid, and MPMD
 
The following five job scripts with corresponding program source, illustrate a common variety of computation: serial, OpenMP threads, MPI, MPI-OpenMP hybrid, and MPMD
Line 7: Line 7:
 
its simplicity allows us to focus better on the interaction between the parameters of batch and those of the programming models.<br>
 
its simplicity allows us to focus better on the interaction between the parameters of batch and those of the programming models.<br>
  
===Example Job 1 (Serial)===
+
==Example Job 1 (Serial)==
  
 
The following job will be run on a single core, of any available node, barring those that have 1TB or 2TB of memory.
 
The following job will be run on a single core, of any available node, barring those that have 1TB or 2TB of memory.
Line 49: Line 49:
 
</div>
 
</div>
  
===Example Job 2 (OpenMP)===
+
==Example Job 2 (OpenMP)==
  
 
This job will run 20 OpenMP threads (OMP_NUM_THREADS=20) on 20 (-n 20) cores, all on the same node (ptile=20) .
 
This job will run 20 OpenMP threads (OMP_NUM_THREADS=20) on 20 (-n 20) cores, all on the same node (ptile=20) .
 +
#BSUB -J omp_helloWorld
 +
#BSUB -L /bin/bash
 +
#BSUB -W 20
 +
#BSUB -n 20
 +
#BSUB -R 'rusage[mem=300] span[ptile=20]'
 +
#BSUB -M 300
 +
#BSUB -o omp_helloWorld.%J
 +
 +
''# Set up environment''
 +
ml purge
 +
ml intel/2015B
 +
ml
 +
 +
''# Compile and run omp_helloWorld.exe''
 +
ifort -openmp -o omp_helloWorld.exe omp_helloWorld.f90
 +
export OMP_NUM_THREADS=20          ''# Set number of OpenMP threads to 20''
 +
./omp_helloWorld.exe              ''# Run the program''
  
<pre>
 
#BSUB -n 20 -R 'rusage[mem=300] span[ptile=20]' -M 300
 
#BSUB -J omp_helloWorld -o omp_helloWorld.%J -L /bin/bash -W 20
 
#
 
module load intel
 
#
 
ifort -openmp -o omp_helloWorld.exe omp_helloWorld.f90
 
#
 
export OMP_NUM_THREADS=20; ./omp_helloWorld.exe
 
</pre>
 
 
<div class="toccolours mw-collapsible mw-collapsed"  style="width:600px">
 
<div class="toccolours mw-collapsible mw-collapsed"  style="width:600px">
 
Source code omp_helloWorld.f90
 
Source code omp_helloWorld.f90
Line 100: Line 107:
 
</div>
 
</div>
  
===Example Job 3 (MPI)===
+
==Example Job 3 (MPI)==
  
 
Here the job runs an mpi program on 12 cores/jobslots (-n 12), across three different nodes (ptile=4). Note
 
Here the job runs an mpi program on 12 cores/jobslots (-n 12), across three different nodes (ptile=4). Note
Line 107: Line 114:
 
order to avoid confusion with the -n of the BSUB directive.
 
order to avoid confusion with the -n of the BSUB directive.
  
<pre>
+
#BSUB -J mpi_helloWorld
#BSUB -n 12 -R 'rusage[mem=150] span[ptile=4]' -M 150
+
#BSUB -L /bin/bash
#BSUB -J mpi_helloWorld -o mpi_helloWorld.%J -L /bin/bash -W 20
+
#BSUB -W 20
#
+
#BSUB -n 12  
module load intel
+
#BSUB -R 'rusage[mem=150] span[ptile=4]'  
#
+
#BSUB -M 150
mpiifort -o mpi_helloWorld.exe mpi_helloWorld.f90
+
#BSUB -o mpi_helloWorld.%J  
#
+
mpiexec.hydra -np 12 ./mpi_helloWorld.exe
+
''# Set up environment''
</pre>
+
ml purge
 +
ml intel/2015B
 +
ml
 +
 +
''# Compile and run '''mpi_helloWorld.exe'''''
 +
mpiifort -o mpi_helloWorld.exe mpi_helloWorld.f90
 +
mpiexec.hydra -np 12 ./mpi_helloWorld.exe ''# The -np setting must match the number of job slots, Run the program''
 +
 
 
<div class="toccolours mw-collapsible mw-collapsed"  style="width:600px">
 
<div class="toccolours mw-collapsible mw-collapsed"  style="width:600px">
 
Source code mpi_helloWorld.f90
 
Source code mpi_helloWorld.f90
Line 151: Line 165:
 
</div>
 
</div>
  
===Example Job 4 (MPI-OpenMP Hybrid)===
+
==Example Job 4 (MPI-OpenMP Hybrid)==
  
 
This job runs an MPI-OpenMP program on 8 jobslots with 4 of them allocated per node. That is, the
 
This job runs an MPI-OpenMP program on 8 jobslots with 4 of them allocated per node. That is, the
Line 160: Line 174:
 
that 8 jobslots = 4 OMP threads on 1st node + 4 OMP threads on 2nd node.
 
that 8 jobslots = 4 OMP threads on 1st node + 4 OMP threads on 2nd node.
  
<pre>
+
#BSUB -J mpi_omp_helloWorld
#BSUB -n 8 -R 'rusage[mem=150] span[ptile=4]' -M 150
+
#BSUB -L /bin/bash
#BSUB -J mpi_omp_helloWorld -o mpi_omp_helloWorld.%J -L /bin/bash -W 20
+
#BSUB -W 20
#
+
#BSUB -n 8  
module load intel
+
#BSUB -R 'rusage[mem=150] span[ptile=4]'  
#
+
#BSUB -M 150
mpiifort -openmp -o mpi_omp_helloWorld.exe mpi_omp_helloWorld.f90
+
#BSUB -o mpi_omp_helloWorld.%J  
#
+
export OMP_NUM_THREADS=4
+
''# Set up environment''
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0   # Needed to respect perhost request
+
ml purge
mpiexec.hydra -np 2 -perhost 1 ./mpi_omp_helloWorld.exe
+
ml intel/2015B
</pre>
+
ml
 +
 +
''# Compile and run '''mpi__omp_helloWorld.exe'''''
 +
mpiifort -openmp -o mpi_omp_helloWorld.exe mpi_omp_helloWorld.f90
 +
export OMP_NUM_THREADS=4                                   ''# Set number of OpenMP threads to 4''
 +
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0               ''# Needed to respect perhost request
 +
mpiexec.hydra -np 2 -perhost 1 ./mpi_omp_helloWorld.exe   ''# Run the program''
 +
 
 
<div class="toccolours mw-collapsible mw-collapsed"  style="width:600px">
 
<div class="toccolours mw-collapsible mw-collapsed"  style="width:600px">
 
Source code mpi_omp_helloWorld.f90
 
Source code mpi_omp_helloWorld.f90
Line 216: Line 237:
 
</div>
 
</div>
  
===Example Job 5 (MPMD)===
+
==Example Job 5 (MPMD)==
  
 
In MPMD and hybrid MPI-OpenMP jobs you should exercise care that the job slots and ptile BSUB settings
 
In MPMD and hybrid MPI-OpenMP jobs you should exercise care that the job slots and ptile BSUB settings
Line 234: Line 255:
 
Currently the staff is exploring the capabilities of using the LSB_PJL_TASK_GEOMETRY LSF environment variable
 
Currently the staff is exploring the capabilities of using the LSB_PJL_TASK_GEOMETRY LSF environment variable
 
in placing flexibly different MPI processes on different nodes.
 
in placing flexibly different MPI processes on different nodes.
<pre>
+
#BSUB -J mpmd_helloWorld
#BSUB -n 60 -M 150 -x
+
#BSUB -L /bin/bash
#BSUB -R "40*{ select[nxt] rusage[mem=150] span[ptile=20]} + 20*{ select[gpu] rusage[mem=150] span[ptile=10] }"
+
#BSUB -W 20
#BSUB -q staff -J mpmd_helloWorld -o mpmd_helloWorld.%J -L /bin/bash -W 20
+
#BSUB -n 60
#
+
#BSUB -R "40*{ select[nxt] rusage[mem=150] span[ptile=20]} + 20*{ select[gpu] rusage[mem=150] span[ptile=10] }"
# 1st Case : Runs in the MPMD model with the same hybrid executables: mpi_helloWorld.exe & mpi_omp_helloWorld.exe
+
#BSUB -M 150
# The first hybrid runs 2 MPI processes, 1 per node at 20 threads per MPI process. This accounts
+
#BSUB -x
# for 40 job slots placed in 2 nodes. The second hybrid executable runs also 2 MPI processes, 1 per
+
#BSUB -o mpmd_helloWorld.%J
# node, but with 10 threads per MPI process node. The role of the perhost option is critical here.
+
# So here we make use of all 4 nodes that the BSUB directives ask.
+
#
#
+
# 1st Case : Runs in the MPMD model with the same hybrid executables: mpi_helloWorld.exe & mpi_omp_helloWorld.exe
# 2nd Case: Runs in the MPMD model 1 pure MPI and 1 hybrid executable: mpi_helloWorld.exe & mpi_omp_helloWorld.exe.
+
# The first hybrid runs 2 MPI processes, 1 per node at 20 threads per MPI process. This accounts
# The first executable runs 40 MPI processes, the second runs 2 MPI process, each one spawning 10 threads.
+
# for 40 job slots placed in 2 nodes. The second hybrid executable runs also 2 MPI processes, 1 per
# All in all this does account for 60 job slots. Unfortunately, 20 jobslots are now mapped onto 1 node only,
+
# node, but with 10 threads per MPI process node. The role of the perhost option is critical here.
# not 2. This is mostly due to the fact that we have not been able to place MPI processes on a node by using a
+
# So here we make use of all 4 nodes that the BSUB directives ask.
# "local" option. (-perhost ### is a global option)  
+
#
#
+
# 2nd Case: Runs in the MPMD model 1 pure MPI and 1 hybrid executable: mpi_helloWorld.exe & mpi_omp_helloWorld.exe.
module load intel
+
# The first executable runs 40 MPI processes, the second runs 2 MPI process, each one spawning 10 threads.
echo -e "\n\n ***** 1st MPMD Run ****** 1st MPMD Run ****** 1st MPMD Run ******\n\n"
+
# All in all this does account for 60 job slots. Unfortunately, 20 jobslots are now mapped onto 1 node only,
export MP_LABELIO="YES"
+
# not 2. This is mostly due to the fact that we have not been able to place MPI processes on a node by using a
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0    # Needed to respect perhost request
+
# "local" option. (-perhost ### is a global option)  
#
+
#
mpiexec.hydra -perhost 1 -np 2 -env OMP_NUM_THREADS 20 ./mpi_omp_helloWorld.exe : \
+
                        -np 2 -env OMP_NUM_THREADS 10 ./mpi_omp_helloWorld.exe
+
''# Set up environment''
#
+
ml purge
sleep 10; echo -e "\n\n ***** 2nd MPMD Run ****** 2nd MPMD Run ****** 2nd MPMD Run ******\n\n"
+
ml intel/2015B
#
+
ml 
export OMP_NUM_THREADS=10
+
#
+
''# Start first case''
mpiexec.hydra -np 40 ./mpi_helloWorld.exe : -np 2 ./mpi_omp_helloWorld.exe
+
echo -e "\n\n ***** 1st MPMD Run ****** 1st MPMD Run ****** 1st MPMD Run ******\n\n"
#
+
export MP_LABELIO="YES"
</pre>
+
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0    ''# Needed to respect perhost request''
 +
 +
''# Run the program''
 +
mpiexec.hydra -perhost 1 -np 2 -env OMP_NUM_THREADS 20 ./mpi_omp_helloWorld.exe : \
 +
                          -np 2 -env OMP_NUM_THREADS 10 ./mpi_omp_helloWorld.exe
 +
 +
''# Start second case''
 +
sleep 10; echo -e "\n\n ***** 2nd MPMD Run ****** 2nd MPMD Run ****** 2nd MPMD Run ******\n\n"
 +
export OMP_NUM_THREADS=10
 +
 +
''# Run the program''
 +
mpiexec.hydra -np 40 ./mpi_helloWorld.exe : -np 2 ./mpi_omp_helloWorld.exe
  
  
 
[[ Category:Ada ]]
 
[[ Category:Ada ]]

Latest revision as of 15:52, 7 November 2017

More Ada Examples

The following five job scripts with corresponding program source, illustrate a common variety of computation: serial, OpenMP threads, MPI, MPI-OpenMP hybrid, and MPMD (Multiple-Program-Multiple-Data). Observe the relationship of the different resource (-R) options and settings, but especially note the effect of the ptile setting. We use the old standby, helloWorld, program/codelet, each time in the guise of the appropriate programming model, because its simplicity allows us to focus better on the interaction between the parameters of batch and those of the programming models.

Example Job 1 (Serial)

The following job will be run on a single core, of any available node, barring those that have 1TB or 2TB of memory. The code(let) illustrates one way of capturing inside a program the value of an environment variable (here, LSB_HOSTS).

#BSUB -J serial_helloWorld 
#BSUB -L /bin/bash 
#BSUB -W 20
#BSUB -n 1 
#BSUB -R 'rusage[mem=150] span[ptile=1]' 
#BSUB -M 150
#BSUB -o serial_helloWorld.%J 

# Set up the environment
ml purge           # module abbreviated as ml
ml intel/2015B     # 2015B is the richest version on Ada, module load abbreviated as ml
ml                 # module list abbreviated as ml 

# Compile and run serial_helloWorld.exe
ifort -o serial_helloWorld.exe serial_helloWorld.f90
./serial_helloWorld.exe

Source code serial_helloWorld.f90

Program Serial_Hello_World
! By SC TAMU staff:  "upgraded" from 5 years ago for Ada
! ifort -o serial_helloWorld.exe serial_helloWorld.f90
! ./serial_helloWorld.exe
!------------------------------------------------------------------
character (len=20)  ::  host_name='LSB_HOSTS', host_name_val
integer   (KIND=4)  :: sz, status
!
call get_environment_variable (host_name, host_name_val, sz, status, .true.)
!
print *,'- Helloo World: node ', trim(adjustl(host_name_val)),' - '
!
end program Serial_Hello_World

Example Job 2 (OpenMP)

This job will run 20 OpenMP threads (OMP_NUM_THREADS=20) on 20 (-n 20) cores, all on the same node (ptile=20) .

#BSUB -J omp_helloWorld 
#BSUB -L /bin/bash 
#BSUB -W 20
#BSUB -n 20 
#BSUB -R 'rusage[mem=300] span[ptile=20]' 
#BSUB -M 300
#BSUB -o omp_helloWorld.%J 

# Set up environment
ml purge
ml intel/2015B
ml 

# Compile and run omp_helloWorld.exe
ifort -openmp -o omp_helloWorld.exe omp_helloWorld.f90
export OMP_NUM_THREADS=20          # Set number of OpenMP threads to 20
./omp_helloWorld.exe               # Run the program

Source code omp_helloWorld.f90

Program Hello_World_omp
! By SC TAMU staff:  "upgraded" from 5 years ago for Ada
! ifort -openmp -o omp_helloWorld.exe omp_helloWorld.f90
! ./omp_helloWorld.exe 
!---------------------------------------------------------------------
USE OMP_LIB
character (len=20)  :: host_name='LSB_HOSTS', host_name_val
integer   (KIND=4)  :: sz, status
!
character (len=4)   :: omp_id_str, omp_np_str
integer   (KIND=4)  :: omp_id, omp_np
!
call get_environment_variable (host_name, host_name_val, sz, status, .true.)
!
!$OMP PARALLEL PRIVATE(omp_id, omp_np, myid_str, omp_id_str, omp_np_str)
!
omp_id = OMP_GET_THREAD_NUM(); omp_np = OMP_GET_NUM_THREADS()
!
! Internal writes convert binary integers to numeric strings so that output
! from print is more tidy.
write (myid_str, '(I4)') myid; write(omp_id_str, '(I4)') omp_id
write (omp_np_str, '(I4)') omp_np
!
print *,'- Helloo World: node ', trim(adjustl(host_name_val)),' THREAD_ID ', &
trim(adjustl(omp_id_str)), ' out of ',trim(adjustl(omp_np_str)),' OMP threads -'
!
!$OMP END PARALLEL
!
end program Hello_World_omp

Example Job 3 (MPI)

Here the job runs an mpi program on 12 cores/jobslots (-n 12), across three different nodes (ptile=4). Note that in this case, the -np 12 setting on the mpi launcher, mpiexec.hydra, command must match the number of jobslots. mpiexec.hydra accepts -n and -np as the same thing. We opted to use the -np alias in order to avoid confusion with the -n of the BSUB directive.

#BSUB -J mpi_helloWorld 
#BSUB -L /bin/bash 
#BSUB -W 20
#BSUB -n 12 
#BSUB -R 'rusage[mem=150] span[ptile=4]' 
#BSUB -M 150
#BSUB -o mpi_helloWorld.%J 

# Set up environment
ml purge
ml intel/2015B
ml 

# Compile and run mpi_helloWorld.exe
mpiifort -o mpi_helloWorld.exe mpi_helloWorld.f90
mpiexec.hydra -np 12 ./mpi_helloWorld.exe  # The -np setting must match the number of job slots, Run the program

Source code mpi_helloWorld.f90

Program Hello_World_mpi
! By SC TAMU staff:  "upgraded" from 5 years ago for Ada
! mpiifort -o mpi_helloWorld.exe mpi_helloWorld.f90
! mpiexec.hydra -n 2 ./mpi_helloWorld.exe 
!----------------------------------------------------------------
USE MPI
character (len=MPI_MAX_PROCESSOR_NAME) host_name
character (len=4)   :: myid_str
integer   (KIND=4)  :: np, myid, host_name_len, ierr
!
call MPI_INIT(ierr)
if (ierr /= MPI_SUCCESS) STOP '-- MPI_INIT ERROR --'
!
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, np, ierr)
!
call MPI_GET_PROCESSOR_NAME(host_name, host_name_len, ierr) ! Returns node/host name
!
! Internal write to convert binary integer (myid) to numeric string so that print line is tidy.
write (myid_str, '(I4)') myid
!
print *,'- Helloo World: node ', trim(adjustl(host_name)),' MPI process # ', myid_str, ' -'
!
call MPI_FINALIZE(ierr)
!
end program Hello_World_mpi

Example Job 4 (MPI-OpenMP Hybrid)

This job runs an MPI-OpenMP program on 8 jobslots with 4 of them allocated per node. That is, the job will run on 2 nodes, one mpi process per node. The latter is accomplished via the -np 2 -perhost 1 settings on the mpiexec.hydra command. It is a quirky thing of the INTEL MPI launcher, mpiexec.hydra, that in order to enforce the -perhost 1 requirement, one must also use the ugly I_MPI_JOB ...PLACEMENT=0. Finally, because of the export OMP_NUM_THREADS=4, each mpi process spawns 4 OpenMP threads. Note, that 8 jobslots = 4 OMP threads on 1st node + 4 OMP threads on 2nd node.

#BSUB -J mpi_omp_helloWorld 
#BSUB -L /bin/bash 
#BSUB -W 20
#BSUB -n 8 
#BSUB -R 'rusage[mem=150] span[ptile=4]' 
#BSUB -M 150
#BSUB -o mpi_omp_helloWorld.%J 

# Set up environment
ml purge
ml intel/2015B
ml 

# Compile and run mpi__omp_helloWorld.exe
mpiifort -openmp -o mpi_omp_helloWorld.exe mpi_omp_helloWorld.f90
export OMP_NUM_THREADS=4                                   # Set number of OpenMP threads to 4
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0               # Needed to respect perhost request
mpiexec.hydra -np 2 -perhost 1 ./mpi_omp_helloWorld.exe    # Run the program

Source code mpi_omp_helloWorld.f90

Program Hello_World_mpi_omp
! By SC TAMU staff:  "upgraded" from 5 years ago for Ada
! mpiifort -openmp -o mpi_omp_helloWorld.exe mpi_omp_helloWorld.f90
! mpiexec.hydra -n 2 ./mpi_omp_helloWorld.exe 
!-----------------------------------------------------------------
USE MPI 
USE OMP_LIB
character (len=MPI_MAX_PROCESSOR_NAME) host_name
character (len=4)   :: omp_id_str, omp_np_str, myid_str
integer   (KIND=4)  :: np, myid, host_name_len, ierr, omp_id, omp_np
!
call MPI_INIT(ierr)
if (ierr /= MPI_SUCCESS) STOP '-- MPI_INIT ERROR --'
!
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, np, ierr)
!
call MPI_GET_PROCESSOR_NAME(host_name, host_name_len, ierr) ! Returns node/host name
!
!$OMP PARALLEL PRIVATE(omp_id, omp_np, myid_str, omp_id_str, omp_np_str)
!
omp_id = OMP_GET_THREAD_NUM(); omp_np = OMP_GET_NUM_THREADS()
!
! Binary integer to string internal converts so that "print" line is more tidy.
write (myid_str, '(I4)') myid; write(omp_id_str, '(I4)') omp_id
write (omp_np_str, '(I4)') omp_np
!
print *,'- Helloo World: node ', trim(adjustl(host_name)),' MPI process # ', &
trim(adjustl(myid_str)),' THREAD_ID ', trim(adjustl(omp_id_str)), &
' of ',trim(adjustl(omp_np_str)),' OMP threads -'
!
!$OMP END PARALLEL
!
call MPI_FINALIZE(ierr)
!
end program Hello_World_mpi_omp 

Example Job 5 (MPMD)

In MPMD and hybrid MPI-OpenMP jobs you should exercise care that the job slots and ptile BSUB settings are consistent with the relevant parameters specified on the mpiexec.hydra command or whatever other application program you happen to use.

We carry out two mpmd runs here. In both, note how one passes (locally) different environment variables to different executables. Observe also that in both mpiexec.hydra runs the total number of execution threads is 60 (=number of jobslots):

  • 1st run: (2 MPI processes * 20 OpenMP threads per MPI process) + (2 MPI processes * 10 OpenMP threads per MPI process) = 60
  • 2nd run: ( 40 MPI processes } + { 2 MPI processes * 10 OpenMP threads per MPI process } = 60

Note, however, that against our expectation, in the 2nd run the 2 MPI processes (10 threads each) do not launch on separate nodes but on one, thus wasting idle a whole node. This run uses only 3 nodes. Nonetheless, this example is useful because it illustrates that process placement in a multi-node run can be tricky.

Currently the staff is exploring the capabilities of using the LSB_PJL_TASK_GEOMETRY LSF environment variable in placing flexibly different MPI processes on different nodes.

#BSUB -J mpmd_helloWorld
#BSUB -L /bin/bash 
#BSUB -W 20
#BSUB -n 60
#BSUB -R "40*{ select[nxt] rusage[mem=150] span[ptile=20]} + 20*{ select[gpu] rusage[mem=150] span[ptile=10] }"
#BSUB -M 150
#BSUB -x
#BSUB -o mpmd_helloWorld.%J

#
# 1st Case : Runs in the MPMD model with the same hybrid executables: mpi_helloWorld.exe & mpi_omp_helloWorld.exe
# The first hybrid runs 2 MPI processes, 1 per node at 20 threads per MPI process. This accounts
# for 40 job slots placed in 2 nodes. The second hybrid executable runs also 2 MPI processes, 1 per
# node, but with 10 threads per MPI process node. The role of the perhost option is critical here.
# So here we make use of all 4 nodes that the BSUB directives ask.
#
# 2nd Case: Runs in the MPMD model 1 pure MPI and 1 hybrid executable: mpi_helloWorld.exe & mpi_omp_helloWorld.exe.
# The first executable runs 40 MPI processes, the second runs 2 MPI process, each one spawning 10 threads.
# All in all this does account for 60 job slots. Unfortunately, 20 jobslots are now mapped onto 1 node only,
# not 2. This is mostly due to the fact that we have not been able to place MPI processes on a node by using a
# "local" option. (-perhost ### is a global option) 
#

# Set up environment
ml purge
ml intel/2015B
ml  

# Start first case
echo -e "\n\n ***** 1st MPMD Run ****** 1st MPMD Run ****** 1st MPMD Run ******\n\n"
export MP_LABELIO="YES"
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0    # Needed to respect perhost request

# Run the program
mpiexec.hydra -perhost 1 -np 2 -env OMP_NUM_THREADS 20 ./mpi_omp_helloWorld.exe : \
                         -np 2 -env OMP_NUM_THREADS 10 ./mpi_omp_helloWorld.exe

# Start second case
sleep 10; echo -e "\n\n ***** 2nd MPMD Run ****** 2nd MPMD Run ****** 2nd MPMD Run ******\n\n"
export OMP_NUM_THREADS=10

# Run the program
mpiexec.hydra -np 40 ./mpi_helloWorld.exe : -np 2 ./mpi_omp_helloWorld.exe