Ada:More Examples
More Ada Examples
Contents
The following five job scripts with corresponding program source, illustrate a common variety of computation: serial, OpenMP threads, MPI, MPI-OpenMP hybrid, and MPMD
(Multiple-Program-Multiple-Data). Observe the relationship of the
different resource (-R) options and settings, but especially note the effect of the ptile setting. We use the old standby, helloWorld, program/codelet,
each time in the guise of the appropriate programming model, because
its simplicity allows us to focus better on the interaction between the parameters of batch and those of the programming models.
Example Job 1 (Serial)
The following job will be run on a single core, of any available node, barring those that have 1TB or 2TB of memory. The code(let) illustrates one way of capturing inside a program the value of an environment variable (here, LSB_HOSTS).
#BSUB -J serial_helloWorld #BSUB -L /bin/bash #BSUB -W 20 #BSUB -n 1 #BSUB -R 'rusage[mem=150] span[ptile=1]' #BSUB -M 150 #BSUB -o serial_helloWorld.%J # Set up the environment ml purge # module abbreviated as ml ml intel/2015B # 2015B is the richest version on Ada, module load abbreviated as ml ml # module list abbreviated as ml # Compile and run serial_helloWorld.exe ifort -o serial_helloWorld.exe serial_helloWorld.f90 ./serial_helloWorld.exe
Source code serial_helloWorld.f90
Program Serial_Hello_World ! By SC TAMU staff: "upgraded" from 5 years ago for Ada ! ifort -o serial_helloWorld.exe serial_helloWorld.f90 ! ./serial_helloWorld.exe !------------------------------------------------------------------ character (len=20) :: host_name='LSB_HOSTS', host_name_val integer (KIND=4) :: sz, status ! call get_environment_variable (host_name, host_name_val, sz, status, .true.) ! print *,'- Helloo World: node ', trim(adjustl(host_name_val)),' - ' ! end program Serial_Hello_World
Example Job 2 (OpenMP)
This job will run 20 OpenMP threads (OMP_NUM_THREADS=20) on 20 (-n 20) cores, all on the same node (ptile=20) .
#BSUB -n 20 -R 'rusage[mem=300] span[ptile=20]' -M 300 #BSUB -J omp_helloWorld -o omp_helloWorld.%J -L /bin/bash -W 20 # module load intel # ifort -openmp -o omp_helloWorld.exe omp_helloWorld.f90 # export OMP_NUM_THREADS=20; ./omp_helloWorld.exe
Source code omp_helloWorld.f90
Program Hello_World_omp ! By SC TAMU staff: "upgraded" from 5 years ago for Ada ! ifort -openmp -o omp_helloWorld.exe omp_helloWorld.f90 ! ./omp_helloWorld.exe !--------------------------------------------------------------------- USE OMP_LIB character (len=20) :: host_name='LSB_HOSTS', host_name_val integer (KIND=4) :: sz, status ! character (len=4) :: omp_id_str, omp_np_str integer (KIND=4) :: omp_id, omp_np ! call get_environment_variable (host_name, host_name_val, sz, status, .true.) ! !$OMP PARALLEL PRIVATE(omp_id, omp_np, myid_str, omp_id_str, omp_np_str) ! omp_id = OMP_GET_THREAD_NUM(); omp_np = OMP_GET_NUM_THREADS() ! ! Internal writes convert binary integers to numeric strings so that output ! from print is more tidy. write (myid_str, '(I4)') myid; write(omp_id_str, '(I4)') omp_id write (omp_np_str, '(I4)') omp_np ! print *,'- Helloo World: node ', trim(adjustl(host_name_val)),' THREAD_ID ', & trim(adjustl(omp_id_str)), ' out of ',trim(adjustl(omp_np_str)),' OMP threads -' ! !$OMP END PARALLEL ! end program Hello_World_omp
Example Job 3 (MPI)
Here the job runs an mpi program on 12 cores/jobslots (-n 12), across three different nodes (ptile=4). Note that in this case, the -np 12 setting on the mpi launcher, mpiexec.hydra, command must match the number of jobslots. mpiexec.hydra accepts -n and -np as the same thing. We opted to use the -np alias in order to avoid confusion with the -n of the BSUB directive.
#BSUB -n 12 -R 'rusage[mem=150] span[ptile=4]' -M 150 #BSUB -J mpi_helloWorld -o mpi_helloWorld.%J -L /bin/bash -W 20 # module load intel # mpiifort -o mpi_helloWorld.exe mpi_helloWorld.f90 # mpiexec.hydra -np 12 ./mpi_helloWorld.exe
Source code mpi_helloWorld.f90
Program Hello_World_mpi ! By SC TAMU staff: "upgraded" from 5 years ago for Ada ! mpiifort -o mpi_helloWorld.exe mpi_helloWorld.f90 ! mpiexec.hydra -n 2 ./mpi_helloWorld.exe !---------------------------------------------------------------- USE MPI character (len=MPI_MAX_PROCESSOR_NAME) host_name character (len=4) :: myid_str integer (KIND=4) :: np, myid, host_name_len, ierr ! call MPI_INIT(ierr) if (ierr /= MPI_SUCCESS) STOP '-- MPI_INIT ERROR --' ! call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, np, ierr) ! call MPI_GET_PROCESSOR_NAME(host_name, host_name_len, ierr) ! Returns node/host name ! ! Internal write to convert binary integer (myid) to numeric string so that print line is tidy. write (myid_str, '(I4)') myid ! print *,'- Helloo World: node ', trim(adjustl(host_name)),' MPI process # ', myid_str, ' -' ! call MPI_FINALIZE(ierr) ! end program Hello_World_mpi
Example Job 4 (MPI-OpenMP Hybrid)
This job runs an MPI-OpenMP program on 8 jobslots with 4 of them allocated per node. That is, the job will run on 2 nodes, one mpi process per node. The latter is accomplished via the -np 2 -perhost 1 settings on the mpiexec.hydra command. It is a quirky thing of the INTEL MPI launcher, mpiexec.hydra, that in order to enforce the -perhost 1 requirement, one must also use the ugly I_MPI_JOB ...PLACEMENT=0. Finally, because of the export OMP_NUM_THREADS=4, each mpi process spawns 4 OpenMP threads. Note, that 8 jobslots = 4 OMP threads on 1st node + 4 OMP threads on 2nd node.
#BSUB -n 8 -R 'rusage[mem=150] span[ptile=4]' -M 150 #BSUB -J mpi_omp_helloWorld -o mpi_omp_helloWorld.%J -L /bin/bash -W 20 # module load intel # mpiifort -openmp -o mpi_omp_helloWorld.exe mpi_omp_helloWorld.f90 # export OMP_NUM_THREADS=4 export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0 # Needed to respect perhost request mpiexec.hydra -np 2 -perhost 1 ./mpi_omp_helloWorld.exe
Source code mpi_omp_helloWorld.f90
Program Hello_World_mpi_omp ! By SC TAMU staff: "upgraded" from 5 years ago for Ada ! mpiifort -openmp -o mpi_omp_helloWorld.exe mpi_omp_helloWorld.f90 ! mpiexec.hydra -n 2 ./mpi_omp_helloWorld.exe !----------------------------------------------------------------- USE MPI USE OMP_LIB character (len=MPI_MAX_PROCESSOR_NAME) host_name character (len=4) :: omp_id_str, omp_np_str, myid_str integer (KIND=4) :: np, myid, host_name_len, ierr, omp_id, omp_np ! call MPI_INIT(ierr) if (ierr /= MPI_SUCCESS) STOP '-- MPI_INIT ERROR --' ! call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, np, ierr) ! call MPI_GET_PROCESSOR_NAME(host_name, host_name_len, ierr) ! Returns node/host name ! !$OMP PARALLEL PRIVATE(omp_id, omp_np, myid_str, omp_id_str, omp_np_str) ! omp_id = OMP_GET_THREAD_NUM(); omp_np = OMP_GET_NUM_THREADS() ! ! Binary integer to string internal converts so that "print" line is more tidy. write (myid_str, '(I4)') myid; write(omp_id_str, '(I4)') omp_id write (omp_np_str, '(I4)') omp_np ! print *,'- Helloo World: node ', trim(adjustl(host_name)),' MPI process # ', & trim(adjustl(myid_str)),' THREAD_ID ', trim(adjustl(omp_id_str)), & ' of ',trim(adjustl(omp_np_str)),' OMP threads -' ! !$OMP END PARALLEL ! call MPI_FINALIZE(ierr) ! end program Hello_World_mpi_omp
Example Job 5 (MPMD)
In MPMD and hybrid MPI-OpenMP jobs you should exercise care that the job slots and ptile BSUB settings
are consistent with the relevant parameters specified on the mpiexec.hydra command or whatever other
application program you happen to use.
We carry out two mpmd runs here. In both, note how one passes (locally) different environment variables to different executables. Observe also that in both mpiexec.hydra runs the total number of execution threads is 60 (=number of jobslots):
- 1st run: (2 MPI processes * 20 OpenMP threads per MPI process) + (2 MPI processes * 10 OpenMP threads per MPI process) = 60
- 2nd run: ( 40 MPI processes } + { 2 MPI processes * 10 OpenMP threads per MPI process } = 60
Note, however, that against our expectation, in the 2nd run the 2 MPI processes (10 threads each) do not launch on
separate nodes but on one, thus wasting idle a whole node. This run uses only 3 nodes. Nonetheless, this example is useful because
it illustrates that process placement in a multi-node run can be tricky.
Currently the staff is exploring the capabilities of using the LSB_PJL_TASK_GEOMETRY LSF environment variable in placing flexibly different MPI processes on different nodes.
#BSUB -n 60 -M 150 -x #BSUB -R "40*{ select[nxt] rusage[mem=150] span[ptile=20]} + 20*{ select[gpu] rusage[mem=150] span[ptile=10] }" #BSUB -q staff -J mpmd_helloWorld -o mpmd_helloWorld.%J -L /bin/bash -W 20 # # 1st Case : Runs in the MPMD model with the same hybrid executables: mpi_helloWorld.exe & mpi_omp_helloWorld.exe # The first hybrid runs 2 MPI processes, 1 per node at 20 threads per MPI process. This accounts # for 40 job slots placed in 2 nodes. The second hybrid executable runs also 2 MPI processes, 1 per # node, but with 10 threads per MPI process node. The role of the perhost option is critical here. # So here we make use of all 4 nodes that the BSUB directives ask. # # 2nd Case: Runs in the MPMD model 1 pure MPI and 1 hybrid executable: mpi_helloWorld.exe & mpi_omp_helloWorld.exe. # The first executable runs 40 MPI processes, the second runs 2 MPI process, each one spawning 10 threads. # All in all this does account for 60 job slots. Unfortunately, 20 jobslots are now mapped onto 1 node only, # not 2. This is mostly due to the fact that we have not been able to place MPI processes on a node by using a # "local" option. (-perhost ### is a global option) # module load intel echo -e "\n\n ***** 1st MPMD Run ****** 1st MPMD Run ****** 1st MPMD Run ******\n\n" export MP_LABELIO="YES" export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0 # Needed to respect perhost request # mpiexec.hydra -perhost 1 -np 2 -env OMP_NUM_THREADS 20 ./mpi_omp_helloWorld.exe : \ -np 2 -env OMP_NUM_THREADS 10 ./mpi_omp_helloWorld.exe # sleep 10; echo -e "\n\n ***** 2nd MPMD Run ****** 2nd MPMD Run ****** 2nd MPMD Run ******\n\n" # export OMP_NUM_THREADS=10 # mpiexec.hydra -np 40 ./mpi_helloWorld.exe : -np 2 ./mpi_omp_helloWorld.exe #