Hprc banner tamu.png

Difference between revisions of "Ada:Batch Job Submission"

From TAMU HPRC
Jump to: navigation, search
(Common Submission Options)
(tamubatch)
 
(195 intermediate revisions by 10 users not shown)
Line 1: Line 1:
 +
== Job Submission ==
 +
Once you have your job file ready, it is time to submit your job. You can submit your job to LSF with the following command:
 +
[ NetID@ada ~]$ '''bsub < ''MyJob.LSF'''''
 +
Verifying job submission parameters...
 +
Verifying project account...
 +
      Account to charge:  123456789123
 +
          Balance (SUs):      5000.0000
 +
          SUs to charge:        5.0000
 +
Job <12345> is submitted to default queue <sn_regular>.
  
 +
== tamubatch ==
  
== Job Submission: the bsub command ==
+
'''tamubatch''' is an automatic batch job script that submits jobs for the user without the need of writing a batch script on the Ada and Terra clusters. The user just needs to provide the executable commands in a text file and tamubatch will automatically submit the job to the cluster. There are flags that the user may specify which allows control over the parameters for the job submitted.
  
Use the '''bsub''' command to submit a job script as shown below:
+
''tamubatch is still in beta and has not been fully developed. Although there are still bugs and testing issues that are currently being worked on, tamubatch can already submit jobs to both the Ada and Terra clusters if given a file of executable commands. ''
  
<pre>$ bsub &lt; sample.job
+
For more information, visit [https://hprc.tamu.edu/wiki/SW:tamubatch this page.]
Job &lt;138733&gt; is submitted to default queue .</pre>
 
When submitted, the job is assigned a unique id. You may refer to the job using only the numerical portion of its id (eg. 138733) with the various batch system commands.
 
  
The batch facility on Ada is LSF (Load Sharing Facility) from IBM. To submit a job via LSF, a user should submit a job file which specifies submission options, commands to execute, and if needed, the batch queue to submit to.
+
== tamulauncher ==
  
'''A reminder.''' For the purpose of computation in batch mode, the Ada cluster has 837 nodes that are powered by the Ivy Bridge-EP processor and 15 by the Westemere. The Ivy Bridge-EP-based nodes have 20 cpus/cores each, while the Westmere-EX nodes have 40. Compute nodes, Ivy Bridge-EP or Westmere-EX, have difference memory capacities. Note a small portion of memory on each compute node is used by operating systems, not available to user processes. The above is usefull to bear in mind when constructing batch requests.
+
'''tamulauncher''' provides a convenient way to run a large number of serial or multithreaded commands without the need to submit individual jobs or a Job array. User provides a text file containing all commands that need to be executed and tamulauncher will execute the commands concurrently. The number of concurrently executed commands depends on the batch requirements. When tamulauncher is run interactively the number of concurrently executed commands is limited to at most 8. tamulauncher is available on terra, ada, and curie. There is no need to load any module before using tamulauncher. tamulauncher has been successfully tested to execute over 100K commands.
  
== Submission Options ==
+
''tamulauncher is preferred over Job Arrays to submit a large number of individual jobs, especially when the run times of the commands are relatively short. It allows for better utilization of the nodes, puts less burden on the batch scheduler, and lessens interference with jobs of other users on the same node.'' 
  
Resources needed for program execution are specified by submission options in a job file to submit.  
+
For more information, visit [https://hprc.tamu.edu/wiki/SW:tamulauncher#tamulauncher this page.]
  
==== Common Submission Options ====
+
[[ Category:Ada ]]
 
 
Below are the common submission options. These options can be specified as #BSUB options in your job script (recommended). they can also be specified on the command line for the bsub command. The '''bsub''' man page describes the available submission options in more detail.
 
 
 
{| class="wikitable" style="text-align: left;"
 
!Option
 
!Description
 
|-
 
| -J jobname
 
| Name of the job. When used with the -j oe option, the job's output will be directed to a file named jobname.oXXXX where XXXX is the job id.
 
|-
 
| -L login_shell
 
| Shell for interpreting the job script. Recommended shell is /bin/bash.
 
|-
 
| -n X
 
| Number of cores (X) to be assigned to the job
 
|-
 
| -o output_file_name
 
| Specifies the output file name.
 
|-
 
| -P acctno
 
| Specifies the billing account to use for this job. Please consult the [[TAMUSC:AMS | AMS documentation]] for more information.
 
|-
 
| -q queue_name
 
| Directs the submitted job to the queue_name queue. On Ada this option should be exercised only on xlarge queue. Here are more details on [[Ada:Batch_Queues | queues on Ada]].
 
|-
 
| -R "select[r]"
 
| Selects nodes with resource r.
 
|-
 
| -R "span[ptile=n]"
 
| Requests n processors on each host that should be allocated to the job.
 
|-
 
| -u email_address(es)
 
| The email addresses to send mail about the job. Using an external email address (eg. @tamu.edu, @gmail.com, etc.) is recommended.
 
|-
 
| -x
 
| Specifies that LSF will not schedule other jobs on any of the engaged nodes than the present job. This is a useful option when, for issues of performance for example, the sharing of nodes with other jobs is undesirable. Usage per node will be assessed 20 (or 40) * wall_clock_time.
 
|-
 
| -W HH:MM
 
| The limit on how long the job can run.
 
|}
 
 
 
==== Examples of Submission Options ====
 
To stress the importance of specifying resources correctly in BSUB directives and because this specification is a frequent source of error, we first present a number of examples.
 
 
 
'''#BSUB -n 40 -W 2:30'''<br /> or, equivalently:<br /> #BSUB -n 40<br /> #BSUB -W 2:30<br /> This directive will allocate 40 cpus. The duration (wall-clock time) of execution is specified to be a maximum 2 hours and 30 minutes.
 
 
 
'''#BSUB -n 40 -M 20 -W 2:30'''<br /> This directive will allocate 40 cpus, and 20 '''MB''' memory per process (cpu). The duration of execution is specified to be a maximum 2 hours and 30 minutes.
 
 
 
'''#BSUB -n 40 -W -M 20 2:30 -P 012345678'''<br /> This directive will allocate 40 cpus, and 20 '''MB''' memory per process. The duration (wall-clock time) of execution is specified to be a maximum 2 hours and 30 minutes. The number of billing units (BUs) used will be charged against the 0123456788 project
 
 
 
'''#BSUB -n 2 -W 2:30 -P 012345678 -R &quot;select[gpu]&quot;'''<br /> This directive will allocate 2 cores on nodes with GPUs. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
 
 
 
'''#BSUB -n 2 -W 2:30 -P 012345678 -R &quot;select[phi]&quot;'''<br /> This directive will allocate 2 cores on nodes with PHIs. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
 
 
 
'''#BSUB -n 40 -W 2:30 -P 012345678 -R &quot;span[ptile=20]&quot;'''<br /> This directive will allocate 2 nodes and 20 cores on each node. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
 
 
 
'''#BSUB -n 50 -W 2:30 -P 012345678 -R &quot;span[ptile=20]&quot;'''<br /> This directive will allocate 3 nodes: two nodes have 20 cores each, while the third node has 10 cores. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
 
 
 
'''#BSUB -n 5 -W 2:30 -P 012345678 -R &quot;span[ptile=5]&quot; -R &quot;select[mem1tb]&quot; -q xlarge'''<br /> This directive will allocate 5 cpus on a node with 1TB memory. duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The job is submitted to xlarge queue (for using xtra-large memory nodes: 1 or 2 TB nodes). The number of billing units (BUs) used will be charged against the 012345678 project
 
 
 
==== Requesting Specific Node Type ====
 
 
 
The BSUB "select" option specifies the resource type of nodes to run programs. The table below lists the resource types for BSUB "select" option. This does not apply to remote visualization jobs.
 
 
 
{| class="wikitable" style="text-align: left;"
 
! Node Type Needed
 
! Job Parameter to Use
 
|-
 
| General 64GB
 
| N/A
 
|-
 
| 256GB
 
| -R &quot;select[mem256gb]&quot;
 
|-
 
| 1 TB
 
| -R &quot;select[mem1t]&quot; -q xlarge
 
|-
 
| 2 TB
 
| -R &quot;select[mem1t]&quot; -q xlarge
 
|-
 
| PHI
 
| -R &quot;select[phi]&quot;
 
|-
 
| Any GPU
 
| -R &quot;select[gpu]&quot;
 
|-
 
| 64GB GPU
 
| -R &quot;select[gpu64gb]&quot;
 
|-
 
| 256GB GPU
 
| -R &quot;select[gpu256gb]&quot;
 
|}
 
 
 
==== Controlling Locality ====
 
 
 
On Ada's NextScale nodes, you can improve performance by ensuring that the nodes selected for your job are "close" to each other.  This helps to minimize latencies between nodes during communication.  For the best explanation of this, see the section [http://sc.tamu.edu/softwareDocs/lsf/9.1.2/lsf_admin/job_locality_compute_units.html Control job locality using compute units] in the [http://sc.tamu.edu/softwareDocs/lsf/9.1.2/lsf_admin/index.htm Administering IBM Platform LSF] manual available in [http://sc.tamu.edu/softwareDocs/lsf/9.1.2/ our local copy of the LSF documentation]
 
 
 
For Ada's NextScale nodes, we define a "Compute Unit" as all the nodes connected to a single Infiniband switch.  There are 24 nodes, each with 20 cores in each compute unit when means that you can run jobs up to 480 cores with only one "hop" (switch) between each node.  Jobs using more than that number must use at least three hops for nodes on different switches (first to the source nodes switch, then to the core switch and then finally to the destination switch).  Even nodes in the same rack (each rack has three Infiniband switches) will have to travel this distance.
 
 
 
If you are running multinode jobs and are either concerned about
 
* consistency (e.g. for benchmarking), or,
 
* maximum efficiency
 
you should consider making use of the settings for locality.
 
 
 
Be aware, however, that it may take longer before your job can be scheduled.  If you ask for 24 nodes all on one switch, the scheduler will delay your job until that constraint can be met.  If you ask for any 24 nodes, the scheduler may pick one node from each of 24 switches.  Although the latter may run sooner, it will be much more inefficient since every node involved must pass through the core switch to talk to any other node.
 
 
 
For details on syntax, see the link above.  In general, the following two settings may be the most useful:
 
 
 
{| class="wikitable" style="text-align: left;"
 
!Setting
 
!Result
 
|-
 
| -R "cu[pref=maxavail]"
 
| This will select nodes that are on switches that are the least utilized.  This will help to group nodes together to help minimize interswitch communication.  It won't be as efficient as the next setting, but should cut down the amount of time your job has to wait before starting
 
|-
 
| -R "cu[maxcus='''number''']"
 
| This will guarantee that your job will utilize no more than '''number''' of compute units. So, if number=1, you can use up to 480 cores and be sure of the most efficient communication pattern.  With 2, you can go up to 960, which any one node can communicate with 23 nodes in only one hop and the other 24 nodes in three hops
 
|}
 
 
 
Again, see the link above for details. 
 
 
 
Note, that you can also combine settings.  For example,
 
<pre>
 
-R "cu[pref=maxavail:maxcus=3]"
 
</pre>
 
would assign your jobs to the three emptiest switches.  The myriad of options/combinations is too much to document here.  Just keep in mind that by using compute units to minimize communications costs can have a significant impact.
 
 
 
== Batch Job files/Scripts ==
 
 
 
A batch request is expressed through a batch file. That is, a text file, a '''job script''', so called, with appropriate directives and other specifications or commands. A batch file, say, '''sample.job''', consists of the LSF directives section (top part) and the (UNIX) commands section. In the latter you specify all the commands that need to be executed. All LSF directives start with the '''#BSUB''' string.
 
 
 
==== Structure of Job Files/Scripts ====
 
Here is the general layout of a common BSUB jofile.
 
 
 
<pre>#BSUB directive(s)1
 
#BSUB directive(s)2
 
#BSUB ...
 
 
 
#UNIX commands section. From here on down &quot;#&quot; on col 1 starts a comment
 
#&lt;-- at this point $HOME is the current working directory
 
cmd1
 
cmd2
 
...
 
</pre>
 
The UNIX command section is executed on a single node or multiple nodes. Serial and OpenMP programs execute on only one node, MPI programs can use 1-848 nodes. The default current working directory is $HOME. If that is not a practical choice for you, you should explicitly change (cd) to the directory of your choice. Many times a convenient choice is the directory you submit jobs from. BSUB stores that directory's location in the '''LSB_SUBCWD''' environment variable. Also, by default, the executing UNIX shell is the bash shell.<br /> <br /> After that we lay out an example of a complete job file.
 
You submit the job script for execution using the '''bsub &lt; jobfile''' command (see below). BSUB then assigns it a job id, which you can use to track your job's progress through the system. The job's priority relative to other jobs will be determined based on several factors. This priority is used to order the jobs for consideration by the batch system for execution on the compute nodes.
 
 
 
Below is a sample job script for a serial job which requests only one node.
 
<pre>
 
## job name
 
#BSUB -J matrix_serial_job
 
 
 
## send stderr and stdout to the same file
 
#BSUB -o matrix_out.%J
 
 
 
## login shell to avoid copying env from login session
 
## also helps the module function work in batch jobs
 
#BSUB -L /bin/bash
 
 
 
## 30 minutes of walltime ([HH:]MM)
 
#BSUB -W 30
 
 
 
## numprocs
 
#BSUB -n 1
 
 
 
## load intel toolchain
 
module load ictce
 
 
 
time ./matrix.exe
 
 
 
</pre>
 
 
 
Here are [[Ada:Batch_Examples |more example job scripts]].
 
 
 
==== Environment Variables ====
 
 
 
LSF provides some environment variables that can be accessed from a job script. See the '''Bsub''' man page for the complete list of LSF environment variables. Below is a list of common environment variables.
 
 
 
 
 
{| class="wikitable" style="text-align: left;"
 
! Variable
 
! Description
 
|-
 
| $LSB_JOBID
 
| Contains the job id.
 
|-
 
| $LSB_JOB_CWD
 
| Sets the current working directory for job execution.
 
|-
 
| $LSB_SUBCWD
 
| Sets the directory where job submitted .
 
|-
 
| $LSB_HOSTS
 
| Gives a list of compute nodes assigned to the job, one entry per MPI task.  A useful command to capture the contents of the list may be, cat $LSB_HOSTS. '''This environment variable is not available when the list of hostnames is more than 4096 bytes.'''
 
|-
 
| $LSB_MCPU_HOSTS
 
| Gives a string of compute nodes assigned to the job in a compact format. For example, &quot;hostA 20 hostB 10&quot;. This should be used for large groups of nodes.
 
|-
 
| $LSB_DJOB_HOSTFILE
 
| Points to a file containing the hostnames in a format useable by MPI.
 
|}
 
 
 
==Notes==
 
* If you get an error like "DAT: library load failure: libdaplomcm.so.2: cannot open shared object file: No such file or directory" then try adding this to your job file:
 
<pre>
 
export I_MPI_FABRICS='shm:ofa'
 
</pre>
 
 
 
==See Also==
 
*[[Ada:Batch_System_Configuration | Batch System Configuration (from Adadocs)]]
 
*[[Ada:Torque_(Eos)_vs._LSF_(Ada)_Quick_Reference | Torque (Eos) vs. LSF (Ada) Quick Reference (from Adadocs]]
 
* [http://slurm.schedmd.com/rosetta.pdf Another quick reference] - Torque/Slurm/LSF/SGE/LoadLeveler
 
 
 
[[Category:Ada]]
 

Latest revision as of 14:31, 18 June 2020

Job Submission

Once you have your job file ready, it is time to submit your job. You can submit your job to LSF with the following command:

[ NetID@ada ~]$ bsub < MyJob.LSF
Verifying job submission parameters...
Verifying project account...
     Account to charge:   123456789123
         Balance (SUs):      5000.0000
         SUs to charge:         5.0000
Job <12345> is submitted to default queue <sn_regular>.

tamubatch

tamubatch is an automatic batch job script that submits jobs for the user without the need of writing a batch script on the Ada and Terra clusters. The user just needs to provide the executable commands in a text file and tamubatch will automatically submit the job to the cluster. There are flags that the user may specify which allows control over the parameters for the job submitted.

tamubatch is still in beta and has not been fully developed. Although there are still bugs and testing issues that are currently being worked on, tamubatch can already submit jobs to both the Ada and Terra clusters if given a file of executable commands.

For more information, visit this page.

tamulauncher

tamulauncher provides a convenient way to run a large number of serial or multithreaded commands without the need to submit individual jobs or a Job array. User provides a text file containing all commands that need to be executed and tamulauncher will execute the commands concurrently. The number of concurrently executed commands depends on the batch requirements. When tamulauncher is run interactively the number of concurrently executed commands is limited to at most 8. tamulauncher is available on terra, ada, and curie. There is no need to load any module before using tamulauncher. tamulauncher has been successfully tested to execute over 100K commands.

tamulauncher is preferred over Job Arrays to submit a large number of individual jobs, especially when the run times of the commands are relatively short. It allows for better utilization of the nodes, puts less burden on the batch scheduler, and lessens interference with jobs of other users on the same node.

For more information, visit this page.