Ada:Batch Processing LSF
Ada Batch Processing: LSF
- 1 Ada Batch Processing: LSF
- 1.1 Introduction
- 1.2 Building Job files
- 1.3 Job Submission
- 1.4 tamubatch
- 1.5 tamulauncher
- 1.6 Job Tracking
- 1.7 Job File Examples
- 1.8 Queues
- 1.9 Usable Memory for Batch Jobs
- 1.10 Recommended Settings for Large Jobs
- 1.11 Advanced Documentation
The batch system is a load distribution implementation that ensures convenient and fair use of a shared resource. Submitting jobs to a batch system allows a user to reserve specific resources with minimal interference to other users. All users are required to submit resource-intensive processing to the compute nodes through the batch system - attempting to circumvent the batch system is not allowed.
On Ada, LSF is the batch system that provides job management. Jobs written in other batch system formats must be translated to LSF in order to be used on Ada. The Batch Translation Guide offers some assistance for translating between batch systems that TAMU HPRC has previously used.
Building Job files
While not the only method of submitted programs to be executed, job files fulfill the needs of most users.
Job files consist of two main parts:
- Resource Specifications
- Executable commands
In a job file, resource specification options are preceded by a script directive. For each batch system, this directive is different. On Ada (LSF) this directive is #BSUB.
For every line of resource specifications, this directive must be the first text of the line, and all specifications must come before any executable lines. An example of a resource specification is given below:
#BSUB -J MyExample #Set the job name to "MyExample"
Note: Comments in a job file also begin with a # but LSF recognizes #BSUB as a directive.
A list of the most commonly used and important options for these job files are given in the following section of this wiki. Full job file examples are given below.
Basic Job Specifications
Several of the most important options are described below. These basic options are typically all that is needed to run a job on Ada.
|Job Name||-J [SomeText]||-J MyJob1||Set the job name to "MyJob1"|
|Shell||-L [Shell]||-L /bin/bash||Uses the bash shell to initialize|
the job's execution environment.
|Wall Clock Limit||-W [hh:mm]||-W 1:15||Set wall clock limit to 1 hour 15 min|
|Core count||-n ##||-n 20||Assigns 20 job slots/cores.|
|Cores per node||-R "span[ptile=##]"||-R "span[ptile=5]"||Request 5 cores per node.|
|Memory Per Core||-M [MB]||-M 2560||Sets the per process memory limit to 2560 MB.|
|Memory Per Core||-R "rusage[mem=[MB]]"||-R "rusage[mem=2560]"||Schedules job on nodes that have at|
least 2560 MBs available per core.
|Combined stdout and stderr||-o [OutputName].%j||-o stdout1.%j||Collect stdout/err in stdout.[JobID]|
Optional Job Specifications
A variety of optional specifications are available to customize your job. The table below lists the specifications which are most useful for users of Ada.
|Set Allocation||-P ######||-P 274839||Set allocation to charge to 274839|
|Email Notification I||-u [email-address]||-u email@example.com||Send emails to firstname.lastname@example.org.|
|Email Notification II||-[B|N]||-B -N||Send email on beginning (-B) and end (-N) of job.|
|Specify Queue||-q [queue]||-q xlarge||Request only nodes in xlarge subset.|
|Exclusive Node Usage||-x||Assigns a whole node exclusively for the job.|
|Specific node type||-R "select[gpu|phi]"||-R "select[gpu]"||Requests a node with a GPU to be used for the job.|
All the nodes enlisted for the execution of a job carry most of the environment variables the login process created: HOME, SCRATCH, PWD, PATH, USER, etc. In addition, LSF defines new ones in the environment of an executing job. Below is a list of most commonly used environment variables.
|Job ID||$LSB_JOBID||Batch job ID assigned by LSF.|
|Job Name||$LSB_JOBNAME||The name of the Job.|
|Queue||$LSB_QUEUE||The name of the queue the job is dispatched from.|
|Error File||$LSB_ERRORFILE||Name of the error file specified with a bsub -e.|
|Submit Directory||$LSB_SUBCWD||The directory the job was submitted from.|
|Hosts I||$LSB_HOSTS||The list of nodes that are used to run the batch job, |
repeated according to ptile value.
*The character limit of LSB_HOSTS variable is 4096.
|Hosts II||$LSB_MCPU_HOSTS||The list of nodes and the specified or default|
ptile value per node to run the batch job.
|Host file||$LSB_DJOB_HOSTFILE||The hostfile containing the list of nodes|
that are used to run the batch job.
Note: To see all relevant LSF environment variables for a job, add the following line to the executable section of a job file and submit that job. All the variables will be printed in the output file.
env | grep LSB
Clarification on Memory, Core, and Node Specifications
Memory Specifications are IMPORTANT.
For examples on calculating memory, core, and/or node specifications on Ada: Specification Clarification.
After the resource specification section of a job file comes the executable section. This executable section contains all the necessary UNIX, Linux, and program commands that will be run in the job.
Some commands that may go in this section include, but are not limited to:
- Changing directories
- Loading, unloading, and listing modules
- Launching software
An example of a possible executable section is below:
cd $SCRATCH # Change current directory to /scratch/user/[netID]/ ml purge # Purge all modules ml intel/2016b # Load the intel/2016b module ml # List all currently loaded modules ./myProgram.o # Run "myProgram.o"
Once you have your job file ready, it is time to submit your job. You can submit your job to LSF with the following command:
[ NetID@ada ~]$ bsub < MyJob.LSF Verifying job submission parameters... Verifying project account... Account to charge: 123456789123 Balance (SUs): 5000.0000 SUs to charge: 5.0000 Job <12345> is submitted to default queue <sn_regular>.
tamubatch is an automatic batch job script that submits jobs for the user without the need of writing a batch script on the Ada and Terra clusters. The user just needs to provide the executable commands in a text file and tamubatch will automatically submit the job to the cluster. There are flags that the user may specify which allows control over the parameters for the job submitted.
tamubatch is still in beta and has not been fully developed. Although there are still bugs and testing issues that are currently being worked on, tamubatch can already submit jobs to both the Ada and Terra clusters if given a file of executable commands.
For more information, visit this page.
tamulauncher provides a convenient way to run a large number of serial or multithreaded commands without the need to submit individual jobs or a Job array. User provides a text file containing all commands that need to be executed and tamulauncher will execute the commands concurrently. The number of concurrently executed commands depends on the batch requirements. When tamulauncher is run interactively the number of concurrently executed commands is limited to at most 8. tamulauncher is available on terra, ada, and curie. There is no need to load any module before using tamulauncher. tamulauncher has been successfully tested to execute over 100K commands.
tamulauncher is preferred over Job Arrays to submit a large number of individual jobs, especially when the run times of the commands are relatively short. It allows for better utilization of the nodes, puts less burden on the batch scheduler, and lessens interference with jobs of other users on the same node.
For more information, visit this page.
In addition to the LSF Job Monitoring and Control commands in the previous section, there are several more advanced techniques to monitor jobs and strategize to reduce your queue times.
Please see this information on the the Ada Batch Job Tracking page.
Job File Examples
Several examples of LSF job files for Ada are listed below. For translating other job files, the Batch Job Translation Guide provides some reference.
NOTE: Job examples are NOT lists of commands, but are a template of the contents of a job file. These examples should be pasted into a text editor and submitted as a job to be tested, not entered as commands line by line.
Example Job 1: A serial job
##NECESSARY JOB SPECIFICATIONS #BSUB -J ExampleJob1 #Set the job name to "ExampleJob1" #BSUB -L /bin/bash #Uses the bash login shell to initialize the job's execution environment. #BSUB -W 2:00 #Set the wall clock limit to 2hr #BSUB -n 1 #Request 1 core #BSUB -R "span[ptile=1]" #Request 1 core per node. #BSUB -R "rusage[mem=5000]" #Request 5000MB per process (CPU) for the job #BSUB -M 5000 #Set the per process enforceable memory limit to 5000MB. #BSUB -o Example1Out.%J #Send stdout and stderr to "Example1Out.[jobID]" ##OPTIONAL JOB SPECIFICATIONS #BSUB -P 123456 #Set billing account to 123456 #BSUB -u email_address #Send all emails to email_address #BSUB -B -N #Send email on job begin (-B) and end (-N)
#First Executable Line
Example Job 2: A multi core, single node job
##NECESSARY JOB SPECIFICATIONS #BSUB -J ExampleJob2 #Set the job name to "ExampleJob2" #BSUB -L /bin/bash #Uses the bash login shell to initialize the job's execution environment. #BSUB -W 6:30 #Set the wall clock limit to 6hr and 30min #BSUB -n 10 #Request 10 cores #BSUB -R "span[ptile=10]" #Request 10 cores per node. #BSUB -R "rusage[mem=2560]" #Request 2560MB per process (CPU) for the job #BSUB -M 2560 #Set the per process enforceable memory limit to 2560MB. #BSUB -o Example2Out.%J #Send stdout and stderr to "Example2Out.[jobID]" ##OPTIONAL JOB SPECIFICATIONS #BSUB -P 123456 #Set billing account to 123456 #BSUB -u email_address #Send all emails to email_address #BSUB -B -N #Send email on job begin (-B) and end (-N) #First Executable Line
Example Job 3: A multi core, multi node job
##NECESSARY JOB SPECIFICATIONS #BSUB -J ExampleJob3 #Set the job name to "ExampleJob3" #BSUB -L /bin/bash #Uses the bash login shell to initialize the job's execution environment. #BSUB -W 24:00 #Set the wall clock limit to 24hr #BSUB -n 40 #Request 40 cores #BSUB -R "span[ptile=20]" #Request 20 cores per node. #BSUB -R "rusage[mem=2560]" #Request 2560MB per process (CPU) for the job #BSUB -M 2560 #Set the per process enforceable memory limit to 2560MB. #BSUB -o Example3Out.%J #Send stdout and stderr to "Example3Out.[jobID]" ##OPTIONAL JOB SPECIFICATIONS #BSUB -P 123456 #Set billing account to 123456 #BSUB -u email_address #Send all emails to email_address #BSUB -B -N #Send email on job begin (-B) and end (-N) #First Executable Line
Example Job 4: A serial GPU job
##NECESSARY JOB SPECIFICATIONS #BSUB -J ExampleJob4 #Set the job name to "ExampleJob4" #BSUB -L /bin/bash #Uses the bash login shell to initialize the job's execution environment. #BSUB -W 2:00 #Set the wall clock limit to 2hr #BSUB -n 1 #Request 1 core #BSUB -R "span[ptile=1]" #Request 1 core per node. #BSUB -R "rusage[mem=2560]" #Request 2560MB per process (CPU) for the job #BSUB -M 2560 #Set the per process enforceable memory limit to 2560MB. #BSUB -o Example4Out.%J #Send stdout and stderr to "Example4Out.[jobID]" #BSUB -R "select[gpu]" #Request a node with a GPU ##OPTIONAL JOB SPECIFICATIONS #BSUB -P 123456 #Set billing account to 123456 #BSUB -u email_address #Send all emails to email_address #BSUB -B -N #Send email on job begin (-B) and end (-N) #First Executable Line
It is possible to request a whole GPU node by adding the following lines to optional job specifications:
#BSUB -x #BSUB -R "select[gpu]"
Please note that exclusive node use will also require more SUs.
Example Job 5: A serial job with queue specification
##NECESSARY JOB SPECIFICATIONS #BSUB -J ExampleJob5 #Set the job name to "ExampleJob5" #BSUB -L /bin/bash #Uses the bash login shell to initialize the job's execution environment. #BSUB -W 2:00 #Set the wall clock limit to 2hr #BSUB -n 1 #Request 1 core #BSUB -R "span[ptile=1]" #Request 1 core per node. #BSUB -R "rusage[mem=5000]" #Request 5000MB per process (CPU) for the job #BSUB -M 5000 #Set the per process enforceable memory limit to 5000MB. #BSUB -o Example5Out.%J #Send stdout and stderr to "Example5Out.[jobID]" ##OPTIONAL JOB SPECIFICATIONS #BSUB -P 123456 #Set billing account to 123456 #BSUB -u email_address #Send all emails to email_address #BSUB -B -N #Send email on job begin (-B) and end (-N) #BSUB -q sn_short #Send job to the sn_short node, which has one hour max walltime and is limited to single node jobs
#First Executable Line
LSF, upon job submission, sends your jobs to appropriate batch queues. These are (software) service stations configured to control the scheduling and dispatch of jobs that have arrived in them. Batch queues are characterized by all sorts of parameters. Some of the most important are:
- the total number of jobs that can be concurrently running (number of run slots)
- the wall-clock time limit per job
- the type and number of nodes it can dispatch jobs to
- which users or user groups can use that queue; etc.
These settings control whether a job will remain idle in the queue or be dispatched quickly for execution.
The current queue structure (updated on September 27, 2019).
NOTE: Each user is now limited to 8000 cores total for his/her pending jobs across all the queues.
|Queue||Job Min/Default/Max Cores||Job Default/Max Walltime||Compute Node Types||Per-Queue Limits||Aggregate Limits Across Queues||Per-User Limits Across Queues||Notes|
|sn_short||1 / 1 / 20||10 min / 1 hr||64 GB nodes (811)
256 GB nodes (26)
|Maximum of 7000 cores for all running jobs in the single-node (sn_*) queues.||Maximum of 1000 cores and 100 jobs per user for all running jobs in the single node (sn_*) queues.||For jobs needing only one compute node.|
|sn_regular||1 hr / 1 day|
|sn_long||24 hr / 4 days|
|sn_xlong||4 days / 30 days|
|mn_short||2 / 2 / 200||10 min / 1 hr||Maximum of 2000 cores for all running jobs in this queue.||Maximum of 12000 cores for all running jobs in the multi-node (mn_*) queues.||Maximum of 3000 cores and 150 jobs per user for all running jobs in the multi-node (mn_*) queues.||For jobs needing more than one compute node.|
|mn_small||2 / 2 / 120||1 hr / 10 days||Maximum of 7000 cores for all running jobs in this queue.|
|mn_medium||121 / 121 / 600||1 hr / 7 days||Maximum of 6000 cores for all running jobs in this queue.|
|mn_large||601 / 601 / 2000||1 hr / 5 days||Maximum of 8000 cores for all running jobs in this queue.|
|xlarge||1 / 1 / 280||1 hr / 10 days||1 TB nodes (11)
2 TB nodes (4)
|For jobs needing more than 256GB of memory per compute node.|
|vnc||1 / 1 / 20||1 hr / 6 hr||GPU nodes (30)||For remote visualization jobs.|
|special||None||1 hr / 7 days||64 GB nodes (811)
256 GB nodes (26)
|Requires permission to access this queue.|
|v100 (*)||1 / 1 / 72||1 hr / 2 days||192 GB nodes, dual 32GB V100 GPUs (2)|
- V100 nodes were moved to terra in preparation for the decommissioning of Ada
LSF determines which queue will receive a job for processing. The selection is determined mainly by the resources (e.g., number of cpus, wall-clock limit) specified, explicitly or by default. There are two exceptions:
- The xlarge queue that is associated with nodes that have 1TB or 2TB of main memory. To use it, submit jobs with the -q xlarge option along with -R "select[mem1tb]" or -R "select[mem2tb]"
- The special queue which gives one access to all of the compute nodes. You MUST request permission to get access to this queue.
To access any of the above queues, you must use the -q queue_name option in your job script.
Output from the bjobs command contains the name of the queue associated with a given job.
Checkpointing is the practice of creating a save state of a job so that, if interrupted, it can begin again without starting completely over. This technique is especially important for long jobs on the batch systems, because each batch queue has a maximum walltime limit.
A checkpointed job file is particularly useful for the gpu queue, which is limited to 4 days walltime due to its demand. There are many cases of jobs that require the use of gpus and must run longer than two days, such as training a machine learning algorithm.
Users can change their code to implement save states so that their code may restart automatically when cut off by the wall time limit. There are many different ways to checkpoint a job file depending on the software used, but it is almost always done at the application level. It is up to the user how frequently save states are made depending on what kind of fault tolerance is needed for the job, but in the case of the batch system, the exact time of the 'fault' is known. It's just the walltime limit of the queue. In this case, only one checkpoint need be created, right before the limit is reached. Many different resources are available for checkpointing techniques. Some examples for common software are listed below.
Usable Memory for Batch Jobs
While nodes on Ada have 64GB, 192GB or 256GB of RAM, some of this memory is used to maintain the software and operating system of the node. In most cases, LSF will not schedule jobs if it cannot find a node to satisfy an excessive memory request.
The table below contains information regarding the approximate limits of Ada memory hardware and our suggestions on its use.
|64GB Nodes||256GB Nodes||192GB Nodes in v100 queue (*)|
|Number of Cores||20 Cores (2 sockets x 10 cores)||24 Cores (2 sockets x 12 cores)|
|Memory Limit Per Core
(if using all cores per node)
|7600 MB |
|Memory Limit Per Node||56000 MB
|184000 MB |
- V100 nodes were moved to terra in preparation for the decommissioning of Ada
LSF may queue your job for an excessive time (or indefinitely) if waiting for some particular nodes with sufficient memory to become free.
Recommended Settings for Large Jobs
For jobs larger than 1000 cores (50+ nodes), the following MPI settings are recommended to reduce the MPI startup time:
export I_MPI_HYDRA_BOOTSTRAP=lsf export I_MPI_HYDRA_BRANCH_COUNT=XXX export I_MPI_LSF_USE_COLLECTIVE_LAUNCH=1
The XXX number should match the number of nodes (#Nodes = #Cores / #CoresPerNode) that your job will request.
This guide only covers the most commonly used options and useful commands.
For more information, check the man pages for individual commands or the LSF Manual.