Hprc banner tamu.png

Difference between revisions of "Ada:Batch Job Submission"

From TAMU HPRC
Jump to: navigation, search
(Common Submission Options)
Line 1: Line 1:
 +
==Job Submission: the bsub command==
  
 +
<pre>
 +
bsub < jobfile                  # Submits specified job for processing by LSF
 +
</pre>
  
== Job Submission: the bsub command ==
+
Here is an illustration,
  
Use the '''bsub''' command to submit a job script as shown below:
+
<pre>
 +
[userx@login4]$ bsub < sample1.job
 +
Verifying job submission parameters...
 +
Job <224139> is submitted to default queue <devel>.
 +
[userx@login4]$
 +
</pre>
  
<pre>$ bsub &lt; sample.job
+
The first thing LSF does upon submission is to tag your job with a numeric identifier, a job id.
Job &lt;138733&gt; is submitted to default queue .</pre>
+
Above, that identifier is '''224139'''. You will need it in order to track or manage (kill or modify)
When submitted, the job is assigned a unique id. You may refer to the job using only the numerical portion of its id (eg. 138733) with the various batch system commands.
+
your jobs. Next, note that the default current working directory for the job is the directory
 +
you submitted the job from. If that's not what you need, you must explicitly indicate that, as we
 +
do above when we cd into a specific directory. On job completion, LSF will place in the submission
 +
directory the file stdout1.224139. It contains a log of job events and other data directed to
 +
standard out. Always inspect this file for useful information.
  
The batch facility on Ada is LSF (Load Sharing Facility) from IBM. To submit a job via LSF, a user should submit a job file which specifies submission options, commands to execute, and if needed, the batch queue to submit to.  
+
'''Three important job parameters:'''
 +
<pre>
 +
#BSUB -n NNN                    # NNN: total number of cpus to allocate for the job
 +
#BSUB -R "span[ptile=NN]"      # NN:  number of cores/cpus per node to use
 +
#BSUB -R "select[node-type]"    # node-type: nxt, mem256gb, gpu, phi, mem1t, mem2t ...
 +
</pre>
  
'''A reminder.''' For the purpose of computation in batch mode, the Ada cluster has 837 nodes that are powered by the Ivy Bridge-EP processor and 15 by the Westemere. The Ivy Bridge-EP-based nodes have 20 cpus/cores each, while the Westmere-EX nodes have 40. Compute nodes, Ivy Bridge-EP or Westmere-EX, have difference memory capacities. Note a small portion of memory on each compute node is used by operating systems, not available to user processes. The above is usefull to bear in mind when constructing batch requests.
+
We list these together because in many jobs they can be closely related and, therefore, must be
 +
consistently set. We recommend their adoption in all jobs, serial, single-node and multi-node.
 +
The following examples, with some commentary,  illustrate their use.
  
== Submission Options ==
+
<pre>
 +
#BSUB -n 900                    # 900: number of cpus to allocate for the job
 +
#BSUB -R "span[ptile=20]"      # 20:  number of cores/cpus per node to use
 +
#BSUB -R "select[nxt]"          # Allocates NeXtScale nodes
 +
</pre>
  
Resources needed for program execution are specified by submission options in a job file to submit.  
+
The above specifications will allocate 45 (=900/20) whole nodes. In many parallel jobs the selection
 +
of NeXtScale nodes at 20 cores per node is the best choice.
  
==== Common Submission Options ====
+
<pre>
 +
#BSUB -n 900                    # 900: total number of cpus to allocate for the job
 +
#BSUB -R "span[ptile=16]"      # 16:  number of cores/cpus per node to use
 +
#BSUB -R "select[nxt]" -x      # Allocates exclusively whole NeXtScale nodes
 +
</pre>
  
Below are the common submission options. These options can be specified as #BSUB options in your job script (recommended). they can also be specified on the command line for the bsub command. The '''bsub''' man page describes the available submission options in more detail.  
+
The above specifications will allocate 57 (= ceiling(900/16)) nodes. The exclusive ('''-x''') node allocation
 +
requested here may be important for multi-node parallel jobs that need it. It will prevent the scheduling
 +
of other jobs on such nodes, jobs which might use 4 cores or less. The absence of -x, can find one  or
 +
more of the 57 nodes hosting more than one job. This can drastically reduce the performance of the
 +
900-core job. The justification for "waisting" 4 cores per node can be a valid one depending on specific
 +
program behavior, such as memory or communication traffic. For sure, the decision to go with 16 cores
 +
per node or less should be taken after carefull experimentation. Applying the -x option will cost
 +
you, in terms of SUs, the same as the use of 20 cores, not 16. So use it sensibly.
  
{| class="wikitable" style="text-align: left;"
 
!Option
 
!Description
 
|-
 
| -J jobname
 
| Name of the job. When used with the -j oe option, the job's output will be directed to a file named jobname.oXXXX where XXXX is the job id.
 
|-
 
| -L login_shell
 
| Shell for interpreting the job script. Recommended shell is /bin/bash.
 
|-
 
| -n X
 
| Number of cores (X) to be assigned to the job
 
|-
 
| -o output_file_name
 
| Specifies the output file name.
 
|-
 
| -P acctno
 
| Specifies the billing account to use for this job. Please consult the [[TAMUSC:AMS | AMS documentation]] for more information.
 
|-
 
| -q queue_name
 
| Directs the submitted job to the queue_name queue. On Ada this option should be exercised only on xlarge queue. Here are more details on [[Ada:Batch_Queues | queues on Ada]].
 
|-
 
| -R "select[r]"
 
| Selects nodes with resource r.
 
|-
 
| -R "span[ptile=n]"
 
| Requests n processors on each host that should be allocated to the job.
 
|-
 
| -u email_address(es)
 
| The email addresses to send mail about the job. Using an external email address (eg. @tamu.edu, @gmail.com, etc.) is recommended.
 
|-
 
| -x
 
| Specifies that LSF will not schedule other jobs on any of the engaged nodes than the present job. This is a useful option when, for issues of performance for example, the sharing of nodes with other jobs is undesirable. Usage per node will be assessed 20 (or 40) * wall_clock_time.
 
|-
 
| -W HH:MM
 
| The limit on how long the job can run.
 
|}
 
  
==== Examples of Submission Options ====
+
<pre>
To stress the importance of specifying resources correctly in BSUB directives and because this specification is a frequent source of error, we first present a number of examples.
+
#BSUB -n 1                    # Allocate a total of 1 cpu/core for the job, appropriate for serial processing.
 +
#BSUB -R "span[ptile=1]"      # Allocate 1 cpu per node.
 +
#BSUB -R "select[gpu]"        # Make the allocated node have gpus, of 64GB or 256GB memory. A "select[phi]"
 +
                              # specification would allocate a node with phi coprocessors.
 +
</pre>
  
'''#BSUB -n 40 -W 2:30'''<br /> or, equivalently:<br /> #BSUB -n 40<br /> #BSUB -W 2:30<br /> This directive will allocate 40 cpus. The duration (wall-clock time) of execution is specified to be a maximum 2 hours and 30 minutes.
+
Omitting the last two options in the above will cause LSF to place the job on any conveniently available
 +
core on any node, idle or busy, of any type, except on those with 1TB or 2TB memory.
  
'''#BSUB -n 40 -M 20 -W 2:30'''<br /> This directive will allocate 40 cpus, and 20 '''MB''' memory per process (cpu). The duration of execution is specified to be a maximum 2 hours and 30 minutes.
+
==Common BSUB Options==
 +
... pending ...
  
'''#BSUB -n 40 -W -M 20 2:30 -P 012345678'''<br /> This directive will allocate 40 cpus, and 20 '''MB''' memory per process. The duration (wall-clock time) of execution is specified to be a maximum 2 hours and 30 minutes. The number of billing units (BUs) used will be charged against the 0123456788 project
+
==More Examples==
 +
... pending ...
  
'''#BSUB -n 2 -W 2:30 -P 012345678 -R &quot;select[gpu]&quot;'''<br /> This directive will allocate 2 cores on nodes with GPUs. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
+
'''Example 2'''
  
'''#BSUB -n 2 -W 2:30 -P 012345678 -R &quot;select[phi]&quot;'''<br /> This directive will allocate 2 cores on nodes with PHIs. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
+
<pre>
 +
#BSUB -N OpenMP1 ...
 +
#BSUB ..
 +
....
 +
</pre>
  
'''#BSUB -n 40 -W 2:30 -P 012345678 -R &quot;span[ptile=20]&quot;'''<br /> This directive will allocate 2 nodes and 20 cores on each node. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
+
'''Example 3'''
 +
<pre>
 +
## A multi-node MPI job
 +
#BSUB -J mpitest -o mpitest.%J -L /bin/bash -W 30 -n 200 -R 'span[ptile=20]'
  
'''#BSUB -n 50 -W 2:30 -P 012345678 -R &quot;span[ptile=20]&quot;'''<br /> This directive will allocate 3 nodes: two nodes have 20 cores each, while the third node has 10 cores. The duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The number of billing units (BUs) used will be charged against the 012345678 project
+
module load ictce        # load intel toolchain
  
'''#BSUB -n 5 -W 2:30 -P 012345678 -R &quot;span[ptile=5]&quot; -R &quot;select[mem1tb]&quot; -q xlarge'''<br /> This directive will allocate 5 cpus on a node with 1TB memory. duration (wall-clock time) of execution is specified to be a maximum 2 hours 30 minutes. The job is submitted to xlarge queue (for using xtra-large memory nodes: 1 or 2 TB nodes). The number of billing units (BUs) used will be charged against the 012345678 project
+
## ONLY SET THESE VARIABLES FOR RUNNING INTEL MPI JOBS (WITH MORE THAN 100
 +
CORES)
 +
# tells Intel MPI to launch MPI processes using LSF's blaunch
 +
export I_MPI_HYDRA_BOOTSTRAP=lsf
 +
# tell Intel MPI to launch only one blaunch instance (for scalability and
 +
# stability)
 +
export I_MPI_LSF_USE_COLLECTIVE_LAUNCH=1
 +
# set this variable to the number of hosts ie. (-n value) divided by (ptile
 +
# value)
 +
export I_MPI_HYDRA_BRANCH_COUNT=40
  
==== Requesting Specific Node Type ====
+
# launch MPI program using the hydra launcher
 
+
mpiexec.hydra ./hw.mpi.C.exe
The BSUB "select" option specifies the resource type of nodes to run programs. The table below lists the resource types for BSUB "select" option. This does not apply to remote visualization jobs.
+
....
 
+
....
{| class="wikitable" style="text-align: left;"
+
</pre>
! Node Type Needed
 
! Job Parameter to Use
 
|-
 
| General 64GB
 
| N/A
 
|-
 
| 256GB
 
| -R &quot;select[mem256gb]&quot;
 
|-
 
| 1 TB
 
| -R &quot;select[mem1t]&quot; -q xlarge
 
|-
 
| 2 TB
 
| -R &quot;select[mem1t]&quot; -q xlarge
 
|-
 
| PHI
 
| -R &quot;select[phi]&quot;
 
|-
 
| Any GPU
 
| -R &quot;select[gpu]&quot;
 
|-
 
| 64GB GPU
 
| -R &quot;select[gpu64gb]&quot;
 
|-
 
| 256GB GPU
 
| -R &quot;select[gpu256gb]&quot;
 
|}
 
 
 
==== Controlling Locality ====
 
 
 
On Ada's NextScale nodes, you can improve performance by ensuring that the nodes selected for your job are "close" to each other. This helps to minimize latencies between nodes during communication. For the best explanation of this, see the section [http://sc.tamu.edu/softwareDocs/lsf/9.1.2/lsf_admin/job_locality_compute_units.html Control job locality using compute units] in the [http://sc.tamu.edu/softwareDocs/lsf/9.1.2/lsf_admin/index.htm Administering IBM Platform LSF] manual available in [http://sc.tamu.edu/softwareDocs/lsf/9.1.2/ our local copy of the LSF documentation]
 
  
For Ada's NextScale nodes, we define a "Compute Unit" as all the nodes connected to a single Infiniband switch.  There are 24 nodes, each with 20 cores in each compute unit when means that you can run jobs up to 480 cores with only one "hop" (switch) between each node.  Jobs using more than that number must use at least three hops for nodes on different switches (first to the source nodes switch, then to the core switch and then finally to the destination switch).  Even nodes in the same rack (each rack has three Infiniband switches) will have to travel this distance.
+
==Environment Variables==
  
If you are running multinode jobs and are either concerned about
+
When LSF selects and activates a node for the running of your job it executes a login to that node. The environment of that
* consistency (e.g. for benchmarking), or,
+
login process is mostly a duplicate of the process you launched (bsub) your job from. In general, it is recommended that you specify
* maximum efficiency
+
the creation of a new shell without any added features that the launching process may have acquired, say, by loading one or
you should consider making use of the settings for locality.
+
more application modules. These may conflict or be irrelevant to the modules you do need to load, within a job, for job
 +
execution. Hence, the recommendation for specifying the '''#BSUB -L /bin/bash''' option in a job file.<br>
  
Be aware, however, that it may take longer before your job can be scheduled.  If you ask for 24 nodes all on one switch, the scheduler will delay your job until that constraint can be met. If you ask for any 24 nodes, the scheduler may pick one node from each of 24 switches. Although the latter may run sooner, it will be much more inefficient since every node involved must pass through the core switch to talk to any other node.
+
All the nodes enlisted for the execution of a job carry most of the environment variables of the login process: HOME, PWD, PATH, USER, etc.
 +
IN addition, LSF defines new ones. Below, we show an abbreviated list.
  
For details on syntax, see the link above.  In general, the following two settings may be the most useful:
 
 
{| class="wikitable" style="text-align: left;"
 
!Setting
 
!Result
 
|-
 
| -R "cu[pref=maxavail]"
 
| This will select nodes that are on switches that are the least utilized.  This will help to group nodes together to help minimize interswitch communication.  It won't be as efficient as the next setting, but should cut down the amount of time your job has to wait before starting
 
|-
 
| -R "cu[maxcus='''number''']"
 
| This will guarantee that your job will utilize no more than '''number''' of compute units. So, if number=1, you can use up to 480 cores and be sure of the most efficient communication pattern.  With 2, you can go up to 960, which any one node can communicate with 23 nodes in only one hop and the other 24 nodes in three hops
 
|}
 
 
Again, see the link above for details. 
 
 
Note, that you can also combine settings.  For example,
 
 
<pre>
 
<pre>
-R "cu[pref=maxavail:maxcus=3]"
+
LSB_QUEUE:    The name of the queue the job is dispatched from.
 +
LSB_JOBNAME:  Name of the job.
 +
LSB_JOBID:    Batch job ID assigned by LSF.
 +
LSB_ERRORFILE: Name of the error file specified with a bsub -e.
 +
LSB_HOSTS:     The list of nodes (their names) that are used to run the batch job.
 
</pre>
 
</pre>
would assign your jobs to the three emptiest switches.  The myriad of options/combinations is too much to document here.  Just keep in mind that by using compute units to minimize communications costs can have a significant impact.
 
  
== Batch Job files/Scripts ==
+
==Job tracking and control commands==
  
A batch request is expressed through a batch file. That is, a text file, a '''job script''', so called, with appropriate directives and other specifications or commands. A batch file, say, '''sample.job''', consists of the LSF directives section (top part) and the (UNIX) commands section. In the latter you specify all the commands that need to be executed. All LSF directives start with the '''#BSUB''' string.
 
  
==== Structure of Job Files/Scripts ====
+
<pre>
Here is the general layout of a common BSUB jofile.
+
bjobs [-u all or user_name] [[-l] job_id]    # displays job information per user(s) or job_id, in summary or detail (-l) form
 
+
bpeek [-f] job_id                            # displays the stdout and stderr output of an unfinished job
<pre>#BSUB directive(s)1
+
bkill job_id                                # kills, suspends, or resumes unfinished jobs. See man bkill for details
#BSUB directive(s)2
+
bmod  job_id [bsub_options]                  # Modifies job submission options of a job. See man bmod for details
#BSUB ...
 
 
 
#UNIX commands section. From here on down &quot;#&quot; on col 1 starts a comment
 
#&lt;-- at this point $HOME is the current working directory
 
cmd1
 
cmd2
 
...
 
 
</pre>
 
</pre>
The UNIX command section is executed on a single node or multiple nodes. Serial and OpenMP programs execute on only one node, MPI programs can use 1-848 nodes. The default current working directory is $HOME. If that is not a practical choice for you, you should explicitly change (cd) to the directory of your choice. Many times a convenient choice is the directory you submit jobs from. BSUB stores that directory's location in the '''LSB_SUBCWD''' environment variable. Also, by default, the executing UNIX shell is the bash shell.<br /> <br /> After that we lay out an example of a complete job file.
 
You submit the job script for execution using the '''bsub &lt; jobfile''' command (see below). BSUB then assigns it a job id, which you can use to track your job's progress through the system. The job's priority relative to other jobs will be determined based on several factors. This priority is used to order the jobs for consideration by the batch system for execution on the compute nodes.
 
  
Below is a sample job script for a serial job which requests only one node.
+
'''Example'''
 
<pre>
 
<pre>
## job name
+
[userx@login4]$ bjobs -u all
#BSUB -J matrix_serial_job
+
JOBID      STAT  USER            QUEUE      JOB_NAME            NEXEC_HOST SLOTS RUN_TIME        TIME_LEFT
 +
223537    RUN  adinar          long      NOR_Q                1          20    400404 second(s) 8:46 L
 +
223547    RUN  adinar          long      NOR_Q                1          20    399830 second(s) 8:56 L
 +
223182    RUN  tengxj1025      long      pro_at16_lowc        10        280  325922 second(s) 5:27 L
 +
229307    RUN  natalieg        long      LES_MORE            3          900  225972 second(s) 25:13 L
 +
229309    RUN  tengxj1025      long      pro_atat_lowc        7          280  223276 second(s) 33:58 L
 +
229310    RUN  tengxj1025      long      cg16_lowc            5          280  223228 second(s) 33:59 L
 +
. . .            . . .    . . .
  
## send stderr and stdout to the same file
+
[userx@login4]$ bjobs -l 229309
#BSUB -o matrix_out.%J
 
  
## login shell to avoid copying env from login session
+
Job <229309>, Job Name <pro_atat_lowc>, User <tengxj1025>, Project <default>, M
## also helps the module function work in batch jobs
+
                          ail <czjnbb@gmail.com>, Status <RUN>, Queue <long>, J
#BSUB -L /bin/bash
+
                          ob Priority <250000>, Command <## job name;#BSUB -J p
 +
                          ro_atat_lowc; ## send stderr and stdout to the same f
 +
                          ile ;#BSUB -o info.%J; ## login shell to avoid copyin
 +
                          g env from login session;## also helps the module fun
 +
                          ction work in batch jobs;#BSUB -L /bin/bash; ## 30 mi
 +
                          nutes of walltime ([HH:]MM);#BSUB -W 96:00; ## numpro
 +
                          cs;#BSUB -n 280; . . .
 +
                          . . .
  
## 30 minutes of walltime ([HH:]MM)
+
RUNLIMIT
#BSUB -W 30
+
5760.0 min of nxt1449
 +
Tue Nov  4 21:34:43 2014: Started on 280 Hosts/Processors <nxt1449> <nxt1449> <
 +
                          nxt1449> <nxt1449> <nxt1449> <nxt1449>  ...
 +
                          . . .
  
## numprocs
+
Execution
#BSUB -n 1
+
                          CWD </scratch/user/tengxj1025/EXTD/pro_atat/lowc/md>;
 
+
Fri Nov  7 12:05:55 2014: Resource usage collected.
## load intel toolchain
+
                          The CPU time used is 67536997 seconds.
module load ictce
+
                          MEM: 44.4 Gbytes;  SWAP: 0 Mbytes;  NTHREAD: 862
 
 
time ./matrix.exe
 
  
 +
                          HOST: nxt1449
 +
                          MEM: 3.2 Gbytes;  SWAP: 0 Mbytes; CPU_TIME: 9004415 s
 +
                          econds . . .
 +
                          . . .
 
</pre>
 
</pre>
 
Here are [[Ada:Batch_Examples |more example job scripts]].
 
 
==== Environment Variables ====
 
 
LSF provides some environment variables that can be accessed from a job script. See the '''Bsub''' man page for the complete list of LSF environment variables. Below is a list of common environment variables.
 
 
 
{| class="wikitable" style="text-align: left;"
 
! Variable
 
! Description
 
|-
 
| $LSB_JOBID
 
| Contains the job id.
 
|-
 
| $LSB_JOB_CWD
 
| Sets the current working directory for job execution.
 
|-
 
| $LSB_SUBCWD
 
| Sets the directory where job submitted .
 
|-
 
| $LSB_HOSTS
 
| Gives a list of compute nodes assigned to the job, one entry per MPI task.  A useful command to capture the contents of the list may be, cat $LSB_HOSTS. '''This environment variable is not available when the list of hostnames is more than 4096 bytes.'''
 
|-
 
| $LSB_MCPU_HOSTS
 
| Gives a string of compute nodes assigned to the job in a compact format. For example, &quot;hostA 20 hostB 10&quot;. This should be used for large groups of nodes.
 
|-
 
| $LSB_DJOB_HOSTFILE
 
| Points to a file containing the hostnames in a format useable by MPI.
 
|}
 
 
==Notes==
 
* If you get an error like "DAT: library load failure: libdaplomcm.so.2: cannot open shared object file: No such file or directory" then try adding this to your job file:
 
<pre>
 
export I_MPI_FABRICS='shm:ofa'
 
</pre>
 
 
==See Also==
 
*[[Ada:Batch_System_Configuration | Batch System Configuration (from Adadocs)]]
 
*[[Ada:Torque_(Eos)_vs._LSF_(Ada)_Quick_Reference | Torque (Eos) vs. LSF (Ada) Quick Reference (from Adadocs]]
 
* [http://slurm.schedmd.com/rosetta.pdf Another quick reference] - Torque/Slurm/LSF/SGE/LoadLeveler
 
 
[[Category:Ada]]
 

Revision as of 11:28, 24 November 2014

Job Submission: the bsub command

bsub < jobfile                  # Submits specified job for processing by LSF

Here is an illustration,

[userx@login4]$ bsub < sample1.job
Verifying job submission parameters...
Job <224139> is submitted to default queue <devel>.
[userx@login4]$

The first thing LSF does upon submission is to tag your job with a numeric identifier, a job id. Above, that identifier is 224139. You will need it in order to track or manage (kill or modify) your jobs. Next, note that the default current working directory for the job is the directory you submitted the job from. If that's not what you need, you must explicitly indicate that, as we do above when we cd into a specific directory. On job completion, LSF will place in the submission directory the file stdout1.224139. It contains a log of job events and other data directed to standard out. Always inspect this file for useful information.

Three important job parameters:

#BSUB -n NNN                    # NNN: total number of cpus to allocate for the job
#BSUB -R "span[ptile=NN]"       # NN:  number of cores/cpus per node to use
#BSUB -R "select[node-type]"    # node-type: nxt, mem256gb, gpu, phi, mem1t, mem2t ...

We list these together because in many jobs they can be closely related and, therefore, must be consistently set. We recommend their adoption in all jobs, serial, single-node and multi-node. The following examples, with some commentary, illustrate their use.

#BSUB -n 900                    # 900: number of cpus to allocate for the job
#BSUB -R "span[ptile=20]"       # 20:  number of cores/cpus per node to use
#BSUB -R "select[nxt]"          # Allocates NeXtScale nodes

The above specifications will allocate 45 (=900/20) whole nodes. In many parallel jobs the selection of NeXtScale nodes at 20 cores per node is the best choice.

#BSUB -n 900                    # 900: total number of cpus to allocate for the job
#BSUB -R "span[ptile=16]"       # 16:  number of cores/cpus per node to use
#BSUB -R "select[nxt]" -x       # Allocates exclusively whole NeXtScale nodes

The above specifications will allocate 57 (= ceiling(900/16)) nodes. The exclusive (-x) node allocation requested here may be important for multi-node parallel jobs that need it. It will prevent the scheduling of other jobs on such nodes, jobs which might use 4 cores or less. The absence of -x, can find one or more of the 57 nodes hosting more than one job. This can drastically reduce the performance of the 900-core job. The justification for "waisting" 4 cores per node can be a valid one depending on specific program behavior, such as memory or communication traffic. For sure, the decision to go with 16 cores per node or less should be taken after carefull experimentation. Applying the -x option will cost you, in terms of SUs, the same as the use of 20 cores, not 16. So use it sensibly.


#BSUB -n 1                    # Allocate a total of 1 cpu/core for the job, appropriate for serial processing.
#BSUB -R "span[ptile=1]"      # Allocate 1 cpu per node.
#BSUB -R "select[gpu]"        # Make the allocated node have gpus, of 64GB or 256GB memory. A "select[phi]"
                              # specification would allocate a node with phi coprocessors.

Omitting the last two options in the above will cause LSF to place the job on any conveniently available core on any node, idle or busy, of any type, except on those with 1TB or 2TB memory.

Common BSUB Options

... pending ...

More Examples

... pending ...

Example 2

#BSUB -N OpenMP1 ...
#BSUB ..
....

Example 3

## A multi-node MPI job 
#BSUB -J mpitest -o mpitest.%J -L /bin/bash -W 30 -n 200 -R 'span[ptile=20]'

module load ictce         # load intel toolchain

## ONLY SET THESE VARIABLES FOR RUNNING INTEL MPI JOBS (WITH MORE THAN 100
CORES)
# tells Intel MPI to launch MPI processes using LSF's blaunch
export I_MPI_HYDRA_BOOTSTRAP=lsf
# tell Intel MPI to launch only one blaunch instance (for scalability and
# stability)
export I_MPI_LSF_USE_COLLECTIVE_LAUNCH=1
# set this variable to the number of hosts ie. (-n value) divided by (ptile
# value)
export I_MPI_HYDRA_BRANCH_COUNT=40

# launch MPI program using the hydra launcher
mpiexec.hydra ./hw.mpi.C.exe
....
....

Environment Variables

When LSF selects and activates a node for the running of your job it executes a login to that node. The environment of that login process is mostly a duplicate of the process you launched (bsub) your job from. In general, it is recommended that you specify the creation of a new shell without any added features that the launching process may have acquired, say, by loading one or more application modules. These may conflict or be irrelevant to the modules you do need to load, within a job, for job execution. Hence, the recommendation for specifying the #BSUB -L /bin/bash option in a job file.

All the nodes enlisted for the execution of a job carry most of the environment variables of the login process: HOME, PWD, PATH, USER, etc. IN addition, LSF defines new ones. Below, we show an abbreviated list.

LSB_QUEUE:     The name of the queue the job is dispatched from.
LSB_JOBNAME:   Name of the job.
LSB_JOBID:     Batch job ID assigned by LSF.
LSB_ERRORFILE: Name of the error file specified with a bsub -e.
LSB_HOSTS:     The list of nodes (their names) that are used to run the batch job.

Job tracking and control commands

bjobs [-u all or user_name] [[-l] job_id]    # displays job information per user(s) or job_id, in summary or detail (-l) form
bpeek [-f] job_id                            # displays the stdout and stderr output of an unfinished job
bkill job_id                                 # kills, suspends, or resumes unfinished jobs. See man bkill for details
bmod  job_id [bsub_options]                  # Modifies job submission options of a job. See man bmod for details

Example

[userx@login4]$ bjobs -u all
JOBID      STAT  USER             QUEUE      JOB_NAME             NEXEC_HOST SLOTS RUN_TIME        TIME_LEFT
223537     RUN   adinar           long       NOR_Q                1          20    400404 second(s) 8:46 L
223547     RUN   adinar           long       NOR_Q                1          20    399830 second(s) 8:56 L
223182     RUN   tengxj1025       long       pro_at16_lowc        10         280   325922 second(s) 5:27 L
229307     RUN   natalieg         long       LES_MORE             3          900   225972 second(s) 25:13 L
229309     RUN   tengxj1025       long       pro_atat_lowc        7          280   223276 second(s) 33:58 L
229310     RUN   tengxj1025       long       cg16_lowc            5          280   223228 second(s) 33:59 L
. . .             . . .     . . .

[userx@login4]$ bjobs -l 229309

Job <229309>, Job Name <pro_atat_lowc>, User <tengxj1025>, Project <default>, M
                          ail <czjnbb@gmail.com>, Status <RUN>, Queue <long>, J
                          ob Priority <250000>, Command <## job name;#BSUB -J p
                          ro_atat_lowc; ## send stderr and stdout to the same f
                          ile ;#BSUB -o info.%J; ## login shell to avoid copyin
                          g env from login session;## also helps the module fun
                          ction work in batch jobs;#BSUB -L /bin/bash; ## 30 mi
                          nutes of walltime ([HH:]MM);#BSUB -W 96:00; ## numpro
                          cs;#BSUB -n 280; . . .
                          . . .

 RUNLIMIT
 5760.0 min of nxt1449
Tue Nov  4 21:34:43 2014: Started on 280 Hosts/Processors <nxt1449> <nxt1449> <
                          nxt1449> <nxt1449> <nxt1449> <nxt1449>  ...
                          . . .

Execution
                          CWD </scratch/user/tengxj1025/EXTD/pro_atat/lowc/md>;
Fri Nov  7 12:05:55 2014: Resource usage collected.
                          The CPU time used is 67536997 seconds.
                          MEM: 44.4 Gbytes;  SWAP: 0 Mbytes;  NTHREAD: 862

                          HOST: nxt1449
                          MEM: 3.2 Gbytes;  SWAP: 0 Mbytes; CPU_TIME: 9004415 s
                          econds . . .
                          . . .