Hprc banner tamu.png

Difference between revisions of "Ada:Batch Job Files"

From TAMU HPRC
Jump to: navigation, search
m (Memory Specifications are IMPORTANT)
(Job files)
Line 1: Line 1:
 
__TOC__
 
__TOC__
==Job files==
+
== Job files ==
A user's request to do processing via the batch system is commonly, though not exclusively, expressed
 
in a text file, called from here on, ''job file'', ''job script'', or just ''job''. This file contains LSF directives and
 
executable lines: user-specified UNIX commands/scripts and other program executables. The directives, one per line, are all prefaced by the '''#BSUB''' string. The rest
 
of the directive can include the specification of any number of job parameters/options,
 
many of which have values associated with them. The user-specified commands can be any combination of
 
user-supplied executables and UNIX Shell commands. <br>
 
  
Unless otherwise specified, a job inherits the environment of the submitting process. First and foremost in that regard are: the shell (captured by the '''$SHELL''' environment variable) that will execute your batch script, the current working directory ('''$CWD'''), and the command search path ('''$PATH''').  If desired, you can change these and other environment settings. We recommend that you not change the default bash shell, unless for some good reason you must. All this concerns the inherited settings. LSF does add, however, several new environment variables. For more information, please see the [[Ada:Batch_Job_Submission#Environment_Variables | Environment Variables]] subsection.
+
While not the only method of submitted programs to be executed, job files fulfill the needs of most users.  
  
===Recommended Job File Format===
+
The general idea behind job files follows:
<pre>
+
* Make resource requests
#BSUB -option1 value1 -option2 value2 ... 
+
* Add your commands and/or scripting
#BSUB  ... more options ...
+
* Submit the job to the batch system
#               
 
# ...  end of BSUB directives: a blank or comment line                     
 
#
 
## 1st non-BSUB executable line: You can change here the default environment settings: SHELL, CWD, PATH, etc.
 
#                                         
 
Executable line 1                                       
 
Executable line 2                                     
 
...                                       
 
</pre>
 
  
===Example Job 1===
+
=== Basic Job Specifications ===
<pre>
 
#BSUB -J myjob1          # sets the job name to myjob1.
 
#BSUB -L /bin/bash        # uses the bash login shell to initialize the job's execution environment.
 
#BSUB -W 12:30            # sets to 12.5 hours the job's runtime wall-clock limit.
 
#BSUB -n 1                # assigns 1 core for execution.
 
#BSUB -o stdout1.%J      # directs the job's standard output to stdout1.jobid
 
##
 
# <--- at this point the current working directory is the one you submitted the job from.
 
#
 
module load intel        # loads the INTEL software tool chain to provide, among other things,
 
#                          needed runtime libraries for the execution of prog.exe below.
 
#                          (assumes prog.exe was compiled using INTEL compilers.)
 
#
 
prog.exe < input1 >& data_out1 # both input1 and data_out1 reside in the job submission dir
 
##
 
</pre>
 
  
In the above job, memory specification is missing. In this case, LSF will schedule the job on any node with one idle core. The
+
Several of the most important options are described below. These basic options are typically all that is needed to run a job on Ada.  
'''per-process memory limit will be 2.5GB, the default.''' If it is exceeded, the job will fail. Even if this process limit is not exceeded, since the job does not specify how much memory to reserve, it risks running slowly because of memory contention from other jobs scheduled on the same node. Job memory specification is an important issue. For more information, please see the the [[Ada:Batch_Job_Files#Memory_Specifications_are_IMPORTANT | Memory Specifications]] subsection.
 
  
===Example Job 2===
+
{| class="wikitable" style="text-align: center;"
<pre>
+
|+ Basic Ada (LSF) Job Specifications
#BSUB -J myjob2          # sets the job name to myjob1.
+
|-
#BSUB -L /bin/bash        # uses the bash login shell to initialize the job's execution environment.
+
! style="width: 130pt;" | Specification
#BSUB -W 12:30            # sets to 12.5 hours the job's runtime wall-clock limit.
+
! style="width: 130pt;" | Option
#BSUB -n 3                # assigns 3 cores for execution.
+
! style="width: 115pt;" | Example
#BSUB -R "span[ptile=3]" # assigns 3 cores per node.
+
! style="width: 200pt;" | Example-Purpose
#BSUB -R "rusage[mem=5000]" # reserves 5000MB per process/CPU for the job (i.e., 15,000 MB for job/node)
+
|-
#BSUB -M 5000            # sets to 5,000MB (~5GB) the per process enforceable memory limit.
+
| Job Name
#BSUB -o stdout1.%J      # directs the job's standard output to stdout1.jobid
+
| -J [SomeText]
#BSUB -P project_ID      # This is the project number against which the used service units (SUs) are charged.
+
| -J MyJob1
#BSUB -u e-mail_address  # sends email to the specified address (e.g., netid@tamu.edu,
+
| Set the job name to "MyJob1"
                          # myname@gmail.com) with information about main job events (next line).
+
|-
#BSUB -B -N              # send emails on job begin (-B) and end (-N)
+
| Shell
##
+
| -L [Shell]
cd $SCRATCH/myjob1        # makes $SCRATCH/myjob1 the job's current working directory where all
+
| -L  
#                          the needed files (e.g., prog.exe, input1, data_out1) are placed.
+
| Uses specified Unix Shell to initialize<br>the job's execution environment.
module load intel        # loads the INTEL software tool chain to provide, among other things,
+
|-
#                          needed runtime libraries for the execution of prog.exe below.
+
|  Wall Clock Limit
#                          (assumes prog.exe was compiled using INTEL compilers.)
+
| -W [hh:mm]
#
+
| -W 1:15
# The next 3 lines concurrently execute 3 instances of the same program, prog.exe, with
+
| Set wall clock limit to 1 hour 15 min
# standard input and output data streams assigned to different files in each case. This style
+
|-
# of concurrent execution can be extended up to 20-way or 40-way on nodes with 20 cores
+
| Core count
# and 40 cores, respectively.  
+
| -n ##
#
+
| -n 20
(prog.exe < input1 >& data_out1 ) &
+
| Assigns 20 job slots/cores.
(prog.exe < input2 >& data_out2 ) &
+
|-
(prog.exe < input3 >& data_out3 )
+
| Cores per node
wait
+
| -R "span[ptile=##]"
##
+
| -R "span[ptile=5]"
 +
| Request 5 cores per node.
 +
|-
 +
| Memory Per Core
 +
| -M [MB]
 +
| -M 2560
 +
| Sets the per process memory limit to 2560 mega-bytes (MBs).
 +
|-
 +
| Memory Per Core
 +
| -R "rusage[mem=[MB]]"
 +
| -R "rusage[mem=2560]"
 +
| Schedules job on nodes that have at<br>least 2560 MBs available per core.
 +
|-
 +
| Combined stdout and stderr
 +
| -o [OutputName].%j
 +
| -o stdout1.%j
 +
| Collect stdout/err in stdout.[JobID]
 +
|}
  
</pre>
+
=== Optional Job Specifications ===
  
The above manner of squeezing into a node as many execution instances as possible is a good way
+
A variety of optional specifications are available to customize your job. The table below lists the specifications which are most useful for users of Ada.
to gain efficiencies all the way around and should be adopted as common practice, provided:
 
* The duration of each execution is about the same;
 
*  Each execution either requires about the same amount of memory, or the cumulative amount of memory does not exceed '''core_count * memory_per_process'''.
 
  
You can also specify multiple '''#BSUB''' options per line if desired:
+
{| class="wikitable" style="text-align: center;"
 +
|+ Optional Ada (LSF) Job Specifications
 +
|-
 +
! style="width: 130pt;" | Specification
 +
! style="width: 130pt;" | Option
 +
! style="width: 170pt;" | Example
 +
! style="width: 200pt;" | Example-Purpose
 +
|-
 +
| Set Allocation
 +
| -P ######
 +
| -P 274839
 +
| Set allocation to charge to 274839
 +
|-
 +
| Email Notification I
 +
| -u [email-address]
 +
| -u howdy@tamu.edu
 +
| Send emails to howdy@tamu.edu.
 +
|-
 +
| Email Notification II
 +
| -[B|N]
 +
| -B -N
 +
| Send email on beginning (-B) and end (-N) of job.
 +
|-
 +
| Specify Queue
 +
| -q [queue]
 +
| -q xlarge
 +
| Request only nodes in xlarge subset.
 +
|}
  
<pre>
+
=== Clarification on Memory, Core, and Node Specifications ===
#BSUB -J myjob1 -W 12:30 -n 3 -L /bin/bash -o stdout1.%J -P project1
 
</pre>
 
  
===Memory Specifications are IMPORTANT===
+
Memory Specifications are <font color=teal>IMPORTANT</font>. <br>
<pre>
+
For examples on calculating memory, core, and/or node specifications on Ada: [[:Ada:Batch_Memory_Specs | Specification Clarification]].
#BSUB -R "rusage[mem=process_alloc_size]" -M process_size_limit
 
</pre>
 
  
Both of these should be set in a job and, typically, to the same amount.
+
[[Category: Ada]]
* The '''process_alloc_size''' specifies the amount of memory to allocate/reserve per process on a node. Omitting this specification, <br>
 
will cause LSF to select nodes on the basis of available cores only, regardless of whether such nodes have sufficient memory to<br>
 
run the job efficiently. Hence, this omission can cause the job to be '''swapped out''' onto the local drive (big delay) and/or experience memory contention<br>
 
from other jobs running on the same node, thereby bringing a general and dramatic slowdown.
 
 
 
*If, in addition to the '''process_alloc_size'''  option, one specifies a value (=core_count_per_node) for the '''ptile''' parameter,<br>
 
LSF will allocate/reserve '''core_count_per_node * process_alloc_size''' MB of memory per node.
 
 
 
* The '''-M process_size_limit''' setting specifies the memory size limit (in MBs) per process, which when exceeded will cause the process and<br>
 
the job to fail. The default value for '''process_size_limit''' is 2.5 GB. Both of these settings should reflect the run-time needs of your job.
 
 
 
* The total available memory per node for jobs is about 10 GB less than the maximum: 54 GB for '''nxt''' type nodes; 246 GB for the '''mem256gb''' type; etc.
 
 
 
One should not rely on default memory limit settings. The latter may be too large or too small.  A realistic picture of job's memory can be obtained<br>
 
from the information given by one (e.g., '''bjobs''') or more job tracking commands. For more information, please see the [[Ada:Batch_Job_Submission#Job_tracking_and_control_commands | Job tracking and control commands]] subsection.
 

Revision as of 17:40, 6 January 2017

Job files

While not the only method of submitted programs to be executed, job files fulfill the needs of most users.

The general idea behind job files follows:

  • Make resource requests
  • Add your commands and/or scripting
  • Submit the job to the batch system

Basic Job Specifications

Several of the most important options are described below. These basic options are typically all that is needed to run a job on Ada.

Basic Ada (LSF) Job Specifications
Specification Option Example Example-Purpose
Job Name -J [SomeText] -J MyJob1 Set the job name to "MyJob1"
Shell -L [Shell] -L Uses specified Unix Shell to initialize
the job's execution environment.
Wall Clock Limit -W [hh:mm] -W 1:15 Set wall clock limit to 1 hour 15 min
Core count -n ## -n 20 Assigns 20 job slots/cores.
Cores per node -R "span[ptile=##]" -R "span[ptile=5]" Request 5 cores per node.
Memory Per Core -M [MB] -M 2560 Sets the per process memory limit to 2560 mega-bytes (MBs).
Memory Per Core -R "rusage[mem=[MB]]" -R "rusage[mem=2560]" Schedules job on nodes that have at
least 2560 MBs available per core.
Combined stdout and stderr -o [OutputName].%j -o stdout1.%j Collect stdout/err in stdout.[JobID]

Optional Job Specifications

A variety of optional specifications are available to customize your job. The table below lists the specifications which are most useful for users of Ada.

Optional Ada (LSF) Job Specifications
Specification Option Example Example-Purpose
Set Allocation -P ###### -P 274839 Set allocation to charge to 274839
Email Notification I -u [email-address] -u howdy@tamu.edu Send emails to howdy@tamu.edu.
Email Notification II N] -B -N Send email on beginning (-B) and end (-N) of job.
Specify Queue -q [queue] -q xlarge Request only nodes in xlarge subset.

Clarification on Memory, Core, and Node Specifications

Memory Specifications are IMPORTANT.
For examples on calculating memory, core, and/or node specifications on Ada: Specification Clarification.