Difference between revisions of "Terra:Batch Job Files"
(→Alternative Specifications) |
|||
Line 48: | Line 48: | ||
| Set the job name to "mpiJob" | | Set the job name to "mpiJob" | ||
|- | |- | ||
− | | | + | | Total Task/Core Count |
− | | -- | + | | --ntasks=[#] |
− | | -- | + | | --ntasks=16 |
− | | | + | | Request 16 tasks/cores total |
|- | |- | ||
| Tasks per Node I | | Tasks per Node I | ||
Line 148: | Line 148: | ||
! style="width: 170pt;" | Example | ! style="width: 170pt;" | Example | ||
! style="width: 225pt;" | Example-Purpose | ! style="width: 225pt;" | Example-Purpose | ||
+ | |- | ||
+ | | Node Count | ||
+ | | --nodes=[min[-max]] | ||
+ | | --nodes=4 | ||
+ | | Spread all tasks/cores across 4 nodes | ||
|- | |- | ||
| CPUs per Task | | CPUs per Task | ||
Line 173: | Line 178: | ||
| --ntasks-per-core=4 | | --ntasks-per-core=4 | ||
| Request max of 4 tasks per core | | Request max of 4 tasks per core | ||
− | |||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
| Tasks per Node II | | Tasks per Node II |
Revision as of 15:06, 9 February 2017
Contents
Building Job Files
While not the only method of submitted programs to be executed, job files fulfill the needs of most users.
The general idea behind job files follows:
- Make resource requests
- Add your commands and/or scripting
- Submit the job to the batch system
In a job file, resource specification options are preceded by a script directive. For each batch system, this directive is different. On Terra (Slurm) this directive is #SBATCH.
For every line of resource specifications, this directive must be the first text of the line, and all specifications must come before any executable lines.
An example of a resource specification is given below:
#SBATCH -J MyExample #Set the job name to "MyExample"
Note: Comments in a job file also begin with a # but Slurm recognizes #SBATCH as a directive.
A list of the most commonly used and important options for these job files are given in the following section of this wiki. Full job file examples are given below.
Basic Job Specifications
Several of the most important options are described below. These basic options are typically all that is needed to run a job on Terra.
Specification | Option | Example | Example-Purpose |
---|---|---|---|
Reset Env I | --export=NONE | Do not propagate environment to job | |
Reset Env II | --get-user-env=L | Replicate the login environment | |
Wall Clock Limit | --time=[hh:mm:ss] | --time=01:15:00 | Set wall clock limit to 1 hour 15 min |
Job Name | --job-name=[SomeText] | --job-name=mpiJob | Set the job name to "mpiJob" |
Total Task/Core Count | --ntasks=[#] | --ntasks=16 | Request 16 tasks/cores total |
Tasks per Node I | --ntasks-per-node=# | --ntasks-per-node=5 | Request exactly (or max) of 5 tasks per node |
Memory Per Node | --mem=value[K|M|G|T] | --mem=32G | Request 32 GB per node |
Combined stdout/stderr | --output=[OutputName].%j | --output=mpiOut.%j | Collect stdout/err in mpiOut.[JobID] |
It should be noted that Slurm divides processing resources as such: Nodes -> Cores/CPUs -> Tasks
A user may change the number of tasks per core. For the purposes of this guide, each core will be associated with exactly a single task.
Optional Job Specifications
A variety of optional specifications are available to customize your job. The table below lists the specifications which are most useful for users of Terra.
Specification | Option | Example | Example-Purpose |
---|---|---|---|
Set Allocation | --account=###### | --account=274839 | Set allocation to charge to 274839 |
Email Notification I | --mail-type=[type] | --mail-type=ALL | Send email on all events |
Email Notification II | --mail-user=[address] | --mail-user=howdy@tamu.edu | Send emails to howdy@tamu.edu |
Specify Queue | --partition=[queue] | --partition=gpu | Request only nodes in gpu subset |
Specify General Resource | --gres=[resource]:[count] | --gres=gpu:1 | Request one GPU |
Submit Test Job | --test-only | Submit test job for Slurm validation | |
Request Temp Disk | --tmp=M | --tmp=10240 | Request at least 10 GB in temp disk space |
Request License | --licenses=[LicenseLoc] | --licenses=nastran@slurmdb:12 |
Alternative Specifications
The job options within the above sections specify resources with the following method:
- Cores and CPUs are equivalent
- 1 Task per 1 CPU
- You specify: desired number of Tasks (equals number of CPUs)
- You specify: desired number of Nodes (equal or less than number of Tasks)
- You get: CPUs per Node equal to #ofCPUs/#ofNodes
- You specify: desired Memory per node
Slurm allows users to specify resources in units of Tasks, CPUs, Sockets, and Nodes.
There are many overlapping settings and some settings may (quietly) overwrite the defaults of other settings. A good understanding of Slurm options is needed to correctly utilize these methods.
Specification | Option | Example | Example-Purpose |
---|---|---|---|
Node Count | --nodes=[min[-max]] | --nodes=4 | Spread all tasks/cores across 4 nodes |
CPUs per Task | --cpus-per-task=# | --cpus-per-task=4 | Require 4 CPUs per task (default: 1) |
Memory per CPU | --mem-per-cpu=MB | --mem-per-cpu=2000 | Request 2000 MB per CPU |
Memory per Node (All, Single) | --mem=0 | Request all available memory on a node | |
Memory per Node (All, Multi) | --mem=0 | Request the least-max available memory for any node across all nodes | |
Tasks per Core | --ntasks-per-core=# | --ntasks-per-core=4 | Request max of 4 tasks per core |
Tasks per Node II | --tasks-per-node=# | --tasks-per-node=5 | Equivalent to Tasks per Node I |
Tasks per Socket | --ntasks-per-socket=# | --ntasks-per-socket=6 | Request max of 6 tasks per socket |
Sockets per Node | --sockets-per-node=# | --sockets-per-node=2 | Restrict to nodes with at least 2 sockets |
If you want to make resource requests in an alternative format, you are free to do so. Our ability to support alternative resource request formats may be limited.
Using Other Job Options
Slurm has facilities to make advanced resources requests and change settings that most Terra users do not need. These options are beyond the scope of this guide.
If you wish to explore the advanced job options, see the Advanced Documentation.
Clarification on Memory, Core, and Node Specifications
Memory Specifications are IMPORTANT.
For examples on calculating memory, core, and/or node specifications on Terra: Specification Clarification.