Hprc banner tamu.png

Difference between revisions of "Terra:Batch Job Files"

From TAMU HPRC
Jump to: navigation, search
(Building Job Files)
(Building Job Files)
Line 110: Line 110:
  
 
=== Alternative Memory/Core/Node Specifications ===
 
=== Alternative Memory/Core/Node Specifications ===
 +
 +
The job options within the above sections specify resources with the following method:
 +
* Cores and CPUs are equivalent
 +
* 1 Task per 1 CPU
 +
* '''You specify:''' desired number of Tasks (equals number of CPUs)
 +
* '''You specify:''' desired number of Nodes (equal or less than number of Tasks)
 +
* '''You get:''' CPUs per Node equal to ''#ofCPUs/#ofNodes''
 +
* '''You specify:''' desired Memory per node
 +
 +
The behavior above is similar to the implementation previously used on Eos, with the exception of memory.
 +
 +
Slurm allows users to specify resources in units of Tasks, CPUs, Sockets, and Nodes. There are many overlapping settings and some settings may (quietly) overwrite the defaults of other settings. A good understanding of Slurm options is needed to correctly utilize these methods.
 +
 +
If you want to make resource requests in an alternative format, you are free to do so. Our ability to support alternative resource request formats may be limited.
 +
 +
{| class="wikitable" style="text-align: center;"
 +
|+ Alternative Memory/Core/Node Specifications
 +
|-
 +
! style="width: 130pt;" | Specification
 +
! style="width: 130pt;" | Option
 +
! style="width: 170pt;" | Example
 +
! style="width: 200pt;" | Example-Purpose
 +
|-
 +
| Set Allocation
 +
| -A ######
 +
| -A 274839
 +
| Set allocation to charge to 274839
 +
|-
 +
| Email Notification I
 +
| --mail-type=[type]
 +
| --mail-type=ALL
 +
| Send email on all events
 +
|-
 +
| Email Notification II
 +
| --mail-user=[address]
 +
| --mail-user=howdy@tamu.edu
 +
| Send emails to howdy@tamu.edu
 +
|-
 +
| Specify Queue
 +
| -q [queue]
 +
| -q gpu
 +
| Request only nodes in gpu subset
 +
|-
 +
| Submit Test Job
 +
| --test-only
 +
|
 +
| Submit test job for Slurm validation
 +
|-
 +
| Request Temp Disk
 +
| --tmp=M
 +
| --tmp=10240
 +
| Request at least 10 GB in temp disk space
 +
|}
  
 
=== Using Other Job Options ===
 
=== Using Other Job Options ===
  
Slurm has facilities to make advanced resources requests and change settings that most Terra users do not need. If you wish to explore the advanced job options, see the [[Terra:Batch:Advanced Documentation | Advanced Documentation]].
+
Slurm has facilities to make advanced resources requests and change settings that most Terra users do not need. These options are beyond the scope of this guide.  
  
Several examples of Slurm job files for Terra are listed below. For translating Ada/LSF job files, the [[HPRC:Batch_Translation | Batch Job Translation Guide]] provides some reference.  
+
If you wish to explore the advanced job options, see the [[Terra:Batch:Advanced Documentation | Advanced Documentation]].
  
  
 
[[Category: Terra]]
 
[[Category: Terra]]

Revision as of 16:50, 26 October 2016

Building Job Files

While not the only method of submitted programs to be executed, job files fulfill the needs of most users.

The general idea behind job files follows:

  • Make resource requests
  • Add your commands and/or scripting
  • Submit the job to the batch system

Basic Job Specifications

Several of the most important options are described below. These basic options are typically all that is needed to run a job on Terra.

Basic Terra/Slurm Job Specifications
Specification Option Example Example-Purpose
Reset Env I --export=NONE Do not propagate environment to job
Reset Env II --get-user-env=L Replicate the login environment
Wall Clock Limit -t [hh:mm:ss] -t 01:15:00 Set wall clock limit to 1 hour 15 min
Job Name -J [SomeText] -J mpiJob Set the job name to "mpiJob"
Node Count -N [min[-max]] -N 4 Spread all tasks/cores across 4 nodes
Total Task/Core Count -n [#] -n 16 Request 16 tasks/cores total
Memory Per Node --mem=[K|M|G|T] --mem=32768M Request 32768 MB (32 GB) per node
Combined stdout/stderr -j oe [OutputName].%j -j oe mpiOut.%j Collect stdout/err in mpiOut.[JobID]

It should be noted that Slurm divides processing resources as such: Nodes -> Cores/CPUs -> Tasks

A user may change the number of tasks per core. For the purposes of this guide, each core will be associated with exactly a single task.

Optional Job Specifications

A variety of optional specifications are available to customize your job. The table below lists the specifications which are most useful for users of Terra.

Optional Terra/Slurm Job Specifications
Specification Option Example Example-Purpose
Set Allocation -A ###### -A 274839 Set allocation to charge to 274839
Email Notification I --mail-type=[type] --mail-type=ALL Send email on all events
Email Notification II --mail-user=[address] --mail-user=howdy@tamu.edu Send emails to howdy@tamu.edu
Specify Queue -q [queue] -q gpu Request only nodes in gpu subset
Submit Test Job --test-only Submit test job for Slurm validation
Request Temp Disk --tmp=M --tmp=10240 Request at least 10 GB in temp disk space

Alternative Memory/Core/Node Specifications

The job options within the above sections specify resources with the following method:

  • Cores and CPUs are equivalent
  • 1 Task per 1 CPU
  • You specify: desired number of Tasks (equals number of CPUs)
  • You specify: desired number of Nodes (equal or less than number of Tasks)
  • You get: CPUs per Node equal to #ofCPUs/#ofNodes
  • You specify: desired Memory per node

The behavior above is similar to the implementation previously used on Eos, with the exception of memory.

Slurm allows users to specify resources in units of Tasks, CPUs, Sockets, and Nodes. There are many overlapping settings and some settings may (quietly) overwrite the defaults of other settings. A good understanding of Slurm options is needed to correctly utilize these methods.

If you want to make resource requests in an alternative format, you are free to do so. Our ability to support alternative resource request formats may be limited.

Alternative Memory/Core/Node Specifications
Specification Option Example Example-Purpose
Set Allocation -A ###### -A 274839 Set allocation to charge to 274839
Email Notification I --mail-type=[type] --mail-type=ALL Send email on all events
Email Notification II --mail-user=[address] --mail-user=howdy@tamu.edu Send emails to howdy@tamu.edu
Specify Queue -q [queue] -q gpu Request only nodes in gpu subset
Submit Test Job --test-only Submit test job for Slurm validation
Request Temp Disk --tmp=M --tmp=10240 Request at least 10 GB in temp disk space

Using Other Job Options

Slurm has facilities to make advanced resources requests and change settings that most Terra users do not need. These options are beyond the scope of this guide.

If you wish to explore the advanced job options, see the Advanced Documentation.