Hprc banner tamu.png

Difference between revisions of "Terra:Batch Job Files"

From TAMU HPRC
Jump to: navigation, search
Line 26: Line 26:
 
| Replicate the login environment
 
| Replicate the login environment
 
|-
 
|-
Runtime Limit
+
Wall Clock Limit
 
| -t [hh:mm:ss]
 
| -t [hh:mm:ss]
 
| -t 01:15:00
 
| -t 01:15:00
| Set runtime limit to 1 hour 15 min
+
| Set wall clock limit to 1 hour 15 min
 
|-
 
|-
 
| Job Name
 
| Job Name
Line 47: Line 47:
 
|-
 
|-
 
| Memory Per Node
 
| Memory Per Node
| mem=[K|M|G|T]
+
| <nowiki>--mem=[K|M|G|T]</nowiki>
| mem=32768M
+
| --mem=32768M
 
| Request 32768 MB (32 GB) per node
 
| Request 32768 MB (32 GB) per node
 
|-
 
|-
| Interconnect:
+
| Combined stdout/stderr
| Intel OmniPath100 Series switches.
+
| -j oe [OutputName].%j
|
+
| -j oe mpiOut.%j
|
+
| Collect stdout/err in mpiOut.[JobID]
 
|-
 
|-
 
| Peak Performance:
 
| Peak Performance:

Revision as of 14:50, 25 October 2016

Job Files

While not the only method of submitted a job, job files fulfill the needs of most users.

The general idea behind job files follows:

  • Make resource requests
  • Add your commands and/or scripting
  • Submit the job to the batch system

Several of the most important options are described below. These options are typically all that is needed to run a job on Terra.

Specification Option Example Example-Purpose
Reset Env I --export=NONE Do not propagate environment to job
Reset Env II --get-user-env=L Replicate the login environment
Wall Clock Limit -t [hh:mm:ss] -t 01:15:00 Set wall clock limit to 1 hour 15 min
Job Name -J [SomeText] -J mpiJob Set the job name to "mpiJob"
Node Count -N [min[-max]] -N 4 Spread all tasks/cores across 4 nodes
Total Task/Core Count -n [#] -n 16 Request 16 tasks/cores total
Memory Per Node --mem=[K|M|G|T] --mem=32768M Request 32768 MB (32 GB) per node
Combined stdout/stderr -j oe [OutputName].%j -j oe mpiOut.%j Collect stdout/err in mpiOut.[JobID]
Peak Performance: ~X TFLOPs (TBD)
Global Disk: 1.5PB (raw) via IBM's GSS26 appliance for general use
1.5PB (raw) via IBM's GSS256 purchased by a dedicated for GeoPhysics
File System: General Parallel File System (GPFS)
Batch Facility: Slurm by SchedMD
Location: Teague Data Center
Production Date: Fall 2016 (tentative)

It should be noted that Slurm divides processing resources as such: Nodes -> Cores/CPUs -> Tasks

A user may change the number of tasks per core. For the purposes of this guide, all cores will be associated with exactly a single task.

Several examples of Slurm job files for Terra are listed below. For translating Ada/LSF job files, the Batch Job Translation Guide provides some reference.

Documentation for advanced options can be found under Advanced Documentation.