Terra:Batch Job Files
While not the only method of submitted a job, job files fulfill the needs of most users.
The general idea behind job files follows:
- Make resource requests
- Add your commands and/or scripting
- Submit the job to the batch system
Several of the most important options are described below. These options are typically all that is needed to run a job on Terra.
|Reset Env I||--export=NONE||Do not propagate environment to job|
|Reset Env II||--get-user-env=L||Replicate the login environment|
|Wall Clock Limit||-t [hh:mm:ss]||-t 01:15:00||Set wall clock limit to 1 hour 15 min|
|Job Name||-J [SomeText]||-J mpiJob||Set the job name to "mpiJob"|
|Node Count||-N [min[-max]]||-N 4||Spread all tasks/cores across 4 nodes|
|Total Task/Core Count||-n [#]||-n 16||Request 16 tasks/cores total|
|Memory Per Node||--mem=[K|M|G|T]||--mem=32768M||Request 32768 MB (32 GB) per node|
|Combined stdout/stderr||-j oe [OutputName].%j||-j oe mpiOut.%j||Collect stdout/err in mpiOut.[JobID]|
|Peak Performance:||~X TFLOPs (TBD)|
|Global Disk:||1.5PB (raw) via IBM's GSS26 appliance for general use |
1.5PB (raw) via IBM's GSS256 purchased by a dedicated for GeoPhysics
|File System:||General Parallel File System (GPFS)|
|Batch Facility:||Slurm by SchedMD|
|Location:||Teague Data Center|
|Production Date:||Fall 2016 (tentative)|
It should be noted that Slurm divides processing resources as such: Nodes -> Cores/CPUs -> Tasks
A user may change the number of tasks per core. For the purposes of this guide, all cores will be associated with exactly a single task.
Several examples of Slurm job files for Terra are listed below. For translating Ada/LSF job files, the Batch Job Translation Guide provides some reference.
Documentation for advanced options can be found under Advanced Documentation.