Difference between revisions of "Terra:Batch Job Files"
Line 21: | Line 21: | ||
| Spread all cores across 4 nodes | | Spread all cores across 4 nodes | ||
|- | |- | ||
− | | | + | | Cores Per Node |
− | + | | -ntasks-per-node= | |
− | |- | ||
− | |||
− | |||
|- | |- | ||
| Compute Nodes: | | Compute Nodes: | ||
Line 51: | Line 48: | ||
| Fall 2016 (tentative) | | Fall 2016 (tentative) | ||
|} | |} | ||
+ | |||
+ | It should be noted that Slurm divides processing resources as such: Nodes -> Cores/CPUs -> Tasks | ||
+ | |||
+ | A user may change the number of tasks per core. For the purposes of this guide, all cores will be associated with exactly a single task. | ||
Several examples of Slurm job files for Terra are listed below. For translating Ada/LSF job files, the [[HPRC:Batch_Translation | Batch Job Translation Guide]] provides some reference. | Several examples of Slurm job files for Terra are listed below. For translating Ada/LSF job files, the [[HPRC:Batch_Translation | Batch Job Translation Guide]] provides some reference. | ||
Documentation for advanced options can be found under [[Terra:Batch:Advanced Documentation | Advanced Documentation]]. | Documentation for advanced options can be found under [[Terra:Batch:Advanced Documentation | Advanced Documentation]]. |
Revision as of 13:58, 25 October 2016
Job Files
While not the only method of submitted a job, job files fulfill the needs of most users.
The general idea behind job files follows:
- Make resource requests
- Add your commands and/or scripting
- Submit the job to the batch system
Several of the most important options are described below. These options are typically all that is needed to run a job on Terra.
Specification | Option | Example | Example-Purpose |
---|---|---|---|
Node Count | -N [min[-max]] | -N 4 | Spread all cores across 4 nodes |
Cores Per Node | -ntasks-per-node= | ||
Compute Nodes: | 256 compute nodes, each with 64GB RAM 48 GPU nodes, each with a single Tesla K80 GPU and 128GB of RAM | ||
Interconnect: | Intel OmniPath100 Series switches. | ||
Peak Performance: | ~X TFLOPs (TBD) | ||
Global Disk: | 1.5PB (raw) via IBM's GSS26 appliance for general use 1.5PB (raw) via IBM's GSS256 purchased by a dedicated for GeoPhysics | ||
File System: | General Parallel File System (GPFS) | ||
Batch Facility: | Slurm by SchedMD | ||
Location: | Teague Data Center | ||
Production Date: | Fall 2016 (tentative) |
It should be noted that Slurm divides processing resources as such: Nodes -> Cores/CPUs -> Tasks
A user may change the number of tasks per core. For the purposes of this guide, all cores will be associated with exactly a single task.
Several examples of Slurm job files for Terra are listed below. For translating Ada/LSF job files, the Batch Job Translation Guide provides some reference.
Documentation for advanced options can be found under Advanced Documentation.