Hprc banner tamu.png

TAMU Supercomputing Facility:HPRC:Batch Translation

From TAMU HPRC
Revision as of 16:39, 17 December 2020 by Ljordan56 (talk | contribs) (Overview)
Jump to: navigation, search

Batch Job Translation Guide

Overview

Over the last couple of years, TAMU HPRC has had a few different job managers. Our clusters in the recent years are listed below along with their job manager.

  • Ada: LSF
  • Terra: Slurm
  • Grace: Slurm
  • Curie (RETIRED): LSF
  • Eos (RETIRED): PBS/Torque
  • Neumann (RETIRED): Load Leveler

This page contains helpful translation tables for the purposes of moving jobs from one cluster to another. The job managers in the tables below are LSF, Slurm, and PBS/Torque, the ones used on our three main clusters: Ada, Terra, and Grace.

For more information on our clusters, please see their respective pages:

Translation Tables

User Commands

The table below contains commands that would be run from the login nodes of the cluster for job control and monitoring purposes.

User Commands LSF Slurm PBS/Torque
Job submission bsub [script_file] sbatch [script_file] qsub [script_file]
Job deletion bkill [job_id] scancel [job_id] qdel [job_id]
Job status (by job) bjobs [job_id] squeue --job [job_id] qstat [job_id]
Job status (by user) bjobs -u [user_name] squeue -u [user_name] qstat -u [user_name]
Queue list bqueues squeue qstat -Q

Environment Variables

The table below lists environment variables that can be useful inside job files.

Environment Variables LSF Slurm PBS/Torque
Job ID $LSB_JOBID $SLURM_JOBID $PBS_JOBID
Submit Directory $LSB_SUBCWD SLURM_SUBMIT_DIR $PBS_O_WORKDIR

Job Specifications

The table below lists various directives that set characteristics and resources requirements for jobs.

Job Specification LSF Slurm PBS/Torque
Script directive #BSUB #SBATCH #PBS
Node Count N/A
( Calculated from: CPUs/CPUs_per_node )
-N [min[-max]] -l nodes=[count]
CPUs Per Node -R "span[ptile=count]" --ntasks-per-node=[count] -l ppn=[count]
CPU Count -n [count] -n [count] N/A
( Calculated from: CPUs_per_node * Nodes )
Wall Clock Limit -W [hh:mm] -t [min] or -t[days-hh:mm:ss] -l walltime=[hh:mm:ss]
Memory Per Core -M [mm] AND
-R "rusage[mem=mm]"
* Where mm is in MB
--mem-per-cpu=mem[M/G/T] -l mem=[mm]
* Where mm is in MB
Standard Output File -o [file_name] -o [file_name] -o [file_name]
Standard Error File -e [file_name] -e [file_name] -e [file_name]
Combine stdout/err (use -o without -e) (use -o without -e) -j oe (both to stdout) OR -j eo (both to stderr)
Event Notification -B and/or -N
* For Begin and/or End
--mail-type=[ALL/END]
* See 'man sbatch' for all types
-m [a/b/e]
* For Abort, Begin, and/or End
Email Address -u [address] --mail-user=[address] -M [address]
Job Name -J [name] --job-name=[name] -N [name]
Account to charge -P [account] -account=[account] -l billto=[account]
Queue -q [queue] -q [queue] -p [queue]

Advanced Documentation

This guide only covers the most commonly used options and useful commands.

For more information on LSF batch options: Official LSF V9.1.2 Documentation

Or see easier-to-read LSF docs: LSF V9.1.2 Documentation

For more information on Slurm batch options: Official Slurm Documentation