Skip to content

Job Scripts

Configuring

Should I use one core or multiple cores?

* Check with the software that you want to use and see if it supports multiple cores. If it supports multiple cores and there are no recommendations to use a max number of cores, then you should request all cores on a compute node and specify number of cores at the software command option.
* GeneMark has been reported to fail with >48 cores
* Trinity inchworm best with 6 cores: --inchworm_cpu 6
* If you request all the cores on a compute node, then it is good practice to also request all the available memory since your job will be the only job running on the compute node and you may end up needing all the memory.

How do I request all the cores and all the memory on a single compute node?

The parameters are different for nodes with different amounts of total available memory.

Use the maxconfig command to see the maximum configuration for partitions on the cluster and to show SU calculations.

maxconfig -h

ACES

For an ACES 512 GB memory compute node

Note: since ACES is a composable cluster, the following maxconfig outputs will be different as resources are recomposed

maxconfig

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --time=7-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=96
#SBATCH --mem=488G
#SBATCH --output=stdout.%x.%j
#SBATCH --error=stderr.%x.%j
For an ACES H100 compute node

maxconfig -g h100

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --time=2-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=96
#SBATCH --mem=488G
#SBATCH --partition=gpu
#SBATCH --gres=gpu:h100:2
#SBATCH --output=stdout.%x.%j
#SBATCH --error=stderr.%x.%j

FASTER

For FASTER a 256 GB memory compute node

Note: since FASTER is a composable cluster, the following maxconfig outputs will be different as resources are recomposed

maxconfig

#!/bin/bash
#SBATCH --job-name=my_job           # job name
#SBATCH --time=7-00:00:00           # max job run time dd-hh:mm:ss
#SBATCH --ntasks-per-node=1         # tasks (commands) per compute node
#SBATCH --cpus-per-task=64          # CPUs (threads) per command
#SBATCH --mem=240G                  # total memory per node
#SBATCH --output=stdout.%x.%j       # save stdout to file
#SBATCH --error=stderr.%x.%j        # save stderr to file
For a FASTER A100 GPU 256 GB memory GPU compute node

maxconfig -g a100

#!/bin/bash
#SBATCH --job-name=my_job           # job name
#SBATCH --time=7-00:00:00           # max job run time dd-hh:mm:ss
#SBATCH --ntasks-per-node=1         # tasks (commands) per compute node
#SBATCH --cpus-per-task=64          # CPUs (threads) per command
#SBATCH --mem=250G                  # total memory per node
#SBATCH --partition=gpu             # request gpu partition (queue)
#SBATCH --gres=gpu:a100:4           # request 1 GPU
#SBATCH --output=stdout.%x.%j       # save stdout to file
#SBATCH --error=stderr.%x.%j        # save stderr to file

Grace

For a 384 GB memory Grace compute node

Up to max 7 days runtime

maxconfig

#!/bin/bash
#SBATCH --job-name=my_job           # job name
#SBATCH --time=7-00:00:00           # max job run time dd-hh:mm:ss
#SBATCH --ntasks-per-node=1         # tasks (commands) per compute node
#SBATCH --cpus-per-task=48          # CPUs (threads) per command
#SBATCH --mem=360G                  # total memory per node
#SBATCH --output=stdout.%x.%j       # save stdout to file
#SBATCH --error=stderr.%x.%j        # save stderr to file

Up to max 21 days runtime

maxconfig -p xlong

#!/bin/bash
#SBATCH --job-name=my_job           # job name
#SBATCH --time=21-00:00:00          # max job run time dd-hh:mm:ss
#SBATCH --partition=xlong           # partition used for jobs 7 to 21 days
#SBATCH --ntasks-per-node=1         # tasks (commands) per compute node`
#SBATCH --cpus-per-task=48          # CPUs (threads) per command
#SBATCH --mem=360G                  # total memory per node
#SBATCH --output=stdout.%x.%j       # save stdout to file
#SBATCH --error=stderr.%x.%j        # save stderr to file
For a 3 TB memory Grace compute node

maxconfig -p bigmem

#!/bin/bash
#SBATCH --job-name=my_job           # job name
#SBATCH --time=2-00:00:00           # max job run time dd-hh:mm:ss
#SBATCH --partition=bigmem          # use large (3TB) memory node
#SBATCH --ntasks-per-node=1         # tasks (commands) per compute node
#SBATCH --cpus-per-task=80          # CPUs (threads) per command
#SBATCH --mem=2929G                 # total memory per node
#SBATCH --output=stdout.%x.%j       # save stdout to file
#SBATCH --error=stderr.%x.%j        # save stderr to file
For a A100 GPU 384 GB memory Grace compute node

Request 1 GPU

maxconfig -g a100 -G 1

#!/bin/bash
#SBATCH --job-name=my_job           # job name
#SBATCH --time=4-00:00:00           # max job run time dd-hh:mm:ss
#SBATCH --ntasks-per-node=1         # tasks (commands) per compute node
#SBATCH --cpus-per-task=24          # CPUs (threads) per command
#SBATCH --mem=180G                  # total memory per node
#SBATCH --gres=gpu:a100:1           # request 1 GPU
#SBATCH --output=stdout.%x.%j       # save stdout to file
#SBATCH --error=stderr.%x.%j        # save stderr to file

Request 2 GPUs

maxconfig -g a100

#!/bin/bash
#SBATCH --job-name=my_job           # job name
#SBATCH --time=4-00:00:00           # max job run time dd-hh:mm:ss
#SBATCH --ntasks-per-node=1         # tasks (commands) per compute node
#SBATCH --cpus-per-task=48          # CPUs (threads) per command
#SBATCH --mem=360G                  # total memory per node
#SBATCH --gres=gpu:a100:2           # request 2 GPUs
#SBATCH --output=stdout.%x.%j       # save stdout to file
#SBATCH --error=stderr.%x.%j        # save stderr to file

Terra

For a 64 GB memory Terra compute node

maxconfig

#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --time=7-00:00:00
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=28
#SBATCH --mem=56G
#SBATCH --output=stdout.%x.%j
#SBATCH --error=stderr.%x.%j
For a 96 GB memory KNL Terra compute node

Note that although the KNL nodes have more cores, they are slower than the 64GB memory nodes.

maxconfig -p knl

#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --time=7-00:00:00
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=72
#SBATCH --mem=85G
#SBATCH --partition=knl
#SBATCH --output=stdout.%x.%j
#SBATCH --error=stderr.%x.%j
For a 128 GB memory Terra compute node

maxconfig -p vnc

#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --time=12:00:00
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=28
#SBATCH --mem=112G
#SBATCH --partition=vnc
#SBATCH --output=stdout.%x.%j
#SBATCH --error=stderr.%x.%j

How do I find compatible software modules?

Grace Example 1

1a. start with a new terminal session or run module purge

module purge

1b. search for a software module for stacks

module spider stacks

1c. you will see that there are multiple versions

Versions:
Stacks/1.48
Stacks/2.3e
Stacks/2.41
Stacks/2.53
Stacks/2.55

1d. run module spider on the latest version

module spider Stacks/2.55`

1e. you will see that you have to load other modules prior to loading Stacks/2.55 You will need to load all module(s) on any one of the lines below before the "Stacks/2.55" module is available to load.

GCC/9.3.0  OpenMPI/4.0.3

1f. load the two prerequisite modules then the Stacks/2.55 module

module load GCC/9.3.0 OpenMPI/4.0.3 
module load Stacks/2.55

1g. now search for available compatible samtools modules that are compatible with the Stacks/2.55 module you just loaded`

module avail samtools

1h. you will notice that there is one version available since you already loaded the prerequisite modules`

----- /sw/eb/mods/all/Compiler/GCC/9.3.0 -----
SAMtools/1.10

1i. load the compatible samtools module

module load SAMtools/1.10

Grace Example 2

2a. first start with a new terminal session or run module purge

module purge

2b. search for a software module for minimap2

module spider minimap2

2c. you will see that there are multiple versions

Versions:
minimap2/2.13
minimap2/2.17
minimap2/2.18
minimap2/2.23

2d. select the version you want to use and run module spider

module spider minimap2/2.23

2e. you will see that you have to load another module prior to loading minimap2/2.23

You will need to load all module(s) on any one of the lines below before the "minimap2/2.23" module is available to load.
GCCcore/10.3.0

2f. load the GCCcore/10.3.0 module then the minimap2/2.23 module

module load GCCcore/10.3.0 minimap2/2.23

2g. now search for available samtools modules that are compatible with the minimap2/2.23 module you just loaded`

module avail samtools

2h. you will notice that there are not any compatible samtools modules

No module(s) or extension(s) found!

2i. load the parent module of GCCcore/10.3.0 which is GCC/10.3.0

module load GCC/10.3.0

2j. now search again for available compatible samtools modules

module avail samtools

2k. now there is a compatible samtools module available

----- /sw/eb/mods/all/Compiler/GCC/10.3.0 -----
SAMtools/1.12

2l. load the compatible samtools module

module load SAMtools/1.12

Grace Example 3

3a. first start with a new terminal session or run module purge

module purge

3b. search for a software module for stacks

module spider samtools

3c. you will see that there are multiple versions

Versions:
SAMtools/0.1.16
SAMtools/0.1.20
SAMtools/1.9
SAMtools/1.10
SAMtools/1.11
SAMtools/1.12

3d. select the version you want to use and run module spider

module spider SAMtools/1.12

3e. you will see that you have to load another module prior to loading SAMtools/1.12

You will need to load all module(s) on any one of the lines below before the "SAMtools/1.12" module is available to load.

GCC/10.3.0

3f. load the GCC/10.3.0 module then the SAMtools/1.12 module

module load GCC/10.3.0 SAMtools/1.12

3g. now search for available stacks modules that are compatible with the SAMtools/1.12 module you just loaded

module avail stacks

3h. you will notice that there are no compatible Stacks modules

No module(s) or extension(s) found!

3i. run module spider to see what Stacks modules are on the cluster

module spider stacks

3j. you will see that there are multiple versions

Versions:
Stacks/1.48
Stacks/2.3e
Stacks/2.41
Stacks/2.53
Stacks/2.55

3k. see what needs to be loaded to use the latest version of Stacks

module spider Stacks/2.55

3l. you will see that you have to load a different module (9.3.0 instead of 10.3.0) prior to loading Stacks/2.55 You will need to load all module(s) on any one of the lines below before the "Stacks/2.55" module is available to load.

GCC/9.3.0  OpenMPI/4.0.3

3m. see what modules have already been loaded

module list

3n. you will see what dependency modules have been automatically loaded in addition to the two you previously loaded

Currently Loaded Modules:
1) GCCcore/10.3.0   3) binutils/2.36.1   5) ncurses/6.2   7) XZ/5.2.5      9) cURL/7.76.0
2) zlib/1.2.11      4) GCC/10.3.0        6) bzip2/1.0.8   8) OpenSSL/1.1  10) SAMtools/1.12

3o. notice that we have **GCCcore**/10.3.0 loaded but not GCC/9.3.0. If Stacks/2.55 required GCC/10.3.0 we could load GCC/10.3.0 and OpenMPI/4.1.1 and do module avail again but since it is 9.3.0 you have a few options

# 1. look to see if a newer version of Stacks is available and request the latest version installed to match the Samtools GCCcore/10.3.0 toolchain
# 2. start by loading the Stacks module then repeat steps necessary for finding compatible Samtools modules.

module purge
module load  GCC/9.3.0  OpenMPI/4.0.3  Stacks/2.55
module avail samtools

3p. now there is a compatible samtools module available which can be loaded with Stacks/2.55

----- /sw/eb/mods/all/Compiler/GCC/9.3.0 -----
SAMtools/1.10

module load SAMtools/1.10

Should I use one compute node or multiple compute nodes?

This is highly dependent on the software. Most bioinformatics software will only require one node.
Example of times when you could use multiple compute nodes:
* The software has MPI support for running a command across multiple nodes (ABySS)
* You have hundreds or thousands of individual commands to run. (TAMULauncher/)
* You have multiple samples to run and can run one sample per compute node in ajob arrayusing all cores and memory on each compute node.

Are the Terra KNL nodes as fast as the 64GB nodes?

* Even though there are 72 cores on the KNL nodes, the speed of each core is 1.5GHz while the 64GB nodes are at 2.4GHz.
* A test alignment of human sequences to GRCh38 took 5 days on a 64GB node using all 28 cores. The same job script did not complete before the KNL queue limit of 7 days using all 72 cores.

Do I use a Job Array or TAMULauncher

TAMULauncher is good for commands that use one or a few cores. It is also good if your job ends due to reaching the walltime limit because TAMULauncher can start from the last uncompleted commands where you will have to reconfigure a job array as to not redo already completed commands. In most cases, TAMULauncher is recommended over a job array.

TAMULauncher can run one command per CPU core while a job array can only run one command per compute node.

If each command uses an entire node, you can configure your job as a job array.

If your commands use any redirection operators such as \< > | \<\< >> then use TAMULauncher instead of an job array.

The commands.txt file you generate is the same for a TAMULauncher job and a job array.

How do I configure and run a TAMULauncher Job?

Here is an example script for TAMULauncher and a required commands file. You need a file that has one command per line. Each line may have multiple commands joined with a semicolon.

blastn -query chunk0000.fa -db /scratch/datasets/blast/nt.bacteria -task megablast -out chunk0000.fa.out -outfmt 6
blastn -query chunk0001.fa -db /scratch/datasets/blast/nt.bacteria -task megablast -out chunk0001.fa.out -outfmt 6
blastn -query chunk0002.fa -db /scratch/datasets/blast/nt.bacteria -task megablast -out chunk0002.fa.out -outfmt 6
blastn -query chunk0003.fa -db /scratch/datasets/blast/nt.bacteria -task megablast -out chunk0003.fa.out -outfmt 6

Here is the example job script for Grace. The default for tamulauncher is to run one command per CPU core but you can use more cores per process. In the following example, there are 5 nodes total each having 12 commands (tasks) with each command using 4 cores each.

#!/bin/bash
#SBATCH --job-name=blast           # job name
#SBATCH --time=7-00:00:00          # max job run time
#SBATCH --nodes=5                  # use 5 nodes max
#SBATCH --ntasks-per-node=12       # tasks (commands) per compute node
#SBATCH --cpus-per-task=4          # CPUs (threads) per command
#SBATCH --mem=360G                 # total memory
#SBATCH --output=stdout.%x.%j      # save stdout to file
#SBATCH --error=stderr.%x.%j       # save stderr to file

module load GCC/11.3.0  OpenMPI/4.1.4 BLAST+/2.13.0

tamulauncher commands.txt

Here is an example of using multiple commands on a single line.

  • In this example, each line will be run on one entire compute node using 20 cores per node.
  • You can specify more cores per process using the --commands-pernode option.
  • Only one of many lines is showed for better viewing but you would generally have many commands in the commands.txt file.
  • Each line in the example runs multiple commands per sample which are joined with a semicolon.
  • Notice with bwa and tamulauncher you need to use \\t instead of \t for the read group value.
  • Also notice the use of the $TMPDIR for temporary files and in this example, the bam_files directory needs to be created first.
bwa mem -R "@RG\\tID:ERR551981\\tSM:282set_CML220\\tLB:CML220_HADTMADXX\\tPL:ILLUMINA" -t 40 m_tuberculosis_uid185758.fna ERR551981_pe_1_trimmo.fastq.gz ERR551981_pe_2_trimmo.fastq.gz > $TMPDIR/ERR551981.sam; samtools sort -n -O bam -T $TMPDIR/nsorted $TMPDIR/ERR551981.sam -o $TMPDIR/nsorted_ERR551981.bam ; samtools fixmate -O bam $TMPDIR/nsorted_ERR551981.bam $TMPDIR/ncleaned_ERR551981.bam; samtools sort -O bam -T $TMPDIR/csorted $TMPDIR/ncleaned_ERR551981.bam -o bam_files/coord_sortedERR551981_1.bam

#!/bin/bash
#SBATCH --job-name=bwa             # job name
#SBATCH --time=7-00:00:00          # max job run time
#SBATCH --nodes=10                 # use 10 nodes max
#SBATCH --ntasks-per-node=1        # tasks (commands) per compute node
#SBATCH --cpus-per-task=48         # CPUs (threads) per command
#SBATCH --mem=360G                 # total memory
#SBATCH --output=stdout.%x.%j      # save stdout to file
#SBATCH --error=stderr.%x.%j       # save stderr to file

module load GCC/11.3.0  OpenMPI/4.1.4 BLAST+/2.13.0

tamulauncher --commands-pernode 1 commands.txt

How do I configure and run a Job Array?

Terra job array

Here is an example Slurm job script of running a job array using a maximum of 10 compute nodes when you have a commands.txt file that contains 10 or more lines where each line is a command or commands joined with semicolons and each command uses all CPU cores on a compute node.

The sed command uses the $SLURM_ARRAY_TASK_ID to select the command on a line number of the commands.txt file

#!/bin/bash
#SBATCH --job-name=spades_job_array # job name
#SBATCH --time=1:00:00              # set the wall clock limit to 1 hour
#SBATCH --ntasks-per-node=1         # request 1 task (command) per node
#SBATCH --cpus-per-task=28          # request 28 cpus (cores) per task
#SBATCH --mem=56G                   # request 56GB of memory per node
#SBATCH --array=1-10                # request a job array with a max of 10 compute nodes
#SBATCH --output=stdout.%A_%a       # create file for stdout
#SBATCH --error=stderr.%A_%a        # create file for stderr

module load SPAdes/3.13.0-Linux

# run one of the 10 commands on each job array node
command=$(sed -n ${SLURM_ARRAY_TASK_ID}p commands.txt)
$command

Create a file of commands for tamulauncher or parallel

If you have hundreds of commands to run and each requires paired end files then you can use bash commands to create a file of commands to use with tamulauncher or parallel.

The main advantages of tamulauncher over parallel is that if your job runs out of walltime, you can restart from after the last completed command and you can check the status of the number of commands completed.

Outside the reads directory

If you have many files with the following names in a directory other than your working directory:

reads/SAMPLE_1_R1.fastq.gz
reads/SAMPLE_1_R2.fastq.gz
reads/SAMPLE_2_R1.fastq.gz
reads/SAMPLE_2_R2.fastq.gz
reads/SAMPLE_3_R1.fastq.gz
reads/SAMPLE_3_R2.fastq.gz
reads/SAMPLE_4_R1.fastq.gz
reads/SAMPLE_4_R2.fastq.gz

Start by selecting only R1 files in order to get the sample names.

Type in the following lines and study the results in order to see how to use the basename command and trim off parts of files in order to get sample names from the file names:

file='reads/SAMPLE_1_R1.fastq.gz'
echo ${file/_R1.fastq.gz}                 # removes _R1.fastq.gz from the end of the value in the $file variable
echo ${file/R1/R2}                        # replaces R1 with R2 in the $file variable
echo $(basename ${file/_R1.fastq.gz})     # removes directory path and removes _R1.fastq.gz from the end of the $file variable

You can capture each sample name in the directory from the first paired file to create your commands to use for spades.py, trimmomatic or other command.

for sample in reads/*R1.fastq.gz; do echo $(basename ${sample/_R1.fastq.gz}); done

Then use the sample names to create a commands file

for sample in reads/*R1.fastq.gz; do echo spades.py --threads 20 --tmp-dir \$TMPDIR --careful --memory 54 -o $(basename ${sample/_R1.fastq.gz}) --pe1-1 $sample --pe1-2 ${sample/R1/R2}; done > spades_commands.txt

(Notice how using \$TMPDIR in the for loop will print $TMPDIR to the output file)

The resulting spades_command.txt file will look like this:

spades.py --threads 20 --tmp-dir $TMPDIR --careful --memory 54 -o SAMPLE_1 --pe1-1 reads/SAMPLE_1_R1.fastq.gz --pe1-2 reads/SAMPLE_1_R2.fastq.gz
spades.py --threads 20 --tmp-dir $TMPDIR --careful --memory 54 -o SAMPLE_2 --pe1-1 reads/SAMPLE_2_R1.fastq.gz --pe1-2 reads/SAMPLE_2_R2.fastq.gz
spades.py --threads 20 --tmp-dir $TMPDIR --careful --memory 54 -o SAMPLE_3 --pe1-1 reads/SAMPLE_3_R1.fastq.gz --pe1-2 reads/SAMPLE_3_R2.fastq.gz
spades.py --threads 20 --tmp-dir $TMPDIR --careful --memory 54 -o SAMPLE_4 --pe1-1 reads/SAMPLE_4_R1.fastq.gz --pe1-2 reads/SAMPLE_4_R2.fastq.gz

Since each spades command can use 20 cores, you can run these commands as an array job instead of using TAMULauncher.

Within the reads directory

This example uses a different approach for parsing file names to capture a sample name where the part of the file name to remove is different for each sample. If you have many files with the following names in your current working directory, for example, and you only want to use the paired end reads (those with 1P and 2P)

A45CEB2_S1_1P.fastq.gz
A45CEB2_S1_1U.fastq.gz
A45CEB2_S1_2P.fastq.gz
A45CEB2_S1_2U.fastq.gz
A49HEB1_S5_1P.fastq.gz
A49HEB1_S5_1U.fastq.gz
A49HEB1_S5_2P.fastq.gz
A49HEB1_S5_2U.fastq.gz

Then you can capture the sample names from the first paired file and create your command such as spades.py, trimmomatic or other command.

Here is a summary on using #, ##, %, %%, * to get a substring of a bash variable

${var#*SubStr}     # will drop beginning of string up to first occurrence of 'SubStr'
${var##*SubStr}    # will drop beginning of string up to last occurrence of 'SubStr'
${var%SubStr*}     # will drop part of string from last occurrence of 'SubStr' to the end
${var%%SubStr*}    # will drop part of string from first occurrence of 'SubStr' to the end

Run the following UNIX commands in the same directory as the fastq.gz files for this example.

Steps 1 and 2 are trial and error until you get the part of the filename that is the sample name.

1. get the names of pair one reads in order to obtain the sample name

for sample in *1P.fastq.gz; do echo $sample; done

OUTPUT:
A45CEB2_S1_1P.fastq.gz
A49HEB1_S5_1P.fastq.gz

2. chop off the parts of the file name in order to get the sample name

for sample in *1P.fastq.gz; do echo ${sample%_*}; done

OUTPUT:
A45CEB2_S1
A49HEB1_S5

3. test printing out the two sample names

for sample in *1P.fastq.gz; do echo "${sample%_*}_1P.fastq.gz ${sample%_*}_2P.fastq.gz"; done

OUTPUT:
A45CEB2_S1_1P.fastq.gz A45CEB2_S1_2P.fastq.gz
A49HEB1_S5_1P.fastq.gz A49HEB1_S5_2P.fastq.gz

4. create the entire spades command for each sample using what you did in step 3

for sample in *1P.fastq.gz; do echo "spades.py --careful --memory 24 --threads 1 -1 \$SCRATCH/data/${sample%_*}_1P.fastq.gz -2 \$SCRATCH/data/${sample%_*}_2P.fastq.gz -o spades_${sample%_*}"; done > ../spades_commands.txt

The resulting spades_command.txt file will look like this:

spades.py --careful --memory 24 --threads 1 -1 $SCRATCH/data/A45CEB2_S1_1P.fastq.gz -2 $SCRATCH/data/A45CEB2_S1_2P.fastq.gz -o spades_A45CEB2_S1
spades.py --careful --memory 24 --threads 1 -1 $SCRATCH/data/A49HEB1_S5_1P.fastq.gz -2 $SCRATCH/data/A49HEB1_S5_2P.fastq.gz -o spades_A49HEB1_S5

Are there any example job scripts for bioinformatics tools?

We have many example job scripts available for various bioinformatics tasks in theGCATemplatesmodule

How much walltime (#SBATCH --time) should I use?

If you are unsure about how much walltime to specify, you should specify the maximum allowed by the batch queues. Then when the job is complete, you can see how long it took to finish a job with your data using the 'seff jobid' command and you can adjust similar scripts accordingly. If a job that you had reserved 7 days (#SBATCH --time=7-00:00:00) finishes in 2 days then you can schedule similar jobs for 3 days just to allow more time if needed.

How much memory did my job use and how long did it run?

You can use the jobstats command to create graphs of job resource usage for CPUs % load, memory, GPU % load, GPU memory used.

Example usage in a job script:

jobstats &

my_job_command
more_job_commands

jobstats

You can also run the 'seff jobid' command and check memory efficiency and memory utilized

seff 3402830

Job ID: 3402830
Cluster: terra
User/Group: netid/netid
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 28
CPU Utilized: 05:15:41
CPU Efficiency: 10.91% of 2-00:12:52 core-walltime
Job Wall-clock time: 01:43:19
Memory Utilized: 6.77 GB
Memory Efficiency: 12.53% of 54.00 GB

You can also see the total runtime and SUs charged per job with the myproject command:

myproject -j all

Running

Monitor resource usage for a running job

Use the jobstats job monitoring tool to see CPU usage, GPU usage and $TMPDIR I/O stats.

R code

Use the RScript command to run R code (mycommands.R in the following example) in a job script. The job script and your R commands file should be in the same directory.

Example usage:

module load GCC/11.2.0  OpenMPI/4.1.1 R_tamu/4.2.0

Rscript mycommands.R

Debugging

How do I know if my job completed successfully?

Successfully completed

Look for the line in the 'seff jobid' command output that says 'State: COMPLETED (exit code 0)'. However, this just means that the job script finished without any errors but you should always check the format of any output files.

Insufficent walltime

In the following example, the job ran out of walltime and used 32% of the requested 180GB of memory:

Job ID: 1234567
Cluster: grace
User/Group: netid/netid
State: TIMEOUT (exit code 0)
Nodes: 1
Cores per node: 24
CPU Utilized: 05:56:02
CPU Efficiency: 2.06% of 12-00:09:12 core-walltime
Job Wall-clock time: 12:00:23
Memory Utilized: 57.85 GB
Memory Efficiency: 32.14% of 180.00 GB

Insufficent memory

If your job failed and you see seff output like the following, then increase the amount of requested memory in the #SBATCH parameters and run the job again.

Memory Utilized: 179.85 GB
Memory Efficiency: 99.91% of 180.00 GB

system terminated job

If you see the following line, then you may have reached your disk or file quota.

The cluster DRM system terminated this job

Run the showquota command to see your quotas and send a request to the HPRC helpdesk if you need to increase your quotas.

showquota

Remember to delete any unwanted or temporary files or tar.gz a project when complete in order to free up disk space and file counts.

Interactive Jobs

srun

You can run an interactive job which will run on a compute node. You can do this from the command line on a login node. The job will end once you exit or close the terminal or when the walltime ends.

Use srun to start an interactive job. Once the job starts, the prompt will change to the compute node name like c697

srun --time=1-00:00:00 --mem=7G --ntasks=1 --cpus-per-task=1 --pty bash

Increase the mem and cpus-per-task values as needed.

VNC

The VNC portal app will run an interactive job that you can log out and come back later to contine where you left off.

See the HPRC video on how to launch a VNC job.

Singularity sandbox

You can build a sandbox directory from a .sif file. The sandbox directory will allow you to install/rebuild software that doesn't require root privileges. Create the sandbox somewhere in your $SCRATCH directory where you have at least 60,000 files available in your scratch file quota.

1). start a Slurm interactive session which will create a job and launch a terminal on a compute node

srun --time=1-00:00:00 --mem=60G --ntasks=1 --cpus-per-task=8 --pty bash

2). during the interactive session on the compute node, create a sandbox directory from a .sif file.

cd $SCRATCH
singularity build --sandbox my_gem5_sandbox_dir /sw/hprc/sw/containers/gem5-v21.2.1.0-04292022.sif

3). start a singularity interactive shell in order to edit files and rebuild software

singularity shell --writable my_gem5_sandbox_dir

Optional: you can mount a directory in your $SCRATCH to a directory inside the image using the -B option

singularity shell -B $SCRATCH/my_gem5_build:/mnt --writable my_gem5_sandbox_dir

4). once at the Singularity> prompt, go to the directory where the software is installed

Singularity>cd /opt/gem5

5). edit files using the vi editor

Singularity>vi src/SConscript

6). rebuild the software from the software directory (/opt/gem5) using the number of cores you used when launching the job

Singularity> scons build/X86/gem5.opt -j 8

7). exit the singularity prompt to get back to the compute node command line

Singularity>exit

8) build your new singularity image

singularity build my_gem5.sif my_gem5_sandbox_dir

You can add the date to the singularity .sif file name

singularity build my_gem5-$(date +%m%d%Y).sif my_gem5_sandbox_dir

9). exit the Slurm interactive job to return to the login node

exit

Bioinformatics Software

Is this a good bio software to use?

reliability

Software is not 100% reliable just because it is featured in a publication and has a github.com repository.

Review the software github or webpage to see when the software was last updated. If it hasn't been updated in 5 years, it probably has not kept up with advances in genome sequencing and you should explore other options.

Always review the output files for correct format including making sure headers match the correct columns. Even after software has undergone numerous bug fixes and years of support, you should still review output files for correctness since features are continually added which can introduce new bugs with existing features.

Review publications that have used the software in their analysis.

licensing

Check the license to see if it is a commercial software or open source.

See a list of academic license only tools on HPRC.

support

Check google groups for software support groups to see how active the bioinformatics community is with the software.

Check the github page issues tab to see how responsive the developers are to user requests.

If you find a bug or suspect a bug, review it with your colleagues if possible to help prevent an oversight on your part and submit a bug report if it persists.

Developing bio software

In addition to gnu coding standards, useful guidelines to follow when developing bio software for the sake of those who will install and use your software.

Some of these are headaches for users and others are headaches for installers:

  1. Provide an option to show help info (-h and --help) and version info (--version).

  2. Allow user to specify a temp directory if many temporary or intermediate files are generated

  3. Allow the user to specify a config file as an option if a config file is required. User can copy the main config file then use their copy in a command option.

  4. Provide a small sample input data set with expected output files to test complex installations

  5. Provide an option to specify file prefix name to output file instead of just writing to stdout

  6. Use - for single character option and -- for complete word option such as -h and --help. headache example: blasr starting with -option and changing to --option in their newer versions causing software tools that used a blasr dependency to fail

  7. Spend a little extra time developing your code to allow compressed files as input

  8. Provide released versions of your code instead of relying on latest main git commit for installation or using a generic name without a version. This helps when installing and updating software with automated install tools.

  9. Set file permissions as executable by everyone for dependencies, precompiled or scripts or let user know that this needs to be done to complete the installation.

Back to top