Skip to content

Clara Parabricks

GCATemplates available: no

Clara Parabricks homepage

module spider parabricks

Clara Parabricks is a software suite that allows users to perform secondary analysis of next generation sequencing DNA and RNA data. The software incorporated into Clara Parabricks has been optimized to run on GPUs, drastically increasing the speed at which common NGS analyses and tasks can be completed.

Example 1: Complete germline analysis

#!/bin/bash  
#SBATCH --job-name=parabricks-germline      # set the job name to "parabricks-germline"  
#SBATCH --time=02:00:00                     # set the wall clock limit to 2 hours  
#SBATCH --ntasks-per-node=1                 # request one task per node  
#SBATCH --cpus-per-task=48                  # request 48 cpus per task  
#SBATCH --mem=360G                          # request 360G of memory  
#SBATCH --output=%x.%j.stdout               # set standard output to write to <jobname>.<jobID>.stdout  
#SBATCH --error=%x.%j.stderr                # set standard error to write to <jobname>.<jobID>.stderr  
#SBATCH --partition=gpu                     # request the gpu partition  
#SBATCH --gres=gpu:a100:2                   # request to A100 GPUs  

# environment setup  
module purge                                # ensure the working environment is clean 
module load parabricks/4.0.0                # load the parabricks module  

# run the parabricks germline command  
pbrun germline --ref /path/to/indexed/genome.fasta \  
   --in-fq /path/to/forward_reads.fq /path/to/reverse_reads.fq \  
   --knownSites /path/to/known/variants.vcf.gz \  
   --out-bam output.bam \ 
   --out-variants output.vcf \  
   --out-recal-file output.txt \   
   --num-gpus 2

Example 2: Aligning paired-end RNA-seq data to a genome

Index genomes before running the script with STAR 2.7.2b:

module spider STAR/2.7.2b

Then run the job script:

#!/bin/bash`  
#SBATCH --job-name=parabricks-rna      # set the job name to "parabricks-rna"  
#SBATCH --time=01:00:00                     # set the wall clock limit to 1 hour  
#SBATCH --ntasks-per-node=1                 # request one task per node  
#SBATCH --cpus-per-task=48                  # request 48 cpus per task  
#SBATCH --mem=360G                          # request 360G of memory  
#SBATCH --output=%x.%j.stdout               # set standard output to write to `<jobname>`.`<jobID>`.stdout  
#SBATCH --error=%x.%j.stderr                # set standard error to write to `<jobname>`.`<jobID>`.stderr  
#SBATCH --partition=gpu                     # request the gpu partition  
#SBATCH --gres=gpu:t4:2                     # request two T4 GPUs  

# environment setup  
module purge                                # ensure the working environment is clean  
module load parabricks/4.0.0                # load the parabricks module  

pbrun rna_fq2bam \  
   --ref /path/to/indexed/genome.fasta \  
   --genome-lib-dir /path/to/indexed/genome/directory \  
   --output-dir ./
   --in-fq  /path/to/forward_reads.fq /path/to/reverse_reads.fq \  
   --out-bam output.bam \  
   --num-gpus 2 \
   --out-prefix  

Example 3: Running fq2bam with Charliecloud and jobstats

Running with a Charliecloud container will allow you to control which version of parabricks you are running by specifying the image version you wish to use from NVIDIA.

Jobstats will allow you to monitor the CPU and GPU usage of the job. More info can be found by typing jobstats into the terminal when logged in to Grace or FASTER.

#!/bin/bash
#SBATCH --job-name=parabricks-fq2bam-charliecloud       # set the job name to "parabricks-fq2bam-charliecloud"
#SBATCH --time=02:00:00                             # set the wall clock limit to 2 hours
#SBATCH --ntasks-per-node=1                         # request one task per node
#SBATCH --cpus-per-task=48                          # request 48 cpus per task
#SBATCH --mem=240G                                  # request 240G of memory
#SBATCH --output=%x.%j.stdout                       # set standard output to write to <jobname>.<jobID>.stdout
#SBATCH --error=%x.%j.stderr                        # set standard error to write to <jobname>.<jobID>.stderr
#SBATCH --partition=gpu                             # request the gpu partition
#SBATCH --gres=gpu:t4:4                         # request 4 t4 GPUs

# environment setup
module purge                                        # ensure the working environment is clean
module load charliecloud/0.31               # load the charliecloud module
module load nvidia-container-cli/1.11.0-hprc        # load the nvidia-container-cli module to access necessary libraries
module load WebProxy                    # load the WebProxy module for internet access to download parabricks image


##### Building the Charliecloud Clara Parabricks Image #####################################
####
### These steps only need to be done to create the image.
### Once the parabricks SquashFS file is generated it can be run directly with charliecloud.

# Grab the image from NVIDA using charliecloud
ch-image pull nvcr.io/nvidia/clara/clara-parabricks:4.0.1-1 parabricks-4.0.1-1

# Convert the image to a directory stored in $TMPDIR
ch-convert parabricks-4.0.1-1 $TMPDIR/parabricks-4.0.1-1

# Inject the necessary NVIDIA libraries (needed to run charliecloud on GPUs)
ch-fromhost --nvidia $TMPDIR/parabricks-4.0.1-1

# Convert the image to a SquashFS file
ch-convert $TMPDIR/parabricks-4.0.1-1 parabricks-4.0.1-1.sqfs

############################################################################################# 


# Start jobstats to monitor GPU usage
jobstats &

# Run Parabricks using Charliecloud
# Reference genome should be in working directory or in subdirectory of working directory
echo $PWD
ch-run -b "$PWD:/mnt/1" -c "mnt/1" parabricks-4.0.1-1.sqfs pbrun fq2bam -- \
    --ref Homo_sapiens_assembly38.fasta \
    --in-fq sample_1.fastq.gz sample_2.fastq.gz \
    --out-bam test.bam \
    --num-gpus $SLURM_GPUS_ON_NODE

# End jobstats
jobstats
Back to top