Sequence Alignments

BLAST & BLAST+

module spider BLAST

or

module spider BLAST+

BLAT

GCATemplates available: no

module spider BLAT

BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.

pblat-cluster

GCATemplates available: no

module spider pblat

This program is useful when you blat a big query file to a huge reference like human whole genome sequence.

The program is based on the original blat program which was written by Jim Kent.

BLAT alignments for small genomes will most likely be faster using regular BLAT even for ~50,000 sequences.

Exonerate

GCATemplates available: no

Exonerate homepage

module spider Exonerate

Exonerate is a generic tool for pairwise sequence comparison.

mrfast

GCATemplates available: no

mrfast homepage

module spider mrfast

mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications.

mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp.

NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length.

For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.

DNA-seq

bwa

GCATemplates

Grace (pe)

bwa homepage

module spider BWA

Genome indexes for bwa can be found in the ucsc directory for each organism:

/scratch/data/bio/genome_indexes/ucsc/hg38/bwa/hg38.fa

Indexes created with bwa version 0.7.12 will work for bwa tool version 0.7.12+

Please contact the HPRC helpdesk help@sc.tamu.edu if you need additional genome indexes

GSNAP

GCATemplates available: no

GSNAP homepage

module spider GMAP-GSNAP

GSNAP: Genomic Short-read Nucleotide Alignment Program

SNAP

GCATemplates available: no

SNAP homepage

module spider SNAP

SNAP is a new sequence aligner that is 3-20x faster and just as accurate as existing tools like BWA-mem, Bowtie2 and Novoalign.

SpeedSeq

GCATemplates available: no

SpeedSeq homepage

module spider SpeedSeq

A flexible framework for rapid genome analysis and interpretation.

Use the following command to download your own copy of VEP builds (canis_familiaris_vep used as an example)

mkdir $SCRATCH/vep_cache
$EBROOTVEP/INSTALL.pl --AUTO c --CACHEDIR $SCRATCH/vep_cache --SPECIES canis_familiaris_vep

If you do not want to use your file and disk quota to download genome builds, there are some builds available at the following directory:

/scratch/data/bio/vep/vep_cache

Send a message to the HPRC helpdesk if you would like additional genome builds added to this shared location.

speedseq align uses a default of 20GB of memory for sambamba sorting. There are other concurrent processes other than the sambamba sort which also needs memory. It also helps to use the $TMPDIR variable in your job script as the temporary files directory: speedseq -T $TMPDIR

BFAST

GCATemplates available: no

BFAST homepage

module spider BFAST

BFAST facilitates the fast and accurate mapping of short reads to reference sequences.

Stampy

GCATemplates available: no

Stampy homepage

module spider Stampy

Stampy is a package for the mapping of short reads from illumina sequencing machines onto a reference genome.

It's recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq.

LAST

GCATemplates available: no

LAST homepage

module spider LAST

LAST finds similar regions between sequences. LAST copes more efficiently with repeat-rich sequences (e.g. genomes).

For example: it can align reads to genomes without repeat-masking, without becoming overwhelmed by repetitive hits.

RNA-seq

Bowtie & Bowtie2

GCATemplates

Bowtie homepage

module spider Bowtie

Genome indexes for bowtie can be found in the ucsc directory for each organism:

/scratch/data/bio/genome_indexes/ucsc/hg38/bowtie/hg38.fa

Bowtie2 homepage

module spider Bowtie2

Genome indexes for bowtie2 can be found in the ucsc directory for each organism:

/scratch/data/bio/genome_indexes/ucsc/hg38/bowtie2/hg38.fa

Bowtie indexes will not work with the Bowtie2 aligner.

Bowtie2 indexes will not work with the Bowtie aligner.

Please contact the HPRC helpdesk help@sc.tamu.edu if you need additional genome indexes

Tophat & Tophat2

GCATemplates available: no

TopHat homepage

module spider TopHat  
module spider TopHat2

TopHat is a fast splice junction mapper for RNA-Seq reads.

It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

Cufflinks

GCATemplates available: no

Cufflinks homepage

module spider Cufflinks

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression (cuffdiff) and regulation in RNA-Seq samples.

Cufflinks includes a program, “cuffdiff”, that you can use to find significant changes in transcript expression, splicing, and promoter use.

STAR

GCATemplates available: no

STAR homepage

module spider STAR

STAR is an ultrafast universal RNA-seq aligner

Optimizing STAR for PacBio long reads tutorial

GMAP

GCATemplates available: no

GMAP homepage

module spider GMAP-GSNAP

GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences.

HISAT2

GCATemplates available: no

HISAT2 homepage

module spider HISAT2

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).

If you will be using cufflinks downstream, run hisat2 with the --dta-cufflinks option

The hisat2 index for hg38 is found here:

/scratch/data/bio/genome_indexes/ucsc/hg38/hisat2/

Subread

GCATemplates available: no

Subread homepage

module spider Subread

Subread: an accurate and efficient aligner for mapping both genomic DNA-seq reads and RNA-seq reads (for the purpose of expression analysis).