Sequence Alignments
BLAST & BLAST+
GCATemplates available: no
or
BLAT
GCATemplates available: no
BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.
pblat-cluster
GCATemplates available: no
This program is useful when you blat a big query file to a huge reference like human whole genome sequence.
The program is based on the original blat program which was written by Jim Kent.
BLAT alignments for small genomes will most likely be faster using regular BLAT even for ~50,000 sequences.
Exonerate
GCATemplates available: no
Exonerate homepage
Exonerate is a generic tool for pairwise sequence comparison.
mrfast
GCATemplates available: no
mrfast homepage
mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications.
mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp.
NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length.
For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.
DNA-seq
bwa
bwa homepage
Genome indexes for bwa can be found in the ucsc directory for each organism:
Indexes created with bwa version 0.7.12 will work for bwa tool version 0.7.12+
Please contact the HPRC helpdesk help@sc.tamu.edu if you need additional genome indexes
GSNAP
GCATemplates available: no
GSNAP homepage
GSNAP: Genomic Short-read Nucleotide Alignment Program
SNAP
GCATemplates available: no
SNAP homepage
SNAP is a new sequence aligner that is 3-20x faster and just as accurate as existing tools like BWA-mem, Bowtie2 and Novoalign.
SpeedSeq
GCATemplates available: no
SpeedSeq homepage
A flexible framework for rapid genome analysis and interpretation.
Use the following command to download your own copy of VEP builds (canis_familiaris_vep used as an example)
mkdir $SCRATCH/vep_cache
$EBROOTVEP/INSTALL.pl --AUTO c --CACHEDIR $SCRATCH/vep_cache --SPECIES canis_familiaris_vep
If you do not want to use your file and disk quota to download genome builds, there are some builds available at the following directory:
Send a message to the HPRC helpdesk if you would like additional genome builds added to this shared location.
speedseq align uses a default of 20GB of memory for sambamba sorting. There are other concurrent processes other than the sambamba sort which also needs memory. It also helps to use the $TMPDIR variable in your job script as the temporary files directory: speedseq -T $TMPDIR
BFAST
GCATemplates available: no
BFAST homepage
BFAST facilitates the fast and accurate mapping of short reads to reference sequences.
Stampy
GCATemplates available: no
Stampy homepage
Stampy is a package for the mapping of short reads from illumina sequencing machines onto a reference genome.
It's recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq.
LAST
GCATemplates available: no
LAST homepage
LAST finds similar regions between sequences. LAST copes more efficiently with repeat-rich sequences (e.g. genomes).
For example: it can align reads to genomes without repeat-masking, without becoming overwhelmed by repetitive hits.
RNA-seq
Bowtie & Bowtie2
Bowtie homepage
Genome indexes for bowtie can be found in the ucsc directory for each organism:
Bowtie2 homepage
Genome indexes for bowtie2 can be found in the ucsc directory for each organism:
Bowtie indexes will not work with the Bowtie2 aligner.
Bowtie2 indexes will not work with the Bowtie aligner.
Please contact the HPRC helpdesk help@sc.tamu.edu if you need additional genome indexes
Tophat & Tophat2
GCATemplates available: no
TopHat homepage
TopHat is a fast splice junction mapper for RNA-Seq reads.
It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Cufflinks
GCATemplates available: no
Cufflinks homepage
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression (cuffdiff) and regulation in RNA-Seq samples.
Cufflinks includes a program, “cuffdiff”, that you can use to find significant changes in transcript expression, splicing, and promoter use.
STAR
GCATemplates available: no
STAR homepage
STAR is an ultrafast universal RNA-seq aligner
Optimizing STAR for PacBio long reads tutorial
GMAP
GCATemplates available: no
GMAP homepage
GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences.
HISAT2
GCATemplates available: no
HISAT2 homepage
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).
If you will be using cufflinks downstream, run hisat2 with the --dta-cufflinks option
The hisat2 index for hg38 is found here:
Subread
GCATemplates available: no
Subread homepage
Subread: an accurate and efficient aligner for mapping both genomic DNA-seq reads and RNA-seq reads (for the purpose of expression analysis).