Hprc banner tamu.png

Difference between revisions of "Bioinformatics:Data Normalization.2C Clustering .26 Collapsing"

From TAMU HPRC
Jump to: navigation, search
(Data Normalization, Clustering & Collapsing)
Line 6: Line 6:
 
BBNorm [https://sourceforge.net/projects/bbmap/ homepage]
 
BBNorm [https://sourceforge.net/projects/bbmap/ homepage]
  
   module load BBMap
+
   module spider BBMap
  
 
bbnorm.sh is the data normalization script that is part of the BBMap package.
 
bbnorm.sh is the data normalization script that is part of the BBMap package.
Line 18: Line 18:
 
CD-HIT [http://weizhongli-lab.org/cd-hit/ homepage]
 
CD-HIT [http://weizhongli-lab.org/cd-hit/ homepage]
  
   module load CD-HIT
+
   module spider CD-HIT
  
 
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
 
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
Line 28: Line 28:
 
FASTX-Toolkit [http://hannonlab.cshl.edu/fastx_toolkit/ homepage]
 
FASTX-Toolkit [http://hannonlab.cshl.edu/fastx_toolkit/ homepage]
  
   module load FASTX-Toolkit
+
   module spider FASTX-Toolkit
  
 
The fastx_collapser tool is included in the FASTX-Toolkit.
 
The fastx_collapser tool is included in the FASTX-Toolkit.

Revision as of 11:38, 24 April 2017

NGS: Data Normalization, Clustering & Collapsing

BBNorm

GCATemplates available: no

BBNorm homepage

 module spider BBMap

bbnorm.sh is the data normalization script that is part of the BBMap package.

BBNorm: Kmer-based error-correction and normalization tool.


CD-HIT

GCATemplates available: no

CD-HIT homepage

 module spider CD-HIT

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.


FASTX-Toolkit

GCATemplates available: no

FASTX-Toolkit homepage

 module spider FASTX-Toolkit

The fastx_collapser tool is included in the FASTX-Toolkit.

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts).