Hprc banner tamu.png

Difference between revisions of "Bioinformatics:Data Normalization.2C Clustering .26 Collapsing"

From TAMU HPRC
Jump to: navigation, search
Line 1: Line 1:
 
= Data Normalization, Clustering & Collapsing =
 
= Data Normalization, Clustering & Collapsing =
 
+
__TOC__
 
== BBNorm ==
 
== BBNorm ==
 
[[Ada:GCATemplates|GCATemplates]] available: no
 
[[Ada:GCATemplates|GCATemplates]] available: no

Revision as of 16:05, 15 December 2016

Data Normalization, Clustering & Collapsing

BBNorm

GCATemplates available: no

BBNorm homepage

 module load BBMap

bbnorm.sh is the data normalization script that is part of the BBMap package.

BBNorm: Kmer-based error-correction and normalization tool.


CD-HIT

GCATemplates available: no

CD-HIT homepage

 module load CD-HIT

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.


FASTX-Toolkit

GCATemplates available: no

FASTX-Toolkit homepage

 module load FASTX-Toolkit

The fastx_collapser tool is included in the FASTX-Toolkit.

Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts).