Difference between revisions of "Bioinformatics:Data Normalization.2C Clustering .26 Collapsing"
(→Data Normalization, Clustering & Collapsing) |
|||
Line 1: | Line 1: | ||
− | = Data Normalization, Clustering & Collapsing = | + | = NGS: Data Normalization, Clustering & Collapsing = |
__TOC__ | __TOC__ | ||
== BBNorm == | == BBNorm == |
Revision as of 17:06, 15 December 2016
NGS: Data Normalization, Clustering & Collapsing
BBNorm
GCATemplates available: no
BBNorm homepage
module load BBMap
bbnorm.sh is the data normalization script that is part of the BBMap package.
BBNorm: Kmer-based error-correction and normalization tool.
CD-HIT
GCATemplates available: no
CD-HIT homepage
module load CD-HIT
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
FASTX-Toolkit
GCATemplates available: no
FASTX-Toolkit homepage
module load FASTX-Toolkit
The fastx_collapser tool is included in the FASTX-Toolkit.
Collapses identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts).