Hprc banner tamu.png

Bioinformatics

From TAMU HPRC
Jump to: navigation, search

Bioinformatics Tool Categories

This is a summary of most of the NGS Bioinformatics tools on Ada, Terra and Curie.

Not all NGS Bioinformatics tools installed on the HPRC clusters are summarized on these pages.

A complete software module listing for each cluster can be found here:

Check the website of the software package you want to use to see if the version on Ada is the latest available version and advise us if a newer version needs to be installed.

Genome Fasta, Index Files and Databases on Ada

The following genomes are available on Ada and Curie and can be accessed via the command line.

The idea is that you link to these files in your job script instead of copying the files to your directories.

Genomic Reference Sequences, Indexes and Databases
All of NCBI's BLAST databases (nt, nr, rna, 16SMicrobial,...) /scratch/datasets/BLAST/
The latest UCSC Genomes /scratch/datasets/ucsc/gbdb/
hg19 annotations (dbSNP, genes.gtf, ...) /scratch/datasets/genome_annotations/Homo_sapiens/UCSC/hg19/
BUSCO lineage files /scratch/datasets/busco/
Silva database files /scratch/datasets/silva/
CESM /scratch/datasets/CESM/
Gemini /scratch/datasets/Gemini/
Augustus species /software/easybuild/software/AUGUSTUS/3.1-intel-2015B/config/species/
Bowtie, Bowtie2 and BWA indexed genomes* /scratch/datasets/genome_indexes/

* bowtie indexes will work with any version of the bowtie tool but will not work with the bowtie2 tool
* bowtie2 indexes will work with any version of the bowtie2 tool but will not work with the bowtie tool

Bioinformatics Web Resources

Genomic Databases
NCBI National Center for Biotechnology Information
Ensembl Provides a bioinformatics framework to organise biology around the sequences of large genomes.
EBI The European Bioinformatics Institute
JGI Genome sequences of plants, fungi, microbes, and metagenomes
DDBJ DNA Data Bank of Japan
ENCODE ENCyclopedia Of DNA Elements
HapMap Identify and catalog genetic similarities and differences in human beings
GOLD Genomes Online Database: information regarding genome and metagenome sequencing projects
DAVID The Database for Annotation, Visualization and Integrated Discovery
RepBase Genetic Information Research Institute. RepBase: repetitive sequences database
Gene Nomenclature and Info
HGNC A curated online repository of HGNC-approved gene nomenclature.
GeneCards Information on all annotated and predicted human genes
Sequencing Projects
HMP Human Microbiome Project: characterizes microbial communities found at multiple human body sites
1000 Genomes A Deep Catalog of Human Genetic Variation
UK10K Rare Genetic Variants in Health and Disease
Protein Sequence Databases
InterPro Protein sequence analysis & classification
PDB Protein Data Bank
Pfam A large collection of protein families
UniProt Protein sequences and functional information
Download Sequence Data
BioMart Specific gene regions, Gene Ontology and Alternate IDs
Ensembl cDNA, GTF, VEP, CDS, Protein
Ensemble Bacteria cDNA, GTF, VEP, Protein
Ensemble Plants cDNA, GFF3, VEP, GTF, Protein
Ensemble Fungi cDNA, cDNA, CDS, Protein, GTF, GFF3, VEP
iGenome Ensembl, NCBI, UCSC reference fasta; Bowtie, BWA and Bowtie2 index files
RNA Databases
miRBase A searchable database of published miRNA sequences and annotation
RDP Aligned and annotated Bacterial and Archaeal 16S rRNA sequences, and Fungal 28S rRNA sequences
SILVA A comprehensive on-line resource for quality checked and aligned ribosomal RNA sequence data
Gene Expression Databases
Expression Atlas EBI Expression Atlas
BioXpress gene expression in cancer
EMAGE Mouse embryo in situ gene expression data
GPXdb Macrophage Expression Atlas
Human Protein Atlas Human protein-coding genes regarding the expression based on both RNA and protein data
PLEXdb Gene expression in diverse organs, tissues, or developmental stages
Metabolic Pathway Databases
KEGG A database resource for understanding high-level functions and utilities of the biological system
Reactome A curated and peer reviewed pathway database
GeneNetwork Tools used to study complex networks of genes, molecules, gene function and phenotypes
Model Organism Databases
TAIR The Arabidopsis Information Resource
CGD Candida Genome Database
FlyBase A Database of Drosophila Genes & Genomes
Gramene Comparative functional genomics in crops and model plant species
MaizeGDB Zea mays database
MGI International database resource for the laboratory mouse
RGD Rat Genome Database
Saccharomyces Comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae
VectorBase Bioinformatics Resource for Invertebrate Vectors of Human Pathogens
WormBase Nematodes
Xenbase Xenopus Database
ZFIN The Zebrafish Model Organism Database
Genome Browsers
UCSC
Genome Data Viewer NCBI genome browser for 600+ RefSeq genome assemblies
Ensembl
Forums and Techniques
RNA-Seq Blog Blog on the latest RNA-seq experimental design and analysis techniques
SeqAnswers Forums on everything Bioinformatics
BioStar Forums on everything Bioinformatics
Genohub Designing your Next Generation Sequencing Run
Genohub Coverage and Read Depth Recommendations by Sequencing Application
Software and Database Lists
NCBI Databases available at NCBI
NCBI Tools available at NCBI
OMICtools Database of tools for omic data analysis (NGS, microarray, PCR, MS, NMR)
ExPASy Bioinformatics Resource Portal
NAR Nucleic Acids Research, published papers on databases
Bioinformatics table of elements
Bioinformatics Web Tutorials
NCBI NIH Online Bioinformatics Tutorials
EMBL-EBI Train online with EMBL-EBI (Home)
EMBL-EBI Train online with EMBL-EBI (Next Generation Sequencing Practical Course)
Ensembl Ensembl Tutorials and Worked Examples
GGD Getting Things Done in Genetics & Bioinformatics Research
Melbourne Bioinformatics A few good tutorials including some for Galaxy
Other Useful Tools
SAM Flags Explain SAM/BAM bitwise flags
SRA Explorer Explore SRA by SRA/GEO id, organism, seq type, tissue type, ...
SNPnexus Web-based annotation of human SNPs