Phylogenetics
RAxML
RAxML homepage
To see the available versions, type the following on the command line:
RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.
AAF
AAF homepage
Use the module spider command to see the available versions of AAF:
AAF (alignment and assembly-free) is a free software package that reconstructs phylogeny from next-generation sequencing data without assembly and alignment.
It takes raw sequencing reads from each sample altogether and generates a distance matrix based on the proportion of shared k-mers between each sample and reconstruct a phylogeny based on the distance matrix.
If you are running aaf_distance.py with > 10 cores and the command is failing with a message like IndexError: list index out of range then use fewer cores such as -t 5
IQ-TREE
GCATemplates available: no
IQ-TREE homepage
Efficient phylogenomic software by maximum likelihood
When running using all 20 cores (#BSUB -n 20) on a 64 GB or 256 GB node, use the following in your job script:
module load IQ-TREE/1.5.5-GCCcore-6.3.0
OMP_NUM_THREADS=20
If you need to run on xlarge nodes (1 TB or 2 TB nodes) and you are using #BSUB -n 40 then use the following in your job script:
module load Westmere
module load IQ-TREE/1.5.5-GCCcore-6.3.0
OMP_NUM_THREADS=40
Use the following command to run IQ-TREE
iqtree-omp
BPGA
GCATemplates available: no
BPGA homepage
BPGA is an ultra-fast software package that provides comprehensive pan genome analysis of microorganisms. In addition to all types of routine pan genomic analyses (Pan genome Profiles, Pan/Core Phylogeny etc.), BPGA includes a number of novel downstream analysis features like Exclusive Gene Family Analysis, Atypical GC Content Analysis, Subset Analysis, MLST based on housekeeping genes and KEGG Distribution etc.
BPGA creates files in the installation directory so you will have to open the .tar.gz package in your working directory for each project.
You can use the following commands to run BPGA from within your working directory in your $SCRATCH space.
You can name your working directory anything you want, in this example it is $SCRATCH/bpga_project
module purge
module load USEARCH/10.0.240-i86linux32
module load LibTIFF/4.0.7-GCCcore-6.3.0
module load glibc/2.14
module load gnuplot/4.6.6-GCCcore-6.3.0-Python-2.7.12-bare
module load Ghostscript/9.21-GCCcore-6.3.0-Python-2.7.12-bare
module load Python/2.7.12-intel-2017A
mkdir $SCRATCH/bpga_project
cd $SCRATCH/bpga_project
cp /sw/eb/sources/b/BPGA/BPGA-1.3-linux-x86_64-0-0-0.tar.gz ./
tar xzf BPGA-1.3-linux-x86_64-0-0-0.tar.gz
cd BPGA-1.3-linux-x86_64-0-0-0/BPGA-Version-1.3/bin
ln -s $EBROOTUSEARCH/bin/usearch
chmod +x BPGA-Version-1.3
./BPGA-Version-1.3
When running BPGA, it will prompt you for the full path to the directory that has your input files
PhyloNetworks
GCATemplates available: no
PhyloNetworks homepage
PhyloNetworks is a Julia package for the manipulation, visualization, inference of phylogenetic networks, and their use for trait evolution.
PICRUSt
GCATemplates available: no
PICRUSt homepage
PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
If you need to use R with the ape package, you can load the following R_tamu version
PICRUSt2 is available using Anaconda
GCATemplates available: no
PIECRUSt homepage
PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a software for predicting functional abundances based only on marker gene sequences.
# GRACE
PhyloSNP
GCATemplates available: no
PhyloSNP homepage
PhyloSNP is designed to take SNP data files (.csv and .vcf) and generate phylogenetic trees from the provided data.
Additionally, PhyloSNP can either generate a shortened concatenated genome from the SNPs or generate a concatenated genome from contigs generated from the SNPs and a specified number of base pairs around each SNP.
PhyloSNP works best with bacterial genomes. For larger genomes or projects with many samples, use the shrunk-genomes.pl script in the PhyloSNP module.
SNPhylo
GCATemplates available: no
SNPhylo homepage
SNPhylo, to construct phylogenetic tree based on SNP data. With this pipeline, user can construct a phylogenetic tree from a file containing huge SNP data.
SNPhylo works best with large genomes such as the human genome. Smaller genomes such as bacterial and fungal genomes probably will not have enough variants (5000 minimum) after the filtering step.