Hprc banner tamu.png

SW:OrthoMCL

From TAMU HPRC
Jump to: navigation, search

OrthoMCL

version 1.4

GCATemplates available: ada

OrthoMCL homepage

Identification of Ortholog Groups for Eukaryotic Genomes.

 module load OrthoMCL/1.4-intel-2015B-Perl-5.20.0

OrthoMCL version 1.4 is required if generating an input file for POTION.

After loading the OrthoMCL 1.4 module, You can see sample input files for OrthoMCL in this directory $EBROOTORTHOMCL/sample_data/

You can increase the number of threads used by BLAST by creating an environment variable in your job script with #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=10 (using 10 cpus is more efficient than 20)

  export BLAST_CPUS=10

From the OrthoMCL Documentation:

"There are five modes to run OrthoMCL, with each mode having a different process. We strongly suggest you to use MODE 4 for very big set, since BLAST was not programmed to run parallelly. You can simply prepare two files for mode 4, BPO file and GG file. And it's very fast, for our test set of 200,000 sequences on a Mac G5 computer, it took 8 hours to finish."

For OrthoMCL-1.4, you must create a directory called sample_data in your working directory and put all your input files in the sample_data directory

The five modes of OrthoMCL are:

  Mode 1: OrthoMCL analysis from FASTA files. OrthoMCL starts from
          the beginning BLAST to final MCL.

     Example:  % orthomcl.pl --mode 1 --fa_files Ath.fa,Hsa.fa,Sce.fa

  Mode 2: OrthoMCL analysis based on former OrthoMCL run (former run
          directory needs to be given), if you want to change the
          inflation parameter, p-value cutoff (can only be lower than
          your former run BLAST p-value cutoff), percent identity cutoff
          or percent match cutoff. No BLAST or BLAST parsing performed.

     Example:  % orthomcl.pl --mode 2 --former_run_dir Sep_8 --inflation 1.4

  Mode 3: OrthoMCL analysis from user-provided BLAST result BLAST out file
          and genome gene relation file telling which genome has which gene
          (Please refer to 5. File Formats). No BLAST performed.

     Example:  % orthomcl.pl --mode 3 --blast_file AtCeHs_blast.out --gg_file AtCeHs.gg


  Mode 4: OrthoMCL analysis from user-provided BPO (BLAST PARSING OUT) file
          and GG (genome gene relation) file telling which genome has which gene
          (Please refer to 5. File Formats). No BLAST or BLAST parsing performed.

     Example:  % orthomcl.pl --mode 4 --bpo_file AtCeHs.bpo --gg_file AtCeHs.gg

  Mode 5: OrthoMCL analysis based on previous run, but with less taxa included
          or with only inflation value changed (FASTER than mode 2, no selection
          on reciprocal best/better hits performed).

     Example:  % orthomcl.pl --mode 5 --former_run_dir Sep_8 --taxa_file AtCeHs.gg --inflation=1.1

version 2.0.9

GCATemplates available: ada

OrthoMCL homepage

Identification of Ortholog Groups for Eukaryotic Genomes.

 module load OrthoMCL/2.0.9-intel-2015B-Perl-5.20.0

You can get taxon codes from the kegg website

In the "Overview of steps" section of the OrthoMCL UserGuide, you only need to do steps 5 - 13.

Refer to the OrthoMCL UserGuide for specific details on each of the following steps.


(5) run orthomclAdjustFasta (or your own simple script) to generate protein fasta files in the required format.

(6) run orthomclFilterFasta to filter away poor quality proteins, and optionally remove alternative proteins. Creates a single large goodProteins.fasta file (and a poorProteins.fasta file)

(7) run all-v-all NCBI BLAST on goodProteins.fasta (output format is tab delimited text [use BLAST option -m 8 ]).

(8) run orthomclBlastParser on the NCBI BLAST tab output to create a file of similarities in the required format

Once you have your BLAST output and your compliantfasta directory/files created, you can initialize the MySQL database using the TAMU HPRC provided setup script.

Go to the directory where you will run your analysis and run the following on the command line not in your job script.

Do not load any modules yet.

module purge
/software/hprc/Bio/OrthoMCL/setup_mysql.sh

While running the setup_mysql.sh script, type any key to enter the new location, the following is recommended:

/scratch/user/<your_NetID>/orthomcl_mysql

The OrthoMCl database will be initialized for use in steps 9 - 13.

You only need to initialize the database once and this is done on the command line not in your job script.

You will need to start the MySQL service with the OrthoMCL db in order to run steps 9 - 13

You will need to stop the MySQL service at the end of your job script. Do not leave the MySQL server process running on the compute node.

Use the following in your job script to start the MySQL service and OrthoMCL db, run steps 9 - 13, then stop the MySQL service

# start the mysqld to use the OrthoMCL db
module purge
./mysqld start
sleep 10
if [ "$?" != 0 ]; then
    echo "mysqld process failed to start. exiting"
    exit 1
fi
module load OrthoMCL/2.0.9-intel-2015B-Perl-5.20.0

<enter your commands for steps 9 - 13 or do one step at a time>

# stop the mysqld that is running on the compute node
./mysqld stop
sleep 10

(9) run orthomclLoadBlast to load the output of orthomclBlastParser into the database.

(10) run the orthomclPairs program to compute pairwise relationships.

(11) run the orthomclDumpPairsFiles program to dump the pairs/ directory from the database

(12) run the mcl program on the mcl_input.txt file created in Step 11.

(13) run orthomclMclToGroups to convert mcl output to groups.txt

We recommend you save the output of each step so that you can easily redo it if things go wrong.