Hprc banner tamu.png

SW:MAKER

From TAMU HPRC
Revision as of 09:56, 7 October 2021 by Cmdickens (talk | contribs) (MAKER)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

MAKER

GCATemplates available: grace (using $TMPDIR)

MAKER homepage

 module spider MAKER

MAKER is a portable and easily configurable genome annotation pipeline.

Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases.

Here is a good paper to help you get started.


You need to do the following three steps prior to submitting a MAKER job script on an HPRC cluster

1. Download the GeneMark license key
To use MAKER you need to download the GeneMark-ES licence key file since GeneMark-ES is part of the MAKER pipeline.
Download here: http://topaz.gatech.edu/GeneMark/license_download.cgi
Select the following: GeneMark-ES/ET v4.38 and LINUX 64.
You do not need to download the program just the 64_bit key file
Save the gm_key_64.gz to your $HOME directory.
Then gunzip the key file and rename it from gm_key_64 to .gm_key


2. Create or copy the three required control files; you must edit maker_opts.ctl
MAKER 2.31.8 use the following commands to create the three maker control files in your current working directory:
 module load MAKER/2.31.8-intel-2015B-Perl-5.20.0
 cp $EBROOTMAKER/sample_ctl_files/* ./
MAKER 2.31.10 use the following commands to copy the three maker control files to your current working directory
 module load MAKER/2.31.10-intel-2017A-Python-2.7.12
 cp /scratch/datasets/maker/2.31.10/*ctl ./
2a. maker_opts.ctl
You need to edit the maker_opts.ctl file based on your project.
Edit maker_opts.ctl file to set cpus=20 when using #SBATCH --cpus-per-task=28 or specify cpus as a command option:
maker -cpus 20
Recommended: Set TMP= to $TMPDIR by using the following maker option
maker -TMP $TMPDIR
2b. maker_bopts.ctl
You do not need to edit the maker_bopts.ctl file unless you want to adjust BLAST parameters.
2c. maker_exe.ctl
You do not need to edit the maker_exe.ctl file since it is pre-configured with executable paths.
3. Create a GeneMark HMM file
A GeneMark HMM file is needed if you want fasta sequences of predicted genes.
module load GeneMarkS/4.32
gmsn.pl -euk your_genome.fasta
gm -m GeneMark.mat -R -lo -op your_genome.fasta
GeneMark-ES and GeneMarkS are installed which can generate the GeneMark HMM file.
Once your GeneMark_hmm.mod file is generated, add it to the gmhmm value in the maker_opts.ctl file.
4. Add an AUGUSTUS species in your maker_opts.ctl file at the line: augustus_species=
You can find a list of AUGUSTUS species by loading the Maker module then looking in the directory:
ls $EBROOTAUGUSTUS/config/species

MAKER 2.31.10

Maker version 2.31.10 requires that you run two scripts after the maker command is complete.

cd dpp_contig.maker.output

fasta_merge -d dpp_contig_master_datastore_index.log
gff3_merge -d dpp_contig_master_datastore_index.log

Maker version 2.31.10 -help information:

MAKER version 2.31.10

Usage:

     maker [options] <maker_opts> <maker_bopts> <maker_exe>


Description:

     MAKER is a program that produces gene annotations in GFF3 format using
     evidence such as EST alignments and protein homology. MAKER can be used to
     produce gene annotations for new genomes as well as update annotations
     from existing genome databases.

     The three input arguments are control files that specify how MAKER should
     behave. All options for MAKER should be set in the control files, but a
     few can also be set on the command line. Command line options provide a
     convenient machanism to override commonly altered control file values.
     MAKER will automatically search for the control files in the current
     working directory if they are not specified on the command line.

     Input files listed in the control options files must be in fasta format
     unless otherwise specified. Please see MAKER documentation to learn more
     about control file  configuration.  MAKER will automatically try and
     locate the user control files in the current working directory if these
     arguments are not supplied when initializing MAKER.

     It is important to note that MAKER does not try and recalculated data that
     it has already calculated.  For example, if you run an analysis twice on
     the same dataset you will notice that MAKER does not rerun any of the
     BLAST analyses, but instead uses the blast analyses stored from the
     previous run. To force MAKER to rerun all analyses, use the -f flag.

     MAKER also supports parallelization via MPI on computer clusters. Just
     launch MAKER via mpiexec (i.e. mpiexec -n 40 maker). MPI support must be
     configured during the MAKER installation process for this to work though
     

Options:
     -genome|g <file>    Overrides the genome file path in the control files

     -RM_off|R           Turns all repeat masking options off.

     -datastore/         Forcably turn on/off MAKER's two deep directory
      nodatastore        structure for output.  Always on by default.

     -old_struct         Use the old directory styles (MAKER 2.26 and lower)

     -base    <string>   Set the base name MAKER uses to save output files.
                         MAKER uses the input genome file name by default.

     -tries|t <integer>  Run contigs up to the specified number of tries.

     -cpus|c  <integer>  Tells how many cpus to use for BLAST analysis.
                         Note: this is for BLAST and not for MPI!

     -force|f            Forces MAKER to delete old files before running again.
                         This will require all blast analyses to be rerun.

     -again|a            recaculate all annotations and output files even if no
                         settings have changed. Does not delete old analyses.

     -quiet|q            Regular quiet. Only a handlful of status messages.

     -qq                 Even more quiet. There are no status messages.

     -dsindex            Quickly generate datastore index file. Note that this
                         will not check if run settings have changed on contigs

     -nolock             Turn off file locks. May be usful on some file systems,
                         but can cause race conditions if running in parallel.

     -TMP                Specify temporary directory to use.

     -CTL                Generate empty control files in the current directory.

     -OPTS               Generates just the maker_opts.ctl file.

     -BOPTS              Generates just the maker_bopts.ctl file.

     -EXE                Generates just the maker_exe.ctl file.

     -MWAS    <option>   Easy way to control mwas_server for web-based GUI

                              options:  STOP
                                        START
                                        RESTART

     -version            Prints the MAKER version.

     -help|?             Prints this usage statement.