Hprc banner tamu.png

SW:I-TASSER

From TAMU HPRC
Revision as of 11:31, 19 October 2021 by Cmdickens (talk | contribs) (parallel)
Jump to: navigation, search

I-TASSER

GCATemplates available: no

I-TASSER is only for non-commercial use.

I-TASSER website

I-TASSER (Iterative Threading ASSEmbly Refinement) is a hierarchical approach to protein structure prediction and structure-based function annotation.

Available on Grace only.

module load GCC/9.3.0  I-TASSER/5.1-Perl-5.30.2

The I-TASSER data libraries are in the following directory

/scratch/data/bio/i-tasser/5.1

Although the I-TASSER libraries (except nr) are updated weekly on the I-TASSER website, the libraries on Grace will be updated at each cluster maintenance.

runstyle

parallel

  • example command:
    • runI-TASSER.pl -java_home $EBROOTJAVA -runstyle parallel -datadir my_datadir -libdir /scratch/data/bio/i-tasser/5.1 -seqname my_seq_name
  • All jobs will be run in parallel on multiple nodes although there may not be a significant reduction in runtime since there are fewer processes than cores on a single node for some I-TASSER scripts. There will be less idle time for CPU cores than using gnuparallel since jobs are submitted using a single core.
  • When using the parallel runstyle in your runI-TASSER.pl job script, submit your job using 3 tasks and 21GB memory.
    • Other scripts such as runCOFACTOR.pl may need more initial tasks but generally each process uses a single-core.
  • Each of automatically generated parallel jobs created are hard coded to use 1 core, 7GB memory for 3 days walltime.
  • If your job fails due to not enough resources, send a message to the HPRC helpdesk and we will expand the resources for the automatically generated jobs.
  • It is possible that your job could fail if you do not have enough SUs to schedule at least 15 single core jobs for 3 days each (1080 SUs)
example job script
#!/bin/bash
#SBATCH --export=NONE               # do not export current env to the job
#SBATCH --job-name=itasser          # job name
#SBATCH --time=1-00:00:00           # max job run time dd-hh:mm:ss
#SBATCH --ntasks-per-node=3         # tasks (commands) per compute node
#SBATCH --cpus-per-task=1           # CPUs (threads) per command
#SBATCH --mem=21G                   # total memory per node
#SBATCH --output=stdout.%j          # save stdout to file
#SBATCH --error=stderr.%j           # save stderr to file

module load GCC/9.3.0  I-TASSER/5.1-Perl-5.30.2

# your sequence.fasta file containing a single protein sequence is in a directory named my_datadir
runI-TASSER.pl -java_home $EBROOTJAVA -runstyle parallel -datadir my_datadir -libdir /scratch/data/bio/i-tasser/5.1 -seqname my_seq_name

gnuparallel

  • example command:
    • runI-TASSER.pl -java_home $EBROOTJAVA -runstyle gnuparallel -datadir my_datadir -libdir /scratch/data/bio/i-tasser/5.1 -seqname my_seq_name
  • All jobs will be run in parallel on a single-node
  • When using the gnuparallel runstyle in your job script, submit your job using 3 tasks and 21GB memory.
    • each of automatically generated gnuparallel processes created are hard coded to use 1 core, 7GB memory for 3 days walltime.

serial

  • This is the default if you do not specify -runstyle
  • Avoid using this mode since it can take 6x longer to run in serial mode.

benchmarks

runI-TASSER.pl

serial:  1 day 5 hr 22 min

gnuparallel:   5 hr  3 min
 (single-node)

parallel:      4 hr 57 min
 (Slurm; multi-node)