Hprc banner tamu.png

Difference between revisions of "SW:InterProScan"

From TAMU HPRC
Jump to: navigation, search
(InterProScan)
(InterProScan)
Line 3: Line 3:
  
 
InterProScan [https://www.ebi.ac.uk/interpro/about.html homepage]
 
InterProScan [https://www.ebi.ac.uk/interpro/about.html homepage]
   module load InterProScan/5.30-69.0-intel-2017A-Python-2.7.12
+
   module spider InterProScan
  
 
Additional notes on how to run [https://github.com/ebi-pf-team/interproscan/wiki/HowToRun InterProscan]
 
Additional notes on how to run [https://github.com/ebi-pf-team/interproscan/wiki/HowToRun InterProscan]
Line 18: Line 18:
 
InterProscan is configured to use 4 cores so be sure to request 4 cores in your job script
 
InterProscan is configured to use 4 cores so be sure to request 4 cores in your job script
  
Sample job script using 4 cores and the option --disable-precalc
+
Sample Grace job script using 4 cores and the option --disable-precalc
<pre>
+
#!/bin/bash
#BSUB -L /bin/bash             # uses the bash login shell for the job's execution environment.
+
#SBATCH --export=NONE              # do not export current env to the job
#BSUB -J interproscan           # job name
+
#SBATCH --job-name=my_job           # job name
#BSUB -n 4                      # assigns 4 cores for execution
+
#SBATCH --time=1-00:00:00          # max job run time dd-hh:mm:ss
#BSUB -R "span[ptile=4]"        # assigns 4 cores per node
+
#SBATCH --ntasks-per-node=1        # tasks (commands) per compute node
#BSUB -R "rusage[mem=2500]"    # reserves 2500MB memory per core
+
#SBATCH --cpus-per-task=4          # CPUs (threads) per command
#BSUB -M 2500                   # sets to 2500MB per process enforceable memory limit. (M * n)
+
#SBATCH --mem=28G                   # total memory per node
#BSUB -W 72:00                  # sets to 72 hour the job's runtime wall-clock limit.
+
#SBATCH --output=stdout.%j          # save stdout to file
#BSUB -o stdout.%J              # directs the job's standard output to stdout.jobid
+
#SBATCH --error=stderr.%j          # save stderr to file
#BSUB -e stderr.%J              # directs the job's standard error to stderr.jobid
 
  
module load InterProScan/5.30-69.0-intel-2017A-Python-2.7.12
+
module load GCC/9.3.0  OpenMPI/4.0.3  InterProScan/5.52-86.0
  
 
interproscan.sh --disable-precalc -f tsv -i <protein_fasta_file> -o <out.tsv> --tempdir $TMPDIR
 
interproscan.sh --disable-precalc -f tsv -i <protein_fasta_file> -o <out.tsv> --tempdir $TMPDIR

Revision as of 10:04, 7 October 2021

InterProScan

GCATemplates available: no

InterProScan homepage

 module spider InterProScan

Additional notes on how to run InterProscan

InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.

You can see the InterProscan options with the following command. Options can be added as options to the interproscan.sh script (see example below)

interproscan.sh

The InterProscan Match Lookup Service is not installed on HPRC clusters and since the compute nodes do not have internet access, you will need to disable the match lookup service with the --disable-precalc option. This also means that the --goterms option will not work.

InterProscan is configured to use 4 cores so be sure to request 4 cores in your job script

Sample Grace job script using 4 cores and the option --disable-precalc

  1. !/bin/bash
  2. SBATCH --export=NONE # do not export current env to the job
  3. SBATCH --job-name=my_job # job name
  4. SBATCH --time=1-00:00:00 # max job run time dd-hh:mm:ss
  5. SBATCH --ntasks-per-node=1 # tasks (commands) per compute node
  6. SBATCH --cpus-per-task=4 # CPUs (threads) per command
  7. SBATCH --mem=28G # total memory per node
  8. SBATCH --output=stdout.%j # save stdout to file
  9. SBATCH --error=stderr.%j # save stderr to file

module load GCC/9.3.0 OpenMPI/4.0.3 InterProScan/5.52-86.0

interproscan.sh --disable-precalc -f tsv -i <protein_fasta_file> -o <out.tsv> --tempdir $TMPDIR