Hprc banner tamu.png

SW:InterProScan

From TAMU HPRC
Revision as of 10:04, 7 October 2021 by Cmdickens (talk | contribs) (InterProScan)
Jump to: navigation, search

InterProScan

GCATemplates available: no

InterProScan homepage

 module spider InterProScan

Additional notes on how to run InterProscan

InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.

You can see the InterProscan options with the following command. Options can be added as options to the interproscan.sh script (see example below)

interproscan.sh

The InterProscan Match Lookup Service is not installed on HPRC clusters and since the compute nodes do not have internet access, you will need to disable the match lookup service with the --disable-precalc option. This also means that the --goterms option will not work.

InterProscan is configured to use 4 cores so be sure to request 4 cores in your job script

Sample Grace job script using 4 cores and the option --disable-precalc

  1. !/bin/bash
  2. SBATCH --export=NONE # do not export current env to the job
  3. SBATCH --job-name=my_job # job name
  4. SBATCH --time=1-00:00:00 # max job run time dd-hh:mm:ss
  5. SBATCH --ntasks-per-node=1 # tasks (commands) per compute node
  6. SBATCH --cpus-per-task=4 # CPUs (threads) per command
  7. SBATCH --mem=28G # total memory per node
  8. SBATCH --output=stdout.%j # save stdout to file
  9. SBATCH --error=stderr.%j # save stderr to file

module load GCC/9.3.0 OpenMPI/4.0.3 InterProScan/5.52-86.0

interproscan.sh --disable-precalc -f tsv -i <protein_fasta_file> -o <out.tsv> --tempdir $TMPDIR