$seoHelper.renderFullSimple($sitemeshPage,"{2} - {3}")
Page tree
Skip to end of metadata
Go to start of metadata

Contents

Overview

InterProScan (IPRSCAN) is a tool that combines different protein signature recognition methods native to the InterPro member databases into one resource with look up of corresponding InterPro and GO annotation.

About Version 5

IPRSCAN version 5 was totally rewritten in Java from the previous version 4.  The new version does not appear to be susceptible to the same problems with using I/O over distributed file systems like NFS as the previous versions were.  HPCC used to be able to accommodate version 4 over distributed file systems prior to it's early 2013 system upgrade, but since that time version 4 runs will intermittently fail during the results merge step.  Due to the configuration of our system, it is not practical to store the data sets for IPRSCAN locally on each individual node (the only known remedy for this problem), and thus, we are unable to support version 4.  Therefore it is not recommended to run IPRSCAN version 4 on the HPCC, and instead, we encourage users to consider version 5 after thoroughly evaluating the appropriateness of the new version for their research. 

Job Setup

Icon

Because multi-threading has been hardcoded for 4 cpus and these run in addition to the main process thread, the user MUST use "ppn=5" when submitting InterProScan jobs to prevent node-overutilization.

Configuration

Like version 4, version 5 has a single properties file that is used to determine the run parameters for IPRSCAN.  Unfortunately, IPRSCAN still does not offer the ability to use multiple or alternate properties files to accommodate a centralized installation in a multi-user environment.  Thus, if you desire customized IPRSCAN runs we recommend that usersdownload the IPRSCAN version 5 binaries into their home, research, or scratch directory space, modify the interproscan.properties file to their liking, and then perform their runs.  However, to assist users in running IPRSCAN on our system, we provide a centralized data repository for the very large IPRSCAN analysis sets.  These are provided in:

 

/mnt/research/common-data/Bio/iprscan/data

 

If you decide to run your own customized version of IPRSCAN, simply update your copy of the interproscan.properties file to reflect the fullpath location of the data sets (you can use the sets available on the HPCC if you wish), and run as you usually would utilizing the Java binaries in your home or research directory space.

A copy of the interproscan.properties file is provided for your use as a starting template.  It is available as:

 

/mnt/research/common-data/Bio/iprscan/interproscan.properties

 

Icon

If you use this file for your own IPRSCAN home directory installation, please modify and then change the name to "interproscan.properties"

You may still run IPRSCAN 5 on the HPCC if you wish, providing your work is compatible with the default interproscan.properties setup which is provided for your review in:

 

/mnt/research/common-data/Bio/iprscan/interproscan.properties

 

Icon

You will not need any modifications or additional steps to run IPRSCAN version 5 if you choose the HPCC centralized installation.

Default Configuration

The current configuration of customizable InterProScan limits is as follows:

 

#max amino acids for input sequence
#
maxinputseqs.aa=1000
#max nucleic acids for input sequene
maxinputseqs.nt=100
#max length for the nucleotide input sequence
maxseqlen.nt=10000
#min length for the protein input sequence
minseqlen.aa=5
#default minimum orf size for translation
minorfsize=50
#default codon translation table ( for Standard Code)
codon.table=0
#
#Here is the number of sequences splitted into one chunk.
chunk=100
#the number of chunk to be displayed on a line in the result page
chunk.display=50
#If you want CRC64, InterPro look up and goterms checkboxes checked by default, set tags to 1, otherwise leave it empty.
checkbox.crc64=
checkbox.iprlookup=1
checkbox.goterms=1

 

IPRSCAN Analyses

The following IPRSCAN analyses are available on the HPCC build:

TIGRFAM-13.0 : TIGRFAMs are protein families based on Hidden Markov Models or HMMs
ProDom-2006.1 : ProDom is a comprehensive set of protein domain families automatically generated from the UniProt Knowledge Database.
SMART-6.2 : SMART allows the identification and analysis of domain architectures based on Hidden Markov Models or HMMs
PrositeProfiles-20.89 : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
PrositePatterns-20.89 : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
SuperFamily-1.75 : SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.
PRINTS-42.0 : A fingerprint is a group of conserved motifs used to characterise a protein family
Gene3d-3.5.0 : Structural assignment for whole genes and genomes using the CATH domain structure database
PIRSF-2.83 : The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.
PfamA-27.0 : A large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs)
TMHMM-2.0c : Prediction of transmembrane helices in proteins
HAMAP-201302.26 : High-quality Automated and Manual Annotation of Microbial Proteomes
Coils-2.2 : Prediction of Coiled Coil Regions in Proteins

The following IPRSCAN analyses are currently deactivated on the HPCC build:

Phobius-1.01 : Analysis Phobius-1.01 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.phobius.pl.path.1.01
SignalP-GRAM_NEGATIVE-4.0 : Analysis SignalP-GRAM_NEGATIVE-4.0 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.signalp.4.0.path
SignalP-EUK-4.0 : Analysis SignalP-EUK-4.0 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.signalp.4.0.path
SignalP-GRAM_POSITIVE-4.0 : Analysis SignalP-GRAM_POSITIVE-4.0 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.signalp.4.0.path
Panther-7.2 : Analysis Panther-7.2 is deactivated, because the resources expected at the following paths do not exist: data/panther/7.2/model

Run Notes

A shortcut has been created for the HPCC version of IPRSCAN.  To run:

 

module load iprscan
iprscan5 <args>