InterProScan (IPRSCAN) is a tool that combines different protein signature recognition methods native to the InterPro member databases into one resource with look up of corresponding InterPro and GO annotation.
About Version 5
IPRSCAN version 5 was totally rewritten in Java from the previous version 4. The new version does not appear to be susceptible to the same problems with using I/O over distributed file systems like NFS as the previous versions were. HPCC used to be able to accommodate version 4 over distributed file systems prior to it's early 2013 system upgrade, but since that time version 4 runs will intermittently fail during the results merge step. Due to the configuration of our system, it is not practical to store the data sets for IPRSCAN locally on each individual node (the only known remedy for this problem), and thus, we are unable to support version 4. Therefore it is not recommended to run IPRSCAN version 4 on the HPCC, and instead, we encourage users to consider version 5 after thoroughly evaluating the appropriateness of the new version for their research.
Like version 4, version 5 has a single properties file that is used to determine the run parameters for IPRSCAN. Unfortunately, IPRSCAN still does not offer the ability to use multiple or alternate properties files to accommodate a centralized installation in a multi-user environment. Thus, if you desire customized IPRSCAN runs we recommend that usersdownload the IPRSCAN version 5 binaries into their home, research, or scratch directory space, modify the interproscan.properties file to their liking, and then perform their runs. However, to assist users in running IPRSCAN on our system, we provide a centralized data repository for the very large IPRSCAN analysis sets. These are provided in:
If you decide to run your own customized version of IPRSCAN, simply update your copy of the interproscan.properties file to reflect the fullpath location of the data sets (you can use the sets available on the HPCC if you wish), and run as you usually would utilizing the Java binaries in your home or research directory space.
A copy of the interproscan.properties file is provided for your use as a starting template. It is available as:
You may still run IPRSCAN 5 on the HPCC if you wish, providing your work is compatible with the default interproscan.properties setup which is provided for your review in:
The current configuration of customizable InterProScan limits is as follows:
The following IPRSCAN analyses are available on the HPCC build:
TIGRFAM-13.0 : TIGRFAMs are protein families based on Hidden Markov Models or HMMs
ProDom-2006.1 : ProDom is a comprehensive set of protein domain families automatically generated from the UniProt Knowledge Database.
SMART-6.2 : SMART allows the identification and analysis of domain architectures based on Hidden Markov Models or HMMs
PrositeProfiles-20.89 : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
PrositePatterns-20.89 : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
SuperFamily-1.75 : SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.
PRINTS-42.0 : A fingerprint is a group of conserved motifs used to characterise a protein family
Gene3d-3.5.0 : Structural assignment for whole genes and genomes using the CATH domain structure database
PIRSF-2.83 : The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.
PfamA-27.0 : A large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs)
TMHMM-2.0c : Prediction of transmembrane helices in proteins
HAMAP-201302.26 : High-quality Automated and Manual Annotation of Microbial Proteomes
Coils-2.2 : Prediction of Coiled Coil Regions in Proteins
The following IPRSCAN analyses are currently deactivated on the HPCC build:
Phobius-1.01 : Analysis Phobius-1.01 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.phobius.pl.path.1.01
SignalP-GRAM_NEGATIVE-4.0 : Analysis SignalP-GRAM_NEGATIVE-4.0 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.signalp.4.0.path
SignalP-EUK-4.0 : Analysis SignalP-EUK-4.0 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.signalp.4.0.path
SignalP-GRAM_POSITIVE-4.0 : Analysis SignalP-GRAM_POSITIVE-4.0 is deactivated, because the following parameters are not set in the interproscan.properties file: binary.signalp.4.0.path
Panther-7.2 : Analysis Panther-7.2 is deactivated, because the resources expected at the following paths do not exist: data/panther/7.2/model
A shortcut has been created for the HPCC version of IPRSCAN. To run: