Blast2GO +is an application which annotates NCBI-BLAST aligned sequences with Gene Ontology+ (GO) information retained within a MySQL database. The HPCC provides both the front-end Blast2GO application, as well as a back-end MySQL database which is available for searching on any HPCC dev- or compute-node.
Running the Blast2GO Pipeline
A thumbnail sketch of the B2G4PIPE execution workflow is as follows:
Load the Module File
Grab a copy of the templated Blast2GO properties file and place it in your working directory:
This properties file already contains the connection information to the Blast2GO MySQL database so do NOT alter the section labeled "GO and B2G Data Access Basic" as this will break the application's functionality.
Run the Pipeline
Run the blast2go pipeline (i.e. non-gui) version of the program with the desired/appropriate options and input files.
The general format of the blast2go pipeline command is:
Please see the attached README file for details on program options. Other external sources of useful information are provided below:
BLASTing Through Blast2GO
BLAST alignment searches can be performed directly through Blast2GO. However, please be aware that BLAST searches performed in this manner will link to the NCBI public web server, which may be slow or limited in terms of the number of sequences that can be submitted. To insure optimum performance, users may wish to consider performing their BLAST searches independently using BLAST/BLAST+ on the HPCC, and then performing the annotation step separately using Blast2GO.
XML BLAST Files
It is possible to pass an XML file to Blast2GO that is so large that it breaks internal array allocation and referencing. One such case involved a BLAST XML file of nearly 4.3GB. To get around this problem, you can try re-running BLAST with tighter filters to reduce output file size, or use a tool that splits large XML files into a series of smaller ones. Then, each can be run independently. Piecing together the resulting GO should be fairly straightforward, but making use of the graphs representing subsets of the Blast2GO output can be somewhat annoying. The tool for accomplishing this is available as part of the installation of Blast2GO on the HPCC. To use:
In the example above, we split the XML file "blastp_results.xml" into separate files containing 5000 sequences, each. You can adjust the sequence size to suit your needs.
There is another utility available with the HPCC installation of Blast2GO which converts big-BLAST text formatted files into XML. To use:
When prompted by the program, enter the number of sequences desired in each XML file.