Page tree
Skip to end of metadata
Go to start of metadata

Contents

Overview

Blast2GO +is an application which annotates NCBI-BLAST aligned sequences with Gene Ontology+ (GO) information retained within a MySQL database.  The HPCC provides both the front-end Blast2GO application, as well as a back-end MySQL database which is available for searching on any HPCC dev- or compute-node.

Blast2GO comes in 2 flavors - a web-based GUI and as a command line tool called B2G4PIPE.  HPCC provides access to the command line tool, called the "Blast2GO Pipeline."

Running the Blast2GO Pipeline

A thumbnail sketch of the B2G4PIPE execution workflow is as follows:

Load the Module File

 

module load blast2go

 

Properties File

Grab a copy of the templated Blast2GO properties file and place it in your working directory:

 

cp /mnt/research/common-data/Bio/blast2go/b2gPipe.properties .

 

This properties file already contains the connection information to the Blast2GO MySQL database so do NOT alter the section labeled "GO and B2G Data Access Basic" as this will break the application's functionality.

Run the Pipeline

Run the blast2go pipeline (i.e. non-gui) version of the program with the desired/appropriate options and input files. 

The general format of the blast2go pipeline command is:

 

java es.blast2go.prog.B2GAnnotPipe -prop b2gPipe.properties <args>

 

Please see the attached README file for details on program options.  Other external sources of useful information are provided below:

Icon

Blast2GO accepts BLAST output in XML format only.  Plan your BLAST runs accordingly!

BLASTing Through Blast2GO

BLAST alignment searches can be performed directly through Blast2GO.  However, please be aware that BLAST searches performed in this manner will link to the NCBI public web server, which may be slow or limited in terms of the number of sequences that can be submitted.  To insure optimum performance, users may wish to consider performing their BLAST searches independently using BLAST/BLAST+ on the HPCC, and then performing the annotation step separately using Blast2GO.  

XML BLAST Files

It is possible to pass an XML file to Blast2GO that is so large that it breaks internal array allocation and referencing.  One such case involved a BLAST XML file of nearly 4.3GB.  To get around this problem, you can try re-running BLAST with tighter filters to reduce output file size, or use a tool that splits large XML files into a series of smaller ones.  Then, each can be run independently.  Piecing together the resulting GO should be fairly straightforward, but making use of the graphs representing subsets of the Blast2GO output can be somewhat annoying.  The tool for accomplishing this is available as part of the installation of Blast2GO on the HPCC.  To use:

 

module load blast2go
split_xml_blast 5000 blastp_results.xml

 

In the example above, we split the XML file "blastp_results.xml" into separate files containing 5000 sequences, each.  You can adjust the sequence size to suit your needs.

There is another utility available with the HPCC installation of Blast2GO which converts big-BLAST text formatted files into XML.  To use:

 

module load blast2go
blast2xml blastp_results.txt

 

When prompted by the program, enter the number of sequences desired in each XML file.