Page tree
Skip to end of metadata
Go to start of metadata

Contents

Running mpiBLAST on the HPCC

Let's assume that we want to run our BLAST query against "nt" using the HPCC BLAST database repository.  The first step, will be to break the "nt" sequences up into 16 fragments.  Remember, we need to write our fragments into our home, scratch, or research space, since we won't have permissions to write in the HPCC BLAST data directory.  

First, lets create a subdirectory in our home space for our work, and load the mpiBLAST module:

 

mydir@dev-amd09:~/> mkdir myBLAST
mydir@dev-amd09:~/> module load mpiBLAST

 

We should also create a space for temporary worker fragments:

 

mydir@dev-amd09:~/> mkdir myBLAST/tmp

 

Next, let's create our .ncbirc file inside our working directory (myBLAST/.ncbirc):

 

[NCBI]
Data=/opt/software/mpiBLAST/mpiblast-1.6.0/ncbi/data
[BLAST]
BLASTDB=/mnt/home/mydir/myBLAST
[mpiBLAST]
Shared=/mnt/home/mydir/myBLAST
Local=/mnt/home/mydir/myBLAST/tmp

 

Now lets break the "nt" fasta sequences into fragments using the HPCC database repository:

 

mydir@dev-amd09:~/> cd myBLAST
mydir@dev-amd09:~/myBLAST> mpiformatdb -N 16 -i /mnt/research/common-data/Bio/blastdb/FASTA/nt.fa -o T -n /mnt/home/mydir/myBLAST/my_nt.fa

 

The command above will break the "nt" sequences (stored in  /mnt/research/common-data/Bio/blastdb/FASTA/nt.fa) into 16 fragments, and save the results to the "/mnt/home/mydir/myBLAST" directory with the file prefix "my_nt.fa".

To actually run an MPI simulation on HPCC, you will need to compose your run into a job submission script.  The following represents a viable example:

 

#PBS -l nodes=18:ppn=1,walltime=36:00:00,mem=18gb
#PBS -q main
#PBS -N mpiBLAST-test
module load mpiBLAST
mpiexec -n 18 mpiblast -p blastn -d my_nt.fa -i my_query.fa -o results.txt --use-parallel-write --time-profile=time.txt --removedb

 

In this example, we are using 18 nodes, 1 core per node, with an allocation of 1GB of memory per process thread (18 total).  This is convenient because by splitting up our worker threads to different nodes, we allow ourselves a much wider subset of the universe of compute nodes - that is, it's a lot easier to find a node that has at least 1 core and 1GB available, than it is to find one that has 3 or 6 cores available and 3-6GB of RAM.

Now let's pay closer attention to the command line itself.  Here are a few interesting features:

  1. We are of course, using 18 "nodes" or worker threads "-n 18"
  2. The blast type we're performing is a "blastn", where we are using our "my_query.fa" file to query against the "nt" database we previously broke up into fragments.  Results are written to results.txt.
  3. The "--use-parallel-wirte" flag enables parallel writing and is a directive to the mpiBLAST program
  4. Similarly, we can record time profile information to the file "time.txt" using the flag shown above
  5. The "--removedb" flag can be very useful in conserving disk space, as it will delete those temporary fragments from out "myBLAST/tmp" directory when the job is complete.

There are several other flags available that can be passed to the mpiBLAST program.  See the mpiBLAST user documentation for more information.