$seoHelper.renderFullSimple($sitemeshPage,"{2} - {3}")
Page tree
Skip to end of metadata
Go to start of metadata

Contents

Overview

It is possible to run BLAST or BLAST+ on the HPCC in multi-threaded mode.  This is advantageous in that is allows users to leverage multiple processors to complete their BLAST searches, thereby decreasing compute time.

Multi-Threading vs. MPI

Multi-threaded BLAST runs enable the user to launch multiple worker threads on a single node.  However, because standard BLAST and BLAST+ do not use distributed memory, you cannot accomplish multi-threaded runs across multiple nodes.  Therefore, users executing multi-threaded BLAST or BLAST+ runs should not reserve more than one node (nodes=1), as this will reserve hardware resources that cannot be used.

Job Submission Guidelines

First, we need to differentiate between traditional NCBI BLAST and BLAST+.  Traditional NCBI BLAST utilizes the "-a #" flag to specify the number of processors to use for the job (default is 1).  BLAST+ uses the "-num_threads #" flag to specify the number of worker threads to use.  Depending upon which type of BLAST you use, you will need to adjust your job submission script parameters accordingly.

Traditional BLAST

Using the "-a" flag in BLAST will specify the number of processors to use.  To reserve the appropriate quantity of resources in your job submission script, you will need to reserve a number of cores equal to the value specified by the "-a" flag  For example, if you used a command like:

 

blastall -p blastp -d swissprot -i prot.fasta -o test1.blast -e 0.001 -a 4

 

You should specify something like the following in your job submission script:

 

#PBS -l nodes=1:ppn=4,walltime=4:00:00,mem=4gb

 

BLAST+

In contrast, BLAST+ uses the "-num_threads" flag to specify the number of worker threads to create.  In order to specify the correct number of cores for the job, you will need to ADD ONE to the number of threads specified.  This is to account for the number of worker threads, PLUS the main process thread.  So if you used an equivalent BLAST+ command like:

 

blastn -task blastn -db swissprot -query prot.fasta -out test1.blast -evalue 0.001 -num_threads 4

 

You should use something like:

 

#PBS -l nodes=1:ppn=5,walltime=4:00:00,mem=4gb

 

BLASTDB

The BLASTDB environmental variable tells BLAST or BLAST+ where to find your databases that can be searched.  On the HPCC, we offer select BLAST-ready data sets for this purpose in a common read-only area.  BLAST data sets can be accessed at:

 

/mnt/research/common-data/Bio/blastdb

 

If you are using the FASTA sequences instead of nucleotide data sets, you need to augment the path above as follows:

 

/mnt/research/common-data/Bio/blastdb/FASTA

 

For cluster jobs, you will need to set the value of BLASTDB in your job submission script, for example:

 

export BLASTDB=/mnt/research/common-data/Bio/blastdb:/mnt/research/common-data/Bio/blastdb/FASTA:$BLASTDB

 

A Word About Memory

In either case (BLAST or BLAST+) your requested memory (in the examples above, 4gb) will be divided amongst all of your task threads.  Plan accordingly.