It is possible to run BLAST or BLAST+ on the HPCC in multi-threaded mode. This is advantageous in that is allows users to leverage multiple processors to complete their BLAST searches, thereby decreasing compute time.
Multi-Threading vs. MPI
Multi-threaded BLAST runs enable the user to launch multiple worker threads on a single node. However, because standard BLAST and BLAST+ do not use distributed memory, you cannot accomplish multi-threaded runs across multiple nodes. Therefore, users executing multi-threaded BLAST or BLAST+ runs should not reserve more than one node (nodes=1), as this will reserve hardware resources that cannot be used.
Job Submission Guidelines
First, we need to differentiate between traditional NCBI BLAST and BLAST+. Traditional NCBI BLAST utilizes the "-a #" flag to specify the number of processors to use for the job (default is 1). BLAST+ uses the "-num_threads #" flag to specify the number of worker threads to use. Depending upon which type of BLAST you use, you will need to adjust your job submission script parameters accordingly.
Using the "-a" flag in BLAST will specify the number of processors to use. To reserve the appropriate quantity of resources in your job submission script, you will need to reserve a number of cores equal to the value specified by the "-a" flag For example, if you used a command like:
You should specify something like the following in your job submission script:
In contrast, BLAST+ uses the "-num_threads" flag to specify the number of worker threads to create. In order to specify the correct number of cores for the job, you will need to ADD ONE to the number of threads specified. This is to account for the number of worker threads, PLUS the main process thread. So if you used an equivalent BLAST+ command like:
You should use something like:
The BLASTDB environmental variable tells BLAST or BLAST+ where to find your databases that can be searched. On the HPCC, we offer select BLAST-ready data sets for this purpose in a common read-only area. BLAST data sets can be accessed at:
If you are using the FASTA sequences instead of nucleotide data sets, you need to augment the path above as follows:
For cluster jobs, you will need to set the value of BLASTDB in your job submission script, for example:
A Word About Memory
In either case (BLAST or BLAST+) your requested memory (in the examples above, 4gb) will be divided amongst all of your task threads. Plan accordingly.