Velvet is a popular sequence assembler for very short reads, which is frequently used in conjunction with Oases for Denovo transcriptome assembly. The HPCC offers both applications as part of its "customer" repository. Both can be loaded using the modules system.
When used in conjunction with one another, Velvet and Oases must be appropriately matched by version/release, and also, based upon the value of MAXKMERLENGTH (the maximum hash length) and CATEGORIES (the number of short read categories), which should also be matched during respective builds. By loading the current Oases module (module load oases) users can be assured of loading compatible versions of both velvet and oases. Presently, the following were used in all HPCC builds of Velvet and Oases:
The following tool may be helpful in estimating memory requirements for your velvet run:
Make sure you substitute in the correct value for "k" based on the MAXKMERLENGTH used in your Velvet build (the HPCC default build value is provided in the previous section).
Versions of Velvet > 1.1.x are designed to provide multithreading support via the Openmp (OMP) library to facilitate parallel computation. Many users are confused by the differences between OMP and OpenMPI, which are actually not the same thing. Failure to understand the differences can (and has) resulted in all sorts of problems for Velvet users, not the least of which are node over-utilization, and the reservation of resources which are never actually used.
Simply stated, OMP applications like Velvet, thread processes in a shared-memory environment, whereas OpenMPI applications are capable of utilizing distributed (or shared) memory. What this means in practice, is that a single Velvet instance can only effectively run on a single node - that is, all threads are executed on the same node and cannot be spread across several machines.
The threading of velvetg can be controlled by setting the environmental variable OMP_NUM_THREADS in your job submission script:
When threading Velvet, it is necessary to tailor your request for cores to a number which will cover the number of threads assigned, PLUS 1. So for example, for a case where you might specify OMP_NUM_THREADS=7, your core request would be:
Requesting too few cores to accomodate the number of process threads will cause Velvet to send additional threads to other processors on the compute node. If those other cores are being used by other user processes, the performance of those other jobs will be slowed significantly.
Contrary to discussions on some of the bioinformatics boards, OASES IS NOT CURRENTLY MULTI-THREADED!
Notes on velveth
The environmental variable "OMP_NUM_THREADS" does not influence the behavior of "velveth" - only "velvetg". The "velveth" stage can generate some mulitthreading activity depending upon data size, and the amount of threading that will result can be difficult to predict (generally between 2-3 ppn). Even though the duration of the velveth stage is comparatively short compared to velvetg, be aware that it can trigger some utilization concerns.
Oases and Memory
OASES REQUIRES MASSIVE AMOUNTS OF MEMORY!
To-date, we have not observed successful Oases runs on the HPCC on nodes other than the amd09 (fat node) cluster. Although we cannot provide detailed specifics on how much memory your Oases run will take, as a general rule of thumb, anecdotal evidence on the HPCC and Oases discussion LISTSERV, indicates that typical requirements run in the 100GB to 250GB range. Please plan accordingly.