Page tree
Skip to end of metadata
Go to start of metadata

Contents

Overview

Velvet is a popular sequence assembler for very short reads, which is frequently used in conjunction with Oases for Denovo transcriptome assembly.  The HPCC offers both applications as part of its "customer" repository.  Both can be loaded using the modules system.

When used in conjunction with one another, Velvet and Oases must be appropriately matched by version/release, and also, based upon the value of MAXKMERLENGTH (the maximum hash length) and CATEGORIES (the number of short read categories), which should also be matched during respective builds.  By loading the current Oases module (module load oases) users can be assured of loading compatible versions of both velvet and oases.  Presently, the following were used in all HPCC builds of Velvet and Oases:

 

MAXKMERLENGTH=70
CATEGORIES=3

 

Memory Estimating

The following tool may be helpful in estimating memory requirements for your velvet run:

Make sure you substitute in the correct value for "k" based on the MAXKMERLENGTH used in your Velvet build (the HPCC default build value is provided in the previous section).

Multithreading

Versions of Velvet > 1.1.x are designed to provide multithreading support via the Openmp (OMP) library to facilitate parallel computation.  Many users are confused by the differences between OMP and OpenMPI, which are actually not the same thing.  Failure to understand the differences can (and has) resulted in all sorts of problems for Velvet users, not the least of which are node over-utilization, and the reservation of resources which are never actually used.

Simply stated, OMP applications like Velvet, thread processes in a shared-memory environment, whereas OpenMPI applications are capable of utilizing distributed (or shared) memory.  What this means in practice, is that a single Velvet instance can only effectively run on a single node - that is, all threads are executed on the same node and cannot be spread across several machines.  

Icon

Thus, reserving any more than one (1) node for a Velvet run is futile and unnecessarily removes valuable resources from the shared pool.

The threading of velvetg can be controlled by setting the environmental variable OMP_NUM_THREADS in your job submission script:

 

export OMP_NUM_THREADS=7

 

When threading Velvet, it is necessary to tailor your request for cores to a number which will cover the number of threads assigned, PLUS 1.  So for example, for a case where you might specify OMP_NUM_THREADS=7, your core request would be:

 

nodes=1:ppn=8

 

Requesting too few cores to accomodate the number of process threads will cause Velvet to send additional threads to other processors on the compute node.  If those other cores are being used by other user processes, the performance of those other jobs will be slowed significantly.  

Icon

Reserving an insufficient number of cores to accomodate your Velvet threads, will result in node over-utilization, and you run a significant risk of having your job killed.

Contrary to discussions on some of the bioinformatics boards, OASES IS NOT CURRENTLY MULTI-THREADED!

Notes on velveth

The environmental variable "OMP_NUM_THREADS" does not influence the behavior of "velveth" - only "velvetg".  The "velveth" stage can generate some mulitthreading activity depending upon data size, and the amount of threading that will result can be difficult to predict (generally between 2-3 ppn).  Even though the duration of the velveth stage is comparatively short compared to velvetg, be aware that it can trigger some utilization concerns.

Icon

It is recommended that even if it is your intent to run a single-threaded "velvetg" stage, that users prescribe at least 3 ppn to cover the velveth stage of the run to mitigate any "overutilization" issues.

Oases and Memory

Plainly stated:  

OASES REQUIRES MASSIVE AMOUNTS OF MEMORY!

 

To-date, we have not observed successful Oases runs on the HPCC on nodes other than the amd09 (fat node) cluster.  Although we cannot provide detailed specifics on how much memory your Oases run will take, as a general rule of thumb, anecdotal evidence on the HPCC and Oases discussion LISTSERV, indicates that typical requirements run in the 100GB to 250GB range.  Please plan accordingly.