Page tree
Skip to end of metadata
Go to start of metadata

Contents

Overview

Mothur is an open source bioinformatics toolkit aimed primarily at addressing the needs of the microbial ecology community.  The Mothur application can be built to provide single node multi-threaded functionality for certain tools, or MPI-capable functionality via OpenMPI, BUT NOT BOTH.  This informational tutorial addresses the differences between each build, how to use them, and the potential advantages and costs to each approach.

Icon

Mothur must be built explicitly to link against the OpenMPI libraries. If Mothur is not built in this way, it will not be MPI-capable, but WILL be able to use multiple processors on a single node.

MPI Run Characteristics

As discussed in other Mothur tutorials, MPI runs of Mothur are initiated by using something like the following:

 

mpirun -np 8 mothur <mybatch.file>

 

If you fail to use the "mpirun -np #" prefix to the command on MPI-builds of Mothur, and also use the "processors=#" option on an eligible command, YOU WILL NOT BE PROVIDED WITH MULTI-PROCESSOR PERFORMANCE.  That is to say, Mothur will not default to single-node multiprocessor mode on MPI-builds in the absence of the "mpirun -np #" command directive.  

Imagine for example, you load the default MPI-build of Mothur and attempt to run the "cluster.split" function using multiple processors (4 in the following example):

 

module load mothur
mothur
 
mothur v.1.31.2
Last updated: 6/13/2013
by
Patrick D. Schloss
Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org
When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.
Distributed under the GNU General Public License
Type 'help()' for information on the commands that are available
Type 'quit()' to exit program
 
mothur >cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta,count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table,taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=4)

 

In this case, you will not utilize 4 cpus, only 1, since the MPI-build does not provide multi-threaded single-node functionality outside of MPI.

In order to run the above example with 4 cpus, it is strictly necessary to invoke mothur inside the context of MPI by modifying the initial launch line as follows:

 

module load mothur
mpirun -np 4 mothur

 

Although the examples above illustrate interactive job runs, this also applies when running scheduled jobs with job scripts on the HPCC cluster.

The following Mothur installs on the HPCC have been designed to accommodate MPI-capable runs:

  • 1.31.2 (default)
  • 1.30.1
  • 1.29.0
  • 1.27.0
  • 1.26.0
  • 1.25.0
  • 1.24.0, 1.23.1, 1.21.1, 1.20.3, 1.18.1

Multi-Processor, Non-MPI Runs

The most recent version of Mothur as of this writing has been installed as a build for non-MPI multi-threaded runs only:

 

module load mothur/1.31.2b

 

If you desire to run Mothur on a single node with multiple processors, you must load the version above for this to work correctly.

MPI versus Non-MPI Multiple CPU Runs

The primary advantages of MPI-capable runs over multi-processor single-node runs may be summarized as follows:

  • More processors can be dedicated to the task than are available on a single node
  • May be easier to schedule if a few processors are distributed across several nodes than trying to occupy most or all of a single node (potentially shorter queue wait time)

The primary disadvantages of MPI-capable Mothur runs may be summarized as:

  • Slower on single node runs
  • Higher overhead and less memory efficient
  • The additional processor advantages offered by MPI may be cancelled out by I/O waits to disk

Preliminary run testing is recommended to determine which approach is best for your data set and selected analyses.  Remember, not all of the functions in the Mothur tool set are multiprocessor/MPI capable.  Make sure to evaluate the functions you are using to determine if multiprocessor capabilities even apply to any portion of your run.  See the next section for more information.

Mulit-Processor Capable Mothur Functions

The following table presents a list of multi-processor capable mothur tools, and which of those, are additionally MPI-capable (as of this writing).

Tool
Multi-Processor Capable
MPI-capable
align.seqsyesyes

chimera.bellerphon

yesyes

chimera.ccode

yesyes
chimera.checkyesyes
chimera.pintailyesyes
chimera.slayeryesyes
chimera.uchineyesyes
classify.seqsyesyes
cluster.splityesyes
dist.seqsyesyes
filter.seqsyesyes
indicatoryesno
dist.sharedyesno
metastatsyesno
pairwise.seqsyesyes
parsimonyyesno
phylo.diversityyesno
rarefaction.singleyesno
 screen.seqsyesyes
summary.seqsyesyes
summary.sharedyesno
trim.seqsyesno
unifrac.unweightedyesno
unifrac.weightedyesno