$seoHelper.renderFullSimple($sitemeshPage,"{2} - {3}")
Page tree
Skip to end of metadata
Go to start of metadata

Contents

Overview

MAKER is a gene annotation pipelilne for both eukaryotic and prokaryotic genome projects.  The HPCC build of MAKER leverages the external applications: NCBI BLAST, Augustus, TRF, RepeatMasker, ExoneratetRNAscan, SNAP and GeneMark-ES.  The default HPCC MAKER build uses RMBlast as the default search application, and the full version of the RepeatMasker Libraries has been installed from Giri (Repbase).

The HPCC build of MAKER has both MPI (distributed memory) and non-MPI multi-threaded (shared memory, single node) capabilities.

Several of the external applications required for the MAKER 2.31 pipeline are self-contained within the MAKER build.  A few others must be downloaded, built and configured separately.  The table below presents the versions and source of each external application used by the current MAKER build:

Application
Version
Installation Type
Comments
augustus3.0.2internal 
Bioperl1.6.901external 
BLAST+2.2.27internalblastn, blastx, tblastx, makeblastdb
BLAST2.2.25externalformatdb, blastall
Exonerate2.4.7internal 
GeneMarkES2.3eexternal 
MPICH21.4.1p1external 
RepeatMasker4.0.5internalincludes full RepBase libraries for RepeatMasker
RMBlast2.2.28externaldefault search for RepeatMasker
snap2066-07-28internal 
trf4.04external 
tRNAscan-SE1.3.1external 

Running MAKER ( >= 2.31)

MAKER versions >= 2.31 were built with MPICH2 to provide full MPI-capable processing on one or more nodes.  MAKER versions 2.28 and  2.10 (also available) also feature MPI support since November 2013.  To use the latest MPI-capable MAKER:

 

module swap OpenMPI MPICH2
module load MAKER

 

Loading the module above will load the necessary prerequisites and set all of the required environmental variables and paths.  MAKER runs are accomplished inside the data directory and are guided by a series of control files which need to be generated first before the main maker command is executed.

Generating Control Files

Before you can begin running MAKER, you will need to generate run-specific control (CTL) files.  In this tutorial, we are going to load the MPI-capable build version of MAKER, generate the control file, and then run it on the example data provided in the MAKER source package.  To facilitate this, we are going to copy the example data into a subdirectory of our scratch space:

 

module swap OpenMPI MPICH2
module load MAKER
cd /mnt/scratch/myUid
cp -R /opt/software/MAKER/2.31--GCC-4.4.5/data .

 

You should now have a directory called:

 

/mnt/scratch/myUid/data

 

Containing the following files:

 

dpp_contig.fasta
dpp_est.fasta
dpp_protein.fasta
hsap_contig.fasta
hsap_est.fasta
hsap_protein.fasta
te_proteins.fasta

 

Now lets generate the CTL files:

 

maker -CTL

 

We should now have the following three (3) control files:

  • maker_exe.ctl - contains the path information for the underlying executables.
  • maker_bopt.ctl - contains filtering statistics for BLAST and Exonerate
  • maker_opt.ctl - contains all other information for MAKER, including the location of the input genome file.

It should NOT be necessary to change the contents of the first file (maker_exe.ctl) - all entries should be pre-populated with the correct executable paths.  Most users would likely only need to concentrate on making changes to the BLAST/Exonerate and MAKER run options.

For our example, we are going to change the "maker_opt.ctl" file to tell MAKER where to find our input files.  Fire-up your favorite text editor (in this example, we'll use nano) and change the following lines as shown:

 

genome=dpp_contig.fasta
est=dpp_est.fasta
protein=dpp_protein.fasta
est2genome=1

 

Save the file.

Running as a Cluster Job

Now we're going to launch the job on the HPCC cluster using the application's MPI capabilities.  For this example, let's use the following in a job script called "makerTest.sh":

 

#!/bin/bash --login
# PBS -N MAKER_test
#PBS -m abe
#PBS -q main
#PBS -l nodes=2:ppn=2,mem=20gb,walltime=01:00:00
module swap OpenMPI MPICH2
module load MAKER
cd /mnt/scratch/myUid/data
 
mpiexec -n 4 maker
 
qstat -f ${PBS_JOBID}

 

In this example, we are going to use 2 nodes, 2 cores each, with a total memory allocation of 20GB for all processors (total job, or average about 5GB each).  Your needs for your data set may vary, and this is more for illustrative purposes than anything else.  The maker command is run with "mpiexec" and the flag "-n 4" which is control to invoke MPICH2 with 4 CPUs.

Now launch the job:

 

qsub makerTest.sh

 

Icon

You MUST launch MAKER from the directory containing the CTL and data input files.

This run should finish pretty quickly.  If you scan your run directory, you should see a new subdirectory called "dpp_contig.maker.output".  This will contain your results.

Older Versions of MAKER (2.10)

If using MAKER 2.10 on the HPCC, there is a separate executable file for MPI runs.  The process for running MAKER is nearly identical to the instructions provided above, except for the following command line argument:

 

mpiexec -n 4 mpi_maker

 

Output Summary

Inside this new subdirectory, you should see the following:

  • The maker_opts.logmaker_exe.log, and maker_bopts.log files are logs of the control files used for this run of MAKER.
  • The mpi_blastdb directory contains FASTA indexes and BLAST database files created from the input EST, protein, and repeat databases.
  • The dpp_contig_master_datastore_index.log contains information on both the run status of individual contigs and information on where individual contig data is stored.
  • The dpp_contig_datastore directory contains a set of subfolders, each containing the final MAKER output for individual contigs from the genomic fasta file.

You should also see two (2) other files containing standard program output.  Let's assume our job number was: "234567". These two output files would be named:

  • makerTest.sh.e234567 - browse this to see details of the MAKER run which would normally stream to the screen.
  • makerTest.sh.o234567 - browse to see the output of the "qstat -f" command at the end of the job script above, or to view errors (if any).