Page tree
Skip to end of metadata
Go to start of metadata



SnpEff is a toolkit for predicting and annotating the effects of gene variants.  GATK (variant calling) supports SnpEff and is often used in combination with the toolkit (see the GATK tutorial for more information).

There are a couple of important tips in using SnpEff on the HPCC.  This tutorial shows you the basics on how to get started.

SnpEff Configuration File

The SnpEff configuration file is essential for specifying key run parameters, including (most importantly), the location of the databases to be used for your analysis.  For your convenience, a configuration file template has been provided for your use in the following common directory:




To use SnpEff, you first need to copy this file to your working directory, and make any necessary changes desired for your analysis.  Note that the configuration parameter "data_dir" (the database location) defaults to the following path in your home directory space:


data_dir = ~/snpEff/data/


You may leave this as-is, providing you actually place your databases in this path.  Otherwise, you will need to update it to match the correct location.

For more information on the nuances of other configuration parameters, please refer to the SnpEff documentation (refer to Section 3, "Configuration").

SnpEff Databases

As of this writing, the developers state that there are over 2500 pre-built databases available for use with SnpEff.  For most people, this means it should be unnecessary to build your own database.  However, for those whose genome is not supported, please refer to the SnpEff documentation (Section 17, "Building a Database").

The current list of pre-built databases available for SnpEff can be obtained by using the following:


module load SnpEff
java -jar $SNPEFF/snpEff.jar databases


For your convenience, a list has been created and is available for your inspection in the common directory path:




Grep'ing on that file is probably the easiest method of finding out if your genome is supported.

The "supported_dbs" file also contains download URLs for obtaining pre-built databases, or you can always browse for them on the SnpEff Sourceforge website

However, the recommended method of obtaining the most recent pre-built databases is to use the SnpEff command itself.  For example:


cd ~/snpEff/data
module load SnpEff
java -jar $SNPEFF/snpEff.jar download GRCh37.69


Using the above would place a copy of the Human Genome in the directory ~/snpEff/data.  

Note that grep'ing on the "supported_dbs" file yields the following:


cat /mnt/research/common-data/Bio/SnpEff/supported_dbs | grep GRCh37.69
GRCh37.69                                                       Homo_sapiens                                                      



Pre-built SnpEff databases are segregated by version. For example, if the version of SnpEff you're using is v. 3.3, you will want to download the database from the v. 3.3 subdirectory of the SnpEff website. Using the "download" command ensures you will get the correct database for your version.

Finally, make whatever changes that might be needed to the configuration file based on where you actually placed your database files (for example, if you placed them on scratch to maximize I/O performance).

Running SnpEff

Once you have made the necessary modifications to your configuration file, and downloaded your desired databases, you are ready to run SnpEff.  As part of executing the SnpEff command, make sure to specify the location of the configuration file you wish to use for the run. 

For example:


module load SnpEff
java -jar $SNPEFF/snpEff.jar -c /path/to/configuration/file/snpEff.config <args>


More information

For more information on using SnpEff, please refer to the following documentation: