Page tree
Skip to end of metadata
Go to start of metadata

Contents

Overview

Data downloaded from the NCBI website, or prepared by users can, in most cases, be easily converted for use with BLAST.  This brief tutorial is designed to illustrate a fairly basic scenario where the user wants to download a set of FASTA sequences from the NCBI website and prepare them for BLAST-ing.

Download

The simplest way to do this is to note the link of the FASTA file, and use either the "wget" or "curl" command.  For example:

 

wget ftp://ftp.ncbi.nih.gov/repository/UniGene/Triticum_aestivum/Ta.seq.all.gz

 

or...

 

curl -O ftp://ftp.ncbi.nih.gov/repository/UniGene/Triticum_aestivum/Ta.seq.all.gz

 

This will download the file "Ta.seq.all.gz" into the current directory.  Now unzip the file:

 

gunzip Ta.seq.all.gz

 

This will leave a file called "Ta.seq.all" in your directory.

Preparing the Indices

To prepare the BLAST indices for nucleotides:

 

module load BLAST
 
formatdb -i Ta.seq.all -p F

 

The command above will produce several files, such as:

 

Ta.seq.all.fa.nhr
Ta.seq.all.fa.nin
Ta.seq.all.fa.nsq

 

If you want to produce protein indices instead of, or in addition to nucleotides, run:

 

formatdb -i Ta.seq.all -p T

 

In this case, this will produce the files:

 

Ta.seq.all.fa.phr
Ta.seq.all.fa.pin
Ta.seq.all.fa.psq

 

You can verify whether your BLAST formatting was successful by looking at the "formatdb.log" file which should now be present in your directory.