Hprc banner tamu.png

Difference between revisions of "SW:BLAST"

From TAMU HPRC
Jump to: navigation, search
(BLAST & BLAST+)
Line 19: Line 19:
  
 
BLAST version 4 databases (pre BLAST+ 2.8.1) such as nr and nt can be found here (BLAST+ uses the same databases):
 
BLAST version 4 databases (pre BLAST+ 2.8.1) such as nr and nt can be found here (BLAST+ uses the same databases):
 +
  /scratch/data/bio/blast  (Terra)
 
   /scratch/datasets/blast  (Ada)
 
   /scratch/datasets/blast  (Ada)
  /scratch/data/bio/blast  (Terra)
 
  
 
Sample BLAST+ command:
 
Sample BLAST+ command:
Line 26: Line 26:
 
   blastx -query my_protein_sequences.fasta -db /scratch/datasets/blast/nr -outfmt 10 -out my_protein_sequences_nr_blastout.csv
 
   blastx -query my_protein_sequences.fasta -db /scratch/datasets/blast/nr -outfmt 10 -out my_protein_sequences_nr_blastout.csv
  
To see the current version of the nr or nt database use the following command:
+
To see the current version of the nr or nt database, load the latest version of BLAST+ and use the following command:
 
<pre>
 
<pre>
module load BLAST/2.2.26-x64-linux
+
Terra:
fastacmd -d /scratch/datasets/blast/nr -I
+
blastdbcmd -info -db /scratch/data/bio/blast/nr
 +
 
 +
Ada:
 +
blastdbcmd -info -db /scratch/datasets/blast/nr
 
</pre>
 
</pre>
  
BLAST+ version 2.8.1 supports the newest version 5 of the BLAST database which allows limiting a search based on taxonomy at the '''''species''''' level.
+
BLAST+ version 2.8.1 and newer supports the newest version 5 of the BLAST database which allows limiting a search based on taxonomy at the '''''species''''' level.
  
 +
The version 5 BLAST nr and nt databases are in the following directories. Let us know if you need others.
 
<pre>
 
<pre>
module load BLAST+/2.8.1-intel-2017b
+
Terra:
</pre>
+
/scratch/data/bio/blastdb_v5/nr_v5
 +
/scratch/data/bio/blastdb_v5/nt_v5
  
The version 5 BLAST nr and nt databases are in the following directories. Let us know if you need others.
 
<pre>
 
 
Ada:
 
Ada:
 
/scratch/datasets/blastdbv5/nr_v5
 
/scratch/datasets/blastdbv5/nr_v5
 
/scratch/datasets/blastdbv5/nt_v5
 
/scratch/datasets/blastdbv5/nt_v5
 
Terra:
 
/scratch/data/bio/blastdb_v5/nr_v5
 
/scratch/data/bio/blastdb_v5/nt_v5
 
 
</pre>
 
</pre>
  

Revision as of 13:21, 3 March 2020

BLAST & BLAST+

GCATemplates available: ada (blastx)

 module spider BLAST

or

 module spider BLAST+

The maximum recommended number of cores to use with blast is 8.

BLAST+ v2.7.1 blastp benchmarks for 1 protein sequence vs nr on Terra

 1 core     2GB mem   36.1 minutes
 7 cores   12GB mem   14.6 minutes
 8 cores   14GB mem   14.2 minutes
28 cores   54GB mem   27.5 minutes

BLAST version 4 databases (pre BLAST+ 2.8.1) such as nr and nt can be found here (BLAST+ uses the same databases):

 /scratch/data/bio/blast  (Terra)
 /scratch/datasets/blast  (Ada)

Sample BLAST+ command:

 blastx -query my_protein_sequences.fasta -db /scratch/datasets/blast/nr -outfmt 10 -out my_protein_sequences_nr_blastout.csv

To see the current version of the nr or nt database, load the latest version of BLAST+ and use the following command:

Terra:
blastdbcmd -info -db /scratch/data/bio/blast/nr

Ada:
blastdbcmd -info -db /scratch/datasets/blast/nr

BLAST+ version 2.8.1 and newer supports the newest version 5 of the BLAST database which allows limiting a search based on taxonomy at the species level.

The version 5 BLAST nr and nt databases are in the following directories. Let us know if you need others.

Terra:
/scratch/data/bio/blastdb_v5/nr_v5
/scratch/data/bio/blastdb_v5/nt_v5

Ada:
/scratch/datasets/blastdbv5/nr_v5
/scratch/datasets/blastdbv5/nt_v5

If your taxid is not recognized, it is most likely too high of a taxon level. Get subtree species taxid list from this webpage (33208 used as an example): https://www.ncbi.nlm.nih.gov/taxonomy/?term=txid33208%5Bsubtree%5D