Hprc banner tamu.png

Difference between revisions of "SW:SRA-toolkit"

From TAMU HPRC
Jump to: navigation, search
(Install Aspera)
(SRA-toolkit)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= SRA-toolkit =
 
= SRA-toolkit =
Browse SRA using [https://ewels.github.io/sra-explorer/ SRA Explorer] where you can get URLs using the 'saved datasets' feature to directly download fastq files using wget instead of having to use SRA-toolkit.
+
Used to download Sequence Read Archive files and extract into fasta file(s).
  
 
   module load SRA-Toolkit/2.9.6-centos_linux64
 
   module load SRA-Toolkit/2.9.6-centos_linux64
Line 14: Line 14:
 
The compute nodes are not connected to the internet so they can't be used to download SRA files.
 
The compute nodes are not connected to the internet so they can't be used to download SRA files.
  
You can download SRA files using one of three approaches
+
You can download SRA files using one of two approaches
  
 
1) Login to ada-ftn1.tamu.edu since the fastq-dump and prefetch commands are available on the fast transfer nodes.
 
1) Login to ada-ftn1.tamu.edu since the fastq-dump and prefetch commands are available on the fast transfer nodes.
Line 24: Line 24:
 
   fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR958661 --gzip &
 
   fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR958661 --gzip &
  
2) Prefetch the file using the login node then run the fastq-dump command in a job script if you think fastq-dump will take longer than one hour
+
2a) Prefetch the file using the ada-ftn1.tamu.edu login node. This example will download a file named SRR958661.sra
 
<pre>
 
<pre>
module load SRA-Toolkit/2.9.6-centos_linux64
 
 
prefetch --output-directory ./ SRR958661
 
prefetch --output-directory ./ SRR958661
 
</pre>
 
</pre>
  
3) Download the .sra file and then fastq-dump the file locally using a job script. Save the .sra file in the same location as the job script.
+
2b) You can either use fastq-dump on the command line of the login node to extract the fasta files from the .sra file if you think it will take less than one hour or if it is a large .sra file and may take longer than one hour, you can run the fastq-dump as a job script:
  
Sample command to use in your job scipt where the .sra file has already been downloaded
+
Sample command to use in your job scipt where the .sra file has already been downloaded. Notice this time the SRR file ends with .sra which is the file downloaded from the example in 2a.
 
<pre>
 
<pre>
 
module load SRA-Toolkit/2.9.6-centos_linux64
 
module load SRA-Toolkit/2.9.6-centos_linux64
Line 38: Line 37:
 
</pre>
 
</pre>
  
3) SRA-toolkit (fast-dump) is available in Maroon Galaxy.
+
SRA-toolkit (fast-dump) is also available in Maroon Galaxy.
  
 +
Browse SRA using [https://ewels.github.io/sra-explorer/ SRA Explorer] where you can get URLs using the 'saved datasets' feature to directly download fastq files using wget instead of having to use SRA-toolkit.
  
 
== Install Aspera ==
 
== Install Aspera ==

Latest revision as of 11:23, 25 June 2020

SRA-toolkit

Used to download Sequence Read Archive files and extract into fasta file(s).

 module load SRA-Toolkit/2.9.6-centos_linux64

SRA-toolkit will download files to your home directory be default and since your home directory is limited to 10GB, you can redirect the downloads to your scratch space by creating a directory in scratch and making a symbolic link to that directory from your home directory

cd
mkdir -p $SCRATCH/ncbi/public/sra
ln -s $SCRATCH/ncbi

The compute nodes are not connected to the internet so they can't be used to download SRA files.

You can download SRA files using one of two approaches

1) Login to ada-ftn1.tamu.edu since the fastq-dump and prefetch commands are available on the fast transfer nodes.

 ssh NetID@ada-ftn1.tamu.edu

Sample command for downloading SRA file from ada-ftn1.tamu.edu for a paired end read .sra file. This is useful if you will be using Trinity 2.4.0 which likes /1 and /2 at the end of fastq headers.

 fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR958661 --gzip &

2a) Prefetch the file using the ada-ftn1.tamu.edu login node. This example will download a file named SRR958661.sra

prefetch --output-directory ./ SRR958661

2b) You can either use fastq-dump on the command line of the login node to extract the fasta files from the .sra file if you think it will take less than one hour or if it is a large .sra file and may take longer than one hour, you can run the fastq-dump as a job script:

Sample command to use in your job scipt where the .sra file has already been downloaded. Notice this time the SRR file ends with .sra which is the file downloaded from the example in 2a.

module load SRA-Toolkit/2.9.6-centos_linux64
fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR958661.sra

SRA-toolkit (fast-dump) is also available in Maroon Galaxy.

Browse SRA using SRA Explorer where you can get URLs using the 'saved datasets' feature to directly download fastq files using wget instead of having to use SRA-toolkit.

Install Aspera

SRA-Toolkit will look to see if you have Aspera installed. The Aspera ascp command will download SRA files quicker than wget. Run the installation script from any directory. This will install configuration files in your ~/.aspera

/scratch/helpdesk/ngs/ibm-aspera-connect-3.9.8.176272-linux-g2.12-64.sh


Example command: ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR315/009/ERR3155119/ERR3155119.fastq.gz ./