Hprc banner tamu.png

Difference between revisions of "HPRC:File Transfers"

From TAMU HPRC
Jump to: navigation, search
m (FTP)
(16 intermediate revisions by one other user not shown)
Line 8: Line 8:
 
==== Globus Connect ====
 
==== Globus Connect ====
  
[[SW:GlobusConnect|Globus Connect]] is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems or endpoints. Users can schedule transfer via a web interface on globus.org and receive notification after transfer is completed. The endpoint can be systems with Globus installed (like ada-ftn1) or user's personal desktop.
+
[[SW:GlobusConnect|Globus Connect]] is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems or endpoints. Users can schedule transfer via a web interface on globus.org and receive notification after transfer is completed. The endpoint can be systems with Globus installed (like terra-ftn) or user's personal desktop.
  
 
* ''What Globus Connect is good at''
 
* ''What Globus Connect is good at''
 
** transfer large amount of data (say 100+ GB)
 
** transfer large amount of data (say 100+ GB)
 
** it's fast (utilizing up to 4 data streams); as fast as the slowest link from your server/desktop/laptop to HPRC fast transfer nodes
 
** it's fast (utilizing up to 4 data streams); as fast as the slowest link from your server/desktop/laptop to HPRC fast transfer nodes
** transfer files between two endpoints (for example, between Ada and Terra, or between Ada and endpoint on your laptop)
+
** transfer files between two endpoints (for example, between Grace and Terra, or between Grace and endpoint on your laptop)
 
** personal endpoint works behind NAT (Network Address Translation; like your desktop behind a wifi router at home)
 
** personal endpoint works behind NAT (Network Address Translation; like your desktop behind a wifi router at home)
 
** resume for failed transfers
 
** resume for failed transfers
** receive notification after a scheduled transfer is completed
+
** can [https://docs.globus.org/how-to/get-started/#request_a_file_transfer sync] directories (similar to rsync)
 +
** receive a email notification after a scheduled transfer is completed
  
 
* ''What Globus Connect is not good at''
 
* ''What Globus Connect is not good at''
 
** your server or desktop/laptop must have Globus Connect software installed and setup as an endpoint
 
** your server or desktop/laptop must have Globus Connect software installed and setup as an endpoint
** by default, data stream is not encrypted
+
** <del>by default, data stream is not encrypted</del> encryption has been turned on by default as of Jan 14, 2021
  
 
* ''How do I use Globus Connect''
 
* ''How do I use Globus Connect''
 
** visit [[SW:GlobusConnect|Globus Connect]] wiki page for more information
 
** visit [[SW:GlobusConnect|Globus Connect]] wiki page for more information
 
** Globus Connect software is supported on Windows, Mac, and Linux.
 
** Globus Connect software is supported on Windows, Mac, and Linux.
** use endpoints: "TAMU ada-ftn1" or "TAMU ada-ftn2" for Ada/Curie cluster and "TAMU terra-ftn" for Terra cluster
+
** use endpoints: "TAMU terra-ftn" for Terra cluster, and "TAMU grace-dtn" for Grace cluster
  
  
Line 41: Line 42:
 
* ''How do I use SCP/SFTP''
 
* ''How do I use SCP/SFTP''
 
** you can use command line on Linux, Mac or MobaXterm terminal to issue scp/sftp command
 
** you can use command line on Linux, Mac or MobaXterm terminal to issue scp/sftp command
** use [https://winscp.net/eng/index.php WinSCP] on Windows, [https://filezilla-project.org/ FileZilla] (use SFTP protocol) on Windows or Mac, or use File Transfer panel on MobaXterm.  See instruction on [[HPRC:Access:Windows#File Transfer from Windows | file transfer from Windows]].
+
** use [https://winscp.net/eng/index.php WinSCP] on Windows, [https://filezilla-project.org/ FileZilla] (use SFTP protocol) on Windows or Mac, or use File Transfer panel on MobaXterm.  See instruction on [[HPRC:Access:Windows#File Transfer from Windows | file transfer from Windows]].  Also, check [[Two_Factor#Duo_With_FTP.2C_SSH_Clients|instructions on using SCP/SFTP software with Duo]].
** use "ada-ftn1.tamu.edu" or "ada-ftn2.tamu.edu" for Ada/Curie and "terra-ftn.hprc.tamu.edu" for Terra if your data transfer to Ada or Terra login nodes (ada.tamu.edu or terra.tamu.edu) is terminated after one hour; Ada/Terra login nodes have one hour CPU limit for all user processes.
+
** HPRC cluster login nodes have 60 minutes CPU time limit.  Your data transfer might get interrupted for moving large amount of data.  To by pass this problem, use [[HPRC:File_Transfers#rsync|rsync]] or use data transfer nodes ("terra-ftn.hprc.tamu.edu" for Terra, "grace-dtn1.hprc.tamu.edu", "grace-dtn2.hprc.tamu.edu" for Grace), which do not have CPU time limit.
 
** check [http://man7.org/linux/man-pages/man1/scp.1.html scp man page] and [http://man7.org/linux/man-pages/man1/sftp.1.html sftp man page] for options and examples
 
** check [http://man7.org/linux/man-pages/man1/scp.1.html scp man page] and [http://man7.org/linux/man-pages/man1/sftp.1.html sftp man page] for options and examples
 
**  Note two factor authentication has been enabled on Nov 4, 2019.  Please see [[Two Factor | two factor authentication wiki page]] on how to use two factor for some software listed above.
 
**  Note two factor authentication has been enabled on Nov 4, 2019.  Please see [[Two Factor | two factor authentication wiki page]] on how to use two factor for some software listed above.
Line 55: Line 56:
 
** data stream is not encrypted, thus data transfer rate is faster than SCP/SFTP
 
** data stream is not encrypted, thus data transfer rate is faster than SCP/SFTP
  
* ''What SCP/SFTP is not good at''
+
* ''What FTP is not good at''
 
** FTP protocol doesn't encrypt password and data transfer, thus less secure
 
** FTP protocol doesn't encrypt password and data transfer, thus less secure
 
** require FTP server at remote server
 
** require FTP server at remote server
  
 
* ''How do I use FTP''
 
* ''How do I use FTP''
** [https://lftp.yar.ru/lftp-man.html lftp] is available on Ada/Terra as a module. Run "module load lftp" to load the module.
+
** [https://lftp.yar.ru/lftp-man.html lftp] is available on Grace/Terra as a module. Run "module load lftp" to load the module.
** On FTN nodes, you can lunch lftp via absolute path (Ada: /sw/eb/software/lftp/4.8.4-GCCcore-6.4.0/bin/lftp ; Terra: /sw/eb/sw/lftp/4.8.4-GCCcore-6.4.0/bin/lftp).  To find the up-to-date path of lftp, first run "module load lftp" on Ada/Terra login nodes, then run "which lftp" to find the path of lftp.
+
** On FTN nodes, you can lunch lftp via absolute path (Terra: /sw/eb/sw/lftp/4.8.4-GCCcore-6.4.0/bin/lftp).  To find the up-to-date path of lftp, first run "module load lftp" on Terra login nodes, then run "which lftp" to find the path of lftp.
  
  
Line 80: Line 81:
 
** from command line on Linux, Mac, or MobaXterm terminal to issue rsync command
 
** from command line on Linux, Mac, or MobaXterm terminal to issue rsync command
 
** use [http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp DeltaCopy] or [https://sourceforge.net/projects/grsync/ Grsync] on Windows
 
** use [http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp DeltaCopy] or [https://sourceforge.net/projects/grsync/ Grsync] on Windows
** use [https://www.rsync.net/resources/howto/windows_rsync.html cwRsync 5.4.1] for command line on Windows
+
** use [https://www.itefix.net/cwrsync cwRsync client] for command line on Windows.  [https://www.rsync.net/resources/howto/windows_rsync.html Sample commands of using cwRsync].
 
** check [http://man7.org/linux/man-pages/man1/rsync.1.html rsync man page] for options and examples
 
** check [http://man7.org/linux/man-pages/man1/rsync.1.html rsync man page] for options and examples
 
** Note two factor authentication has been enabled on Nov 4, 2019. Please see [[ Two Factor | two factor authentication wiki]] page for details.
 
** Note two factor authentication has been enabled on Nov 4, 2019. Please see [[ Two Factor | two factor authentication wiki]] page for details.
 +
** For large files (100G+), you can use additional parameters with rsync to shorten the time to resume transfer after rsync transfer was interrupted. This example uses Terra FTN as the target host.
 +
  rsync -av --partial --inplace --append --progress my_file NetID@tarra-ftn.hprc.tamu.edu:/scratch/user/NetID/
  
  
Line 96: Line 99:
  
 
* ''How do I use rclone''
 
* ''How do I use rclone''
** rclone is available on ada, terra and HPRC Lab workstations. No module is required for any of them
+
** rclone is available on grace, terra and HPRC Lab workstations. No module is required for any of them
 
** reference [[SW:rclone|rclone]] wiki page for instructions and examples
 
** reference [[SW:rclone|rclone]] wiki page for instructions and examples
  
Line 113: Line 116:
  
 
* ''How do I use portal''
 
* ''How do I use portal''
** visit https://portal.hprc.tamu.edu, or visit https://portal-ada.hprc.tamu.edu for Ada portal and https://protal-terra.hprc.tamu.edu for Terra portal
+
** visit https://portal.hprc.tamu.edu, or visit https://protal-terra.hprc.tamu.edu for Terra portal, and https://portal-grace.hprc.tamu.edu for Grace portal.
 
** check [[SW:Portal|portal]] wiki page for additional info
 
** check [[SW:Portal|portal]] wiki page for additional info
 
** portal is CAS authenticated
 
** portal is CAS authenticated
 +
 +
 +
=== Tutorial Videos ===
 +
* [https://www.youtube.com/watch?v=oCSzuJf6p7g File transfer using WinSCP, MobaXterm, Filezilla, Cyberduck, and Open OnDemand portal] (March 12, 2021)
 +
* [https://www.youtube.com/watch?v=_ROApd7MtRQ File Transfers using Globus Connect] (May 13, 2021)
 +
 +
 +
=== Host selection ===
 +
 +
For transferring large files, selecting a better target host could greatly shorten the data transfer time.  TAMU firewall scanning adds a significant overhead to the data transfer flow through campus firewall.  The table below gives a general guidance of selecting target hosts for your transfer.  The transfer speed is rated "Very Good", "Good+" (a bit better than "Good"), "Good" and "Fair".
 +
 +
{| class="wikitable" style="text-align: center;"
 +
! src/dest
 +
! Terra login
 +
! Terra FTN
 +
! Grace login
 +
! Grace DTN
 +
|-
 +
! on TAMU campus network
 +
| Very Good
 +
| Fair
 +
| Very Good
 +
| Very Good
 +
|-
 +
! at home via TAMU VPN
 +
| Good+
 +
| Fair
 +
| Good+
 +
| Good+
 +
|-
 +
! HPRC cluster login
 +
| Very Good
 +
| Fair
 +
| Very Good
 +
| Very Good
 +
|-
 +
! HPRC cluster FTN/DTN
 +
| Fair
 +
| Very Good
 +
| Fair
 +
| Very Good
 +
|-
 +
! TACC cluster login
 +
| -
 +
| Very Good
 +
| -
 +
| Very Good
 +
|-
 +
! other HPC sites
 +
| -
 +
| Very Good <br>only Globus Connect
 +
| -
 +
| Very Good <br>only Globus Connect
 +
|}
 +
 +
Comments:
 +
* Unless it's noted, ssh, scp/sftp, and [[HPRC:File_Transfers#rsync|rsync]] can be used for data transfer.
 +
* [[SW:GlobusConnect|Globus Connect]] can be used to move files between HPRC clusters and other HPC sites.  Globus Connect command line can be used on FTN/DTN.
 +
* Terra FTN is outside campus firewall and all HPRC cluster login nodes are behind campus firewall.  Thus, file transfer from login nodes to Terra FTN will be impacted by campus firewall scanning.  Grace DTNs are behind firewall.
 +
  
 
=== Other Considerations ===
 
=== Other Considerations ===
 +
 +
* ''What is the best way to move my files between HPRC clusters?''
 +
[[HPRC:File_Transfers#Globus_Connect|Globus Connect]] is the best option.  Use [[HPRC:File_Transfers#rsync|rsync]] if you want sync a directory between two clusters and only a few files changed/added, or some files were partially transferred / changed.
 +
  
 
* ''I am off campus or travel abroad.  Can I transfer files from/to HPRC cluster?''
 
* ''I am off campus or travel abroad.  Can I transfer files from/to HPRC cluster?''
All Ada/Terra login nodes are behind TAMU campus firewall, so TAMU VPN access is required if you are off campus.  If you have personal [[SW:GlobusConnect|Globus Connect]] endpoint setup on your laptop, you can transfer files to/from your laptop via globus.org using [[SW:GlobusConnect|Globus Connect]], without using TAMU VPN.
+
All Grace/Terra login nodes are behind TAMU campus firewall, so TAMU VPN access is required if you are off campus.  If you have personal [[SW:GlobusConnect|Globus Connect]] endpoint setup on your laptop, you can transfer files to/from your laptop via globus.org using [[SW:GlobusConnect|Globus Connect]], without using TAMU VPN.
 +
 
  
 +
* ''Should I use FTN or regular Grace/Terra/Grace login nodes to transfer files?''
 +
Please see table in [[HPRC:File_Transfers#Host_selection|Host selection]] for a quick summary.
  
* ''Should I use FTN or regular Ada/Terra login nodes to transfer files?''
+
All Terra login nodes (terra.tamu.edu) and Grace login nodes (grace.hprc.tamu.edu) have 10 Gbps links to HPRC edge switches, which have dual 10 Gbps link to campus network.
All Terra login nodes (terra.tamu.edu) and two Ada login nodes (ada1.tamu.edu and ada2.tamu.edu) are now with 10 Gbps links to HPRC edge switches, which have dual 10 Gbps link to campus network (as Jan 6, 2020).
 
  
For short file transfer (less than one hour), either FTN or login nodes would work.  For transfer time over 1 hour, please use FTN node (ada-ftn1.tamu.edu and ada-ftn2.tamu.edu for Ada/Curie cluster; terra-ftn.hprc.tamu.edu for Terra cluster), which does not have one hour CPU process time limit.  Use "rsync" to counter CPU time limit, as "rsync" supports resume transfer, or use Globus Connect.
+
For short file transfer (less than one hour), either data transfer or login nodes would work.  For transfer time over 1 hour, please use data transfer nodes (terra-ftn.hprc.tamu.edu for Terra cluster; grace-dtn1.hprc.tamu.edu or grace-dtn2.hprc.tamu.edu for Grace cluster), which do not have one hour CPU process time limit.  Use [[HPRC:File_Transfers#rsync|rsync]] when possible, as rsync supports resume transfer, or use [[HPRC:File_Transfers#Globus_Connect|Globus Connect]] if moving files between two clusters.
  
If you are on campus or transfer small files over VPN, please use Terra login nodes (terra.tamu.edu) or ada1/ada2.tamu.edu to transfer your data.  If you are off campus, please consider using Globus Connect to transfer your data.
+
If you are on campus or transfer small files over VPN, please use Grace/Terra login nodes (terra.tamu.edu and grace.hprc.tamu.edu) to transfer your data.  If you are off campus, please consider using [[HPRC:File_Transfers#Globus_Connect|Globus Connect]] to transfer your data.
  
 
If you are using clusters at TACC, you can scp/sftp/rsync to FTN or use Globus Connect.
 
If you are using clusters at TACC, you can scp/sftp/rsync to FTN or use Globus Connect.
Line 134: Line 203:
  
 
* ''Can I download files from internet (say NIH) inside a job script?''
 
* ''Can I download files from internet (say NIH) inside a job script?''
Ada/Terra compute nodes do not have access to internet, so you cannot download files from internet on the compute node.  Please download necessary files on Ada/Terra login nodes or FTN nodes.
+
Grace/Terra compute nodes do not have access to internet, so you cannot download files from internet on the compute node.  Please set up [[SW:WebProxy|web proxy]] in your job script or download necessary files on Grace/Terra login nodes or DTN/FTN nodes.
  
  
 
* ''I want to back up files to my desktop/laptop.''
 
* ''I want to back up files to my desktop/laptop.''
"rsync" or "rclone" (to cloud storage) probably the best choice.
+
[[HPRC:File_Transfers#rsync|rsync]] or [[HPRC:File_Transfers#rclone|rclone]] (to cloud storage) probably the best choice.
  
  
* ''Can I mount my Ada/Terra home/scratch directory on my desktop/laptop to transfer files?''
+
* ''Can I mount my Grace/Terra home/scratch directory on my desktop/laptop to transfer files?''
 
[https://en.wikipedia.org/wiki/SSHFS '''SSHFS'''] is the best option.
 
[https://en.wikipedia.org/wiki/SSHFS '''SSHFS'''] is the best option.
  
Line 155: Line 224:
  
 
Number of data transfer streams would make a difference as well.  [[SW:GlobusConnect|Globus Connect]] utilizes up to 4 data streams to shorten the transfer time (more noticeable for large files; typically seeing 2.5x speedup).
 
Number of data transfer streams would make a difference as well.  [[SW:GlobusConnect|Globus Connect]] utilizes up to 4 data streams to shorten the transfer time (more noticeable for large files; typically seeing 2.5x speedup).
 +
 +
 +
* ''I don't see the answer for my file transfer issues.''
 +
Please contact us at help@hprc.tamu.edu with details of your questions.
 +
 +
 +
[[ Category:HPRC ]] [[ Category:Terra ]] [[ Category:Grace ]]

Revision as of 11:37, 6 November 2021

File Transfer on HPRC Clusters

File Transfer Software

There are several options for choosing a software to transfer files to and from HPRC clusters. The choice is largely depending on many factors, such as size, location, transfer frequency, etc. If the data size is small (transfer time is less than an hour), just pick a software convenient/familiar to you.


Globus Connect

Globus Connect is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems or endpoints. Users can schedule transfer via a web interface on globus.org and receive notification after transfer is completed. The endpoint can be systems with Globus installed (like terra-ftn) or user's personal desktop.

  • What Globus Connect is good at
    • transfer large amount of data (say 100+ GB)
    • it's fast (utilizing up to 4 data streams); as fast as the slowest link from your server/desktop/laptop to HPRC fast transfer nodes
    • transfer files between two endpoints (for example, between Grace and Terra, or between Grace and endpoint on your laptop)
    • personal endpoint works behind NAT (Network Address Translation; like your desktop behind a wifi router at home)
    • resume for failed transfers
    • can sync directories (similar to rsync)
    • receive a email notification after a scheduled transfer is completed
  • What Globus Connect is not good at
    • your server or desktop/laptop must have Globus Connect software installed and setup as an endpoint
    • by default, data stream is not encrypted encryption has been turned on by default as of Jan 14, 2021
  • How do I use Globus Connect
    • visit Globus Connect wiki page for more information
    • Globus Connect software is supported on Windows, Mac, and Linux.
    • use endpoints: "TAMU terra-ftn" for Terra cluster, and "TAMU grace-dtn" for Grace cluster


SCP/SFTP

SCP and SFTP protocols are a means of securely transferring computer files between a local host and a remote host.

  • What SCP/SFTP is good at
    • ubiquitous; simple to use
    • sftp offers an interactive interface (command line) to download/upload files
    • data stream is encrypted
  • What SCP/SFTP is not good at
    • not very fast (file transfer only uses one data stream over SSH protocol)
  • How do I use SCP/SFTP
    • you can use command line on Linux, Mac or MobaXterm terminal to issue scp/sftp command
    • use WinSCP on Windows, FileZilla (use SFTP protocol) on Windows or Mac, or use File Transfer panel on MobaXterm. See instruction on file transfer from Windows. Also, check instructions on using SCP/SFTP software with Duo.
    • HPRC cluster login nodes have 60 minutes CPU time limit. Your data transfer might get interrupted for moving large amount of data. To by pass this problem, use rsync or use data transfer nodes ("terra-ftn.hprc.tamu.edu" for Terra, "grace-dtn1.hprc.tamu.edu", "grace-dtn2.hprc.tamu.edu" for Grace), which do not have CPU time limit.
    • check scp man page and sftp man page for options and examples
    • Note two factor authentication has been enabled on Nov 4, 2019. Please see two factor authentication wiki page on how to use two factor for some software listed above.


FTP

File Transfer Protocol (FTP) protocol is used to transfer files between a client a server. FTP is not as popular as it was. NCBI still uses FTP to transfer data. FTP protocol transfers user name, password, and data unencrypted. Users should choose FTPS, whose password is transferred over encrypted channel, whenever possible.

  • What FTP is good at
    • FTP client offers an interactive interface (command line) to download/upload files
    • data stream is not encrypted, thus data transfer rate is faster than SCP/SFTP
  • What FTP is not good at
    • FTP protocol doesn't encrypt password and data transfer, thus less secure
    • require FTP server at remote server
  • How do I use FTP
    • lftp is available on Grace/Terra as a module. Run "module load lftp" to load the module.
    • On FTN nodes, you can lunch lftp via absolute path (Terra: /sw/eb/sw/lftp/4.8.4-GCCcore-6.4.0/bin/lftp). To find the up-to-date path of lftp, first run "module load lftp" on Terra login nodes, then run "which lftp" to find the path of lftp.


rsync

rsync is a fast, versatile, remote (and local) file-copying tool and recommended when relatively few differences exist between target and source versions, because rsync copies only the differences of files that have actually changed. By default, rsync uses the SSH remote shell.

  • What rsync is good at
    • resume file transfer for partial transferred file
    • synchronize files/dirs of two directories (local-local, local-remote, remote-local)
    • by default, transfer is over SSH protocol, so data stream is encrypted
  • What rsync is not good at
    • by default, files transferred over SSH which uses only one data stream and not very fast
    • compression option, "-z", might not shorten the transfer time
  • How do I use rsync
    • from command line on Linux, Mac, or MobaXterm terminal to issue rsync command
    • use DeltaCopy or Grsync on Windows
    • use cwRsync client for command line on Windows. Sample commands of using cwRsync.
    • check rsync man page for options and examples
    • Note two factor authentication has been enabled on Nov 4, 2019. Please see two factor authentication wiki page for details.
    • For large files (100G+), you can use additional parameters with rsync to shorten the time to resume transfer after rsync transfer was interrupted. This example uses Terra FTN as the target host.
 rsync -av --partial --inplace --append --progress my_file NetID@tarra-ftn.hprc.tamu.edu:/scratch/user/NetID/


rclone

rclone is a tool for syncing files from HPRC systems to remote storage sites like Google Drive, Dropbox, Amazon's AWS and many more.

  • What rclone is good at
    • copy data to or from cloud (Google Drive, Dropbox, AWS, etc)
  • What rclone is not good at
    • transfer can be slow
  • How do I use rclone
    • rclone is available on grace, terra and HPRC Lab workstations. No module is required for any of them
    • reference rclone wiki page for instructions and examples


portal

HPRC Portal (OnDemand) is a web platform through which users can access HPRC clusters and services with a web browser. You can download/upload file via menu "Files".

  • What portal is good at
    • web interface and simple to use
    • you can view content (text, image, movie) via web browser
  • What portal is not good at
    • transfer large files >2GB
    • transfer can be slow (file is transfer via single data stream)


Tutorial Videos


Host selection

For transferring large files, selecting a better target host could greatly shorten the data transfer time. TAMU firewall scanning adds a significant overhead to the data transfer flow through campus firewall. The table below gives a general guidance of selecting target hosts for your transfer. The transfer speed is rated "Very Good", "Good+" (a bit better than "Good"), "Good" and "Fair".

src/dest Terra login Terra FTN Grace login Grace DTN
on TAMU campus network Very Good Fair Very Good Very Good
at home via TAMU VPN Good+ Fair Good+ Good+
HPRC cluster login Very Good Fair Very Good Very Good
HPRC cluster FTN/DTN Fair Very Good Fair Very Good
TACC cluster login - Very Good - Very Good
other HPC sites - Very Good
only Globus Connect
- Very Good
only Globus Connect

Comments:

  • Unless it's noted, ssh, scp/sftp, and rsync can be used for data transfer.
  • Globus Connect can be used to move files between HPRC clusters and other HPC sites. Globus Connect command line can be used on FTN/DTN.
  • Terra FTN is outside campus firewall and all HPRC cluster login nodes are behind campus firewall. Thus, file transfer from login nodes to Terra FTN will be impacted by campus firewall scanning. Grace DTNs are behind firewall.


Other Considerations

  • What is the best way to move my files between HPRC clusters?

Globus Connect is the best option. Use rsync if you want sync a directory between two clusters and only a few files changed/added, or some files were partially transferred / changed.


  • I am off campus or travel abroad. Can I transfer files from/to HPRC cluster?

All Grace/Terra login nodes are behind TAMU campus firewall, so TAMU VPN access is required if you are off campus. If you have personal Globus Connect endpoint setup on your laptop, you can transfer files to/from your laptop via globus.org using Globus Connect, without using TAMU VPN.


  • Should I use FTN or regular Grace/Terra/Grace login nodes to transfer files?

Please see table in Host selection for a quick summary.

All Terra login nodes (terra.tamu.edu) and Grace login nodes (grace.hprc.tamu.edu) have 10 Gbps links to HPRC edge switches, which have dual 10 Gbps link to campus network.

For short file transfer (less than one hour), either data transfer or login nodes would work. For transfer time over 1 hour, please use data transfer nodes (terra-ftn.hprc.tamu.edu for Terra cluster; grace-dtn1.hprc.tamu.edu or grace-dtn2.hprc.tamu.edu for Grace cluster), which do not have one hour CPU process time limit. Use rsync when possible, as rsync supports resume transfer, or use Globus Connect if moving files between two clusters.

If you are on campus or transfer small files over VPN, please use Grace/Terra login nodes (terra.tamu.edu and grace.hprc.tamu.edu) to transfer your data. If you are off campus, please consider using Globus Connect to transfer your data.

If you are using clusters at TACC, you can scp/sftp/rsync to FTN or use Globus Connect.


  • Can I download files from internet (say NIH) inside a job script?

Grace/Terra compute nodes do not have access to internet, so you cannot download files from internet on the compute node. Please set up web proxy in your job script or download necessary files on Grace/Terra login nodes or DTN/FTN nodes.


  • I want to back up files to my desktop/laptop.

rsync or rclone (to cloud storage) probably the best choice.


  • Can I mount my Grace/Terra home/scratch directory on my desktop/laptop to transfer files?

SSHFS is the best option.


  • I have 100+ TB data to transfer.

Please contact us at help@hprc.tamu.edu. We would like to get more information first and see how we can support this.


  • Why the file transfer takes so long?

This is a complicated question. The file transfer time is largely depended on data size, bandwidth of the slowest link from HPRC cluster to your desktop (bottleneck link), how congested/busy the network link is, and how fast file system can read/write. For 100 Giga Bytes data transferred over a 100 Mbps link, it will take about 170 min under the best case scenario (80% efficiency, no cross traffic and no I/O bottleneck). Often, link speed of network to your desktop/laptop (the last mile) and storage on your desktop/laptop are the slowest part for the entire file transfer.

The transfer rate of medium for storing files (spinning disk, SSD, external disk via USB) could be a limiting factor. Spinning hard drive typically has a limit of 30~50 MB/sec transfer rate. Most USB storage (like USB sticks, USB external drive) cannot transfer at max speed of USB standard (USB 2.0 of 60 MB/sec and USB 3.0 of 640 MB/sec). When assess overall data transfer rate, be aware of transfer limit from combinations of storage speed, storage to computer interface (like USB), network link speed (Wifi or Ethernet), CPU speed (CPU can process limited number of file system access requests), and internet connecting speed of your laptop/desktop. Making a few test transfers with small data set can help you make a (more) reasonable transfer time for a larger data set.

Number of data transfer streams would make a difference as well. Globus Connect utilizes up to 4 data streams to shorten the transfer time (more noticeable for large files; typically seeing 2.5x speedup).


  • I don't see the answer for my file transfer issues.

Please contact us at help@hprc.tamu.edu with details of your questions.