Hprc banner tamu.png

Difference between revisions of "HPRC:File Transfers"

From TAMU HPRC
Jump to: navigation, search
(initial version)
 
Line 3: Line 3:
 
=== File Transfer Software ===
 
=== File Transfer Software ===
  
There are several options to transfer files to and from HPRC clusters.
+
There are several options for choosing a software to transfer files to and from HPRC clusters.  The choice is largely depending on many factors, such as size, location, transfer frequency, etc.  If it's data size is small (transfer time is less than an hour), just pick a software convenient/familiar to you.
  
 
==== Globus Connect ====
 
==== Globus Connect ====
Line 9: Line 9:
 
[[SW:GlobusConnect|Globus Connect]] is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems or endpoints. Users can schedule transfer via a web interface on globus.org and receive notification after transfer is completed. The endpoint can be systems with Globus installed (like ada-ftn1) or user's personal desktop.
 
[[SW:GlobusConnect|Globus Connect]] is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems or endpoints. Users can schedule transfer via a web interface on globus.org and receive notification after transfer is completed. The endpoint can be systems with Globus installed (like ada-ftn1) or user's personal desktop.
  
What Globus Connect is good for
+
'''What Globus Connect is good at'''
* transfer large amount of data
+
* transfer large amount of data (say 100+ GB)
 
* it's fast (utilizing up to 4 data streams); as fast as the slowest link from your server/desktop/laptop to HPRC fast transfer nodes
 
* it's fast (utilizing up to 4 data streams); as fast as the slowest link from your server/desktop/laptop to HPRC fast transfer nodes
 
* resume for failed transfers
 
* resume for failed transfers
 
* receive notification after a scheduled transfer is completed
 
* receive notification after a scheduled transfer is completed
  
What Globus Connect is not good for
+
'''What Globus Connect is not good at'''
 
* your server or desktop/laptop must have Globus Connect software installed and setup as an endpoint
 
* your server or desktop/laptop must have Globus Connect software installed and setup as an endpoint
 
* it will not work if your files are on a server behind a firewall (not reachable from internet)
 
* it will not work if your files are on a server behind a firewall (not reachable from internet)
 +
* by default, data stream is not encrypted
  
How do I use Globus Connect
+
'''How do I use Globus Connect'''
* visit [[SW:GlobusConnect|Globus Connect]] for more information
+
* visit [[SW:GlobusConnect|Globus Connect]] wiki page for more information
 
* use endpoints: "TAMU ada-ftn1" or "TAMU ada-ftn2" for Ada/Curie cluster and "TAMU terra-ftn" for Terra cluster
 
* use endpoints: "TAMU ada-ftn1" or "TAMU ada-ftn2" for Ada/Curie cluster and "TAMU terra-ftn" for Terra cluster
  
Line 28: Line 29:
 
SCP and SFTP protocols are a means of securely transferring computer files between a local host and a remote host.
 
SCP and SFTP protocols are a means of securely transferring computer files between a local host and a remote host.
  
What SCP/SFTP is good for
+
'''What SCP/SFTP is good at'''
* transfer files
+
* ubiquitous; simple to use
 +
* ''sftp'' offers an interactive interface (command line) to download/upload files
 +
* data stream is encrypted
  
What SCP/SFTP is not good for
+
'''What SCP/SFTP is not good at'''
* not very fast (file transfer only uses one data stream)
+
* not very fast (file transfer only uses one data stream over SSH protocol)
  
How do I use SCP/SFTP
+
'''How do I use SCP/SFTP'''
 
* you can use command line on Linux, Mac or MobaXterm terminal to issue scp/sftp command
 
* you can use command line on Linux, Mac or MobaXterm terminal to issue scp/sftp command
 
* use [https://winscp.net/eng/index.php WinSCP] on Windows, [https://filezilla-project.org/ FileZilla] on Windows or Mac, or use File Transfer panel on MobaXterm
 
* use [https://winscp.net/eng/index.php WinSCP] on Windows, [https://filezilla-project.org/ FileZilla] on Windows or Mac, or use File Transfer panel on MobaXterm
 
* use "ada-ftn1.tamu.edu" or "ada-ftn2.tamu.edu" for Ada/Curie and "terra-ftn.hprc.tamu.edu" for Terra if your data transfer to Ada or Terra login nodes (ada.tamu.edu or terra.tamu.edu) is terminated after one hour; Ada/Terra login nodes have one hour CPU limit for all user processes.
 
* use "ada-ftn1.tamu.edu" or "ada-ftn2.tamu.edu" for Ada/Curie and "terra-ftn.hprc.tamu.edu" for Terra if your data transfer to Ada or Terra login nodes (ada.tamu.edu or terra.tamu.edu) is terminated after one hour; Ada/Terra login nodes have one hour CPU limit for all user processes.
 +
* check [http://man7.org/linux/man-pages/man1/scp.1.html scp man page] and [http://man7.org/linux/man-pages/man1/sftp.1.html sftp man page] for options and examples
  
  
Line 44: Line 48:
 
[https://rsync.samba.org/ rsync] is a fast, versatile, remote (and local) file-copying tool and recommended when relatively few differences exist between target and source versions, because rsync copies only the differences of files that have actually changed. By default, rsync uses the SSH remote shell.  
 
[https://rsync.samba.org/ rsync] is a fast, versatile, remote (and local) file-copying tool and recommended when relatively few differences exist between target and source versions, because rsync copies only the differences of files that have actually changed. By default, rsync uses the SSH remote shell.  
  
What rsync is good for
+
'''What rsync is good at'''
 
* resume file transfer for partial transferred file
 
* resume file transfer for partial transferred file
 
* synchronize files/dirs of two directories (local-local, local-remote, remote-local)
 
* synchronize files/dirs of two directories (local-local, local-remote, remote-local)
 +
* by default, transfer is over SSH protocol, so data stream is encrypted
  
What rsync is not good for
+
'''What rsync is not good at'''
 
* by default, files transferred over SSH which uses only one data stream and not very fast
 
* by default, files transferred over SSH which uses only one data stream and not very fast
  
How do I use rsync
+
'''How do I use rsync'''
 
* from command line on Linux, Mac, or MobaXterm terminal to issue rsync command
 
* from command line on Linux, Mac, or MobaXterm terminal to issue rsync command
 
* use [http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp DeltaCopy] or [https://sourceforge.net/projects/grsync/ Grsync] on Windows
 
* use [http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp DeltaCopy] or [https://sourceforge.net/projects/grsync/ Grsync] on Windows
 
* use [https://www.rsync.net/resources/howto/windows_rsync.html cwRsync 5.4.1] for command line on Windows
 
* use [https://www.rsync.net/resources/howto/windows_rsync.html cwRsync 5.4.1] for command line on Windows
 +
* check [http://man7.org/linux/man-pages/man1/rsync.1.html rsync man page] for options and examples
  
  
Line 61: Line 67:
 
[[SW:rclone|rclone]] is a tool for syncing files from HPRC systems to remote storage sites like Google Drive, Dropbox, Amazon's AWS and many more.
 
[[SW:rclone|rclone]] is a tool for syncing files from HPRC systems to remote storage sites like Google Drive, Dropbox, Amazon's AWS and many more.
  
What rclone is good for
+
'''What rclone is good at'''
 
* copy data to or from cloud (Google Drive, Dropbox, AWS, etc)
 
* copy data to or from cloud (Google Drive, Dropbox, AWS, etc)
  
What rclone is not good for
+
'''What rclone is not good at'''
 
* transfer can be be slow
 
* transfer can be be slow
  
How do I use rclone
+
'''How do I use rclone'''
 
* rclone is available on ada, terra and HPRC Lab workstations. No module is required for any of them
 
* rclone is available on ada, terra and HPRC Lab workstations. No module is required for any of them
* use Ada/Terra fast transfer nodes (ada-ftn1.tamu.edu, ada-ftn2.tamu.edu or terra-ftn.hprc.tamu.edu) for long data transfer (1+ hours)
+
* reference [[SW:rclone|rclone]] wiki page for instructions and examples
  
  
 
=== Other Considerations ===
 
=== Other Considerations ===
  
* Should I use FTN or login nodes?
+
* Should I use FTN or login nodes?
 +
For short file transfer (less than one hour), either one would work.  For transfer time over 1 hour, please use FTN node (ada-ftn1.tamu.edu and ada-ftn2.tamu.edu for Ada/Curie cluster; terra-ftn.hprc.tamu.edu for Terra cluster), which does not have one hour CPU process time limit.
  
* Why the transfer takes so long?
+
* Why the file transfer takes so long?
  
* I have 100+ TB to transfer.
+
* I want to back up files to my desktop/laptop.
 +
 
 +
* I have 100+ TB data to transfer.

Revision as of 10:57, 17 January 2019

Transfer Files

File Transfer Software

There are several options for choosing a software to transfer files to and from HPRC clusters. The choice is largely depending on many factors, such as size, location, transfer frequency, etc. If it's data size is small (transfer time is less than an hour), just pick a software convenient/familiar to you.

Globus Connect

Globus Connect is a reliable, high-performance file transfer platform allowing users to transfer large amounts of data seamlessly between systems or endpoints. Users can schedule transfer via a web interface on globus.org and receive notification after transfer is completed. The endpoint can be systems with Globus installed (like ada-ftn1) or user's personal desktop.

What Globus Connect is good at

  • transfer large amount of data (say 100+ GB)
  • it's fast (utilizing up to 4 data streams); as fast as the slowest link from your server/desktop/laptop to HPRC fast transfer nodes
  • resume for failed transfers
  • receive notification after a scheduled transfer is completed

What Globus Connect is not good at

  • your server or desktop/laptop must have Globus Connect software installed and setup as an endpoint
  • it will not work if your files are on a server behind a firewall (not reachable from internet)
  • by default, data stream is not encrypted

How do I use Globus Connect

  • visit Globus Connect wiki page for more information
  • use endpoints: "TAMU ada-ftn1" or "TAMU ada-ftn2" for Ada/Curie cluster and "TAMU terra-ftn" for Terra cluster


SCP/SFTP

SCP and SFTP protocols are a means of securely transferring computer files between a local host and a remote host.

What SCP/SFTP is good at

  • ubiquitous; simple to use
  • sftp offers an interactive interface (command line) to download/upload files
  • data stream is encrypted

What SCP/SFTP is not good at

  • not very fast (file transfer only uses one data stream over SSH protocol)

How do I use SCP/SFTP

  • you can use command line on Linux, Mac or MobaXterm terminal to issue scp/sftp command
  • use WinSCP on Windows, FileZilla on Windows or Mac, or use File Transfer panel on MobaXterm
  • use "ada-ftn1.tamu.edu" or "ada-ftn2.tamu.edu" for Ada/Curie and "terra-ftn.hprc.tamu.edu" for Terra if your data transfer to Ada or Terra login nodes (ada.tamu.edu or terra.tamu.edu) is terminated after one hour; Ada/Terra login nodes have one hour CPU limit for all user processes.
  • check scp man page and sftp man page for options and examples


rsync

rsync is a fast, versatile, remote (and local) file-copying tool and recommended when relatively few differences exist between target and source versions, because rsync copies only the differences of files that have actually changed. By default, rsync uses the SSH remote shell.

What rsync is good at

  • resume file transfer for partial transferred file
  • synchronize files/dirs of two directories (local-local, local-remote, remote-local)
  • by default, transfer is over SSH protocol, so data stream is encrypted

What rsync is not good at

  • by default, files transferred over SSH which uses only one data stream and not very fast

How do I use rsync


rclone

rclone is a tool for syncing files from HPRC systems to remote storage sites like Google Drive, Dropbox, Amazon's AWS and many more.

What rclone is good at

  • copy data to or from cloud (Google Drive, Dropbox, AWS, etc)

What rclone is not good at

  • transfer can be be slow

How do I use rclone

  • rclone is available on ada, terra and HPRC Lab workstations. No module is required for any of them
  • reference rclone wiki page for instructions and examples


Other Considerations

  • Should I use FTN or login nodes?

For short file transfer (less than one hour), either one would work. For transfer time over 1 hour, please use FTN node (ada-ftn1.tamu.edu and ada-ftn2.tamu.edu for Ada/Curie cluster; terra-ftn.hprc.tamu.edu for Terra cluster), which does not have one hour CPU process time limit.

  • Why the file transfer takes so long?
  • I want to back up files to my desktop/laptop.
  • I have 100+ TB data to transfer.