Hprc banner tamu.png

Difference between revisions of "Terra:QuickStart"

From TAMU HPRC
Jump to: navigation, search
(Submitting and Monitoring Jobs)
(Navigating Terra & Storage Quotas)
Line 39: Line 39:
  
 
<font color=purple>
 
<font color=purple>
Your ''scratch'' directory is restricted to 1TB/50,000 files of storage. This quota is '''extendable''' upon request.
+
Your ''scratch'' directory is restricted to 1TB/50,000 files of storage. This storage quota is '''extendable''' upon request.
 
   
 
   
Your ''home'' directory is restricted to 10GB/10,000 files of storage. This quota is '''not extendable'''.
+
Your ''home'' directory is restricted to 10GB/10,000 files of storage. This storage quota is '''not extendable'''.
 
</font>
 
</font>
  
You can see the current status of your quota with:
+
You can see the current status of your storage quotas with:
 
  showquota
 
  showquota
  
If you need a quota increase, please contact us with justification and the expected length of time that you will need the extended quota.
+
If you need a storage quota increase, please contact us with justification and the expected length of time that you will need the extended quota.
  
 
==The Batch System==
 
==The Batch System==

Revision as of 13:43, 28 November 2016

Terra Quick Start Guide

Terra Usage Policies

Access to Terra is granted with the condition that you will understand and adhere to all TAMU HPRC and Terra-specific policies.

General policies can be found on the HPRC Policies page.

Terra-specific policies, which are similar to Ada, can be found on the Terra Policies page.

Accessing Terra

Most access to Terra is done via a secure shell session.

Users on Windows computers use either PuTTY or MobaXterm. If MobaXterm works on your computer, it is usually easier to use.

Users on Mac and Linux/Unix should use whatever SSH-capable terminal is available on their system.

The command to connect to Terra is as follows. Be sure to replace [NetID] with your TAMU NetID.

ssh [NetID]@terra.tamu.edu

Your login password is the same that used on Howdy. You will not see your password as your type it into the login prompt.

Navigating Terra & Storage Quotas

When you first access Terra, you will be within your home directory. This directory has smaller quotas and should not be used for general purpose.

You can navigate to your home directory with the following command:

cd /home/[NetID]

Your scratch directory has more space than your home directory and is recommended for general purpose use. You can navigate to your scratch directory with the following command:

cd /scratch/user/[NetID]

You can navigate to scratch or home easily by using their respective environment variables.

Navigate to scratch with the following command:

cd $SCRATCH

Navigate to home with the following command:

cd $HOME

Your scratch directory is restricted to 1TB/50,000 files of storage. This storage quota is extendable upon request.

Your home directory is restricted to 10GB/10,000 files of storage. This storage quota is not extendable.

You can see the current status of your storage quotas with:

showquota

If you need a storage quota increase, please contact us with justification and the expected length of time that you will need the extended quota.

The Batch System

The batch system is a load distribution implementation that ensures convenient and fair use of a shared resource. Submitting jobs to a batch system allows a user to reserve specific resources with minimal interference to other users. All users are required to submit resource-intensive processing to the compute nodes through the batch system - attempting to circumvent the batch system is not allowed.

On Terra, Slurm is the batch system that provides job management. More information on Slurm can be found in the Terra Batch page.

Running Your Program / Preparing a Job File

In order to properly run a program on Terra, you will need to create a job file and submit a job.

The simple example job file below requests 1 core on 1 node with 2.5GB of RAM for 1 hour. Note that typical nodes on Terra have 28 cores with 120GB of usable memory and ensure that your job requirements will fit within these restrictions. Any modules that need to be loaded or executable commands will replace the "#First Executable Line" in this example.

#!/bin/bash
##ENVIRONMENT SETTINGS; CHANGE WITH CAUTION
#SBATCH --export=NONE        #Do not propagate environment
#SBATCH --get-user-env=L     #Replicate login environment
  
##NECESSARY JOB SPECIFICATIONS
#SBATCH -J JobExample1       #Set the job name to "JobExample1"
#SBATCH -t 01:30:00          #Set the wall clock limit to 1hr and 30min
#SBATCH -N 1                 #Request 1 node
#SBATCH --ntasks-per-node=1   #Request 1 task/core per node
#SBATCH --mem=2560M          #Request 2560MB (2.5GB) per node
#SBATCH -o Example1Out.%j    #Send stdout/err to "Example1Out.[jobID]"

#First Executable Line

Note: If your job file has been written on an older Mac or DOS workstation, you will need to use "dos2unix" to remove certain characters that interfere with parsing the script.

dos2unix MyJob.slurm

More information on job options can be found in the Building Job Files section of the Terra Batch page.

More information on dos2unix can be found on the dos2unix section of the HPRC Available Software page.

Submitting and Monitoring Jobs

Once you have your job file ready, it is time to submit your job. You can submit your job to slurm with the following command:

[user1@terra1 ~]$ sbatch MyJob.slurm 
Submitted batch job 3606

After the job has been submitted, you are able to monitor it with several methods. To see all of your jobs, use the following command:

[user1@terra1 ~]$ squeue -u user1
JOBID       NAME                USER                    PARTITION   NODES CPUS STATE       TIME        TIME_LEFT   START_TIME           REASON      NODELIST            
3606        myjob2              user1                   short       1     3    RUNNING     0:30        00:10:30    2016-11-27T23:44:12  None        tnxt-[0340]  

To cancel a job while it is pending or running, use the following command:

[user1@terra1 ~]$ scancel 3606
Job Monitoring and Control Commands
Function Command Example
Submit a job sbatch [script_file] sbatch FileName.job
Cancel/Kill a job scancel [job_id] scancel 101204
Check status of a single job squeue --job [job_id] squeue --job 101204
Check status of all
jobs for a user
squeue -u [user_name] squeue -u terraUser1

More information on submitting and monitoring Slurm jobs can be found in the Job Submission section of the Terra Batch System page.

Additional Topics

Translating Ada/LSF <--> Terra/Slurm

The HPRC Batch Translation page contains information on converting between LSF, PBS, and Slurm.

Our staff has also written some example jobs for specific software. These software-specific examples can be seen on the Individual Software Pages where available.

Finding Software

Software on Terra is loaded using modules.

You can see the most popular software on the HPRC Available Software page.

You can find most available software on Terra with the following command:

module avail

You can search for particular software by keyword using:

module spider keyword

If you need new software or an update, please contact us with your request.

There are restrictions on what software we can install. There is also regularly a queue of requested software installations.

Account for delays in your installation request timeline.

Transferring Files

Files can be transferred to Terra using the scp command or a file transfer program.

Our users most commonly utilize:

See our [Terra-Filezilla example video] for a demonstration of this process.

Advice: while GUIs are acceptable for file transfers, the cp and scp commands are much quick and may significantly benefit your workflow.

[Insert info on glob, rsync, ftn]

Graphic User Interfaces (Visualization)

The use of GUIs on Terra is a more complicated process than running non-interactive jobs or doing resource-light interactive processing.

You have two options for using GUIs on Terra.

The first option is to run on the login node. When doing this, you must observe the fair-use policy of login node usage. Users commonly violate these policies by accident, resulting in terminated processes, confusion, and warnings from our admins.

The second option is to use a VNC job. This method is outside the scope of this guide. See the Terra Remote Visualization page for more information.