Hprc banner tamu.png

Difference between revisions of "Terra:Batch Job Submissions"

From TAMU HPRC
Jump to: navigation, search
(Job Monitoring and Control Commands)
Line 3: Line 3:
 
  [NetID@terra1 ~]$ '''sbatch ''MyJob.slurm'''''  
 
  [NetID@terra1 ~]$ '''sbatch ''MyJob.slurm'''''  
 
  Submitted batch job 3606
 
  Submitted batch job 3606
 
== Job Monitoring and Control Commands ==
 
 
After a job has been submitted, you may want to check on its progress or cancel it. Below is a list of the most used job monitoring and control commands for jobs on Terra.
 
 
{| class="wikitable" style="text-align: center;"
 
|+ Job Monitoring and Control Commands
 
!Function
 
!Command
 
!Example
 
|-
 
|Submit a job
 
|sbatch [script_file]
 
|sbatch FileName.job
 
|-
 
|Cancel/Kill a job
 
|scancel [job_id]
 
|scancel 101204
 
|-
 
|Check status of a single job
 
|squeue --job [job_id]
 
|squeue --job 101204
 
|-
 
|Check status of all <br> jobs for a user
 
|squeue -u [user_name]
 
|squeue -u terraUser1
 
|-
 
|Check CPU and memory efficiency for a job<br>(Use only on finished jobs)
 
|seff [job_id]
 
|seff 101204
 
|}
 
 
Here is an example of the seff command provides for a finished job:
 
 
<pre>% seff 12345678
 
Job ID: 12345678
 
Cluster: terra
 
User/Group: username/groupname
 
State: COMPLETED (exit code 0)
 
Nodes: 16
 
Cores per node: 28
 
CPU Utilized: 1-17:05:54
 
CPU Efficiency: 94.63% of 1-19:25:52 core-walltime
 
Job Wall-clock time: 00:05:49
 
Memory Utilized: 310.96 GB (estimated maximum)
 
Memory Efficiency: 34.70% of 896.00 GB (56.00 GB/node)
 
</pre>
 
  
 
== tamulauncher ==
 
== tamulauncher ==

Revision as of 17:55, 13 December 2018

Job Submission

Once you have your job file ready, it is time to submit your job. You can submit your job to slurm with the following command:

[NetID@terra1 ~]$ sbatch MyJob.slurm 
Submitted batch job 3606

tamulauncher

tamulauncher provides a convenient way to run a large number of serial or multithreaded commands without the need to submit individual jobs or a Job array. User provides a text file containing all commands that need to be executed and tamulauncher will execute the commands concurrently. The number of concurrently executed commands depends on the batch requirements. When tamulauncher is run interactively the number of concurrently executed commands is limited to at most 8. tamulauncher is available on terra, ada, and curie. There is no need to load any module before using tamulauncher. tamulauncher has been successfully tested to execute over 100K commands.

tamulauncher is preferred over Job Arrays to submit a large number of individual jobs, especially when the run times of the commands are relatively short. It allows for better utilization of the nodes, puts less burden on the batch scheduler, and lessens interference with jobs of other users on the same node.

For more information, visit this page.