Hprc banner tamu.png

Difference between revisions of "Grace:Batch Queues"

From TAMU HPRC
Jump to: navigation, search
(Created page with "==Batch Queues== Upon job submission, '''Slurm''' sends your jobs to appropriate batch queues. These are (software) service stations configured to control the scheduling and d...")
 
(Batch Queues)
 
(10 intermediate revisions by 2 users not shown)
Line 10: Line 10:
 
<!-- '''NOTE: Each user is now limited to 8000 cores total for his/her pending jobs across all the queues.''' -->
 
<!-- '''NOTE: Each user is now limited to 8000 cores total for his/her pending jobs across all the queues.''' -->
  
<font color=teal>The current queue structure is: ('''updated on January 29, 2020''').  </font>
+
<font color=teal>The current queue structure is: ('''updated on January 11, 2021''').  </font>
  
 
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
Line 21: Line 21:
 
|-
 
|-
 
| short
 
| short
| 448 cores / 16 nodes
+
| 1536 cores / 32 nodes
| 30 min / 2 hr
+
| 2 hr
| rowspan="2" | 64 GB nodes (256)
+
| rowspan="4" | 384 GB nodes (800)
| rowspan="3" | 1800 Cores per User
+
| rowspan="8" | 6144 Cores per User
 
|  
 
|  
 
|-
 
|-
 
| medium
 
| medium
| 1792 cores / 64 nodes
+
| 6144 cores / 128 nodes
 
| 1 day
 
| 1 day
|  
+
|
 
|-
 
|-
 
| long
 
| long
| 896 cores / 32 nodes
+
| 3072 cores / 64 nodes
 
| 7 days
 
| 7 days
| 64 GB nodes (256)
 
 
|
 
|
 
|-
 
|-
 
| xlong
 
| xlong
| 448 cores / 16 nodes
+
| 1536 cores / 32 nodes
 
| 21 days
 
| 21 days
| 64 GB nodes (256)
 
| 448 cores per User
 
 
| style="text-align: left;" | For jobs needing to run longer than 7 days.
 
| style="text-align: left;" | For jobs needing to run longer than 7 days.
 
'''Submit jobs to this partition with the --partition xlong option.'''
 
'''Submit jobs to this partition with the --partition xlong option.'''
 
|-
 
|-
| gpu
+
| rowspan ="3"| gpu
| 1344 cores / 48 nodes
+
| rowspan="3" | 1536 cores / 32 nodes
| 3 days
+
| rowspan="3" | 4 days
| 128 GB nodes with GPUs (48)
+
| A100 GPU nodes (100)
+
| style="text-align: left;" | '''Preferred GPU node type if only --gres=gpu:N is used.'''  <br> Also can request with --gres=gpu:a100:N (N is either 1 or 2) 
| style="text-align: left;" | For jobs requiring a GPU or more than 64 GB of memory.
 
 
|-
 
|-
| vnc
+
| RTX 6000 GPU nodes (9)
| 28 cores / 1 node
+
| style="text-align: left;" | Can request with --gres=gpu:rtx:N (N is either 1 or 2)
| 12 hours
+
|-
| 128 GB nodes with GPUs (48)
+
| T4 GPU nodes (8)
|
+
| style="text-align: left;" | Can request with --gres=gpu:t4:N (N is 1, 2, 3, or 4)
| style="text-align: left;" | For jobs requiring remote visualization.
+
|-
 +
| bigmem
 +
| 192 cores / 4 node
 +
| 2 days
 +
| Large Memory 3TB nodes (8)
 +
|
 
|-
 
|-
| knl
 
| 68 cores / 8 nodes <br> 72 cores / 8 nodes
 
| 7 days
 
| 96 GB nodes with KNL processors (8)
 
|
 
| style="text-align: left;" | For jobs requiring a KNL.
 
 
|}
 
|}
  
Line 82: Line 77:
 
The following command can be used to get information on queues and their nodes.  
 
The following command can be used to get information on queues and their nodes.  
  
  [NetID@terra1 ~]$ sinfo
+
  [NetID@grace1 ~]$ sinfo
  
 
'''Example output: '''
 
'''Example output: '''
Line 88: Line 83:
 
  PARTITION        AVAIL  TIMELIMIT    JOB_SIZE    NODES(A/I/O/T)  CPUS(A/I/O/T)   
 
  PARTITION        AVAIL  TIMELIMIT    JOB_SIZE    NODES(A/I/O/T)  CPUS(A/I/O/T)   
 
      
 
      
  short*          up    2:00:00      1-16       244/12/0/256     5333/1835/0/7168    
+
  short*          up    2:00:00      1-32       32/763/5/800     1496/36664/240/38400    
  
  
 
Note: A/I/O/T stands for Active, Idle, Offline, and Total
 
Note: A/I/O/T stands for Active, Idle, Offline, and Total
 
  
 
===Checking node usage===
 
===Checking node usage===
 
The following command can be used to generate a list of nodes and their corresponding information, including their CPU usage.
 
The following command can be used to generate a list of nodes and their corresponding information, including their CPU usage.
  
  [NetID@terra1 ~]$ pestat
+
  [NetID@grace1 ~]$ pestat
  
 
'''Example output: '''
 
'''Example output: '''
  
  Hostname      Partition    Node Num_CPU CPUload Memsize Freemem Joblist
+
  Hostname      Partition    Node     Num_CPU   CPUload   Memsize   Freemem   Joblist
                          State Use/Tot             (MB)     (MB) JobId User ...
+
                              State     Use/Tot               (MB)       (MB)       JobId User ...
  knl-0101            knl  drain$   0  68    0.00*    88000        0 
+
  c001          short*        idle      0   48    0.01      368640    365067  
  
  

Latest revision as of 08:28, 21 April 2021

Batch Queues

Upon job submission, Slurm sends your jobs to appropriate batch queues. These are (software) service stations configured to control the scheduling and dispatch of jobs that have arrived in them. Batch queues are characterized by all sorts of parameters. Some of the most important are:

  1. The total number of jobs that can be concurrently running (number of run slots)
  2. The wall-clock time limit per job
  3. The type and number of nodes it can dispatch jobs to

These settings control whether a job will remain idle in the queue or be dispatched quickly for execution.

The current queue structure is: (updated on January 11, 2021).

Queue Job Max Cores / Nodes Job Max Walltime Compute Node Types Per-User Limits Across Queues Notes
short 1536 cores / 32 nodes 2 hr 384 GB nodes (800) 6144 Cores per User
medium 6144 cores / 128 nodes 1 day
long 3072 cores / 64 nodes 7 days
xlong 1536 cores / 32 nodes 21 days For jobs needing to run longer than 7 days.

Submit jobs to this partition with the --partition xlong option.

gpu 1536 cores / 32 nodes 4 days A100 GPU nodes (100) Preferred GPU node type if only --gres=gpu:N is used.
Also can request with --gres=gpu:a100:N (N is either 1 or 2)
RTX 6000 GPU nodes (9) Can request with --gres=gpu:rtx:N (N is either 1 or 2)
T4 GPU nodes (8) Can request with --gres=gpu:t4:N (N is 1, 2, 3, or 4)
bigmem 192 cores / 4 node 2 days Large Memory 3TB nodes (8)


Checking queue usage

The following command can be used to get information on queues and their nodes.

[NetID@grace1 ~]$ sinfo

Example output:

PARTITION        AVAIL  TIMELIMIT    JOB_SIZE    NODES(A/I/O/T)   CPUS(A/I/O/T)  
    
short*           up     2:00:00      1-32        32/763/5/800     1496/36664/240/38400    


Note: A/I/O/T stands for Active, Idle, Offline, and Total

Checking node usage

The following command can be used to generate a list of nodes and their corresponding information, including their CPU usage.

[NetID@grace1 ~]$ pestat

Example output:

Hostname       Partition     Node      Num_CPU    CPUload    Memsize    Freemem    Joblist
                             State     Use/Tot               (MB)       (MB)       JobId User ...
c001          short*         idle      0   48     0.01       368640     365067  


Checkpointing

Checkpointing is the practice of creating a save state of a job so that, if interrupted, it can begin again without starting completely over. This technique is especially important for long jobs on the batch systems, because each batch queue has a maximum walltime limit.


A checkpointed job file is particularly useful for the gpu queue, which is limited to 2 days walltime due to its demand. There are many cases of jobs that require the use of gpus and must run longer than two days, such as training a machine learning algorithm.


Users can change their code to implement save states so that their code may restart automatically when cut off by the wall time limit. There are many different ways to checkpoint a job file depending on the software used, but it is almost always done at the application level. It is up to the user how frequently save states are made depending on what kind of fault tolerance is needed for the job, but in the case of the batch system, the exact time of the 'fault' is known. It's just the walltime limit of the queue. In this case, only one checkpoint need be created, right before the limit is reached. Many different resources are available for checkpointing techniques. Some examples for common software are listed below.