Hprc banner tamu.png

Difference between revisions of "Ada:Batch Queues"

From TAMU HPRC
Jump to: navigation, search
(Queues)
(Queues)
Line 19: Line 19:
 
! Default/Max Walltime
 
! Default/Max Walltime
 
! Compute Node Types
 
! Compute Node Types
! Queue Limits
+
! Per-Queue Limits
 +
! Limits Across Queues
 
! Per-User Limits
 
! Per-User Limits
 
! Notes
 
! Notes
Line 27: Line 28:
 
| 10 min / 1 hr
 
| 10 min / 1 hr
 
| rowspan="8" | 64 GB nodes (811)<br>256 GB nodes (26)
 
| rowspan="8" | 64 GB nodes (811)<br>256 GB nodes (26)
 +
| rowspan="4" |
 
| rowspan="4" | Maximum of '''6000''' cores for all running jobs in the single-node (sn_*) queues.
 
| rowspan="4" | Maximum of '''6000''' cores for all running jobs in the single-node (sn_*) queues.
 
| rowspan="4" | Maximum of '''1000 cores and 50 jobs per user''' for all running jobs in the single node (sn_*) queues.
 
| rowspan="4" | Maximum of '''1000 cores and 50 jobs per user''' for all running jobs in the single node (sn_*) queues.
Line 43: Line 45:
 
| 2 / 2 / 200
 
| 2 / 2 / 200
 
| 10 min / 1 hr
 
| 10 min / 1 hr
| Maximum of '''2000''' cores for all running jobs.
+
| Maximum of '''2000''' cores for all running jobs in this queue.
 +
| rowspan="4" | Maximum of '''12000''' cores for all running jobs in the multi-node (mn_*) queues.
 
| rowspan="4" | Maximum of '''2000 cores and 100 jobs per user''' for all running jobs in the multi-node (mn_*) queues.
 
| rowspan="4" | Maximum of '''2000 cores and 100 jobs per user''' for all running jobs in the multi-node (mn_*) queues.
 
| rowspan="4" | For jobs needing '''more than one compute node'''.
 
| rowspan="4" | For jobs needing '''more than one compute node'''.
Line 50: Line 53:
 
| 2 / 2 / 120
 
| 2 / 2 / 120
 
| 1 hr / 7 days
 
| 1 hr / 7 days
| Maximum of '''6000''' cores for all running jobs.
+
| Maximum of '''6000''' cores for all running jobs in this queue.
 
|-
 
|-
 
| mn_medium
 
| mn_medium
 
| 121 / 121 / 600
 
| 121 / 121 / 600
 
| 1 hr / 7 days
 
| 1 hr / 7 days
| Maximum of '''6000''' cores for all running jobs.
+
| Maximum of '''6000''' cores for all running jobs in this queue.
 
|-
 
|-
 
| mn_large
 
| mn_large
 
| 601 / 601 / 2000
 
| 601 / 601 / 2000
 
| 1 hr / 5 days
 
| 1 hr / 5 days
| Maximum of '''3000''' cores for all running jobs.  
+
| Maximum of '''3000''' cores for all running jobs in this queue.  
 
|-
 
|-
 
| xlarge
 
| xlarge
Line 66: Line 69:
 
| 1 hr / 10 days
 
| 1 hr / 10 days
 
| 1 TB nodes (11)<br>2 TB nodes (4)
 
| 1 TB nodes (11)<br>2 TB nodes (4)
|
+
| colspan="2" |  
 
|
 
|
 
| For jobs needing '''more than 256GB of memory per compute node'''.
 
| For jobs needing '''more than 256GB of memory per compute node'''.
Line 74: Line 77:
 
| 1 hr / 6 hr
 
| 1 hr / 6 hr
 
| GPU nodes (30)
 
| GPU nodes (30)
|
+
| colspan="2" |  
 
|
 
|
 
| For remote visualization jobs.
 
| For remote visualization jobs.
Line 82: Line 85:
 
| 1 hr / 7 days
 
| 1 hr / 7 days
 
| 64 GB nodes (811)<br>256 GB nodes (26)
 
| 64 GB nodes (811)<br>256 GB nodes (26)
|
+
| colspan="2" |  
 
|
 
|
 
| Requires permission to access this queue.
 
| Requires permission to access this queue.

Revision as of 21:59, 2 May 2016

Queues

LSF, upon job submission, sends your jobs to appropriate batch queues. These are (software) service stations configured to control the scheduling and dispatch of jobs that have arrived in them. Batch queues are characterized by all sorts of parameters. Some of the most important are:

  1. the total number of jobs that can be concurrently running (number of run slots)
  2. the wall-clock time limit per job
  3. the type and number of nodes it can dispatch jobs to
  4. which users or user groups can use that queue; etc.

These settings control whether a job will remain idle in the queue or be dispatched quickly for execution.

The current queue structure (updated on Apr. 29, 2016).

NOTE: Each user is now limited to 8000 cores total for his/her pending jobs across all the queues.

Queue Min/Default/Max Cores Default/Max Walltime Compute Node Types Per-Queue Limits Limits Across Queues Per-User Limits Notes
sn_short 1 / 1 / 20 10 min / 1 hr 64 GB nodes (811)
256 GB nodes (26)
Maximum of 6000 cores for all running jobs in the single-node (sn_*) queues. Maximum of 1000 cores and 50 jobs per user for all running jobs in the single node (sn_*) queues. For jobs needing only one compute node.
sn_regular 1 hr / 1 day
sn_long 24 hr / 4 days
sn_xlong 4 days / 7 days
mn_short 2 / 2 / 200 10 min / 1 hr Maximum of 2000 cores for all running jobs in this queue. Maximum of 12000 cores for all running jobs in the multi-node (mn_*) queues. Maximum of 2000 cores and 100 jobs per user for all running jobs in the multi-node (mn_*) queues. For jobs needing more than one compute node.
mn_small 2 / 2 / 120 1 hr / 7 days Maximum of 6000 cores for all running jobs in this queue.
mn_medium 121 / 121 / 600 1 hr / 7 days Maximum of 6000 cores for all running jobs in this queue.
mn_large 601 / 601 / 2000 1 hr / 5 days Maximum of 3000 cores for all running jobs in this queue.
xlarge 1 / 1 / 280 1 hr / 10 days 1 TB nodes (11)
2 TB nodes (4)
For jobs needing more than 256GB of memory per compute node.
vnc 1 / 1 / 20 1 hr / 6 hr GPU nodes (30) For remote visualization jobs.
special None 1 hr / 7 days 64 GB nodes (811)
256 GB nodes (26)
Requires permission to access this queue.

LSF determines which queue will receive a job for processing. The selection is determined mainly by the resources (e.g., number of cpus, wall-clock limit) specified, explicitly or by default. There are two exceptions:

  1. The xlarge queue that is associated with nodes that have 1TB or 2TB of main memory. To use it, submit jobs with the -q xlarge option along with -R "select[mem1tb]" or -R "select[mem2tb]"
  2. The special queue which gives one access to all of the compute nodes. You MUST request permission to get access to this queue.

To access either of the above queues, you must use the -q queue_name option in your job script.

Output from the bjobs command contains the name of the queue associated with a given job.