Hprc banner tamu.png

Difference between revisions of "Ada:Batch Queues"

From TAMU HPRC
Jump to: navigation, search
Line 3: Line 3:
 
LSF, upon job submission, sends your jobs to appropriate batch queues. These are (software) service
 
LSF, upon job submission, sends your jobs to appropriate batch queues. These are (software) service
 
stations configured to control the scheduling and dispatch of jobs that have arrived in them. Batch queues
 
stations configured to control the scheduling and dispatch of jobs that have arrived in them. Batch queues
are characterised by all sorts of parameters. Some of the most important are: (1) the total number of jobs
+
are characterized by all sorts of parameters. Some of the most important are:  
that can be concurrently running (number of run slots); (2) the wall-clock time limit per job; (3) the type
+
 
and number of nodes it can distpatch jobs to; (4) which users or user groups can use that queue; etc.
+
# the total number of jobs that can be concurrently running (number of run slots)
 +
# the wall-clock time limit per job
 +
# the type and number of nodes it can dispatch jobs to
 +
# which users or user groups can use that queue; etc.
 +
 
 
These settings control whether a job will lie idle in the queue or be dispatched quickly for execution.
 
These settings control whether a job will lie idle in the queue or be dispatched quickly for execution.
  
 
The current (Sep 2015) queue structure. It is in flux.  
 
The current (Sep 2015) queue structure. It is in flux.  
  
<pre>
+
{| class="wikitable" style="text-align: center;"
Queue     Min/Default/Max Cpus   Default/Max Walltime     Compute Node Types                 Notes
+
! Queue  
special        None              1 hr / 168 hr            All
+
! Min/Default/Max Cpus
devel      1 / 1 / 320            10 min / 1 hr           All
+
! Default/Max Walltime
small      1 / 1 / 3              1 hr / 120 hr           64 GB and 256 GB nodes
+
! Compute Node Types
short      4 / 4 / 8000          1 hr / 5 hr              64 GB and 256 GB nodes
+
! Notes
medium    4 / 4 / 4000          5 hr / 24 hr            64 GB and 256 GB nodes
+
|-
long      4 / 4 / 2000           24 hr / 7 days           64 GB and 256 GB nodes             Maximum of 8000 cores for all running jobs in this queue
+
| sn_short
xlarge     1 / 1 / 280           1 hr / 240 hr            1 TB nodes (11), 2 TB nodes (4)
+
| 1 / 1 / 20
vnc       1 / 1 / 20             1 hr / 6 hr             All 30 nodes with GPUs
+
| 10 min / 1 hr
</pre>
+
| 64 GB and 256 GB nodes
 +
| rowspan="4"| Maximum of '''8000''' cores for all running jobs in the single-node (sn_*) queues.
 +
|-
 +
| sn_regular
 +
| 1 / 1 / 20
 +
| 1 hr / 1 day
 +
| 64 GB and 256 GB nodes
 +
|-
 +
| sn_long
 +
| 1 / 1 / 20
 +
| 24 hr / 4 days
 +
| 64 GB and 256 GB nodes
 +
|-
 +
| sn_xlong
 +
| 1 / 1 / 20
 +
| 4 days / 7 days
 +
| 64 GB and 256 GB nodes
 +
|-
 +
| mn_small
 +
| 21 / 21 / 120
 +
| 1 hr / 7 days
 +
| 64 GB and 256 GB nodes
 +
| Maximum of '''4000''' cores for all running jobs in this queue.
 +
|-
 +
| mn_medium
 +
| 121 / 121 / 600
 +
| 1 hr / 7 days
 +
| 64 GB and 256 GB nodes
 +
| Maximum of '''3000''' cores for all running jobs in this queue.
 +
|-
 +
| mn_large
 +
| 600 / 601 / 2000
 +
| 1 hr / 2 days
 +
| 64 GB and 256 GB nodes
 +
| Maximum of '''3000''' cores for all running jobs in this queue.
 +
|-
 +
| xlarge
 +
| 1 / 1 / 280
 +
| 1 hr / 10 days
 +
| 1 TB nodes (11), 2 TB nodes (4)
 +
|
 +
|-
 +
| vnc
 +
| 1 / 1 / 20
 +
| 1 hr / 6 hr
 +
| All 30 nodes with GPUs
 +
|
 +
|-
 +
| special
 +
| None
 +
| 1 hr / 7 days
 +
| 64 GB and 256 GB nodes
 +
| Requires permission to access this queue.
 +
|}
  
 
LSF determines which queue will receive a job for processing. The selection is determined mainly by the resources
 
LSF determines which queue will receive a job for processing. The selection is determined mainly by the resources
Line 30: Line 87:
 
To access either of the above queues, you must use the '''-q queue_name''' option in your job script.
 
To access either of the above queues, you must use the '''-q queue_name''' option in your job script.
  
Output from the bjobs command contains
+
Output from the bjobs command contains the name of the queue associated with a given job.
the name of the queue associated with a given job.
 
  
 
====Fair-Share Policy====
 
====Fair-Share Policy====
Line 38: Line 94:
 
====Public and Private/Group Queues====
 
====Public and Private/Group Queues====
 
... pending ...
 
... pending ...
 
  
 
====The Interactive Queue====
 
====The Interactive Queue====
 
... pending ...
 
... pending ...

Revision as of 17:31, 21 September 2015

Queues

LSF, upon job submission, sends your jobs to appropriate batch queues. These are (software) service stations configured to control the scheduling and dispatch of jobs that have arrived in them. Batch queues are characterized by all sorts of parameters. Some of the most important are:

  1. the total number of jobs that can be concurrently running (number of run slots)
  2. the wall-clock time limit per job
  3. the type and number of nodes it can dispatch jobs to
  4. which users or user groups can use that queue; etc.

These settings control whether a job will lie idle in the queue or be dispatched quickly for execution.

The current (Sep 2015) queue structure. It is in flux.

Queue Min/Default/Max Cpus Default/Max Walltime Compute Node Types Notes
sn_short 1 / 1 / 20 10 min / 1 hr 64 GB and 256 GB nodes Maximum of 8000 cores for all running jobs in the single-node (sn_*) queues.
sn_regular 1 / 1 / 20 1 hr / 1 day 64 GB and 256 GB nodes
sn_long 1 / 1 / 20 24 hr / 4 days 64 GB and 256 GB nodes
sn_xlong 1 / 1 / 20 4 days / 7 days 64 GB and 256 GB nodes
mn_small 21 / 21 / 120 1 hr / 7 days 64 GB and 256 GB nodes Maximum of 4000 cores for all running jobs in this queue.
mn_medium 121 / 121 / 600 1 hr / 7 days 64 GB and 256 GB nodes Maximum of 3000 cores for all running jobs in this queue.
mn_large 600 / 601 / 2000 1 hr / 2 days 64 GB and 256 GB nodes Maximum of 3000 cores for all running jobs in this queue.
xlarge 1 / 1 / 280 1 hr / 10 days 1 TB nodes (11), 2 TB nodes (4)
vnc 1 / 1 / 20 1 hr / 6 hr All 30 nodes with GPUs
special None 1 hr / 7 days 64 GB and 256 GB nodes Requires permission to access this queue.

LSF determines which queue will receive a job for processing. The selection is determined mainly by the resources (e.g., number of cpus, wall-clock limit) specified, explicitly or by default. There are two exceptions:

  1. The xlarge queue that is associated with nodes that have 1TB or 2TB of main memory. To use it, submit jobs with the -q xlarge option.
  2. The special queue which gives one access to all of the compute nodes. You MUST request permission to get access to this queue.

To access either of the above queues, you must use the -q queue_name option in your job script.

Output from the bjobs command contains the name of the queue associated with a given job.

Fair-Share Policy

... pending ...

Public and Private/Group Queues

... pending ...

The Interactive Queue

... pending ...