gpuavail

The gpuavail command is available on all clusters and will display the current configuration and availability of GPUs. Use the cpuavail command to see the status of non-GPU nodes.

Usage

The gpuavail output will show the current GPU configuration and readily available compute nodes for new jobs. If only 1 of many GPUs are in use on a compute node, the gpuavail output will show the amount of available GPUs, CPUs and memory for other jobs that could use that compute node.

In the following example output, if you wanted to schedule a job with 6 x A100 GPUs on a single compute node, you could configure your job to use the 16 available CPUs and 687GB memory for you job to run on the compute node named fc024 without having to specify the node name.

Example Output

gpuavail

      CONFIGURATION
  NODE            NODE 
  TYPE            COUNT
  ---------------------
  gpu:t4:4         29
  gpu:t4:8         10
  gpu:a100:8        2
  gpu:a40:2         2
  gpu:a100:4        2
  gpu:a30:2         2
  gpu:a100:16       1
  gpu:a10:4         1
  gpu:a40:4         1
  gpu:t4:2          1
  gpu:a10:2         1


                AVAILABILITY
  NODE   GPU    GPU    GPU    CPU    GB MEM
  NAME   TYPE   COUNT  AVAIL  AVAIL  AVAIL
  -------------------------------------------
  fc004   a100    16   11     24     797
  fc009   t4      4    4      64     250
  fc010   t4      4    4      64     250
  fc011   t4      4    4      64     250
  fc012   t4      8    8      64     250
  fc013   t4      8    8      64     250
  fc023   t4      2    2      64     250
  fc024   a100    8    6      16     687
  fc026   a100    4    3      52     131
  fc031   a100    8    3      28     783

It is good practice to not use all CPUs and memory on a GPU compute node when not scheduling all the GPUs unless your job requires those resources.
If you schedule 1 of many available GPUs on a compute node and also request all the CPUs and memory, the remaining GPUs on that compute node will not be available to other jobs and will remain idle for the duration of your GPU job.