Hprc banner tamu.png

Difference between revisions of "Ada:Batch:JobTracking"

From TAMU HPRC
Jump to: navigation, search
(Typical Job Requests)
(Typical Job Requests)
Line 21: Line 21:
 
It is most common for users to request either 2^n cores or 20*n cores. This means that there are many single-node jobs that request 1, 2, 4, 8, 16, or 20 cores.
 
It is most common for users to request either 2^n cores or 20*n cores. This means that there are many single-node jobs that request 1, 2, 4, 8, 16, or 20 cores.
  
Common memory requests per core is typically 2700MB on 64GB nodes and 12700MB on 256GB nodes. This is in part due to [[memory limitations | Ada:Batch_Memory_Specs]] and the SU surcharge for [[HPRC:AMS:Service_Unit#Non-Exclusive_Job | memory equivalent cores]].
+
Common memory requests per core is typically 2700MB on 64GB nodes and 12700MB on 256GB nodes. This is in part due to [[Ada:Batch_Memory_Specs | memory limitations]] and the SU surcharge for [[HPRC:AMS:Service_Unit#Non-Exclusive_Job | memory equivalent cores]].
  
 
====Advice====
 
====Advice====

Revision as of 15:53, 13 February 2018

Ada/LSF Job Tracking Techniques

Introduction

The information on this page will help you understand the LSF scheduling behavior on Ada, but will not tell you when your job will run. Scheduling is a complex task. There are many factors that contribute to whether a job will exit the queue next. These next few sections will cover common bottlenecks users encounter, but should not be considered a comprehensive guide.

After reviewing the following sections, you should be able to estimate whether your job will start running quickly or if you should expect to wait.

Hardware Limitations

Ada is composed mostly of 20 core + 64GB nodes. There is a small set of 20 core + 256GB nodes. Mixed between these two sets are some GPU and PHI nodes.

The compute node hardware details can be seen at: Ada Hardware Summary.

The compute node batch job memory limitations can be seen at: Ada Memory Specification Clarification.

Advice

It is much more common for all the 256GB, GPU, or 1/2TB hardware to be occupied than the 64GB hardware. If your program works on a 64GB general compute node (<54GB of RAM), then ensure your job file fits on 64GB nodes.

If you need GPU nodes, then you want to request as few nodes as possible. Requesting many GPU nodes almost guarantees that you will be waiting in queue for a while. The same applies to PHI and TB nodes.

Typical Job Requests

It is most common for users to request either 2^n cores or 20*n cores. This means that there are many single-node jobs that request 1, 2, 4, 8, 16, or 20 cores.

Common memory requests per core is typically 2700MB on 64GB nodes and 12700MB on 256GB nodes. This is in part due to memory limitations and the SU surcharge for memory equivalent cores.

Advice

Overall Impact: Minor

Batch Queue Structure

Review the batch queue

Job Priority

View priority