Dedicated Use & Batch Policies
Last updated: July 19, 2017
Batch system policies may on occasion change to reflect changing needs and load conditions. Your adherence to what we say below will be appreciated. What we aim at is to convince you that a little care on your part in doing certain things right will go a long way to keep our compute servers efficiently and fairly run. Very reluctantly, in order to maintain fairness and efficiency we will on occasion prematurely terminate jobs. The Job Termination Policies subsection lists common reasons for terminating a job by the staff.
All requests for dedicated cluster use require the approval of the Director. To initiate the process, please send e-mail to the HPRC help desk at, email@example.com. Assuming approval, arrangements must also be made in consultation with the staff. When system maintenance is also scheduled, every other Tuesday is a strongly preferred day. Otherwise, system load conditions will be a significant factor in selecting the preferred day for such an event. Please always give at least two weeks notice. The maximum processing time per request is also the Director's decision.
The HPRC staff reserves the right to terminate batch jobs when one or a combination of following effects occur:
- Use by your program of a larger number of CPUs than its parallel efficiency warrants.
- Use by your program of a smaller number of CPUs than that specified through the batch system. This is a particularly unacceptable practice since it results in wasting resources that they might otherwise be used by others. The batch system sets aside resources but it knows nothing about the actual number of CPUs that your program will use.
- Submitting jobs with an artificially large wall-clock or cpu-time.
- Submitting jobs with an artificially large memory request, especially when utilizing only a small percentage of that memory.
- Use/abuse of a special access queue to run a job that could very well run in one of the common queues.
- Use/abuse of a special access queue to circumvent wait time or job limits within the common queues.
- Excessive I/O with large files, which in turn overwhelms memory due to excessive file caching.
- Any use of large amounts of disk and/or memory that causes a significant disruption to the smooth operation of the system.
- Delayed file transfers with source or destination hosts that are remote.
Batch jobs are subject to periodic monitoring. Jobs inappropriately using extreme resources are subject to termination without prior notice.
Please see the Fair Resource Usage page for details on a cluster-by-cluster basis.