Hprc banner tamu.png

Ada:Batch Memory Specs

From TAMU HPRC
Revision as of 10:22, 9 January 2017 by Cryssb818 (talk | contribs) (Created page with "== Clarification on Memory, Core, and Node Specifications == <pre> #BSUB -R "rusage[mem=process_alloc_size]" -M process_size_limit </pre> Both of these should be set in a job...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Clarification on Memory, Core, and Node Specifications

#BSUB -R "rusage[mem=process_alloc_size]" -M process_size_limit

Both of these should be set in a job and, typically, to the same amount.

  • The process_alloc_size specifies the amount of memory to allocate/reserve per process on a node. Omitting this specification,

will cause LSF to select nodes on the basis of available cores only, regardless of whether such nodes have sufficient memory to
run the job efficiently. Hence, this omission can cause the job to be swapped out onto the local drive (big delay) and/or experience memory contention
from other jobs running on the same node, thereby bringing a general and dramatic slowdown.

  • If, in addition to the process_alloc_size option, one specifies a value (=core_count_per_node) for the ptile parameter,

LSF will allocate/reserve core_count_per_node * process_alloc_size MB of memory per node.

  • The -M process_size_limit setting specifies the memory size limit (in MBs) per process, which when exceeded will cause the process and

the job to fail. The default value for process_size_limit is 2.5 GB. Both of these settings should reflect the run-time needs of your job.

  • The total available memory per node for jobs is about 10 GB less than the maximum: 54 GB for nxt type nodes; 246 GB for the mem256gb type; etc.

One should not rely on default memory limit settings. The latter may be too large or too small. A realistic picture of job's memory can be obtained
from the information given by one (e.g., bjobs) or more job tracking commands. For more information, please see the Job tracking and control commands subsection.

Five important job parameters:

#BSUB -n NNN                    # NNN: total number of cores/jobslots to allocate for the job
#BSUB -R "span[ptile=XX]"       # XX:  number of cores/jobslots per node to use. Also, a node selection criterion
#BSUB -R "select[node-type]"    # node-type: nxt, mem256gb, gpu, phi, mem1t, mem2t ...
#BSUB -R "rusage[mem=nnn]"      # reserves nnn MBs per process/CPU for the job
#BSUB -M mmm                    # sets the per process enforceable memory limit to mmm MB

We list these together because in many jobs they can be closely related and, therefore, must be consistently set. We recommend their adoption in all jobs, serial, single-node and multi-node. The rusage[mem=nnn] setting causes LSF to select nodes that can each allocate XX * nnn MBs for the execution of the job. The -M mm sets and enforces the process memory size limit. When this limit is violated the job will abort. Omitting this specification, causes LSF to assume the default memory limit, which by configuration is set to 2.5 giga-bytes (2500 MB) per process. The following examples, with some commentary, illustrate the use of these options.

Important: if the process memory limit, default (2500 MB) or specified, is exceeded during execution the job will fail with a memory violation error.

#BSUB -n 900                    # 900: number of cores/jobslots to allocate for the job
#BSUB -R "span[ptile=20]"       # 20:  number of cores per node to use
#BSUB -R "select[nxt]"          # Allocates NeXtScale type nodes

The above specifications will allocate 45 (=900/20) whole nodes. In many parallel jobs the selection of NeXtScale nodes at 20 cores per node is the best choice. Here, the maximum memory per process is set to 2500 MB. Here, we're just illustrating what happens when you omit the memory-related options. We definitely urge that you specify them. The memory enforceable limit per process here is 2.5 MB, the default setting.

#BSUB -n 900                    # 900: total number of cores/jobslots to allocate for the job
#BSUB -R "span[ptile=16]"       # 16:  number of cores/jobslots per node to use
#BSUB -R "select[nxt]"          # allocates NeXtScale type nodes
#BSUB -R "rusage[mem=3600]"     # schedules on nodes that have at least 3600 MB per process/CPU avail
#BSUB -M 3600                   # enforces 3600 MB memory use per process 

The above specifications will allocate 57 (= ceiling(900/16)) nodes. The decision to only apply XX (here 16) number cores per node, and not the maximum 20, for a computation requires some judgement. The execution profile of the job is important. Typically, some experimentation is required in finding the optimal tile number for a given code.

#BSUB -n 1                    # Allocate a total of 1 cpu/core for the job, appropriate for serial processing.
#BSUB -R "span[ptile=1]"      # Allocate 1 core per node.
#BSUB -R "select[gpu]"        # Allocate a node that has gpus (of 64GB or 256GB memory). A "select[phi]"
                              # specification would allocate a node with phi coprocessors.

Omitting the last two options in the above will cause LSF to place the job on any conveniently available core on any node, idle or (partially) busy, of any type, except on those with 1TB or 2TB memory.

It is worth emphasizing that, under the current LSF setup, only the -x option and a ptile value equal to the node's core limit will prevent LSF from scheduling jobs that match the balance of unreserved cores.

Inhomogeneous Node Selection

#BSUB -n 900
#BSUB -R "600*{ select[nxt] rusage[mem=3000] span[ptile=20]} + 300*{ select[gpu] rusage[mem=3000] span[ptile=20] }"
#BSUB -M 3000

The above specification will allocate 30 (600/20) NeXtScale and 15 (300/20) iDataPlex nodes, the latter with GPUs, at 20 cores per node. Note that the enforceable memory limit here 3000 MB per process. In the Examples section, we provide an illustration of the usefulness of inhomogeneous node selection when the MPMD parallelization model is to be used.