Hprc banner tamu.png

Difference between revisions of "Grace:Intro"

Jump to: navigation, search
(Usable Memory for Batch Jobs)
(Usable Memory for Batch Jobs)
Line 125: Line 125:
| Memory Limit<br>Per Core
| Memory Limit<br>Per Core
| 7424 MB <br> 7.25 GB
| 7680 MB <br> 7.5 GB
| Memory Limit<br>Per Node
| Memory Limit<br>Per Node
| 356352 MB <br> 348 GB
| 368640 MB <br> 360 GB

Revision as of 15:13, 7 January 2021

Grace: A Dell x86 HPC Cluster

Deployment Status

Cluster deployed, currently in testing and early user access mode.

Hardware Overview

System Name: Grace
Host Name: grace.hprc.tamu.edu
Operating System: Linux (CentOS 7)
Total Compute Cores/Nodes: 44,656 cores
925 nodes
Compute Nodes: 800 48-core compute nodes, each with 384GB RAM
100 48-core GPU nodes, each with two A100 40GB GPUs and 384GB RAM
8 48-core GPU nodes, each with two RTX 6000 24GB GPUs and 384 GB RAM
8 48-cores GPU nodes, each with 4 T4 16GB GPUs
8 80-core large memory nodes, each with 3TB RAM
Interconnect: Mellanox HDR 100 InfiniBand
Peak Performance: 6.2 PFLOPS
Global Disk: 5PB (usable) via DDN Lustre appliances for general use
1.4PB (usable) via Lenovo's DSS GPFS appliance purchased by and dedicated for Dr. Junjie Zhang's CryoEM Lab
File System: Lustre and GPFS
Batch Facility: Slurm by SchedMD
Location: West Campus Data Center
Production Date: Spring 2021

Grace is an Intel x86-64 Linux cluster with 925 compute nodes (44,656 total cores) and 5 login nodes. There are 800 compute nodes with 384 GB of memory, and 117 GPU nodes with 384 GB of memory. Among the 117 GPU nodes, there are 100 GPU nodes two A100 40 GB GPU cards, 9 GPU nodes with two RTX 6000 24GB GPU cards, 8 GPU nodes with four T4 16GB GPU cards. These 800 compute nodes and 117 GPU nodes are a dual socket server with two Intel 6248R 3.0GHz 24-core processors. There are 8 compute nodes with 3 TB of memory and four Intel 6248 2.5 GHz 20-core processors.

The interconnecting fabric is a two-level fat-tree based on HDR 100 InfiniBand. High performance mass storage of 5 petabyte (usable) capacity is made available to all nodes by the DDN Lustre storage.

Get details on using this system, see the User Guide for Grace.

Compute Nodes

A description of the four types of compute nodes is below:

Table 1 Details of Compute Nodes
General 384GB
GPU A100
Large Memory 3TB
Total Nodes 800 100 9 8 8
Processor Type Intel Xeon 6248R 3.0GHz 24-core Intel 6248 2.5 GHz 20-core
Sockets/Node 2 4
Cores/Node 48 80
Memory/Node 384 GB DDR4, 3200 MHz 3 TB DDR4, 3200 MHz
Accelerator(s) N/A 2 NVIDIA A100 40GB GPU 2 NVIDIA RTX6000 24GB GPU 4 NVIDIA T4 16GB GPU N/A
Interconnect Mellanox HDR 100 InfiniBand
Local Disk Space 480GB SSD, 1.6TB NVMe

Usable Memory for Batch Jobs

While nodes on Grace have either 384GB or 3TB of RAM, some of this memory is used to maintain the software and operating system of the node. In most cases, excessive memory requests will be automatically rejected by SLURM.

The table below contains information regarding the approximate limits of Grace memory hardware and our suggestions on its use.


Memory Limits of Nodes
384GB Nodes (Regular and GPU) 3TB Nodes
Node Count 917 8
Number of Cores 48 Cores 80 Cores
Memory Limit
Per Core
7680 MB
7.5 GB
Memory Limit
Per Node
368640 MB
360 GB

Login Nodes

The grace.hprc.tamu.edu hostname can be used to access the Grace cluster. This translates into one of the five login nodes, grace[1-5].hprc.tamu.edu. To access a specific login node use its corresponding host name (e.g., grace2.hprc.tamu.edu). All login nodes have 10 GbE connections to the TAMU campus network and direct access to all global parallel (Lustre-based) file systems. The table below provides more details about the hardware configuration of the login nodes.

Table 2: Details of Login Nodes
HostNames grace1.hprc.tamu.edu grace2.hprc.tamu.edu grace3.hprc.tamu.edu grace4.hprc.tamu.edu
Processor Type Intel Xeon 6248R 3.0GHz 24-core
Memory 384 GB DDR4 3200 MHz
Total Nodes 1 2
Cores/Node 48
Interconnect Mellanox HDR 100 InfiniBand
Local Disk Space per node: two 480 GB SSD drives, 1.6 TB NVMe

Mass Storage

5PB (usable) with Lustre provided by DDN



"Grace" is named for Grace Hopper.