Skip to content

Grace-Hopper

Grace-Hopper Node on ACES

Node Architecture

The NVIDIA Grace-Hopper Superchip combines a Hopper GPU with several ARM CPUs onto a single chip with very fast memory interconnect.

Node Name:

gh01

Node Type:

NVIDIA GH200 Superchip (single)

Compute Components

1 NVIDIA H100 GPU

72 Arm Neoverse V2 cores

Interconnect:

NVIDIA NVLink Chip-2-Chip (CPU-GPU communication)

NVIDIA Mellanox NDR200 InfiniBand (MPI and storage)

Access

ACES cluster has one Grace-Hopper node named gh01. Direct ssh is the method of access.

ssh gh01

(NOTE. ssh is enabled between ACES cluster nodes. Use the portal to connect to a login node, then from there connect to gh01. Direct ssh from outside the cluster is not enabled.)

Software for Grace-Hopper on ACES

Modules

The Grace-Hopper node uses the ARM CPU architecture, which means the majority of software packages normally available via the module system will not work. Thus,

  • The contents of the module system on gh01 is independent from the rest of the cluster, but the commands are the same.

  • The /sw partition that is mounted on the gh01 node has very few pre-built software packages.

Compilers

To build your own software, we recommend:

  • FOSS gcc compiler, located at:
/sw/eb/sw/GCC/13.2.0
  • NVIDIA clang compiler, an alternative c compiler, located at:
/sw/clang-gh/clang
  • A Hopper-compatible release of the CUDA toolkit, located at:
/usr/local/cuda-12.6/

Containers

Singularity containers can provide ARM-compatible software for the Grace-Hopper architecture.

  • NVIDIA Container Registry provides containers with Hopper GPU support. Look for CUDA >= 12.
  • Not all container images are compatible! Look for image names or tags or descriptions that specify an ARM architecture. A generic, unlabeled container is most likely built for x86_64.
  • You may need to execute the Singularity pull command on the gh01 node to fetch the ARM-compatible image (if multiple images with the same name are available, Singularity may pick the one that matches the local node architecture)
  • Read more about using the Singularity runtime.

Grace-Hopper Performance on ACES

Future Benchmark Result

Coming Soon!