System Notices

FASTER and Grace Cluster Maintenance, March 10-13 — UPDATED

UPDATE: (03/15/2025 4:36p):

The FASTER cluster is currently available with about 80% of its compute nodes. We are still investigating issues with FASTER's OOD portals.

The Grace cluster redeployment is still in progress.

UPDATE: (03/14/2025 11:55p):

UPDATE 11:55p March 14: The FASTER cluster may be available tomorrow morning at 75% capacity after testing overnight. Some GPU nodes will remain offline due to composability fabric issues that will be remediated next week.

The Grace cluster remains unavailable as its redeployment with a new OS is taking much longer than anticipated. We will continue working through the weekend to complete the remaining maintenance to make the Grace cluster ASAP.

UPDATE: (03/13/2025 10:06pm):

The maintenance for the shared storage and the Liqid composability fabrics were completed successfully but took more time than anticipated. A failed disk (which needed replacement) contributed delays to the shared storage maintenance. We will provide more updates as we continue work on the FASTER and Grace cluster maintenance.

Posted at 03/04/2025 10:28a

The FASTER and Grace clusters will be unavailable from 9am March 10 to 8pm March 13. Software maintenance will be done for FASTER's nodes and the Liqid fabrics. The Grace cluster will be redeployed to the same OS (RHEL 8.10) as FASTER. The software on the shared storage will be updated as well.