System Notices

Partial Outage of HPRC Clusters, September 20 and 28

Posted at 09/18/2025 1:19p

TAMU Technology Services has scheduled electrical maintenance for the West Campus Data Center (WCDC) during 6a-8p Saturday September 20. Some ACES, Launch, FASTER, and Grace compute nodes will be powered down for the electrical maintenance.

There will also be additional electrical maintenance for WCDC during 6a-8p Sunday September 28. HPRC will also power down some compute nodes in each cluster during this second maintenance window.

Job submissions will use the remaining online cluster compute nodes during both maintenance windows.

Storage Status for Grace and FASTER Clusters, September 15 — UPDATED

UPDATE: (09/15/2025 11:56p):

The shared storage was restored around 10:22p. An issue was possibly fixed around 10:50p that slightly lowered the high storage server loads. We are still observing high loads in the past hour for possibly other reasons.

Posted at 09/15/2025 8:35p

A new issue has developed for a storage server for the Grace and FASTER shared storage this evening. We are investigating and attempting to recover it tonight.

Storage Status for Grace and FASTER Clusters, September 12

Posted at 09/12/2025 2:02p

We've observed continued performance issues with the shared storage for Grace and FASTER over the past few weeks, and we believe the following factors are contributing:

  • High Metadata Load: We've identified an issue related to elevated metadata activity from certain workloads, which is also impacting performance.
  • Low Available Space: Free space has remained critically low at around 4%. Significant performance improvements may not be possible until we reclaim a larger portion of storage.

Our initial goal is to increase free space from the current 4% (approximately 200 TB) to at least 8-10% (400-500 TB) over the next few weeks and maintain that level moving forward. Following voluntary, deleting efforts, we are continuing to identify legacy data from past users for potential deletion. In addition, we will begin reviewing inactive data (defined as data not accessed in the past six months) for possible removal. Storage on HPRC clusters is not intended for long-term data retention.

We encourage users to consider modifying their jobs to utilize the local disk on each compute node via the $TMPDIR environment variable for temporary files that do not need to persist after job completion. We realize users may need some assistance for this request. HPRC will assist users when possible.

We also would like to remind all HPRC users to review and delete any unnecessary data when they leave TAMU (eg. graduation, change jobs, etc.).

We appreciate your cooperation in helping maintain system performance and reliability.

TAMU HPRC