ACES: Using the Slurm Scheduler on Composable Resources

Overview

Instructor: Michael Dickens

Time: Tuesday, September 10, 2024 — 1:30PM-4:00PM CT

Location: Online using Zoom

Prerequisites: Current ACCESS ID, basic Linux/Unix skills

Introduction to using the Slurm scheduler on the ACES cluster, a composable accelerator testbed at Texas A&M University. Topics covered include multiple job scheduling approaches and job management tools.

Course Materials

The presentation slides are available as a downloadable PDF file.

  • Slurm Job Scheduling (Fall 2024) PDF

  • Slurm Job Scheduling (Spring 2024) PDF
  • Slurm Job Scheduling (Fall 2023) PDF
  • Slurm Job Scheduling (Spring 2023) PDF

Learning Objectives and Agenda

In this course, participants will:

  • Learn the basics of HPC architecture
  • Learn the basic components of a job script
  • Learn how to submit a job script
  • Learn how to review job HPC resource usage
  • Learn how to debug failed jobs

This short course will cover various job scheduling approaches using the Slurm Workload Manager on ACES:

  • HPC Architecture
  • SBATCH Parameters
  • Single node jobs
    • single-core
    • multi-core
  • Multi-node jobs
    • MPI jobs
    • TAMULauncher
    • array jobs
  • Monitoring job resource usage
    • at runtime
    • after job completion
    • job debugging

See: https://hprc.tamu.edu/aces

Note: During the class sessions many aspects of the material will be illustrated live via a login to ACES. Attendees will log into ACES and complete the exercises. You are encouraged to contact the HPRC helpdesk with any questions regarding ACES.