ACES: Introduction to HPC and AI for Faculty and Researchers
Overview
Instructor(s): Dr. Zhenhua He and Dr. Dinesh S. Devarajan
Time: Tuesday, February 17, 2026 1:30PM-4:00PM CT
Location: Online using Zoom
Prerequisite(s): Active ACCESS ID, basic Python Programming skills, familiarity with PyTorch is preferred but not required
This short course will cover distributed training strategies with a focus on PyTorch Distributed Data Parallel (DDP). Through hands-on exercises, we will progress step by step: starting from CPU-based training, moving to a single GPU, scaling up to multiple GPUs on a single node, and finally extending to multi-node distributed training.
Course Materials
- ACES: Introduction to HPC and AI for Faculty and Researchers (Fall 2025): PDF
Participation
During the training, attendees are expected to use their own computer and complete the instructor-led examples and exercises.
Learning Objectives and Agenda
After this short course, participants will be able to:
- Describe and compare different distributed training strategies for deep learning.
- Transition deep learning workloads from CPU to GPU training and scale from a single GPU to multiple GPUs within a single node.
- Extend distributed training to multi-node HPC environments.
