ACES: Introduction to HPC and AI for Faculty and Researchers

Overview

Instructor(s): Dr. Zhenhua He and Dr. Dinesh S. Devarajan

Time: Tuesday, February 17, 2026 1:30PM-4:00PM CT

Location: Online using Zoom

Prerequisite(s): Active ACCESS ID, basic Python Programming skills, familiarity with PyTorch is preferred but not required

This short course will cover distributed training strategies with a focus on PyTorch Distributed Data Parallel (DDP). Through hands-on exercises, we will progress step by step: starting from CPU-based training, moving to a single GPU, scaling up to multiple GPUs on a single node, and finally extending to multi-node distributed training.

Course Materials

  • ACES: Introduction to HPC and AI for Faculty and Researchers (Fall 2025): PDF

Participation

During the training, attendees are expected to use their own computer and complete the instructor-led examples and exercises.

Learning Objectives and Agenda

After this short course, participants will be able to:

  • Describe and compare different distributed training strategies for deep learning.
  • Transition deep learning workloads from CPU to GPU training and scale from a single GPU to multiple GPUs within a single node.
  • Extend distributed training to multi-node HPC environments.