ACES: Graphcore IPU Tutorial
Overview
Instructor: Zhenhua He
Time: Tuesday, October 1, 2024 10:00AM-12:30PM CT
Location: Online using Zoom
Prerequisites: Current ACCESS ID; basic Linux/Unix skills; basic understanding of machine learning concepts, neural networks, and deep learning; familiarity with deep learning frameworks TensorFlow and/or PyTorch
This short course introduces researchers to Graphcore IPUs on the ACES cluster, a composable accelerator testbed at Texas A&M University. The Graphcore IPU course is a short but in-depth training program that provides users with a comprehensive understanding of Graphcore's Intelligence Processing Unit (IPU) and how it can be used to accelerate machine learning and artificial intelligence workloads. The course is designed to provide practical advice and a hands-on experience for engineers and researchers who are looking to improve the performance of their AI and ML workloads.
The course begins by covering the architecture of the IPU and how it differs from traditional CPUs and GPUs. Participants will learn about the unique features of the IPU, such as its large memory bandwidth and high-performance interconnects, which make it well-suited for deep learning and other AI workloads.
The course then covers the use of popular deep learning frameworks such as TensorFlow and PyTorch on the IPU. Participants will learn how to optimize their models for the IPU and how to use the Graphcore-specific libraries to take full advantage of the IPU's capabilities. Hands-on exercises will be provided to give participants experience with the IPU and its capabilities. The course also includes a section on the use of Graphcore's software development kit (SDK) and tools for profiling, debugging, and monitoring the IPU.
Registration will open up on this webpage the week before the class.
Course Materials
The presentation slides are available as downloadable PDF files.
Learning Objectives and Agenda
In this course, participants will:
- Access the IPU systems on ACES cluster: Colossus and Bow Pod16
- Learn how to run PyTorch and TensorFlow models on the IPU systems.
- Learn to migrate a Keras MNIST classification model to IPU.
- Learn to migrate a PyTorch Fashion-MNIST classification model to IPU.
There are a total of four lab sessions:
- Intro to IPU (30 mins)
We will introduce Graphcore, IPU architecture, and the IPU system on the TAMU ACES platform. - Demo on ACES (30 mins)
We will demonstrate how to run models of different frameworks on the ACES IPU system. - TensorFlow on IPU (30 minutes)
We will learn to convert a Keras MNIST classification model to run on IPU. - PyTorch on IPU (30 minutes)
We will learn to convert a PyTorch Fashion-MNIST classification model to run on IPU.
See: https://hprc.tamu.edu/aces/
Note: This is a training session that will take place on the ACES cluster. Participants should log in and follow along with the instructor to complete the hands-on exercises.