Intel PVC GPUs
Introduction
The ACES cluster has as total of 120 Intel Data Center Max GPU 1100 GPUs Throughout this documentation, these GPUs are referred to as the Intel PVC GPUs.
Accessing Intel PVC GPUs
Interactive
Access a compute node interactively using srun
with the resource options --partition=pvc
and --gres=gpu:pvc:<num_gpus>
.
Load all the necessary modules
The intel/AIKit
module comes with a conda binary and few default conda environments in a shared space.
To see all the default environments use the following command
For PyTorch, create a new environment by cloning the shared aikit-pt-gpu
environment
For Tensorflow, create a new environment by cloning the shared aikit-tf-gpu
environment
Activate the conda environment
ORInstall any additional packages if required
Run python script
Job Submission
Intel PVCs can be accessed via slurm with the resource options --partition=pvc
and --gres=gpu:pvc:<num_gpus>
.
The folowing is an example of a jobscript for the Tensorflow environment. For PyTorch, replace aikit-tf-gpu
with aikit-pt-gpu
and aikit-tf-gpu-clone
with aikit-pt-gpu-clone
.
#!/bin/bash
##NECESSARY JOB SPECIFICATIONS
#SBATCH --job-name=tf_demo
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --output=tf_demo.%j
#SBATCH --mem=100GB
#SBATCH --gres=gpu:pvc:1
#SBATCH --partition=pvc
# load all the necessary modules
module purge
module load intel/2023.07
module load intel/AIKit/2023.2.0
ENV_NAME=aikit-tf-gpu-clone
# If it doesn't exist, create the environment
if ! conda env list | grep -q "$ENV_NAME"; then
conda create -n $ENV_NAME --clone aikit-tf-gpu
fi
# activate the conda environment
source activate $ENV_NAME
# change directory to your script
cd $SCRATCH/<path-to-your-script>
# executable command
python <name-of-script>.py
Monitor utilization
Launch a VNC interactive job through the portal. Make sure to select Intel GPU Max (PVC)
as the node type.
Take note of the compute node that is assigned to your job.
Follow Slurm guide to create a job script. Use the same compute node that was assigned earlier using --nodelist
option. For example, if the compute node is ac026
, then the job script would list the following resources:
#SBATCH --job-name=tf_demo
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --output=tf_demo.%j
#SBATCH --mem=100GB
#SBATCH --gres=gpu:pvc:1
#SBATCH --partition=pvc
#SBATCH --nodelist=ac026 # use the same compute node here
Submit the job using sbatch
command
There are two commands to monitor gpu utilization which are detailed in the following sections.
sysmon
$ sysmon -h
Usage: ./sysmon [options]
Options:
--processes [-p] Print short device information and running processes (default)
--list [-l] Print list of devices and subdevices
--details [-d] Print detailed information for all of the devices and subdevices
--help [-h] Print help message
--version Print version
This utility provides basic information about the GPUs on a node, including the list of processes that are attached to the GPU at the moment.
The Process mode (default mode) dumps short information about all the available GPUs and running processes. Example output of process mode:
$ sysmon
=====================================================================================
GPU 0: Intel(R) Data Center GPU Max 1100 PCI Bus: 0000:1b:00.0
Vendor: Intel(R) Corporation Driver Version: 1.3.26516 Subdevices: 0
EU Count: 448 Threads Per EU: 8 EU SIMD Width: 16 Total Memory(MB): 46679.2
Core Frequency(MHz): 1400.0 of 1550.0 Core Temperature(C): unknown
=====================================================================================
Running Processes: 4
PID, Device Memory Used(MB), Shared Memory Used(MB), GPU Engines, Executable
4809, 2.2, 0.0, COMPUTE, /usr/bin/xpumd
2651639, 5.2, 0.0, COMPUTE;DMA, python
2651661, 46213.8, 0.0, COMPUTE;DMA, python
2652076, 2.2, 0.0, UNKNOWN, sysmon
=====================================================================================
GPU 1: Intel(R) Data Center GPU Max 1100 PCI Bus: 0000:21:00.0
Vendor: Intel(R) Corporation Driver Version: 1.3.26516 Subdevices: 0
EU Count: 448 Threads Per EU: 8 EU SIMD Width: 16 Total Memory(MB): 46679.2
Core Frequency(MHz): 200.0 of 1550.0 Core Temperature(C): unknown
=====================================================================================
Running Processes: 4
PID, Device Memory Used(MB), Shared Memory Used(MB), GPU Engines, Executable
4809, 2.2, 0.0, COMPUTE, /usr/bin/xpumd
2651639, 0.4, 0.0, COMPUTE, python
2651661, 6.9, 0.0, COMPUTE, python
2652076, 2.2, 0.0, UNKNOWN, sysmon
=====================================================================================
GPU 2: Intel(R) Data Center GPU Max 1100 PCI Bus: 0000:29:00.0
Vendor: Intel(R) Corporation Driver Version: 1.3.26516 Subdevices: 0
EU Count: 448 Threads Per EU: 8 EU SIMD Width: 16 Total Memory(MB): 46679.2
Core Frequency(MHz): 200.0 of 1550.0 Core Temperature(C): unknown
=====================================================================================
Running Processes: 4
PID, Device Memory Used(MB), Shared Memory Used(MB), GPU Engines, Executable
4809, 2.2, 0.0, COMPUTE, /usr/bin/xpumd
2651639, 0.4, 0.0, COMPUTE, python
2651661, 6.9, 0.0, COMPUTE, python
2652076, 2.2, 0.0, UNKNOWN, sysmon
=====================================================================================
GPU 3: Intel(R) Data Center GPU Max 1100 PCI Bus: 0000:2d:00.0
Vendor: Intel(R) Corporation Driver Version: 1.3.26516 Subdevices: 0
EU Count: 448 Threads Per EU: 8 EU SIMD Width: 16 Total Memory(MB): 46679.2
Core Frequency(MHz): 200.0 of 1550.0 Core Temperature(C): unknown
=====================================================================================
Running Processes: 4
PID, Device Memory Used(MB), Shared Memory Used(MB), GPU Engines, Executable
4809, 2.2, 0.0, COMPUTE, /usr/bin/xpumd
2651639, 0.4, 0.0, COMPUTE, python
2651661, 4.4, 0.0, COMPUTE, python
2652076, 2.2, 0.0, UNKNOWN, sysmon
To monitor the usage periodically pair it with linux's watch utility:
xpumcli
See the XPU Manager CLI help info
$ xpumcli -h
Intel XPU Manager Command Line Interface -- v1.2
Intel XPU Manager Command Line Interface provides the Intel data center GPU model and monitoring capabilities. It can also be used to change the Intel data center GPU settings and update the firmware.
Intel XPU Manager is based on Intel oneAPI Level Zero. Before using Intel XPU Manager, the GPU driver and Intel oneAPI Level Zero should be installed rightly.
Supported devices:
- Intel Data Center GPU
Usage: xpumcli [Options]
xpumcli -v
xpumcli -h
xpumcli discovery
Options:
-h,--help Print this help message and exit
-v,--version Display version information and exit.
Subcommands:
discovery Discover the GPU devices installed on this machine and provide the device info.
topology Get the system topology.
group Group the managed GPU devices.
diag Run some test suites to diagnose GPU.
health Get the GPU device component health status.
policy Get and set the GPU policies.
updatefw Update GPU firmware
config Get and change the GPU settings.
topdown Expected feature.
ps List status of processes.
vgpu Create and remove virtual GPUs in SRIOV configuration.
stats List the GPU aggregated statistics since last execution of this command or XPU Manager daemon is started.
dump Dump device statistics data.
log Collect GPU debug logs.
agentset Get or change some XPU Manager settings.
amcsensor List the AMC real-time sensor readings.
A full list of available metrics can be found with the dump command.
$ xpumcli dump
Dump device statistics data.
Usage: xpumcli dump [Options]
xpumcli dump -d [deviceIds] -t [deviceTileIds] -m [metricsIds] -i [timeInterval] -n [dumpTimes]
xpumcli dump --rawdata --start -d [deviceId] -t [deviceTileId] -m [metricsIds]
xpumcli dump --rawdata --list
xpumcli dump --rawdata --stop [taskId]
Options:
-h,--help Print this help message and exit
-j,--json Print result in JSON format
-d,--device The device IDs or PCI BDF addresses to query. The value of "-1" means all devices.
-t,--tile The device tile IDs to query. If the device has only one tile, this parameter should not be specified.
-m,--metrics Metrics type to collect raw data, options. Separated by the comma.
0. GPU Utilization (%), GPU active time of the elapsed time, per tile
1. GPU Power (W), per tile
2. GPU Frequency (MHz), per tile
3. GPU Core Temperature (Celsius Degree), per tile
4. GPU Memory Temperature (Celsius Degree), per tile
5. GPU Memory Utilization (%), per tile
6. GPU Memory Read (kB/s), per tile
7. GPU Memory Write (kB/s), per tile
8. GPU Energy Consumed (J), per tile
9. GPU EU Array Active (%), the normalized sum of all cycles on all EUs that were spent actively executing instructions. Per tile.
10. GPU EU Array Stall (%), the normalized sum of all cycles on all EUs during which the EUs were stalled. Per tile.
At least one thread is loaded, but the EU is stalled. Per tile.
11. GPU EU Array Idle (%), the normalized sum of all cycles on all cores when no threads were scheduled on a core. Per tile.
12. Reset Counter, per tile.
13. Programming Errors, per tile.
14. Driver Errors, per tile.
15. Cache Errors Correctable, per tile.
16. Cache Errors Uncorrectable, per tile.
17. GPU Memory Bandwidth Utilization (%)
18. GPU Memory Used (MiB)
19. PCIe Read (kB/s), per GPU
20. PCIe Write (kB/s), per GPU
21. Xe Link Throughput (kB/s), a list of tile-to-tile Xe Link throughput.
22. Compute engine utilizations (%), per tile.
23. Render engine utilizations (%), per tile.
24. Media decoder engine utilizations (%), per tile.
25. Media encoder engine utilizations (%), per tile.
26. Copy engine utilizations (%), per tile.
27. Media enhancement engine utilizations (%), per tile.
28. 3D engine utilizations (%), per tile.
29. GPU Memory Errors Correctable, per tile. Other non-compute correctable errors are also included.
30. GPU Memory Errors Uncorrectable, per tile. Other non-compute uncorrectable errors are also included.
31. Compute engine group utilization (%), per tile.
32. Render engine group utilization (%), per tile.
33. Media engine group utilization (%), per tile.
34. Copy engine group utilization (%), per tile.
35. Throttle reason, per tile.
36. Media Engine Frequency (MHz), per tile
-i The interval (in seconds) to dump the device statistics to screen. Default value: 1 second.
-n Number of the device statistics dump to screen. The dump will never be ended if this parameter is not specified.
--rawdata Dump the required raw statistics to a file in background.
--start Start a new background task to dump the raw statistics to a file. The task ID and the generated file path are returned.
--stop Stop one active dump task.
--list List all the active dump tasks.
Usage example for dump command:
$ xpumcli dump -d 0 -m 0,1,2,3,4,5
Timestamp, DeviceId, GPU Utilization (%), GPU Power (W), GPU Frequency (MHz), GPU Core Temperature (Celsius Degree), GPU Memory Temperature (Celsius Degree), GPU Memory Utilization (%)
11:01:41.000, 0, 0.00, 27.90, 0, 25.00, 21.00, 0.05
11:01:42.000, 0, 0.00, 28.87, 0, 24.50, 21.00, 0.05
11:01:43.000, 0, 0.00, 28.76, 0, 25.00, 21.00, 0.05
11:01:44.000, 0, 0.00, 28.77, 0, 24.00, 21.00, 0.05
11:01:45.000, 0, 0.00, 28.78, 0, 23.50, 21.00, 0.05
11:01:46.000, 0, 0.00, 28.84, 0, 24.00, 21.00, 0.05
11:01:47.000, 0, 0.00, 28.73, 0, 24.50, 21.00, 0.05
show_pvc_features
This script shows the current arrangement of nodes with PVCs composed over Liqid PCIe fabrics with and without Xe Link bridges. Note, there are now PVCs in both PCIe Gen4 and Gen5 fabrics. All nodes with Xe Link bridges are in Gen5 fabrics.
$ show_pvc_features
HOSTNAME AVAIL_FEATURES GRES STATE
ac010 gen4_fabric gpu:pvc:4 mixed
ac011 gen4_fabric gpu:pvc:4 mixed
ac012 gen4_fabric gpu:pvc:4 mixed
ac013 gen4_fabric gpu:pvc:4 mixed
ac023 gen4_fabric gpu:pvc:4 idle
ac024 gen4_fabric gpu:pvc:8 idle
ac025 gen4_fabric gpu:pvc:4 mixed
ac026 gen5_fabric gpu:pvc:6 reserved
ac030 gen5_fabric gpu:pvc:8 reserved
ac034 gen5_fabric gpu:pvc:4 reserved
ac039 gen5_fabric gpu:pvc:4 reserved
ac050 gen5_nonfabric gpu:pvc:2 mixed
ac051 gen5_nonfabric gpu:pvc:2 drained*
ac062 gen4_fabric gpu:pvc:4 mixed
ac068 gen4_fabric gpu:pvc:8 idle
ac078 gen4_fabric gpu:pvc:4 mixed
ac079 gen4_fabric gpu:pvc:4 mixed
ac081 gen5_fabric,xelink4 gpu:pvc:4 reserved
ac082 gen5_fabric,xelink2 gpu:pvc:2 reserved
ac083 gen5_fabric gpu:pvc:2 reserved
ac085 gen5_fabric,xelink4 gpu:pvc:4 reserved
ac086 gen5_fabric,xelink2 gpu:pvc:2 reserved
ac087 gen5_fabric,xelink2 gpu:pvc:2 reserved
ac089 gen5_fabric,xelink4 gpu:pvc:4 reserved
ac094 gen5_fabric,xelink2 gpu:pvc:2 reserved
ac095 gen5_fabric,xelink2 gpu:pvc:2 reserved
ac097 gen5_fabric,xelink2 gpu:pvc:2 reserved
ac099 gen5_fabric,xelink4 gpu:pvc:4 reserved
ac100 gen5_fabric,xelink2 gpu:pvc:2 allocated
ac101 gen5_fabric gpu:pvc:4 reserved
ac102 gen5_fabric gpu:pvc:4 reserved
ac103 gen5_fabric,xelink2 gpu:pvc:2 idle
This script shows the current arrangement of nodes with PVCs composed over Liqid PCIe fabrics with and without Xe Link bridges.
Note, there are now PVCs in both PCIe Gen4 and Gen5 fabrics. All nodes with Xe Link bridges are in Gen5 fabrics.
Use the --constraint=xelink2 or --constraint=xelink4 sbatch option to request a node with a 2-way or 4-way Xe Link bridge.