Hprc banner tamu.png


Jump to: navigation, search

CUDA Programming


In order to compile, run, and debug CUDA programs, a CUDA module must be loaded:

[ netID@terra3 ~]$ module load CUDA/8.0.44

For more information on the modules system, please see our Modules System page.

Compiling CUDA C/C++ with NVIDIA nvcc

The compiler nvcc is the NVIDIA CUDA C/C++ compiler. The command line for invoking it is:

[ netID@terra3 ~]$ nvcc [options] -o cuda_prog.exe file1 file2 ...

where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the executable file cuda_prog.exe.

The CUDA devices on Terra are dual-GPU K80s. K80 GPUs are compute capability 3.7 devices. When compiling your code, you need to specify:

[ netID@terra3 ~]$ nvcc -arch=compute_37 -code=sm_37 ...

By default, nvcc will use gcc to compile your source code. However, it is better to use the Intel compiler by adding the flag -ccbin=icc to your compile command.

For more information on nvcc, please refer to the online manual .

Running CUDA Programs

Only one login node (terra3) on Terra is installed with one dual-GPU K80. To find out load information of the device, please run the NVIDIA system management interface program nvidia-smi. This command will tell you on which GPU device your code is running on, how much memory is used on the device, and the GPU utilization.

[ netID@terra3 ~]$ nvidia-smi
Fri Feb 10 11:44:30 2017       
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           On   | 0000:83:00.0     Off |                  Off |
| N/A   27C    P8    26W / 149W |      0MiB / 12205MiB |      0%      Default |
|   1  Tesla K80           On   | 0000:84:00.0     Off |                  Off |
| N/A   32C    P8    29W / 149W |      0MiB / 12205MiB |      0%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|  No running processes found                                                 |

You can test your CUDA program on the login node as long as you abide by the rules stated in Computing Environment. For production runs, you should submit a batch job to run your code on the compute nodes. Terra has 48 compute nodes each with one dual-GPU K80 and 128GB (host) memory. In order to be placed on GPU nodes with available GPUs, a job needs to request them with the following two lines in a job file.

#SBATCH --gres=gpu:1                 #Request 1 GPU
#SBATCH --partition=gpu              #Request the GPU partition/queue

Debugging CUDA Programs

CUDA programs must be compiled with "-g -G" to force O0 optimization and to generate code with debugging information. To generate debugging code for K80, compile and link the code with the following:

[ netID@terra3 ~]$ nvcc -g -G arch=compute_37 -code=sm_37 cuda_prog.cu -o cuda_prog.out

For more information on cuda-gdb, please refer to its online manual.