Hprc banner tamu.png


Revision as of 13:37, 12 February 2015 by Pingluo (talk | contribs) (Debugging CUDA Programs)
Jump to: navigation, search

CUDA Programs

Compiling CUDA C/C++ with NVIDIA nvcc

NVIDIA's CUDA compiler and libraries are accessible by loading the CUDA module:

 login1]$ module load CUDA

nvcc is the NVIDIA CUDA C/C++ compiler. The command line for invoking nvcc is:

 login1]$ nvcc [options] -o cuda_prog.exe file1 file2 ...

where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the executable file cuda_prog.exe.

The CUDA devices on Ada are K20s. K20 GPUs are compute capability 3.5 devices. When compiling your code, you need to specify:

 login1]$ nvcc -arch=compute_35 -code=sm_35 ...

By default, nvcc will use gcc to compile your source code. It is better to use the Intel compiler by adding '-ccbin=icc'.

For more information on nvcc, please refer to it online manual[1].

Running CUDA Programs

The CUDA module must be loaded before running any CUDA programs.

 login1]$ module load CUDA

5 login nodes (login1, login2, ..., login5) on Ada are installed with either one or two K20s. To find out how many K20s on each node and the load information of the device, please run the NVIDIA system management interface program nvidia-smi. This command will tell you on which GPU device your code is running on, how much memory is used on the device, and the GPU utilization.

 login5]$ nvidia-smi
 Wed Jan  7 11:16:05 2015       
 | NVIDIA-SMI 340.29     Driver Version: 340.29         |                       
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |   0  Tesla K20m          On   | 0000:20:00.0     Off |                    0 |
 | N/A   18C    P8    15W / 225W |     22MiB /  4799MiB |      0%      Default |
 |   1  Tesla K20m          On   | 0000:8B:00.0     Off |                    0 |
 | N/A   16C    P8    16W / 225W |     13MiB /  4799MiB |      0%      Default |
 | Compute processes:                                               GPU Memory |
 |  GPU       PID  Process name                                     Usage      |
 |   0       18950  ./a.out                                               7MiB |

You can test your CUDA program on one or more of the login nodes as long as you abide by the rules stated in Computing Environment. For production runs, you should submit a batch job to run your code on the compute nodes. Ada has 20 compute nodes with dual K20s and 256GB (host) memory and 10 compute nodes with a single K20 and 64GB (host) memory. Your job needs to specify one of the following, in conjunction with other parameters, to secure one or more GPU nodes.

Node Type Needed Job Parameter to Use
Any GPU -R "select[gpu]"
64GB GPU -R "select[gpu64gb]"
256GB GPU -R "select[gpu256gb]"

For example, the following job options will select one node with 256GB memory and dual K20s:

 #BSUB -n 20 -R "span[ptile=20]" -R "select[gpu256gb]"

Debugging CUDA Programs

The NVIDIA CUDA program debugging tool is cuda-gdb. Before using cuda-gdb, the CUDA module must be loaded first.

 login1]$ module load CUDA

CUDA programs must be compiled with "-g -G" to force O0 optimization and to generate code with debugging information. To generate debugging code for K20, compile and link the code with

 login1]$ nvcc -g -G arch=compute_35 -code=sm_35 cuda_prog.cu -o cuda_prog.out

For more information on cuda-gdb, please refer to its manual [2]