Hprc banner tamu.png


Revision as of 12:03, 17 February 2017 by Cryssb818 (talk | contribs) (Compiling CUDA C/C++ with NVIDIA nvcc)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

CUDA Programs


In order to compile, run, and debug CUDA programs, a CUDA module must be loaded:

[ netID@ada ~]$ module load CUDA

For more information on the modules system, please see our Modules System page.

Compiling CUDA C/C++ with NVIDIA nvcc

The compiler nvcc is the NVIDIA CUDA C/C++ compiler. The command line for invoking it is:

[ netID@ada ~]$ nvcc [options] -o cuda_prog.exe file1 file2 ...

where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the executable file cuda_prog.exe.

The CUDA devices on Ada are K20s. K20 GPUs are compute capability 3.5 devices. When compiling your code, you need to specify:

[ netID@ada ~]$ nvcc -arch=compute_35 -code=sm_35 ...

By default, nvcc will use gcc to compile your source code. However, it is better to use the Intel compiler by adding the flag -ccbin=icc to your compile command.

For more information on nvcc, please refer to the online manual .

Running CUDA Programs

Only 5 login nodes (login1, login2, ..., login5) on Ada are installed with either one or two K20s. To find out how many K20s on each node and the load information of the device, please run the NVIDIA system management interface program nvidia-smi. This command will tell you on which GPU device your code is running on, how much memory is used on the device, and the GPU utilization.

[ netID@ada ~]$ nvidia-smi
 Wed Jan  7 11:16:05 2015       
 | NVIDIA-SMI 340.29     Driver Version: 340.29         |                       
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |   0  Tesla K20m          On   | 0000:20:00.0     Off |                    0 |
 | N/A   18C    P8    15W / 225W |     22MiB /  4799MiB |      0%      Default |
 |   1  Tesla K20m          On   | 0000:8B:00.0     Off |                    0 |
 | N/A   16C    P8    16W / 225W |     13MiB /  4799MiB |      0%      Default |
 | Compute processes:                                               GPU Memory |
 |  GPU       PID  Process name                                     Usage      |
 |   0       18950  ./a.out                                               7MiB |

You can test your CUDA program on one or more of the login nodes as long as you abide by the rules stated in Computing Environment. For production runs, you should submit a batch job to run your code on the compute nodes. Ada has 20 compute nodes with dual K20s and 256GB (host) memory and 10 compute nodes with a single K20 and 64GB (host) memory. Your job needs to specify one of the following, in conjunction with other parameters, to secure one or more GPU nodes.

Node Type Needed Job Parameter to Use
Any GPU -R "select[gpu]"
64GB GPU -R "select[gpu64gb]"
256GB GPU -R "select[gpu256gb]"

For example, the following job options will select one node with 256GB memory and dual K20s:

 #BSUB -n 20 -R "span[ptile=20]" -R "select[gpu256gb]"

Debugging CUDA Programs

CUDA programs must be compiled with "-g -G" to force O0 optimization and to generate code with debugging information. To generate debugging code for K20, compile and link the code with

[ netID@ada ~]$ nvcc -g -G arch=compute_35 -code=sm_35 cuda_prog.cu -o cuda_prog.out

For more information on cuda-gdb, please refer to its online manual.