Ada:Compile:CUDA
Contents
CUDA Programs
Access
In order to compile, run, and debug CUDA programs, a CUDA module must be loaded:
[ netID@ada ~]$ module load CUDA
For more information on the modules system, please see our Modules System page.
Compiling CUDA C/C++ with NVIDIA nvcc
The compiler nvcc is the NVIDIA CUDA C/C++ compiler. The command line for invoking it is:
[ netID@ada ~]$ nvcc [options] -o cuda_prog.exe file1 file2 ...
where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the executable file cuda_prog.exe.
The CUDA devices on Ada are K20s. K20 GPUs are compute capability 3.5 devices. When compiling your code, you need to specify:
[ netID@ada ~]$ nvcc -arch=compute_35 -code=sm_35 ...
By default, nvcc will use gcc to compile your source code. However, it is better to use the Intel compiler by adding the flag -ccbin=icc to your compile command.
For more information on nvcc, please refer to the online manual .
Running CUDA Programs
Only 5 login nodes (login1, login2, ..., login5) on Ada are installed with either one or two K20s. To find out how many K20s on each node and the load information of the device, please run the NVIDIA system management interface program nvidia-smi. This command will tell you on which GPU device your code is running on, how much memory is used on the device, and the GPU utilization.
[ netID@ada ~]$ nvidia-smi Wed Jan 7 11:16:05 2015 +------------------------------------------------------+ | NVIDIA-SMI 340.29 Driver Version: 340.29 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20m On | 0000:20:00.0 Off | 0 | | N/A 18C P8 15W / 225W | 22MiB / 4799MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K20m On | 0000:8B:00.0 Off | 0 | | N/A 16C P8 16W / 225W | 13MiB / 4799MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0 18950 ./a.out 7MiB | +-----------------------------------------------------------------------------+
You can test your CUDA program on one or more of the login nodes as long as you abide by the rules stated in Computing Environment. For production runs, you should submit a batch job to run your code on the compute nodes. Ada has 20 compute nodes with dual K20s and 256GB (host) memory and 10 compute nodes with a single K20 and 64GB (host) memory. Your job needs to specify one of the following, in conjunction with other parameters, to secure one or more GPU nodes.
Node Type Needed | Job Parameter to Use |
---|---|
Any GPU | -R "select[gpu]" |
64GB GPU | -R "select[gpu64gb]" |
256GB GPU | -R "select[gpu256gb]" |
For example, the following job options will select one node with 256GB memory and dual K20s:
#BSUB -n 20 -R "span[ptile=20]" -R "select[gpu256gb]"
Debugging CUDA Programs
CUDA programs must be compiled with "-g -G" to force O0 optimization and to generate code with debugging information. To generate debugging code for K20, compile and link the code with
[ netID@ada ~]$ nvcc -g -G arch=compute_35 -code=sm_35 cuda_prog.cu -o cuda_prog.out
For more information on cuda-gdb, please refer to its online manual.