Difference between revisions of "Terra:Compile:CUDA"
(→Running CUDA Programs) |
(→Compiling CUDA C/C++ with NVIDIA nvcc) |
||
Line 8: | Line 8: | ||
=== Compiling CUDA C/C++ with NVIDIA nvcc === | === Compiling CUDA C/C++ with NVIDIA nvcc === | ||
The compiler '''nvcc''' is the NVIDIA CUDA C/C++ compiler. The command line for invoking it is: | The compiler '''nvcc''' is the NVIDIA CUDA C/C++ compiler. The command line for invoking it is: | ||
− | [ netID@terra3 ~]$ '''nvcc [options] -o cuda_prog.exe file1 file2 ...''' | + | [ netID@terra3 ~]$ '''nvcc ''[options]'' -o ''cuda_prog.exe file1 file2 ...''''' |
where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the executable file cuda_prog.exe. | where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the executable file cuda_prog.exe. |
Revision as of 12:03, 17 February 2017
Contents
CUDA Programming
Access
In order to compile, run, and debug CUDA programs, a CUDA module must be loaded:
[ netID@terra3 ~]$ module load CUDA
For more information on the modules system, please see our Modules System page.
Compiling CUDA C/C++ with NVIDIA nvcc
The compiler nvcc is the NVIDIA CUDA C/C++ compiler. The command line for invoking it is:
[ netID@terra3 ~]$ nvcc [options] -o cuda_prog.exe file1 file2 ...
where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the executable file cuda_prog.exe.
The CUDA devices on Terra are dual-GPU K80s. K80 GPUs are compute capability 3.7 devices. When compiling your code, you need to specify:
[ netID@terra3 ~]$ nvcc -arch=compute_37 -code=sm_37 ...
By default, nvcc will use gcc to compile your source code. However, it is better to use the Intel compiler by adding the flag -ccbin=icc to your compile command.
For more information on nvcc, please refer to the online manual .
Running CUDA Programs
Only one login node (terra3) on Terra is installed with one dual-GPU K80. To find out load information of the device, please run the NVIDIA system management interface program nvidia-smi. This command will tell you on which GPU device your code is running on, how much memory is used on the device, and the GPU utilization.
[ netID@terra3 ~]$ nvidia-smi Fri Feb 10 11:44:30 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 0000:83:00.0 Off | Off | | N/A 27C P8 26W / 149W | 0MiB / 12205MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 On | 0000:84:00.0 Off | Off | | N/A 32C P8 29W / 149W | 0MiB / 12205MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
You can test your CUDA program on the login node as long as you abide by the rules stated in Computing Environment. For production runs, you should submit a batch job to run your code on the compute nodes. Terra has 48 compute nodes each with one dual-GPU K80 and 128GB (host) memory. In order to be placed on GPU nodes with available GPUs, a job needs to request them with the following two lines in a job file.
#SBATCH --gres=gpu:1 #Request 1 GPU #SBATCH --partition=gpu #Request the GPU partition/queue
Debugging CUDA Programs
CUDA programs must be compiled with "-g -G" to force O0 optimization and to generate code with debugging information. To generate debugging code for K80, compile and link the code with the following:
[ netID@terra3 ~]$ nvcc -g -G arch=compute_37 -code=sm_37 cuda_prog.cu -o cuda_prog.out
For more information on cuda-gdb, please refer to its online manual.