Run MEX-Functions Containing CUDA Code
Write a MEX-File Containing CUDA Code
As with any MEX-files, those containing CUDA® code have a single entry point, known as
mexFunction
. The MEX-function contains the host-side code
that interacts with gpuArray
objects from MATLAB® and launches the CUDA code. The CUDA code in the MEX-file must conform to the CUDA runtime API.
You should call the function mxInitGPU
at the entry to your
MEX-file. This ensures that the GPU device is properly initialized and known to
MATLAB.
The interface you use to write a MEX-file for gpuArray
objects
is different from the MEX interface for standard MATLAB arrays.
You can see an example of a MEX-file containing CUDA code at:
This file contains the following CUDA device function:
void __global__ TimesTwo(double const * const A, double * const B, int const N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) B[i] = 2.0 * A[i]; }
It contains the following lines to determine the array size and launch a grid of the proper size:
N = (int)(mxGPUGetNumberOfElements(A)); blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; TimesTwo<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, N);
Run the Resulting MEX-Functions
The MEX-function in this example multiplies every element in the input array by 2
to get the values in the output array. To test it, start with a
gpuArray
in which every element is 1:
x = ones(4,4,'gpuArray');
y = mexGPUExample(x)
y = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Both the input and output arrays are gpuArray
objects:
disp(['class(x) = ',class(x),', class(y) = ',class(y)])
class(x) = gpuArray, class(y) = gpuArray
Comparison to a CUDA Kernel
Parallel Computing Toolbox™ also supports CUDAKernel
objects that can be used to integrate CUDA code with MATLAB. You can create CUDAKernel
objects using CU and PTX
files. Generally, using MEX-files is more flexible than using
CUDAKernel
objects:
MEX-files can include calls to host-side libraries, including NVIDIA® libraries such as the NVIDIA Performance Primitives (NPP) or cuFFT libraries. MEX-files can also contain calls from the host to functions in the CUDA runtime library.
MEX-files can analyze the size of the input and allocate memory of a different size, or launch grids of a different size, from C or C++ code. In comparison, MATLAB code that calls
CUDAKernel
objects must preallocate output memory and determine the grid size.
Access Complex Data
Complex data on a GPU device is stored in interleaved complex format. That is, for
a complex gpuArray
A
, the real and imaginary parts of element i
are stored in consecutive addresses. MATLAB uses CUDA built-in vector types to store complex data on the device (see the
NVIDIA
CUDA C Programming Guide).
Depending on the needs of your kernel, you can cast the pointer to complex data either as the real type or as the built-in vector type. For example, in MATLAB, suppose you create the following matrix:
a = complex(ones(4,'gpuArray'),ones(4,'gpuArray'));
If you pass a gpuArray
to a MEX-function as the first argument
(prhs[0]), then you can get a pointer to the complex data by using the calls:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_complex = mxGPUGetNumberOfElements(A); double2 * d_A = (double2 const *)(mxGPUGetDataReadOnly(A));
To treat the array as a real double-precision array of twice the length, you could do it this way:
mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]); mwSize numel_real =2*mxGPUGetNumberOfElements(A); double * d_A = (double const *)(mxGPUGetDataReadOnly(A));
Various functions exist to convert data between complex and real formats on the
GPU. These operations require a copy to interleave the data. The function mxGPUCreateComplexGPUArray
takes
two real mxGPUArrays and interleaves their elements to produce a single complex
mxGPUArray of the same length. The functions mxGPUCopyReal
and mxGPUCopyImag
each copy either the
real or the imaginary elements into a new real mxGPUArray. (There is no equivalent
of the mxGetImagData
function for mxGPUArray
objects.)
Compile a GPU MEX-File
Use the mexcuda
command in MATLAB to compile a MEX-file containing the CUDA code. You can compile the example file using the command:
mexcuda mexGPUExample.cu
By default, the mexcuda
function compiles the CUDA code using the NVIDIA
nvcc
compiler installed with MATLAB. To check which compiler mexcuda
is using, use the
-v
flag for verbose output in the mexcuda
command.
If mexcuda
has trouble locating the NVIDIA compiler (nvcc
) in your installed CUDA toolkit, it might be installed in a non-default location. You can
specify the location of nvcc
on your system by storing it in the
environment variable MW_NVCC_PATH
. You can set this variable
using the MATLAB setenv
command. For
example,
setenv('MW_NVCC_PATH','/usr/local/CUDA/bin')
Only a subset of Visual Studio® compilers is supported for mexcuda
. For details,
consult the NVIDIA toolkit
documentation.
Install the CUDA Toolkit (Optional)
The CUDA Toolkit installed with MATLAB does not contain all libraries that are available in the CUDA Toolkit. If you want to use a specific library that is not installed with MATLAB, install the CUDA Toolkit.
Note
You do not need the Toolkit to run MATLAB functions on a GPU or to generate CUDA enabled MEX functions.
The CUDA Toolkit contains CUDA libraries and tools for compilation.
The Toolkit version that you should download depends on the version of MATLAB you are using. Check which version of the toolkit is compatible with your version of MATLAB version in the following table. Recommended best practice is to use the latest version of your supported Toolkit, including any updates and patches from NVIDIA.
MATLAB Release | CUDA Toolkit Version |
---|---|
R2023a | 11.8 |
R2022b | 11.2 |
R2022a | 11.2 |
R2021b | 11.0 |
R2021a | 11.0 |
R2020b | 10.2 |
R2020a | 10.1 |
R2019b | 10.1 |
R2019a | 10.0 |
R2018b | 9.1 |
R2018a | 9.0 |
R2017b | 8.0 |
R2017a | 8.0 |
R2016b | 7.5 |
R2016a | 7.5 |
R2015b | 7.0 |
R2015a | 6.5 |
R2014b | 6.0 |
R2014a | 5.5 |
R2013b | 5.0 |
R2013a | 5.0 |
R2012b | 4.2 |
R2012a | 4.0 |
R2011b | 4.0 |
For more information about the CUDA Toolkit and to download your supported version, see CUDA Toolkit Archive (NVIDIA).
See Also
mexcuda
| CUDAKernel
| mex