Main Content

Run MEX-Functions Containing CUDA Code

Write a MEX-File Containing CUDA Code

As with any MEX-files, those containing CUDA® code have a single entry point, known as mexFunction. The MEX-function contains the host-side code that interacts with gpuArray objects from MATLAB® and launches the CUDA code. The CUDA code in the MEX-file must conform to the CUDA runtime API.

You should call the function mxInitGPU at the entry to your MEX-file. This ensures that the GPU device is properly initialized and known to MATLAB.

The interface you use to write a MEX-file for gpuArray objects is different from the MEX interface for standard MATLAB arrays.

You can see an example of a MEX-file containing CUDA code at:

This file contains the following CUDA device function:

void __global__ TimesTwo(double const * const A,
                         double * const B,
                         int const N)
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < N)
        B[i] = 2.0 * A[i];

It contains the following lines to determine the array size and launch a grid of the proper size:

N = (int)(mxGPUGetNumberOfElements(A));
blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;
TimesTwo<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, N);

Run the Resulting MEX-Functions

The MEX-function in this example multiplies every element in the input array by 2 to get the values in the output array. To test it, start with a gpuArray in which every element is 1:

x = ones(4,4,'gpuArray');
y = mexGPUExample(x)
y = 

    2    2    2    2
    2    2    2    2
    2    2    2    2
    2    2    2    2

Both the input and output arrays are gpuArray objects:

disp(['class(x) = ',class(x),', class(y) = ',class(y)])
class(x) = gpuArray, class(y) = gpuArray

Comparison to a CUDA Kernel

Parallel Computing Toolbox™ also supports CUDAKernel objects that can be used to integrate CUDA code with MATLAB. You can create CUDAKernel objects using CU and PTX files. Generally, using MEX-files is more flexible than using CUDAKernel objects:

  • MEX-files can include calls to host-side libraries, including NVIDIA® libraries such as the NVIDIA Performance Primitives (NPP) or cuFFT libraries. MEX-files can also contain calls from the host to functions in the CUDA runtime library.

  • MEX-files can analyze the size of the input and allocate memory of a different size, or launch grids of a different size, from C or C++ code. In comparison, MATLAB code that calls CUDAKernel objects must preallocate output memory and determine the grid size.

Access Complex Data

Complex data on a GPU device is stored in interleaved complex format. That is, for a complex gpuArray A, the real and imaginary parts of element i are stored in consecutive addresses. MATLAB uses CUDA built-in vector types to store complex data on the device (see the NVIDIA CUDA C Programming Guide).

Depending on the needs of your kernel, you can cast the pointer to complex data either as the real type or as the built-in vector type. For example, in MATLAB, suppose you create the following matrix:

a = complex(ones(4,'gpuArray'),ones(4,'gpuArray'));

If you pass a gpuArray to a MEX-function as the first argument (prhs[0]), then you can get a pointer to the complex data by using the calls:

mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]);
mwSize numel_complex = mxGPUGetNumberOfElements(A);
double2 * d_A = (double2 const *)(mxGPUGetDataReadOnly(A));

To treat the array as a real double-precision array of twice the length, you could do it this way:

mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]);
mwSize numel_real =2*mxGPUGetNumberOfElements(A);
double * d_A = (double const *)(mxGPUGetDataReadOnly(A));

Various functions exist to convert data between complex and real formats on the GPU. These operations require a copy to interleave the data. The function mxGPUCreateComplexGPUArray takes two real mxGPUArrays and interleaves their elements to produce a single complex mxGPUArray of the same length. The functions mxGPUCopyReal and mxGPUCopyImag each copy either the real or the imaginary elements into a new real mxGPUArray. (There is no equivalent of the mxGetImagData function for mxGPUArray objects.)

Compile a GPU MEX-File

Use the mexcuda command in MATLAB to compile a MEX-file containing the CUDA code. You can compile the example file using the command:


By default, the mexcuda function compiles the CUDA code using the NVIDIA nvcc compiler installed with MATLAB. To check which compiler mexcuda is using, use the -v flag for verbose output in the mexcuda command.

If mexcuda has trouble locating the NVIDIA compiler (nvcc) in your installed CUDA toolkit, it might be installed in a non-default location. You can specify the location of nvcc on your system by storing it in the environment variable MW_NVCC_PATH. You can set this variable using the MATLAB setenv command. For example,


Only a subset of Visual Studio® compilers is supported for mexcuda. For details, consult the NVIDIA toolkit documentation.

Install the CUDA Toolkit (Optional)

The CUDA Toolkit installed with MATLAB does not contain all libraries that are available in the CUDA Toolkit. If you want to use a specific library that is not installed with MATLAB, install the CUDA Toolkit.


You do not need the Toolkit to run MATLAB functions on a GPU or to generate CUDA enabled MEX functions.

The CUDA Toolkit contains CUDA libraries and tools for compilation.

The Toolkit version that you should download depends on the version of MATLAB you are using. Check which version of the toolkit is compatible with your version of MATLAB version in the following table. Recommended best practice is to use the latest version of your supported Toolkit, including any updates and patches from NVIDIA.

MATLAB ReleaseCUDA Toolkit Version

For more information about the CUDA Toolkit and to download your supported version, see CUDA Toolkit Archive (NVIDIA).

See Also

| |

Related Topics