Main Content

Kernels from Library Calls

GPU Coder™ supports libraries optimized for CUDA® GPUs such as cuBLAS, cuSOLVER, cuFFT, Thrust, cuDNN, and TensorRT libraries.

  • The cuBLAS library is an implementation of Basic Linear algebra Subprograms (BLAS) on top of the NVIDIA® CUDA run time. It allows you to access the computational resources of the NVIDIA GPU.

  • The cuSOLVER library is a high-level package based on the cuBLAS and cuSPARSE libraries. It provides useful LAPACK-like features, such as common matrix factorization and triangular solve routines for dense matrices, a sparse least-squares solver, and an Eigenvalue solver.

  • The cuFFT library provides a high-performance implementation of the Fast Fourier Transform (FFT) algorithm on NVIDIA GPUs. The cuBLAS, cuSOLVER, and cuFFT libraries are part of the NVIDIA CUDA toolkit.

  • Thrust is a C++ template library for CUDA. The Thrust library is shipped with CUDA toolkit and allows you to take advantage of GPU-accelerated primitives such as sort to implement complex high-performance parallel applications.

  • The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The NVIDIA TensorRT is a high performance deep learning inference optimizer and runtime library. For more information, see Code Generation for Deep Learning Networks by Using cuDNN and Code Generation for Deep Learning Networks by Using TensorRT.

GPU Coder does not require a special pragma to generate kernel calls to libraries. During the code generation process, when you select the Enable cuBLAS option in the GPU Coder app or use config_object.GpuConfig.EnableCUBLAS = true property in CLI, GPU Coder replaces some functionality with calls to the cuBLAS library. When you select the Enable cuSOLVER option in the GPU Coder app or use config_object.GpuConfig.EnableCUSOLVER = true property in CLI, GPU Coder replaces some functionality with calls to the cuSOLVER library. For GPU Coder to replace high-level math functions to library calls, the following conditions must be met:

  • GPU-specific library replacement must exist for these functions.

  • MATLAB® Coder™ data size thresholds must be satisfied.

GPU Coder supports cuFFT, cuSOLVER, and cuBLAS library replacements for the functions listed in the table. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB functions that are mapped to the GPU.

MATLAB FunctionDescriptionMATLAB Coder LAPACK SupportcuBLAS, cuSOLVER, cuFFT, Thrust Support

mtimes

Matrix multiply

Yes

Yes

mldivide (‘\’)

Solve system of linear equation Ax=B for x

Yes

Yes

lu

LU matrix factorization

Yes

Yes

qr

Orthogonal-triangular decomposition

Yes

Partial

det

Matrix determinant

Yes

Yes

inv

Matrix inverse

Yes

Yes

chol

Cholesky factorization

Yes

Yes

rcond

Reciprocal condition number

Yes

Yes

linsolve

Solve system of linear equations Ax=B

Yes

Yes

eig

Eigenvalues and eigen vectors

Yes

No

schur

Schur decomposition

Yes

No

svd

Singular value decomposition

Yes

Partial

fft,fft2,fftn

Fast Fourier Transform

Yes

Yes

ifft,ifft2,ifftn

Inverse Fast Fourier Transform

Yes

Yes

sort

Sort array elements

 

Yes, using gpucoder.sort

When you select the Enable cuFFT option in the GPU Coder app or use config_object.GpuConfig.EnableCUFFT = true property in CLI, GPU Coder maps fft,ifft,fft2,ifft2,fftn.ifftn function calls in your MATLAB code to the appropriate cuFFT library calls. For 2-D transforms and higher, GPU Coder creates multiple 1-D batched transforms. These batched transforms have higher performance than single transforms. GPU Coder only supports out-of-place transforms. If Enable cuFFT is not selected, GPU Coder uses C FFTW libraries where available or generates kernels from portable MATLAB FFT. Both single and double precision data types are supported. Input and output can be real or complex-valued, but real-valued transforms are faster. cuFFT library support input sizes that are typically specified as a power of 2 or a value that can be factored into a product of small prime numbers. In general the smaller the prime factor, the better the performance.

Note

Using CUDA library names such as cufft, cublas, and cudnn as the names of your MATLAB function results in code generation errors.

See Also

| | | | |

Related Topics