coder.gpu.kernel
Pragma that maps for
-loops to GPU kernels
Description
coder.gpu.kernel()
is a loop-level pragma that you must place
immediately before a for loop. It generates a kernel with the dimensions computed
from the loop parameters.
Note
The coder.gpu.kernel
pragma overrides all parallel
loop analysis checks that the software performs. Use coder.gpu.kernelfun
first before using the more advanced
functionality of the coder.gpu.kernel
pragma.
Note
Using the coder.gpu.kernel
pragma for loops containing
reductions is not recommended.
coder.gpu.kernel(B,T)
is a loop-level pragma that you must
place immediately before a for loop. It generates a kernel with the dimensions
specified by B
and T
.
B[Bx,By,1]
is an array that defines the number of blocks in
the grid along dimensions x
and y
(z
not used). T[Tx,Ty,Tz]
is an array that
defines the number of threads in the block along dimensions x
,
y
, and z
.
A value of -1 for B
and T
indicates that
GPU Coder™ must infer the grid and block dimensions automatically. The
coder.gpu.kernel
pragma generates errors for invalid grid
and block dimensions.
coder.gpu.kernel(B,T,M,name)
expects the same
B
and T
arguments. You can specify
optional arguments M
and name
.
M
is a positive integer specifying the minimum number of
blocks per streaming multiprocessor. Sometimes, increasing M
can
reduce the register usage within a kernel and improve kernel occupancy. A value of
-1 for M
indicates that GPU Coder must use the default value of
1. name
is a character array that allows you to customize the
name of the generated kernel.
Specifying the kernel pragma overrides all parallel loop analysis checks. This override allows loops to be parallelized in situations where parallel loop analysis cannot prove that all iterations are independent of each other. First, ensure that the loop is safe to parallelize.
This function is a code generation function. It has no effect in MATLAB®.
Examples
Version History
Introduced in R2017b
See Also
Apps
Functions
codegen
|coder.gpu.kernelfun
|gpucoder.stencilKernel
|coder.gpu.constantMemory
|gpucoder.reduce
|gpucoder.sort
|coder.gpu.nokernel