coder.gpu.kernel
Pragma that maps for
-loops to GPU kernels
Description
coder.gpu.kernel()
is a loop-level pragma that you must place
immediately before a for
-loop. This pragma generates a kernel and
computes the launch parameters from the loop parameters.
The coder.gpu.kernel
pragma overrides
all parallel loop analysis checks. This override
allows GPU Coder™ to parallelize loops in situations where parallel loop analysis cannot
prove that all iterations are independent. Consider
using coder.gpu.kernelfun
to parallelize loops in functions that pass the
parallel loop analysis check.
Note
Using the coder.gpu.kernel
pragma before a
for
-loop that contains reductions is not
recommended.
coder.gpu.kernel(B,T)
generates a kernel with the dimensions
specified by B
and T
.
B[Bx,By,Bz]
is an array that defines the number of blocks in
the grid along dimensions x
and y
(z
not used). T[Tx,Ty,Tz]
is an array that
defines the number of threads in the block along dimensions x
,
y
, and z
.
A value of -1 for B
and T
indicates that
GPU Coder must infer the grid and block dimensions automatically. The
coder.gpu.kernel
pragma generates errors for invalid grid
and block dimensions.
coder.gpu.kernel(B,T,M,name)
specifies optional arguments
M
and name
. M
is a
positive integer that specifies the minimum number of blocks per streaming
multiprocessor. Increasing M
can reduce the register usage within
a kernel and improve kernel occupancy. A value of -1 for M
indicates that GPU Coder must use the default value of 1. name
is
a character array that allows you to customize the name of the generated
kernel.
This function is a code generation function. It has no effect in MATLAB®.
Examples
Version History
Introduced in R2017b
See Also
Apps
Functions
codegen
|coder.gpu.kernelfun
|stencilfun
|coder.gpu.constantMemory
|gpucoder.reduce
|gpucoder.sort
|coder.gpu.nokernel