Maximum blocks per kernel
Description
Specify the maximum number of CUDA® blocks created during a kernel launch.
Because GPU devices have limited streaming multiprocessor (SM) resources, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading and unloading of blocks.
If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA kernels with striding.
When you specify the maximum number of blocks for each kernel, the code generator
creates 1-D kernels. To force the code generator to create 2-D or 3-D kernels, use the
coder.gpu.kernel
(GPU Coder) pragma. The coder.gpu.kernel
pragma takes
precedence over the maximum number of kernels for each CUDA block.
Category: Code Generation > GPU Code
Settings
Default: 0
Specify the maximum number of CUDA blocks created during a kernel launch.
Dependencies
This parameter requires a GPU Coder™ license.
To enable this parameter, select Generate GPU code on the Code Generation pane.
Command-Line Information
Parameter:
GPUMaximumBlocksPerKernel |
Type: integer |
Value: any valid value |
Default:
0 |