gpucoder.reduce

Optimized GPU implementation for reduction operations

Syntax

S = gpucoder.reduce(A,FUN)

S = gpucoder.reduce(A,{FUN_1,FUN_2,...,FUN_N})

S = gpucoder.reduce(___,Name=Value)

Description

S = gpucoder.reduce(A,FUN) aggregates the values in the input array A into the value S by using the function handle FUN. If A is empty, then S is a 0-by-0 array.

example

S = gpucoder.reduce(A,{FUN_1,FUN_2,...,FUN_N}) aggregates the values in the input array to N values by using the N function handles in the cell array. The output S is a 1-by-N array, where N is the number of function handles.

The generated GPU code uses CUDA^® shfl intrinsics to perform reduction operations on the GPU. The function performs the reduction for each function handle inside a single kernel on the GPU.

example

S = gpucoder.reduce(___,Name=Value) aggregates the values in the input array using the options specified by one or more name-value arguments.

example

Examples

collapse all

Aggregate Values of Arrays

This example shows how to aggregate the values of an array.

Consider this function sumArray. The function returns the sum of the elements of an array. To calculate the sum as a reduction, it uses plus.

function y = sumArray(X) %#codegen
y = 0;
if ~isempty(X)
    y = gpucoder.reduce(X,@plus);
end
end

Create a vector A, and calculate its sum using sumArray.

A = 1:10;
sumArray(A)

ans =

    55

Create a 256-by-256 array of ones named X. Generate code from sumArray. The generated code contains a kernel for the call to gpucoder.reduce.

X = ones(256);
cfg = coder.gpuConfig("mex");
codegen sumArray.m -args {X} -config cfg;

Call the generated MEX function on the array X. The MEX function uses a reduction kernel to calculate the result.

sumArray_mex(X)

ans =

       65536

Aggregate Range of Values in an Array

Find the range of values in an array by using a single GPU kernel.

Consider the function getRange which finds the maximum and minimum value of an input x and returns the difference.

function r = getRange(x)
r = max(x,[],"all")-min(x,[],"all");
end

To calculate the maximum and minimum values in the same GPU kernel, use gpucoder.reduce with max and min as the reduction operators.

function r = getRange(x)
S = gpucoder.reduce(x,{@max,@min});
r = S(1)-S(2);
end

Create a vector A, and compute its range by using getRange.

A = [62,62,11,63,62,32,52,10];
getRange(A)

ans =

    53

Create a 100-element vector, X, by using the sin function. Generate code from getRange, and use the generated MEX function, getRange_mex, to calculate the range of values in X. The range is approximately equal to 2.

X = sin(0:100);
cfg = coder.gpuConfig("mex");
codegen getRange -args {X} -config cfg;
getRange_mex(X)

ans =

    1.9999

Determine if Values of Array Exceed a Threshold

Check if all of the elements of an array are below a threshold.

Write an entry-point function named isBelowThreshold that accepts the matrix input A and the threshold value t. Define a function handle, fh, that checks if a scalar value is less than t. To check if each value of the input is less than t, use gpucoder.reduce with and as the reduction function and fh as the preprocessing function.

function out = isBelowThreshold(A,t)
out = true;
if ~isempty(A)
    fh = @(x) x<t;
    out = gpucoder.reduce(A,@and,preprocess=fh);
end

Define a 3-by-3 matrix, A, and use a threshold value of 64. Check whether the elements of A are each below the threshold value. The matrix has an element greater than the threshold value.

A = [30,68,60;65,46,66;56,3,48];
t = 64;
isBelowThreshold(A,t)

ans =

  logical

   0

Generate code for isBelowThreshold. The generated code uses GPU kernels to preprocess the array and perform the reduction.

X = rand(128);
t = 0.5;
cfg = coder.gpuConfig("mex");
codegen isBelowThreshold -config cfg -args {X,t}

Find Minimum and Sum of Rows in Arrays

Generate a CUDA MEX function that finds the minimum and sum of the rows of an array.

Write an entry-point function named minAndSum that accepts the matrix input A. To find the minimum and sum of the rows, use the gpucoder.reduce function with min and plus as reduction operators. Specify the dimension dim as 2.

function [s1, s2] = minAndSum(A) %codegen
[s1, s2] = gpucoder.reduce(A,{@min,@plus},dim=2);
end

Create a 3-by-3 matrix, A. Assign the minimum and sum of the rows of A to the variables s1 and s2, respectively.

A = [2,8,4;8,7,1;7,4,7];
[s1,s2] = minAndSum(A)

Create a 32-by-32 input array, X. Run the codegen command to generate the CUDA MEX function minAndSum_mex.

X = rand(32);
cfg = coder.gpuConfig('mex');
codegen minAndSum.m -args {X} -config cfg;

Call minAndSum_mex on X.

[s1,s2] = minAndSum_mex(X);

Input Arguments

collapse all

`A` — Input array
vector | matrix | array

Input array, specified as a vector, matrix, or array. For code generation, the elements of the input array must be of numeric or logical data type. If A is empty, then gpucoder.reduce returns a 0-by-0 array.

`FUN` — User-defined function
function handle

User-defined function, specified as a named or anonymous function handle. The function handle is a binary function and must:

Accept two inputs and return one output. The type of the inputs and output to the function must match the type of the preprocessed input array.
Be commutative and associative. Otherwise, the behavior of the function is undefined.

If FUN is anonymous, it can refer to variables that exist in the scope where you define the function. You can use these variables in the reduction function in addition to the two input arguments to FUN.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: gpucoder.reduce(A,{@min,@max},dim=2);

`dim` — Reduction dimension
positive integer scalar

Reduction dimension, specified as a positive integer scalar. When you specify dim, gpucoder.reduce returns N output arguments, where N is the number of function handles. Each output is the result of reducing A along the dimension with the corresponding function handle.

Example: gpucoder.reduce(A,{@min,@max},dim=2);

`preprocess` — Preprocessing function
function handle

Preprocessing function, specified as a named or anonymous function handle. By default, gpucoder.reduce does not preprocess the input array.

If the preprocess function handle is anonymous, you can refer to variables that exist in the scope where you define the function. You can create preprocessing functions that refer to these variables as well as the input array.

Example: gpucoder.reduce(A,@min,preprocess=@myScale);

Output Arguments

collapse all

`S` — Result of reduction operation
scalar | vector | matrix | array

Result of the reduction operation, returned as a scalar, vector, matrix, or array. If A is empty, S is a 0-by-0 array. Otherwise, the function initializes S using the values of the preprocessed input array. Then, S takes the actions in the table.

Number of Function Handles	Input Argument `dim`	Output `S`
1	Unspecified	`S` is a scalar.
N	Unspecified	`S` is a 1-by-N array, where N is the number of function handles you pass to `gpucoder.reduce`. Each element of `S` is the result of reducing `A` by using the corresponding function handle.
1	Specified	`S` is the result of applying the function handle `FUN` along the dimension `dim`. The size of `S` in the dimension specified by `dim` is equal to 1, but the other dimensions of `S` match the size of the corresponding dimension of `A`. For example, if `A` is an 8-by-16-by-32 array, and `dim` is `2`, then `S` is an 8-by-1-by-32 array.
N	Specified	`gpucoder.reduce` returns N output arguments, where N is the number of function handles you pass to `gpucoder.reduce`. Each output `S` is the result of reducing the array along the dimension `dim` by using the corresponding function handle. The size of `S` in the dimension specified by `dim` is equal to 1, but the other dimensions of `S` match the size of the corresponding dimension of `A`. For example, if `A` is an 8-by-16-by-32 array, and `dim` is `2`, then `S` is an 8-by-1-by-32 array.

Limitations

gpucoder.reduce does not support reducing complex arrays.
For code generation, gpucoder.reduce accepts a limited number of user-defined function handles based on the size of the output data type. For example, you can input up to 46 function handles that output the half data type or up to 11 function handles that output the double data type. If you input too many function handles, code generation generates an error.
For inputs of an integer data type, the generated code may contain intermediate computations that reach saturation. In this case, the results from the generated code may not match the simulation results from MATLAB^®.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

Introduced in R2019b

expand all

R2024b: Use `half` data type for input arrays and anonymous function handles

You can use input arrays that have the half data type. You can also use anonymous function handles for the reduction and preprocessing functions.

R2024b: Improved performance when specifying the reduction dimension

The code generated for gpucoder.reduce has improved performance when you specify the dimension name-value argument dim.

gpucoder.reduce

Syntax

Description

Examples

Aggregate Values of Arrays

Aggregate Range of Values in an Array

Determine if Values of Array Exceed a Threshold

Find Minimum and Sum of Rows in Arrays

Input Arguments

`A` — Input array
vector | matrix | array

`FUN` — User-defined function
function handle

Name-Value Arguments

`dim` — Reduction dimension
positive integer scalar

`preprocess` — Preprocessing function
function handle

Output Arguments

`S` — Result of reduction operation
scalar | vector | matrix | array

Limitations

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2024b: Use `half` data type for input arrays and anonymous function handles

R2024b: Improved performance when specifying the reduction dimension

See Also

Apps

Functions

Objects

Topics

gpucoder.reduce

Syntax

Description

Examples

Aggregate Values of Arrays

Aggregate Range of Values in an Array

Determine if Values of Array Exceed a Threshold

Find Minimum and Sum of Rows in Arrays

Input Arguments

A — Input array vector | matrix | array

FUN — User-defined function function handle

Name-Value Arguments

dim — Reduction dimension positive integer scalar

preprocess — Preprocessing function function handle

Output Arguments

S — Result of reduction operation scalar | vector | matrix | array

Limitations

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2024b: Use half data type for input arrays and anonymous function handles

R2024b: Improved performance when specifying the reduction dimension

See Also

Apps

Functions

Objects

Topics

`A` — Input array
vector | matrix | array

`FUN` — User-defined function
function handle

`dim` — Reduction dimension
positive integer scalar

`preprocess` — Preprocessing function
function handle

`S` — Result of reduction operation
scalar | vector | matrix | array

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

R2024b: Use `half` data type for input arrays and anonymous function handles