gpucoder.matrixMatrixKernel

Optimized GPU implementation of functions containing matrix-matrix operations

Syntax

C = gpucoder.matrixMatrixKernel(fun,A,B)

C = gpucoder.matrixMatrixKernel(___,orientation)

C = gpucoder.matrixMatrixKernel(___,vectorizedSim)

Description

C = gpucoder.matrixMatrixKernel(fun,A,B) generates kernels from functions that contain GEMM-like operations. For example, matching feature points between two images by using:

The sum of absolute differences (SAD) — F() = @(a,b)abs(a-b)
The sum of squared differences (SSD) — F() = @(a,b)(a-b).*(a-b)

C = gpucoder.matrixMatrixKernel(___,orientation) has the optional argument orientation that specifies the orientation of A and B matrices.

C = gpucoder.matrixMatrixKernel(___,vectorizedSim) has the optional argument vectorizedSim that specifies use of vectorized operations during MATLAB^® simulation and CPU code generation. The function handle fun must support vector inputs and take one row or column from A and one column or row from B and outputs a vector equivalent to arrayfun(FUN, A, B).

example

Examples

collapse all

Matrix-Matrix Multiplication

This example performs a simple matrix-matrix multiplication and uses the matrixMatrixKernel design pattern to generate CUDA^® code.

In one file, write an entry-point function matMul_nn that accepts two matrix inputs f1 and f2. Use the MATLAB function @times to multiply f1 and f2 element by element. The sign @ creates a handle to the function times. Insert the gpucoder.matrixMatrixKernel() statement. The input matrices are not transposed, therefore use the 'nn' option.

function scores = matMul_nn(f1, f2)
    scores = gpucoder.matrixMatrixKernel(@times, f1, f2, 'nn',true);
end

Use the codegen function to generate CUDA MEX function.

codegen -config coder.gpuConfig('mex') ...
    -args {ones(1024,1024,'double'),ones(1024,1024,'double')} ...
    -report matMul_nn

The generated CUDA code contains two kernels: matMul_nn_kernel1 for initializing the output matrix scores and matrixMatrixKernel that performs the times operation. The following is a snippet of the generated code.

  cudaMemcpy(*gpu_f2, cpu_f2, 8388608UL, cudaMemcpyHostToDevice);
  matMul_nn_kernel1<<<dim3(2048U, 1U, 1U), dim3(512U, 1U, 1U)>>>(*gpu_f2,
                                                                 *gpu_B);
  cudaMemcpy(*gpu_f1, cpu_f1, 8388608UL, cudaMemcpyHostToDevice);
  matrixMatrixKernel<<<1024U, 64U>>>(*gpu_f1, *gpu_B, *gpu_scores);
  cudaMemcpy(cpu_scores, *gpu_scores, 8388608UL, cudaMemcpyDeviceToHost);

Input Arguments

collapse all

`fun` — Function to apply
function handle

Function to apply to the elements of the input arrays, specified as a function handle. fun is a handle to a user-defined function. It takes one row or column from matrix A and one row or column from matrix B and outputs a vector with the same type as the input. The output vector is then summed to compute a single scalar value in C.

Data Types: function_handle

`A`, `B` — Input array
array

Numeric inputs A and B must be either of the same size or have sizes that are compatible. For example, if A is an M-by-K matrix, B is a K-by-N matrix then C is an M-by-N matrix.

`orientation` — Operation performed on input matrices
'NN' (default) | character vector | string

Character vector or string composed of two characters, indicating the operation performed on the matrices A and B prior to matrix multiplication. Possible values are normal ('N'), transposed ('T'), or complex conjugate transpose ('C').

Possible values are:

'nn' - Matrices A and B are normal.
'nt' - Matrix B is transposed.
'tn' - Matrix A is transposed.
'tt' - Both matrices A and B are transposed.

`vectorizedSim` — Use vectorized operation
false (default) | true

Specify whether to use vectorized operation during MATLAB simulation and CPU code generation.

Output Arguments

collapse all

`C` — Output Array
scalar | vector | matrix

Product, returned as a scalar, vector, or matrix. Array D has the same number of rows as input A and the same number of columns as input B.

gpucoder.matrixMatrixKernel

Syntax

Description

Examples

Matrix-Matrix Multiplication

Input Arguments

`fun` — Function to apply
function handle

`A`, `B` — Input array
array

`orientation` — Operation performed on input matrices
'NN' (default) | character vector | string

`vectorizedSim` — Use vectorized operation
false (default) | true

Output Arguments

`C` — Output Array
scalar | vector | matrix

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

See Also

Apps

Functions

Objects

Topics

gpucoder.matrixMatrixKernel

Syntax

Description

Examples

Matrix-Matrix Multiplication

Input Arguments

fun — Function to apply function handle

A, B — Input array array

orientation — Operation performed on input matrices 'NN' (default) | character vector | string

vectorizedSim — Use vectorized operation false (default) | true

Output Arguments

C — Output Array scalar | vector | matrix

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

See Also

Apps

Functions

Objects

Topics

`fun` — Function to apply
function handle

`A`, `B` — Input array
array

`orientation` — Operation performed on input matrices
'NN' (default) | character vector | string

`vectorizedSim` — Use vectorized operation
false (default) | true

`C` — Output Array
scalar | vector | matrix

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.