Main Content

Pass GPU Inputs to Entry-Point Functions

Since R2024a

This example shows how to configure GPU Coder to pass GPU inputs to entry-point functions and produce GPU outputs. It can improve the performance of the generated code when you integrate the code with a system that produces and consumes data on GPU. When you create inputs on GPU in the caller of entry-point function and access them on the GPU in the entry-point function, you can avoid creating unnecessary memory copies between CPU and GPU. It also avoids unnecessary memory copy for outputs.

Third-Party Prerequisites

  • CUDA-enabled NVIDIA® GPU and compatible driver.

Verify GPU Environment

To verify that the compilers and libraries necessary for running this example are set up correctly, use the coder.checkGpuInstall function.

envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

The sobelEdgeDetection Entry-Point Function

The sobelEdgeDetection entry-point function is a sobel edge detection algorithm that takes a image input and produces image output that shows the edges.

type sobelEdgeDetection.m
function outputImg = sobelEdgeDetection(inputImg)
%

% Copyright 2023 The MathWorks, Inc.
    coder.gpu.kernelfun();
    inputSize = size(inputImg);
    outputSize = inputSize -2;
    outputImg = zeros(outputSize, 'like', inputImg);
    inputImg = double(inputImg);
    for colIdx = 1:outputSize(2)
        for rowIdx = 1:outputSize(1)
            hDiff = inputImg(rowIdx, colIdx) + 2* inputImg(rowIdx, colIdx+1) + inputImg(rowIdx,colIdx + 2) - ...
                inputImg(rowIdx + 2, colIdx) - 2* inputImg(rowIdx + 2, colIdx+1) - inputImg(rowIdx + 2,colIdx + 2);
            vDiff = inputImg(rowIdx, colIdx) + 2* inputImg(rowIdx + 1, colIdx) + inputImg(rowIdx + 2,colIdx) - ...
                inputImg(rowIdx, colIdx + 2) - 2* inputImg(rowIdx + 1, colIdx + 2) - inputImg(rowIdx + 2,colIdx + 2);
            diff = hDiff*hDiff + vDiff*vDiff;
            if diff > 3600
                outputImg(rowIdx, colIdx) = 255;
            else
                outputImg(rowIdx, colIdx) = 0;
            end
        end
    end
end

Generate GPU Code and Run gpuPerformanceAnalyzer on CPU

Use coder.gpuConfig to create a GPU code configuration object and use codegen command to generate MEX function.

cfg = coder.gpuConfig('mex');
imRGB = imread('peppers.png');
imGray = rgb2gray(imRGB);
codegen -config cfg -args {imGray} sobelEdgeDetection
Code generation successful.
gpuPerformanceAnalyzer('sobelEdgeDetection', {imGray}, Config=cfg, OutFolder='sobleEdgeWithCPUIO');
### Starting GPU code generation
Code generation successful: View report

### GPU code generation finished
### Starting application profiling
### Application profiling finished
### Starting profiling data processing
### Profiling data processing finished
### Showing profiling data

By default, GPU Coder expects the inputs from the CPU and produces the output on the CPU. It copies the data from CPU to GPU before running computation on GPU and copies the results back to CPU.

The GPU Performance Analyzer report shows that memory copies takes most of the time.

Generate GPU Code and Run gpuPerformanceAnalyzer on GPU

The sobel edge detection algorithm passes the input immediately to the GPU to compute the edges and produces the final results on the GPU. If algorithm passes the inputs to and takes the outputs from the GPU, it does not require any memory copies.

Pass the inputs to the GPU by using the gpuArray function. When you pass inputs to the GPU, GPU Coder produces the outputs on the GPU when the GPU output types are supported.

imGrayGpu = gpuArray(imGray);
codegen -config cfg -args {imGrayGpu} sobelEdgeDetection
Code generation successful.

You can also use coder.typeof to represent inputs on the GPU.

inputImg = coder.typeof(imGray, 'Gpu', true);
codegen -config cfg -args {inputImg} sobelEdgeDetection
Code generation successful.

Run gpuPerformanceAnalyzer with inputs and outputs on the GPU.

gpuPerformanceAnalyzer('sobelEdgeDetection', {imGrayGpu}, Config=cfg, OutFolder='sobleEdgeWithGPUIO');
### Starting GPU code generation
Code generation successful: View report

### GPU code generation finished
### Starting application profiling
### Application profiling finished
### Starting profiling data processing
### Profiling data processing finished
### Showing profiling data

With the inputs and outputs on the GPU, there are no GPU memory copies in the entry-point function.

See Also

Functions

Objects

Related Topics