Improve Performance of Small Matrix Problems on the GPU Using `pagefun`

Open Live Script

This example shows how to use pagefun to improve the performance of independent operations applied to multiple matrices arranged in a multidimensional array.

Multidimensional arrays are an extension of 2-D matrices and use additional subscripts for indexing. A 3-D array, for example uses three subscripts. The first two dimensions represent a matrix and the third represents pages of elements (sometimes referred to as slices). For more information, see Multidimensional Arrays.

A 4-by-4-by-3 multidimensional array.

While GPUs can effectively apply small independent operations to large matrices, performance is suboptimal when these operations are applied in serial, for example when the operations are applied in a for-loop. In order to avoid serial processing, the arrayfun function applies a scalar operation to each element of an array in parallel on the GPU. Similarly, the pagefun function applies a function to each page of a multidimensional GPU array.

The pagefun function supports applying most element-wise functions and a number of matrix operations that support GPU array input. MATLAB® also provides a number of dedicated page-wise functions, including pagemtimes, pagemldivide, pagemrdivide, pagetranspose, pagectranspose, pageinv, pagenorm, and pagesvd. Depending on the task, these functions might simplify your code or provide better performance than using pagefun.

In this example, a robot is navigating a known map containing a large number of features that the robot can identify using its sensors. The robot locates itself in the map by measuring the relative position and orientation of those features and comparing them to the map locations. Assuming the robot is not completely lost, it can use any difference between the two to correct its position, for instance by using a Kalman Filter. This example shows an efficient way to compute the feature positions relative to the robot.

Set Up the Map

Define the dimensions of a room containing a number of features.

roomDimensions = [50 50 5];

The supporting function randomTransforms is provided at the end of this example and initializes N transforms with random values, providing a structure as output. It represents positions and orientations using 3-by-1 vectors T and 3-by-3 rotation matrices R. The N translations are packed into a 3-by-N matrix and the rotations are packed into a 3-by-3-by-N array.

Use the randomTransforms function to set up a map of 1000 features, and a start location for the robot.

numFeatures = 1000;
Map = randomTransforms(numFeatures,roomDimensions);
Robot = randomTransforms(1,roomDimensions);

The plotRobot function is provided as a supporting file with this example and plots a top-down view of the room, and a close up view of the robot and nearby features. The robot is represented by a blue box with wheels and the features are represented by red circles with accompanying lines representing their orientation. To use this function, open the example as a live script.

Call the plotRobot function.

plotRobot(Robot,Map)

Figure contains 2 axes objects. Axes object 1 with xlabel x, ylabel y contains 8 objects of type line, surface, quiver. Axes object 2 with xlabel x, ylabel y contains 8 objects of type quiver, surface, line.

Define the Equations

To correctly identify the features in the map, the robot needs to transform the map to put its sensors at the origin. Then it can find map features by comparing what it sees with what it expects to see.

For a map feature $i$ we can find its position relative to the robot $T_{r e l} (i)$ and orientation $R_{r e l} (i)$ by transforming its global map location:

$\begin{array}{l} R_{r e l} (i) = R_{b o t}^{⊤} R_{m a p} (i) \\ T_{r e l} (i) = R_{b o t}^{⊤} (T_{m a p} (i) - T_{b o t}) \end{array}$

where $T_{b o t}$ and $R_{b o t}$ are the position and orientation of the robot, and $T_{m a p} (i)$ and $R_{m a p} (i)$ represent the map data. The equivalent MATLAB code looks like this:

Rrel(:,:,i) = Rbot' * Rmap(:,:,i)
Trel(:,i) = Rbot' * (Tmap(:,i) - Tbot)

Perform Matrix Transforms on the CPU Using a `for`-loop

The supporting function loopingTransform is provided at the end of this example and loops over all the transforms in turn, transforming each feature to its location relative to the robot. Note the like name-value argument for zeros function which makes the function return an array of zeros of the same data type as a prototype array. For example, if the prototype array is a gpuArray, then zeros returns a gpuArray. This allows you to use the same code on the GPU in the next section.

Time the calculations using the timeit function. The timeit function times the execution of loopingTransform multiple times and returns the median of the measurements. Since timeit requires a function with no arguments, use the @() syntax to create an anonymous function of the right form.

cpuTime = timeit(@()loopingTransform(Robot,Map,numFeatures))

cpuTime = 
0.0048

Perform Matrix Transforms on the GPU Using a `for`-loop

To run the same code on the GPU, simply pass the input data to the function as a gpuArray. A gpuArray represents an array stored in GPU memory. Many functions in MATLAB and in other toolboxes support gpuArray objects, allowing you to run your code on GPUs with minimal changes to the code. For more information, see Run MATLAB Functions on a GPU.

Ensure that your desired GPU is available and selected.

gpu = gpuDevice;
disp(gpu.Name + " GPU selected.")

NVIDIA RTX A5000 GPU selected.

Create GPU arrays containing the position and orientation of the robot and the features in the map.

gMap.R = gpuArray(Map.R);
gMap.T = gpuArray(Map.T);
gRobot.R = gpuArray(Robot.R);
gRobot.T = gpuArray(Robot.T);

Time the calculations using the gputimeit function. The gputimeit function is the equivalent of timeit for code that includes GPU computation. It makes sure all GPU operations have finished before recording the time.

gpuTime = gputimeit(@()loopingTransform(gRobot,gMap,numFeatures))

gpuTime = 
0.1842

Perform Matrix Transforms on the GPU Using `pagefun`

The GPU version is very slow because, although all calculations were independent, they ran in series. Using pagefun we can run all the computations in parallel.

The supporting function pagefunTransform is provided at the end of this example and applies the same transforms as the loopingTransform function using pagefun instead of a for-loop. The first computation is the calculation of the rotations. This involves a matrix multiply, which translates to the function mtimes (*). Pass this to pagefun along with the two sets of rotations to be multiplied:

Rel.R = pagefun(@mtimes,Robot.R',Map.R);

Robot.R' is a 3-by-3 matrix, and Map.R is a 3-by-3-by-N array. The pagefun function matches each independent matrix from the map to the same robot rotation, and gives us the required 3-by-3-by-N output.

The translation calculation also involves a matrix multiply, but the normal rules of matrix multiplication allow this to come outside the loop without any changes:

Rel.T = Robot.R' * (Map.T - Robot.T);

Time the calculations using the gputimeit function.

gpuPagefunTime = gputimeit(@()pagefunTransform(gRobot,gMap))

gpuPagefunTime = 
4.9757e-04

Compare Results

Plot the timing results.

figure
labels = categorical(["CPU Execution","GPU Execution","GPU Execution with \fontname{consolas}pagefun"]);
bar(labels,[cpuTime,gpuTime,gpuPagefunTime])
ylabel("Execution Time (s)")
set(gca,YScale="log")

Figure contains an axes object. The axes object with ylabel Execution Time (s) contains an object of type bar.

Calculate how much faster the execution using pagefun is than CPU and simple GPU execution.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than on the CPU.\n", ...
    cpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 9.63 times faster than on the CPU.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than using for-loops on the GPU.\n", ...
    gpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 370.29 times faster than using for-loops on the GPU.

Locate a Lost Robot Using Multiple Possible Robot Positions

If the robot is in an unknown part of the map, it can use a global search algorithm to locate itself. The algorithm tests a number of possible locations by carrying out the above computation and looking for good correspondence between the features seen by the robot's sensors and what it would expect to see at that position.

Now there are multiple possible robot positions as well as multiple features. N features and M robots requires N*M transforms. To distinguish 'robot space' from 'feature space', use the 4th dimension for rotations and the 3rd for translations. That means that the robot rotations will be 3-by-3-by-1-by-M, and the translations will be 3-by-1-by-M.

Initialize the search with ten random robot locations. A good search algorithm would use topological or other clues to seed the search more intelligently.

numRobots = 10;
Robot = randomTransforms(numRobots,roomDimensions);
Robot.R = reshape(Robot.R,3,3,1,[]); % Spread along the 4th dimension
Robot.T = reshape(Robot.T,3,1,[]); % Spread along the 3rd dimension

A supporting function loopingTransform2 is defined at the end of this example and performs a looping transform using two nested loops, to loop over the robots as well as over the features.

Time the calculations using timeit.

cpuTime = timeit(@()loopingTransform2(Robot,Map,numFeatures,numRobots))

cpuTime = 
0.0731

Create GPU arrays containing the robot rotations and translations.

gRobot.R = gpuArray(Robot.R);
gRobot.T = gpuArray(Robot.T);

Time the calculations on the GPU using gputimeit.

gpuTime = gputimeit(@() loopingTransform2(gRobot,gMap,numFeatures,numRobots))

gpuTime = 
2.1571

As before, the looping version runs much slower on the GPU because it is not doing calculations in parallel.

A supporting function pagefunTransform2 is provided at the end of this example and applies the same transforms as the loopingTransform2 function using two pagefun calls instead of nested for-loops. This function needs to incorporate the transpose operator as well as mtimes into a call to pagefun. The function also applies the squeeze function to the transposed robot orientations to put the spread over robots into the 3rd dimension, to match the translations. Despite this, the resulting code is considerably more compact.

The pagefun function expands dimensions appropriately so where we multiply 3-by-3-by-1-by-M matrix Rt with 3-by-3-by-N-by-1 matrix Map.R, we get a 3-by-3-by-N-by-M matrix out.

Time the calculations on the GPU using gputimeit.

gpuPagefunTime = gputimeit(@()pagefunTransform2(gRobot,gMap))

gpuPagefunTime = 
0.0016

Compare Results

Plot the timing results.

labels = categorical(["CPU Execution","GPU Execution","GPU Execution with \fontname{consolas}pagefun"]);
bar(labels,[cpuTime,gpuTime,gpuPagefunTime])
ylabel("Execution Time (s)")
set(gca,YScale="log")

Figure contains an axes object. The axes object with ylabel Execution Time (s) contains an object of type bar.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than on the CPU.\n", ...
    cpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 45.61 times faster than on the CPU.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than using nested for-loops on the GPU.\n", ...
    gpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 1346.60 times faster than using nested for-loops on the GPU.

Conclusion

The pagefun function supports a number of 2-D operations, as well as most of the scalar operations supported by arrayfun. Together, these functions allow you to vectorize a range of computations involving matrix algebra and array manipulation, eliminating the need for loops and making huge performance gains.

Wherever you are doing small calculations on GPU data in a loop, you should consider converting to a vectorized implementation in this way. This can also be an opportunity to make use of the GPU to improve performance where previously it gave no performance gains.

Supporting Functions

Random Transform Function

The randomTransforms function creates matrices defining N random transforms in a room of specified dimensions. Each transform comprises a random translation T and a random rotation R. The function can be used to set up a map of features in a room and the starting position and orientation of a robot.

function Tform = randomTransforms(N,roomDimensions)
% Preallocate matrices.
Tform.T = zeros(3,N);
Tform.R = zeros(3,3,N);

for i = 1:N
    % Create random translation.
    Tform.T(:,i) = rand(3,1) .* roomDimensions';

    % Create random rotation by extracting an orthonormal
    % basis from a random 3-by-3 matrix.
    Tform.R(:,:,i) = orth(rand(3,3));
end

end

Looping Transform Function

The loopingTransform function transforms every feature to its location relative to the robot by looping over the transforms in turn.

function Rel = loopingTransform(Robot,Map,numFeatures)
% Preallocate matrices.
Rel.R = zeros(size(Map.R),like=Map.R);
Rel.T = zeros(size(Map.T),like=Map.T);

for i = 1:numFeatures
    % Find orientation of map feature relative to the robot.
    Rel.R(:,:,i) = Robot.R' * Map.R(:,:,i);
    % Find position of map feature relative to the robot.
    Rel.T(:,i) = Robot.R' * (Map.T(:,i) - Robot.T);
end

end

`pagefun` Transform Function

The pagefunTransform function transforms every feature to its location relative to the robot by applying the transforms using the pagefun function.

function Rel = pagefunTransform(Robot,Map)
% Find orientation of map feature relative to the robot.
Rel.R = pagefun(@mtimes,Robot.R', Map.R);
% Apply translation.
Rel.T = Robot.R' * (Map.T - Robot.T);
end

Nested Looping Transform Function

The loopingTransform2 function performs a looping transform using two nested loops, to loop over the robots as well as over the features. The transforms map every feature to its location relative to every robot.

function Rel = loopingTransform2(Robot,Map,numFeatures,numRobots)
% Preallocate matrices.
Rel.R = zeros(3,3,numFeatures,numRobots,like=Map.R);
Rel.T = zeros(3,numFeatures,numRobots,like=Map.T);

for i = 1:numFeatures
    for j = 1:numRobots
        % Find orientation of map feature relative to the robot.
        Rel.R(:,:,i,j) = Robot.R(:,:,1,j)' * Map.R(:,:,i);
        % Find position of map feature relative to the robot.
        Rel.T(:,i,j) = ...
            Robot.R(:,:,1,j)' * (Map.T(:,i) - Robot.T(:,1,j));
    end
end

end

Two-call `pagefun` Transform Function

The pagefunTransform2 function performs transforms to map every feature to its location relative to every robot using two calls to the pagefun function.

function Rel = pagefunTransform2(Robot,Map)
% Find orientation of map feature relative to the robot.
Rt = pagefun(@transpose,Robot.R);
Rel.R = pagefun(@mtimes,Rt,Map.R);
% Find position of map feature relative to the robot.
Rel.T = pagefun(@mtimes,squeeze(Rt), ...
    (Map.T - Robot.T));
end

Improve Performance of Small Matrix Problems on the GPU Using `pagefun`

Set Up the Map

Define the Equations

Perform Matrix Transforms on the CPU Using a `for`-loop

Perform Matrix Transforms on the GPU Using a `for`-loop

Perform Matrix Transforms on the GPU Using `pagefun`

Compare Results

Locate a Lost Robot Using Multiple Possible Robot Positions

Compare Results

Conclusion

Supporting Functions

Random Transform Function

Looping Transform Function

`pagefun` Transform Function

Nested Looping Transform Function

Two-call `pagefun` Transform Function

See Also

Topics

Improve Performance of Small Matrix Problems on the GPU Using pagefun

Set Up the Map

Define the Equations

Perform Matrix Transforms on the CPU Using a for-loop

Perform Matrix Transforms on the GPU Using a for-loop

Perform Matrix Transforms on the GPU Using pagefun

Compare Results

Locate a Lost Robot Using Multiple Possible Robot Positions

Compare Results

Conclusion

Supporting Functions

Random Transform Function

Looping Transform Function

pagefun Transform Function

Nested Looping Transform Function

Two-call pagefun Transform Function

See Also

Topics

Improve Performance of Small Matrix Problems on the GPU Using `pagefun`

Perform Matrix Transforms on the CPU Using a `for`-loop

Perform Matrix Transforms on the GPU Using a `for`-loop

Perform Matrix Transforms on the GPU Using `pagefun`

`pagefun` Transform Function

Two-call `pagefun` Transform Function