Perfomance Loss of Matrix-Vector Multilplication on GPU with Array Indexing

2 vues (au cours des 30 derniers jours)
Hi,
I have a large matrix A and a vector B. I want to do a partial multiplication on GPU using array indexing but the peformance is much lower than doing a full A*B. Below is a simple example of what I am trying to do:
A = rand(20000,'gpuArray');
B = rand(20000,1,'gpuArray');
C = A(8001:18000,1:end)*B;
GPU Device: Tesla V100
MATLAB 2020a
Any suggestion on how to improve the performance? Thank you.

Réponse acceptée

Edric Ellis
Edric Ellis le 30 Avr 2020
Unfortunately, the expression A(8001:18000,:) requires a strided memory copy. Matrices in MATLAB (even on the GPU) are stored in column-major format, so picking out only certain rows is much less efficient than picking out only certain columns.
There's a trick you can use though that takes advantage of the fact that gpuArray matrix multiplication is optimised for the transposed-times case. Try instead pre-transposing A (this is relatively expensive, but perhaps you can do it only once) and then doing:
A(:, 8001:18000).' * B;
This uses the much-faster indexing pattern, and is about ~2x faster on my GPU.
  5 commentaires
Edric Ellis
Edric Ellis le 4 Mai 2020
Strange, I just tried on a WIN64 machine here with a V100, and got the following result:
t1 =
1.6677e-04
t2 =
4.4944e-04
(This was using R2020a).
Afshin Ahmadi
Afshin Ahmadi le 4 Mai 2020
I tried again and it seems your solution is quite fast when the block size is small, which is exactly what I need. Thank you so much for the help! I will just include some information here for the people who are interested in doing the same thing.
A = gpuArray.rand(20000);
B = gpuArray.rand(20000,1);
At = A.';
t1 = gputimeit(@() At(:,500:2000).'*B)
t2 = gputimeit(@() At(:,500:5000).'*B)
t3 = gputimeit(@() At(:,500:10000).'*B)
t4 = gputimeit(@() A(500:2000,:)*B)
t5 = gputimeit(@() A(500:5000,:)*B)
t6 = gputimeit(@() A(500:10000,:)*B)
t7 = gputimeit(@() A*B)
Execution time:
t1 = 4.4423e-04
t2 = 0.0010
t3 = 0.0020
t4 = 0.0035
t5 = 0.0051
t6 = 0.0076
t7 = 0.0044
(MATLAB R2020a, Tesla V100, Linux)

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Programming dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by