Perfomance Loss of Matrix-Vector Multilplication on GPU with Array Indexing
2 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Afshin Ahmadi
le 29 Avr 2020
Commenté : Afshin Ahmadi
le 4 Mai 2020
Hi,
I have a large matrix A and a vector B. I want to do a partial multiplication on GPU using array indexing but the peformance is much lower than doing a full A*B. Below is a simple example of what I am trying to do:
A = rand(20000,'gpuArray');
B = rand(20000,1,'gpuArray');
C = A(8001:18000,1:end)*B;
GPU Device: Tesla V100
MATLAB 2020a
Any suggestion on how to improve the performance? Thank you.
0 commentaires
Réponse acceptée
Edric Ellis
le 30 Avr 2020
Unfortunately, the expression A(8001:18000,:) requires a strided memory copy. Matrices in MATLAB (even on the GPU) are stored in column-major format, so picking out only certain rows is much less efficient than picking out only certain columns.
There's a trick you can use though that takes advantage of the fact that gpuArray matrix multiplication is optimised for the transposed-times case. Try instead pre-transposing A (this is relatively expensive, but perhaps you can do it only once) and then doing:
A(:, 8001:18000).' * B;
This uses the much-faster indexing pattern, and is about ~2x faster on my GPU.
5 commentaires
Edric Ellis
le 4 Mai 2020
Strange, I just tried on a WIN64 machine here with a V100, and got the following result:
t1 =
1.6677e-04
t2 =
4.4944e-04
(This was using R2020a).
Plus de réponses (0)
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!