Perfomance Loss of Matrix-Vector Multilplication on GPU with Array Indexing
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Afshin Ahmadi
le 29 Avr 2020
Commenté : Afshin Ahmadi
le 4 Mai 2020
Hi,
I have a large matrix A and a vector B. I want to do a partial multiplication on GPU using array indexing but the peformance is much lower than doing a full A*B. Below is a simple example of what I am trying to do:
A = rand(20000,'gpuArray');
B = rand(20000,1,'gpuArray');
C = A(8001:18000,1:end)*B;
GPU Device: Tesla V100
MATLAB 2020a
Any suggestion on how to improve the performance? Thank you.
0 commentaires
Réponse acceptée
Edric Ellis
le 30 Avr 2020
Unfortunately, the expression A(8001:18000,:) requires a strided memory copy. Matrices in MATLAB (even on the GPU) are stored in column-major format, so picking out only certain rows is much less efficient than picking out only certain columns.
There's a trick you can use though that takes advantage of the fact that gpuArray matrix multiplication is optimised for the transposed-times case. Try instead pre-transposing A (this is relatively expensive, but perhaps you can do it only once) and then doing:
A(:, 8001:18000).' * B;
This uses the much-faster indexing pattern, and is about ~2x faster on my GPU.
5 commentaires
Edric Ellis
le 4 Mai 2020
Strange, I just tried on a WIN64 machine here with a V100, and got the following result:
t1 =
1.6677e-04
t2 =
4.4944e-04
(This was using R2020a).
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur GPU Computing dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!