GPU arrayfun is so slow, what is going on?
Afficher commentaires plus anciens
Hi,
I am trying to understand what the GPU arrayfun is doing? The following is a test code.
clear;clc;close all
gd=gpuDevice();
reset(gd);
N=2e3;
a=rand(60,N,'single','gpuArray');
tic;
b=sum(a,1);
wait(gd);
toc;
tic;
c=arrayfun(@(i) sum(a(:,i),1),(1:N));
wait(gd);
toc;
The results are:
Elapsed time is 0.000468 seconds.
Elapsed time is 0.584521 seconds.
What is going on here? 1000 times difference?? I would expect similary runtime since GPU arrayfun is supposed to be executed parallel on GPU cores. Did I make stupid errors on using the arrayfun?
Thanks!
1 commentaire
Hao Zhang
le 13 Déc 2018
Réponse acceptée
Plus de réponses (1)
Joss Knight
le 11 Déc 2018
You haven't called GPU arrayfun here, you've called CPU arrayfun and in the arrayfun function you are doing stuff on the GPU. This is because none of the arguments to your arrayfun call is a gpuArray.
You could force it to use GPU arrayfun by converting your input:
c = arrayfun(@(i) sum(a(:,i),1), gpuArray(1:N));
However, you'll immediately find it errors, because sum is not supported for GPU arrayfun. Obviously this is just a toy example, but the solution here is sum(a,1), not arrayfun.
4 commentaires
Hao Zhang
le 11 Déc 2018
I need to do c=sum(a.*repmat(b,1,2e3),1).
Your operation does not require repmat nor even sum, but simply
c=b.'*a;
But even if you were to use sum, recent Matlab no longer requires repmat, e.g.,
a=gpuArray.rand(60,2e3); b=a(:,1);
c=sum(a.*b,1);
will work fine.
Joss Knight
le 11 Déc 2018
Thanks Matt.
GPU arrayfun is very special, you should read the documentation and list of supported functions. It only supports element-wise functionality, so you can't do any vector operations. That means you can't index an array unless you're indexing a single element or an up-level variable, you can't call sum or any other reduction or accumulation, and you can't output anything other than a scalar. This is because your arrayfun function gets compiled into a single CUDA kernel with no inter-thread communication. So it's incredibly useful and efficient when used within its limitations.
Nearly always (in my experience) when you want to do something more complex with vector operations, you can translate your code into a series of vectorized calls to normal MATLAB matrix functions, arrayfun, and pagefun.
Hao Zhang
le 11 Déc 2018
Catégories
En savoir plus sur GPU Computing dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
