Question about major difference in computation speed with gpuArray's
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hunter Palcich
le 22 Juin 2017
Commenté : Hunter Palcich
le 23 Juin 2017
I've been trying to optimize my code recently for a project and noticed an interesting phenomenon that occurs with it. I have tried to google around what possibly would create it, but nothing so far has had a good answer.
The following code that runs extremely fast is:
x = linspace(-20,20,25);
z = linspace(0,100,29);
Columns =5;
singleframeofdata = gpuArray(rand(2816,128,'single'));
fgpu = gpuArray(rand(2816,1,'single'));
tofgpu = rand(length(z),length(x),128,'single');
SingleFrameOfDatarep = repmat(singleframeofdata,1,length(z)*length(x));
y = -2i*pi*-1*fgpu*reshape(tofgpu,1,size(tofgpu,1)*size(tofgpu,2)*size(tofgpu,3),1);
tic
holder = SingleFrameOfDatarep.*y;
toc
clear holder
tic
SingleFrameOfDatarep = SingleFrameOfDatarep.*y;
toc
The value of holder returns around 0.09s while SingleFrameOfDatarep will return around 0.00009s. Now i know that because the second calculation uses in place operations it will operate faster.
However, if i change x = linspace(-20,20,25) to x = linspace(-20,20,26) a drastic slow occurs. The value of holder returns around 0.09s again while SingleFrameOfDatarep will return around 0.07s. The original code ran ~ 770X faster than the second code.
Now my only thought/explanation on this is that when the elements of an array gets too large, matlab will create a new variable like it does for holder and this allocation time is where the slowdown occurs but i am not fully sure about this nor do i know how to test/check for this.
Could anyone point me in the correct direction to read on this or give a possible explanation/solution for this?
1 commentaire
Joss Knight
le 22 Juin 2017
I can't check your code right now but I can say two things. Firstly MATLAB does have a memory pool and when GPU memory overflows the pool there are raw allocations; those allocations force synchronization and that's slow. Secondly, your timing with tic and toc is flawed because the GPU operates asynchronously. This means when toc is reporting the time the previous command is still running. What happens when you insert wait(gpuDevice) before each tic and before each toc? You may find the timings change completely.
Finally, you should use gpuArray.rand not gpuArray(rand(...)). The former creates random data directly on the GPU; the latter does it slowly on the CPU then copies the data over to the device.
Réponse acceptée
Matt J
le 22 Juin 2017
Modifié(e) : Matt J
le 22 Juin 2017
The times that you see are probably false. You shouldn't be using tic() and toc() to time GPU operations. You should be using gputimeit(), as below. I see no significant speed difference between any of the cases that you tested, when implemented this way.
x = linspace(-20,20,25);
z = linspace(0,100,29);
singleframeofdata = gpuArray(rand(2816,128,'single'));
fgpu = gpuArray(rand(2816,1,'single'));
tofgpu = rand(length(z),length(x),128,'single');
SingleFrameOfDatarep = repmat(singleframeofdata,1,length(z)*length(x));
y = -2i*pi*-1*fgpu*reshape(tofgpu,1,size(tofgpu,1)*size(tofgpu,2)*size(tofgpu,3),1);
gputimeit(@() fun(SingleFrameOfDatarep,y) )
gputimeit(@() hfun(SingleFrameOfDatarep,y) )
function SingleFrameOfDatarep=fun(SingleFrameOfDatarep,y)
SingleFrameOfDatarep=SingleFrameOfDatarep.*y;
function holder=hfun(SingleFrameOfDatarep,y)
holder=SingleFrameOfDatarep.*y;
Incidentally also, your code should get a bit faster (and certainly conserve memory) if you use bsxfun instead of repmat.
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur GPU Computing in MATLAB dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!