Question about major difference in computation speed with gpuArray's

Question

Hunter Palcich le 22 Juin 2017

1
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/345878-question-about-major-difference-in-computation-speed-with-gpuarray-s

Commenté : Hunter Palcich le 23 Juin 2017

I've been trying to optimize my code recently for a project and noticed an interesting phenomenon that occurs with it. I have tried to google around what possibly would create it, but nothing so far has had a good answer.

The following code that runs extremely fast is:

x = linspace(-20,20,25);
z = linspace(0,100,29);
Columns =5;
singleframeofdata = gpuArray(rand(2816,128,'single'));
fgpu = gpuArray(rand(2816,1,'single'));
tofgpu = rand(length(z),length(x),128,'single');
SingleFrameOfDatarep = repmat(singleframeofdata,1,length(z)*length(x));
y = -2i*pi*-1*fgpu*reshape(tofgpu,1,size(tofgpu,1)*size(tofgpu,2)*size(tofgpu,3),1);
tic
 holder = SingleFrameOfDatarep.*y;
toc
clear holder
tic
 SingleFrameOfDatarep = SingleFrameOfDatarep.*y;
toc

The value of holder returns around 0.09s while SingleFrameOfDatarep will return around 0.00009s. Now i know that because the second calculation uses in place operations it will operate faster.

However, if i change x = linspace(-20,20,25) to x = linspace(-20,20,26) a drastic slow occurs. The value of holder returns around 0.09s again while SingleFrameOfDatarep will return around 0.07s. The original code ran ~ 770X faster than the second code.

Now my only thought/explanation on this is that when the elements of an array gets too large, matlab will create a new variable like it does for holder and this allocation time is where the slowdown occurs but i am not fully sure about this nor do i know how to test/check for this.

Could anyone point me in the correct direction to read on this or give a possible explanation/solution for this?

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Joss Knight le 22 Juin 2017

I can't check your code right now but I can say two things. Firstly MATLAB does have a memory pool and when GPU memory overflows the pool there are raw allocations; those allocations force synchronization and that's slow. Secondly, your timing with tic and toc is flawed because the GPU operates asynchronously. This means when toc is reporting the time the previous command is still running. What happens when you insert wait(gpuDevice) before each tic and before each toc? You may find the timings change completely.

Finally, you should use gpuArray.rand not gpuArray(rand(...)). The former creates random data directly on the GPU; the latter does it slowly on the CPU then copies the data over to the device.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Matt J le 22 Juin 2017

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/345878-question-about-major-difference-in-computation-speed-with-gpuarray-s#answer_271614

Modifié(e) : Matt J le 22 Juin 2017

Ouvrir dans MATLAB Online

The times that you see are probably false. You shouldn't be using tic() and toc() to time GPU operations. You should be using gputimeit(), as below. I see no significant speed difference between any of the cases that you tested, when implemented this way.

    x = linspace(-20,20,25);
    z = linspace(0,100,29);
    singleframeofdata = gpuArray(rand(2816,128,'single'));
    fgpu = gpuArray(rand(2816,1,'single'));
    tofgpu = rand(length(z),length(x),128,'single');
    SingleFrameOfDatarep = repmat(singleframeofdata,1,length(z)*length(x));
   y = -2i*pi*-1*fgpu*reshape(tofgpu,1,size(tofgpu,1)*size(tofgpu,2)*size(tofgpu,3),1);
    gputimeit(@() fun(SingleFrameOfDatarep,y) )
    gputimeit(@() hfun(SingleFrameOfDatarep,y) )
       function SingleFrameOfDatarep=fun(SingleFrameOfDatarep,y)
          SingleFrameOfDatarep=SingleFrameOfDatarep.*y;
        function holder=hfun(SingleFrameOfDatarep,y)
          holder=SingleFrameOfDatarep.*y;

Incidentally also, your code should get a bit faster (and certainly conserve memory) if you use bsxfun instead of repmat.

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Hunter Palcich le 23 Juin 2017

Using the code you proved I see as well that the times are the same. Also thank you for letting me know about the bsxfun.

Connectez-vous pour commenter.

Question about major difference in computation speed with gpuArray's

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Community Treasure Hunt

Question about major difference in computation speed with gpuArray's

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens