3D gpuArray vs cells of 2D gpuArrays major speed difference!

Question

Dan Ryan le 24 Mai 2013

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/76919-3d-gpuarray-vs-cells-of-2d-gpuarrays-major-speed-difference

Can anybody explain why these codes have drastically different runtimes?

I have a shared setup routine

clear all
y = gpuArray.rand(1000, 1000, 'single');
W = cell(1, 5);
WFull = gpuArray.zeros(1000, 1000, 5);
for j = 1:5
   W{j} = gpuArray.rand(1000, 1000, 'single');
   WFull(:,:,j) = W{j};
end

Version 1 (finishes in 1.4 seconds on my machine)

z = gpuArray.zeros(1000, 1000, 5);
tic
for i = 1:1000
   for j = 1:size(W)
      z(:,:,j) = W{j}*y;
   end
end
toc

vs. Version 2 (finishes in 39 seconds on my machine... 27x times slower)

z = gpuArray.zeros(1000, 1000, 5);
tic
for i = 1:1000
   for j = 1:size(WFull, 3)
      z(:,:,j) = WFull(:,:,j)*y;
   end
end
toc

Do you think that slicing large 3D gpuArrays is just really slow compared to looking up cell array values?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Matt J le 24 Mai 2013

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/76919-3d-gpuarray-vs-cells-of-2d-gpuarrays-major-speed-difference#answer_86559

Modifié(e) : Matt J le 24 Mai 2013

Ouvrir dans MATLAB Online

Do you think that slicing large 3D gpuArrays is just really slow compared to looking up cell array values?

Yes, it is faster to look-up a cell than to pull a slice out of a 3D array, and that's true for normal arrays as well, as long as there is a small number of slices/cells. Of course, you should really be including the time needed to allocate memory to each W{j} in your comparison.

Another reason is that you have a syntax error in your for-loop over W{j}. It's only doing 1 loop iteration instead of 5,

   >> for j=1:size(W), j, end 
j =
       1

This is biasing the comparison to some degree.

2 commentaires
Afficher AucuneMasquer Aucune

Dan Ryan le 24 Mai 2013

Ouvrir dans MATLAB Online

I caught a couple of other issues where I had left 'single' off of the gpuArray creation for some items and had it present for others... I changed

size(W)

to

size(W, 2)

and now the comparison is much closer.

Here is the new code:

clear all
y = gpuArray.rand(1000, 1000, 'single');
z = gpuArray.zeros(1000, 1000, 5, 'single');
W = cell(1, 5);
for j = 1:5
   W{j} = gpuArray.rand(1000, 1000, 'single');
end
tic
for i = 1:500
   for j = 1:size(W, 2)
      z(:,:,j) = W{j}*y;
   end
end
toc
clear all
y = gpuArray.rand(1000, 1000, 'single');
z = gpuArray.zeros(1000, 1000, 5, 'single');
WMat = gpuArray.rand(1000, 1000, 5, 'single');
tic
for i = 1:500
   for j = 1:size(WMat, 3)
      z(:,:,j) = WMat(:,:,j)*y;
   end
end
toc

What is really strange to me is that the execution time is very nonlinear in terms of the number of loops, i. There must be some sort of memory flush going on when i gets large, not really sure why though...

i = 100 -> runtimes are 0.10 and 0.14 seconds

i = 200 -> runtimes are 0.73 and 1.98 seconds

i = 500 -> runtimes are 10.3 and 11.7 seconds (notice the large jump for version 1!)

i = 1000 -> runtimes are 26.3 and 28.0 seconds!

Have any clue about this highly nonlinear trend? I don't see why GPU memory would come into play since I am basically just writing over existing values and performing the exact same computations in every iteration!

Dan Ryan le 30 Mai 2013

Ouvrir dans MATLAB Online

James Lebak from mathworks helped me out with a really good tip:

use a

wait(gpuDevice)

command before the

toc

command when timing the GPU speeds.

Now the timings increase linearly with number of loop iterations and the two implementations give very similar results. Good to know!

Connectez-vous pour commenter.

3D gpuArray vs cells of 2D gpuArrays major speed difference!

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires
Afficher AucuneMasquer Aucune

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

3D gpuArray vs cells of 2D gpuArrays major speed difference!

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires Afficher AucuneMasquer Aucune

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune