Effective GPU Bandwidth Nvidia Quadro 6000

Question

Domenico le 18 Mar 2013

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/67673-effective-gpu-bandwidth-nvidia-quadro-6000

Hello, I would like to use GPU acceleration to speed up the computation of fft2 in my code. The GPU device I'm using is a Nvidia Quadro 6000 having a theoretical bandwidth of 144 GB/s. However the effective bandwidth is almost 100 times lower making the use of a GPU almost unworthy:

Test : 2048 x 2048
    Elapsed CPU time is : 0.109062 sec
    Elapsed GPU time is : 0.007661 sec
    Elapsed GPU time with CPU transfer is : 0.079723 sec
    Speed up : 14.236 without memory transfer
                1.36801 with memory transfer
 Test : 4096 x 4096
    Elapsed CPU time is : 0.356208 sec
    Elapsed GPU time is : 0.026819 sec
    Elapsed GPU time with CPU transfer is : 0.29406 sec
    Speed up : 13.2819 without memory transfer
                1.21134 with memory transfer
 Test : 8192 x 8192
    Elapsed CPU time is : 1.30381 sec
    Elapsed GPU time is : 0.121605 sec
    Elapsed GPU time with CPU transfer is : 1.17194 sec
    Speed up : 10.7217 without memory transfer
                1.11252 with memory transfer

If I compute the effective bandwidth (see benchmark below) it's about 1.45 GB/s

Could it be due to the version of Matlab I'm using (R2011a) or is it rather normal to expect such poor performances?

Benchmark used to measure the bandwidth:

sizes = power(2, 12:26);
repeats = 10;
D = gpuDevice
sendTimes = inf(size(sizes));
gatherTimes = inf(size(sizes));
for ii=1:numel(sizes)
  data = randi([0 255], sizes(ii), 1, 'uint8');
  for rr=1:repeats
      timer = tic();
      gdata = gpuArray(data);
      sendTimes(ii) = min(sendTimes(ii), toc(timer));
        timer = tic();
        data2 = gather(gdata); 
        gatherTimes(ii) = min(gatherTimes(ii), toc(timer));
    end
end
sendBandwidth = (sizes./sendTimes)/1e9
[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);
fprintf('Peak send speed is %g GB/s\n',maxSendBandwidth)
gatherBandwidth = (sizes./gatherTimes)/1e9
[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);
fprintf('Peak gather speed is %g GB/s\n',max(gatherBandwidth))

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Edric Ellis le 19 Mar 2013

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/67673-effective-gpu-bandwidth-nvidia-quadro-6000#answer_79167

Your experiment there is measuring the transfer bandwidth across the PCI bus, not the device global memory bandwidth. The PCI bus bandwidth is discussed in a blog entry on Loren's blog here http://blogs.mathworks.com/loren/#1fa09fa2-c99c-4bb0-8b11-eb805fdd7040.

We have made various performance improvements to the gpuArray code since R2011a, so it would be best for you to upgrade if you can.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Domenico le 19 Mar 2013

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/67673-effective-gpu-bandwidth-nvidia-quadro-6000#answer_79169

Ok thank you for the clarification. I've been looking at the wrong specs. So the theoretical bandwidth for 16xPCI Express 2.0 should be 8 GB/s right? But still what I get is much lower than that.

Do you think that the upgrade to a newer release would effectively improve the transfer bandwidth? I would not mind asking my laboratory to do the upgrade if you recommended so.

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Edric Ellis le 19 Mar 2013

Those figures are published using R2012b, and show that 8GB/s is not achieved; however it does show a decent improvement over your measured speed. It's hard to predict exactly how much of the difference is due to the software and how much due to the different hardware.

Connectez-vous pour commenter.

Effective GPU Bandwidth Nvidia Quadro 6000

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

Effective GPU Bandwidth Nvidia Quadro 6000

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens