Effective GPU Bandwidth Nvidia Quadro 6000

13 vues (au cours des 30 derniers jours)
Domenico
Domenico le 18 Mar 2013
Hello, I would like to use GPU acceleration to speed up the computation of fft2 in my code. The GPU device I'm using is a Nvidia Quadro 6000 having a theoretical bandwidth of 144 GB/s. However the effective bandwidth is almost 100 times lower making the use of a GPU almost unworthy:
Test : 2048 x 2048
Elapsed CPU time is : 0.109062 sec
Elapsed GPU time is : 0.007661 sec
Elapsed GPU time with CPU transfer is : 0.079723 sec
Speed up : 14.236 without memory transfer
1.36801 with memory transfer
Test : 4096 x 4096
Elapsed CPU time is : 0.356208 sec
Elapsed GPU time is : 0.026819 sec
Elapsed GPU time with CPU transfer is : 0.29406 sec
Speed up : 13.2819 without memory transfer
1.21134 with memory transfer
Test : 8192 x 8192
Elapsed CPU time is : 1.30381 sec
Elapsed GPU time is : 0.121605 sec
Elapsed GPU time with CPU transfer is : 1.17194 sec
Speed up : 10.7217 without memory transfer
1.11252 with memory transfer
If I compute the effective bandwidth (see benchmark below) it's about 1.45 GB/s
Could it be due to the version of Matlab I'm using (R2011a) or is it rather normal to expect such poor performances?
Benchmark used to measure the bandwidth:
sizes = power(2, 12:26);
repeats = 10;
D = gpuDevice
sendTimes = inf(size(sizes));
gatherTimes = inf(size(sizes));
for ii=1:numel(sizes)
data = randi([0 255], sizes(ii), 1, 'uint8');
for rr=1:repeats
timer = tic();
gdata = gpuArray(data);
sendTimes(ii) = min(sendTimes(ii), toc(timer));
timer = tic();
data2 = gather(gdata);
gatherTimes(ii) = min(gatherTimes(ii), toc(timer));
end
end
sendBandwidth = (sizes./sendTimes)/1e9
[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);
fprintf('Peak send speed is %g GB/s\n',maxSendBandwidth)
gatherBandwidth = (sizes./gatherTimes)/1e9
[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);
fprintf('Peak gather speed is %g GB/s\n',max(gatherBandwidth))

Réponses (2)

Edric Ellis
Edric Ellis le 19 Mar 2013
Your experiment there is measuring the transfer bandwidth across the PCI bus, not the device global memory bandwidth. The PCI bus bandwidth is discussed in a blog entry on Loren's blog here http://blogs.mathworks.com/loren/#1fa09fa2-c99c-4bb0-8b11-eb805fdd7040.
We have made various performance improvements to the gpuArray code since R2011a, so it would be best for you to upgrade if you can.

Domenico
Domenico le 19 Mar 2013
Ok thank you for the clarification. I've been looking at the wrong specs. So the theoretical bandwidth for 16xPCI Express 2.0 should be 8 GB/s right? But still what I get is much lower than that.
Do you think that the upgrade to a newer release would effectively improve the transfer bandwidth? I would not mind asking my laboratory to do the upgrade if you recommended so.
  1 commentaire
Edric Ellis
Edric Ellis le 19 Mar 2013
Those figures are published using R2012b, and show that 8GB/s is not achieved; however it does show a decent improvement over your measured speed. It's hard to predict exactly how much of the difference is due to the software and how much due to the different hardware.

Connectez-vous pour commenter.

Catégories

En savoir plus sur GPU Computing dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by