Effective GPU Bandwidth Nvidia Quadro 6000

Hello, I would like to use GPU acceleration to speed up the computation of fft2 in my code. The GPU device I'm using is a Nvidia Quadro 6000 having a theoretical bandwidth of 144 GB/s. However the effective bandwidth is almost 100 times lower making the use of a GPU almost unworthy:
Test : 2048 x 2048
Elapsed CPU time is : 0.109062 sec
Elapsed GPU time is : 0.007661 sec
Elapsed GPU time with CPU transfer is : 0.079723 sec
Speed up : 14.236 without memory transfer
1.36801 with memory transfer
Test : 4096 x 4096
Elapsed CPU time is : 0.356208 sec
Elapsed GPU time is : 0.026819 sec
Elapsed GPU time with CPU transfer is : 0.29406 sec
Speed up : 13.2819 without memory transfer
1.21134 with memory transfer
Test : 8192 x 8192
Elapsed CPU time is : 1.30381 sec
Elapsed GPU time is : 0.121605 sec
Elapsed GPU time with CPU transfer is : 1.17194 sec
Speed up : 10.7217 without memory transfer
1.11252 with memory transfer
If I compute the effective bandwidth (see benchmark below) it's about 1.45 GB/s
Could it be due to the version of Matlab I'm using (R2011a) or is it rather normal to expect such poor performances?
Benchmark used to measure the bandwidth:
sizes = power(2, 12:26);
repeats = 10;
D = gpuDevice
sendTimes = inf(size(sizes));
gatherTimes = inf(size(sizes));
for ii=1:numel(sizes)
data = randi([0 255], sizes(ii), 1, 'uint8');
for rr=1:repeats
timer = tic();
gdata = gpuArray(data);
sendTimes(ii) = min(sendTimes(ii), toc(timer));
timer = tic();
data2 = gather(gdata);
gatherTimes(ii) = min(gatherTimes(ii), toc(timer));
end
end
sendBandwidth = (sizes./sendTimes)/1e9
[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);
fprintf('Peak send speed is %g GB/s\n',maxSendBandwidth)
gatherBandwidth = (sizes./gatherTimes)/1e9
[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);
fprintf('Peak gather speed is %g GB/s\n',max(gatherBandwidth))

Réponses (2)

Edric Ellis
Edric Ellis le 19 Mar 2013

1 vote

Your experiment there is measuring the transfer bandwidth across the PCI bus, not the device global memory bandwidth. The PCI bus bandwidth is discussed in a blog entry on Loren's blog here http://blogs.mathworks.com/loren/#1fa09fa2-c99c-4bb0-8b11-eb805fdd7040.
We have made various performance improvements to the gpuArray code since R2011a, so it would be best for you to upgrade if you can.
Domenico
Domenico le 19 Mar 2013

0 votes

Ok thank you for the clarification. I've been looking at the wrong specs. So the theoretical bandwidth for 16xPCI Express 2.0 should be 8 GB/s right? But still what I get is much lower than that.
Do you think that the upgrade to a newer release would effectively improve the transfer bandwidth? I would not mind asking my laboratory to do the upgrade if you recommended so.

1 commentaire

Edric Ellis
Edric Ellis le 19 Mar 2013
Those figures are published using R2012b, and show that 8GB/s is not achieved; however it does show a decent improvement over your measured speed. It's hard to predict exactly how much of the difference is due to the software and how much due to the different hardware.

Connectez-vous pour commenter.

Catégories

Question posée :

le 18 Mar 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by