GPU time slower than CPU time, what went wrong with my GPU implementation?

Question

Ruby Fu le 19 Jan 2012

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/26552-gpu-time-slower-than-cpu-time-what-went-wrong-with-my-gpu-implementation

Commenté : ALysko le 14 Avr 2015

Hi all, I have been testing the GPU computing feature in MATLAB. The code below is running and timing large matrix multiplications (1024x1024) using CPU and GPU computing:

    A=rand(1024);
    gA=gpuArray(A);
    %warming up
    for i=1:10
        C=A*A;
        gC=gA*gA;
    end
    tic,C=A*A;toc;
    tic,gC=gA*gA; toc;

After many trials, the results using CPU turns out to be faster than GPU time. I am surprised because this guy on stackoverflow forum did the exact testing and he proved that using GPU is faster:

    >> A = rand(1024); gA = gpuArray(A);
    % warm up by executing the operations a couple of times, and then:
    >> tic, C = A * A; toc
    Elapsed time is 0.075396 seconds.
    >> tic, gC = gA * gA; toc
    Elapsed time is 0.008621 seconds.

The only reason I can think of is that we are using different GPUs. The other guy has a Tesla C2070 while the laptop I am using is Dell Inspirion17R (NVIDIA GeForce GT 525M).

Could it be possible that by using a lesser GPU, the computation is actually slower than using CPU ?

Thank you! Ruby

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

ALysko le 14 Avr 2015

A bit of extra info regarding double precision performance:

Tesla C2070 and GeForce GT 525M are two very different GPUs: Tesla C2070: 1.03TFlops/0.515TFlops (single/double precision) GeForce GT 525M: 0.23TFlops / 0.031TFlops

Titan Black may need a manual switch to enable full double precision:

1) the web page http://nvidianews.nvidia.com/news/nvidia-introduces-geforce-gtx-titan-dna-of-the-world-s-fastest-supercomputer-powered-by-world-s-fastest-gpu and the page 44 of the PDF "GeForce-Update-Feb-2014.pdf" at says that Titan Black has Single Precision 5.1 Teraflops Double Precision1.3 Teraflops

2) the web page http://www.bit-tech.net/news/hardware/2014/02/18/nvidia-gtx-titan-black-launched/1 compares the Titan Black to just Titan (tested by Mathworks): Titan Black: 5.1TFlops / 1.2TFlops Titan: 4.5TFlops / 1.3TFlops

(Thus, the benchmarks for Titan by Mathworks should be similar or worse than the benchmarks for Titan Black)

3) The page https://devtalk.nvidia.com/default/topic/716573/gtx-titan-double-precision-flops-way-off-specs/ talks specifically about the Mathworks benchmarks with gpuBench():

Before any changes (default settings): MTimes_D Backslash_D FFT_D MTimes_S Backslash_S FFT_S Tesla C2075 333 246 73 696 435 163 GF GTX TITAN 223 82 77 3635 179 252

After (switching the card into double precision in Control Panel): MTimes_D Backslash_D FFT_D MTimes_S Backslash_S FFT_S Tesla C2075 333 246 73 696 435 163 GeForce GTX TITAN 1285 128 146 3423 182 227

4) How to switch into double precision (which limits the GPU clock boost): http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/59785-nvidia-geforce-gtx-titan-6gb-performance-review-2.html http://forums.evga.com/When-to-Use-Double-Precision-under-NVIDIA-Control-Panel-Manage-3D-Settings-m2252867.aspx http://nvidia.custhelp.com/app/answers/detail/a_id/3130/~/setting-power-management-mode-from-adaptive-to-maximum-performance http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/59785-nvidia-geforce-gtx-titan-6gb-performance-review-2.html and for linux: http://ambermd.org/gpus/

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Ben Tordoff le 20 Jan 2012

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/26552-gpu-time-slower-than-cpu-time-what-went-wrong-with-my-gpu-implementation#answer_34692

Hi Ruby,

I've just uploaded a benchmarking tool to the File Exchange which runs a whole load of these type of timings to put your GPU in context with others in the market:

http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench

One thing to bear in mind is that virtually all GPUs that aren't explicitly designed for scientific computing are optimized for single-precision maths (as is used by OpenGL etc.). GeForce cards, mobile or otherwise, are quite good for single-precision performance but usually about 8x worse for double. MATLAB defaults to using double-precision everywhere. Of the NVIDIA cards, only the Tesla and top-end Quadro series do well at double-precision. Add to that the fact that a mobile GPU typically has far fewer cores than a desktop one, and I'd be amazed if you saw any significant speed-ups compared to a modern mobile CPU when doing double-precision maths.

Anyway, give the benchmark a try and let us all know what you find.

Cheers

Ben

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Walter Roberson le 19 Jan 2012

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/26552-gpu-time-slower-than-cpu-time-what-went-wrong-with-my-gpu-implementation#answer_34634

Your GeForce GT 525M would be handling the graphics rendering, whereas the Tesla probably would not be handling graphics (and can be specifically configured to take it off graphics duties, I seem to recall.)

The GT 525M has 96 cores at up to 1.2 GHz; the Tesla C2070 has 448 cores at 1.15 GHz -- 4 times the cores.

2 commentaires
Afficher AucuneMasquer Aucune

Ruby Fu le 19 Jan 2012

Hi Walter,

Thanks for the response.

I think your answer explains why the GPU computing i performed is slower than the one performed using Tesla. However, I am also seeing that my GPU computing time is longer than the CPU computing time for the same code. Is this also due to the different number of cores the two types of hardware provide? Is there a possibility that this MATLAB feature can be improved?

Thanks!

Walter Roberson le 19 Jan 2012

I only know some broad outlines on how things work. I know that time to load and unload the data can overwhelm the benefits of using GPUs. Large enough matrix multiply done in CPU are normally farmed out to LAPACK, which is highly optimized and uses multiple cores. The trade-off point of "large enough" could in theory depend upon which CPU you are using, but I do not know if MATLAB takes that in to account. You would need to know about the relative CPU capabilities to compare GPU/CPU figures meaningfully.

I believe that Accelereye's Jacket is benchmarked as faster than the native MATLAB GPU.

Connectez-vous pour commenter.

GPU time slower than CPU time, what went wrong with my GPU implementation?

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (1)

2 commentaires
Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

GPU time slower than CPU time, what went wrong with my GPU implementation?

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (1)

2 commentaires Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune