svds performance on GPU is upto 10 times slower than on CPU
7 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Florian
le 30 Jan 2018
Réponse apportée : Heiko Weichelt
le 11 Mai 2018
Hello everyone,
I am trying to get a complicated algorithm run on my GPU at the moment. It involves a lot of fft, ifft and pointwise multiplications so I thought it would be a good idea. However, it also involves the calculation of the first left singular vector in each Iteration. Although svds Supports gpuArray Inputs, it seems to be extremly slow on GPU. Maybe this is due to my System, maybe it is because I did some silly mistake (this is my first time I try to use GPU). When I run the Code
X = rand(1024);
Y = gpuArray(X);
f = @() svds(X,1);
g = @() svds(Y,1);
t = timeit(f,3);
gt = gputimeit(g,3);
disp([t,gt]);
the gpu function is always 5 upto 10 times slower than the CPU version. This is annoying because I would even be happy if it is about the same. The rest of the algorithm is much faster on GPU right now but the svds ruins everything again. Here is the output of gpuDevice, if this helps:
Name: 'Quadro M2000'
Index: 1
ComputeCapability: '5.2'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 4.2950e+09
AvailableMemory: 3.4038e+09
MultiprocessorCount: 6
ClockRateKHz: 1162500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Matlab version is 2017b running on 64bit Windows 10, i7-7700K CPU (4.2Ghz) and 32GB RAM.
0 commentaires
Réponse acceptée
Heiko Weichelt
le 11 Mai 2018
Hi Florian
Thanks for asking this question.
GPUs are only faster than CPUs if you can keep lots of threads running, and this always requires operating on large arrays. Different operations start to be faster on GPU than CPU at different array sizes for different GPUs. For SVDS the nature of the code requires particularly large arrays, with millions of elements. In the attached plot, you can see that the GPU will be faster given matrices of size 2000x2000 or more. It looks like for your GPU the data just isn’t big enough to get a benefit
I've repeated your example for various sizes and the result looks as following:
The exact threshold depends on the used computer and GPU. Please find attached the live script that I used to create the picture.
If you have any further questions, feel free to get back to me.
Best, Heiko
0 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Logical dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!