isempty slow on GPU

3 vues (au cours des 30 derniers jours)
ShayE
ShayE le 10 Déc 2018
Commenté : ShayE le 14 Déc 2018
While profiling some code, I noticed that the GPU imoplementation for isempty is very slow. For example, see the following code:
chk = ones(512,512,5,32); % test variable
K = 10^5; % num of tests
% CPU loop
for m = 1:K
a = isempty(chk);
end
% GPU loop
chk = gpuArray(chk);
for m = 1:K
a = isempty(chk);
end
The profiling results attached, isempty for GPU seems very slow. I'm working on R2018a, GTX1070, CUDA 9. Is it MATLAB, GPU or CUDA related?
profile.png

Réponses (1)

Edric Ellis
Edric Ellis le 11 Déc 2018
I think there are a couple of things going on here.
Firstly, enabling the profiler introduces overheads - particularly to GPU-related code, where we try to ensure that any asynchronous GPU activity cannot interfere with the profiling measurements. On my machine, running with the profiler enabled slows down the gpuArray version of isempty by a factor of about 3.
Secondly, on my machine using R2018b, when measuring with tic and toc, I see the time for a CPU isempty call being around 8e-9 seconds, and for the GPU version, around 2e-6 seconds. While the GPU version is undoubtedly much slower than the CPU version, it is still rather fast (in particular, no work is launched on the GPU for this computation - what you're seeing here is purely a host-side computation, and in this case, the MATLAB object system isn't as fast as the extremely highly optimised code in place for host-side arrays). The GPU version of isempty is definitely much faster than any conceivable work that you could actually perform on the GPU. So, my question is: why is this a concern? What are you doing for which the GPU version of isempty is a performance bottleneck?
  3 commentaires
Edric Ellis
Edric Ellis le 13 Déc 2018
  1. Unfortunately, profiling does introduce some overheads. On the GPU, because many operations (but not all) are asynchronous by default, enabling the profiler causes them to become synchronous. This introduces overhead because you don't get the benefit of asynchronous operation, but it is necessary because otherwise all asynchronous GPU operations would appear to be unrealistically quick, and subsequent synchronous GPU operations would appear to take all the time.
  2. Ok, so it appears that the GPU version of isempty is running pretty quickly on your machine
  3. I'm still surprised that operations taking ~2e-6 seconds are a significant overhead in your overall application compared to actually performing operations on the GPU. The time it takes to launch a trivial kernel on the GPU is typically ~5x larger than that.
ShayE
ShayE le 14 Déc 2018
Hi,
Re 3- I thought that this was the bottleneck due to the profiling results. Now, when I understand that the profiler results are not accurate for that type of code, my question is- how can I find the bottlenecks in a GPU based code? Is using tic/toc the only option? It's not very practical for a long code...
Thanks,
Shay

Connectez-vous pour commenter.

Catégories

En savoir plus sur GPU Computing dans Help Center et File Exchange

Produits


Version

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by