Summing array elements seems to be slow on GPU
7 vues (au cours des 30 derniers jours)
I am testing the times of execution for the following function on CPU and GPU
for k = 1:P
H = exp(1i*K);
HU = U.*H;
UN(k,:) = sum(HU,[1,3]);
where , are complex arrays of size and Kis a complex array of size . So in each iteration I perform element-wise exp(), element-wise multiplication of two arrays and summing elements of 3D array along two dimensions.
I test the execution time on CPU and on GPU with the help of the following script
P = 200;
URe = 1/(sqrt(2))*rand(P);
UIm = 1/(sqrt(2))*rand(P);
KRe = 1/(sqrt(2))*rand(P,P,P);
KIm = 1/(sqrt(2))*rand(P,P,P);
U = complex(URe, UIm);
K = complex(KRe, KIm);
UN = complex(zeros(P), zeros(P));
fcpu = @() funTestGPU(P,U,K,UN);
tcpu = timeit(fcpu);
disp(['CPU time: ',num2str(tcpu)])
U = gpuArray(complex(URe, UIm));
K = gpuArray(complex(KRe, KIm));
UN = gpuArray(complex(zeros(P), zeros(P)));
fgpu = @() funTestGPU(P,U,K,UN);
tgpu = gputimeit(fgpu);
disp(['GPU time: ',num2str(tgpu)])
and I obtain the results
CPU time: 9.0315
GPU time: 3.3894
My concern is that if I remove the last operation from the funTestGPU (summing array elements) I obtain the results
CPU time: 8.0185
GPU time: 0.0045631
So it looks like the summation is the most time-consuming operation on GPU. Is that an expected result?
I wrote the analogical codes in cuPy and in Pytorch and there the summation does not seem to be the most time consuming operation.
I use Matlab 2019b. My graphics card is NVIDIA GeForce GTX 1050 Ti (768 CUDA cores), my processor is AMD Ryzen 7 3700X (8 physical cores).
These are my results that I got on my (somewhat old) GeForce GTX 1080 Ti:
CPU time: 16.1288
GPU time: 0.96266
If I change the datatype to single I get:
CPU time: 14.9785
GPU time: 0.35102
That's maybe 2x faster?
So on the one hand your GPU is pretty slow and your CPU is pretty fast, and on the other maybe you could try using single precision instead, if you don't mind the loss of accuracy.