Optimize this program on a GPU
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I'm trying to speed up my program on a GPU. But it is much slower than the CPU version, although I use a very powerful GPU. I haven't been able to vectorize my code due to the fact that every element in my main vector has to be compared to all the other elements in my main vector. Therefore i had to use a parfor loop.
This is my program:
%%generate data
num_tst = 5000; % this is the key number. The CPU takes about 10 sek for 65000
binning = num_tst/100;
N = num_tst /binning;
timestamps = gpuArray.randi(num_tst,num_tst,1);
detectors = gpuArray.randi(4,num_tst,1);
T = N*binning;
timestamps = sort(timestamps);
%%to be optimized
tic;
correlations = gpuArray.zeros(1,N);
parfor i = 1:(size(timestamps,1)-1)
a = gpuArray.zeros(1,N);
j = i+1;
dts = timestamps(j) - timestamps(i);
while (dts < T) && (j <= size(timestamps,1))
if dts == 0 && detectors(i) ~= detectors(j)
a(1) = a(1) + 2;
elseif detectors(i) ~= detectors(j)
dts = floor(dts/binning)+1;
a(dts) = a(dts) + 1;
end
j = j + 1;
if j <= size(timestamps,1)
dts = timestamps(j) - timestamps(i);
end
end
correlations = correlations + a;
end
toc;
How can I speed up my program? Is it possible to vectorize a program like this or do I need to implement this program in CUDA code?
2 commentaires
Edric Ellis
le 22 Avr 2013
You have managed only to run the addition on the GPU, which is why you're seeing no benefit. You should profile the code with no parallelism to see which portions might benefit. If you could make the code self-contained so that it is executable, it might be possible to see how to vectorise or parallelise things.
Réponse acceptée
Matt J
le 22 Avr 2013
Modifié(e) : Matt J
le 22 Avr 2013
Here's a somewhat more vectorized version. The operations you're doing don't seem to be very good candidates for GPU acceleration. Accumarray ops seem hard to accelerate on the GPU, judging from the conspicuous absence of a GPU-accelerated accumarray from MATLAB and all other GPU applications I've seen. However, parfor on the CPU might bring some advantage.
correlations=zeros(N,1);
parfor i = 1:(size(timestamps,1)-1)
dts=timestamps(i+1:end)-timestamsp(i);
val=detectors(i+1:end)~=detectors(i) & dts<T;
val=val+val&(dts==0);
subs=floor(dts/binning)+1;
a=accumarray(subs(:),val(:),[N,1]);
correlations = correlations + a;
end
correlations=correlations.';
8 commentaires
Matt J
le 22 Avr 2013
If I'm not mistaken the expression ALL(MAX(SUBS)<=SZ) will only be true, if max(subs) <= 1
No, it requires all(max(subs)<=N). Here are some examples to illustrate the issue
>> X=accumarray([1;2;2;2;3;3],true,[10,1]); X'
ans =
1 3 2 0 0 0 0 0 0 0
but
>> X=accumarray([1;2;2;2;3;30],true,[10,1]); X'
Error using accumarray
First input SUBS and third input SZ must satisfy ALL(MAX(SUBS)<=SZ).
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur GPU Computing dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!