vectorised code is terribly slower

Why is the vectorized version of simple local maxima detection code significantly slower (~2-3 times) than its for-loop version?
%ntest data
X = rand(100000,1000);
% findig local maxima over columns of X
% for-loop version
tic;
[I,J] = size(X);
Ind = false(I,J);
for j = 1:J
Ind(:,j) = diff( sign( diff([0; X(:,j); 0]) ) ) < 0;
end
toc
% vectorized version (~3 times slower than for-loop)
tic;
Ind_ = diff(sign(diff([zeros(1,J);X;zeros(1,J)],1,1)),1,1) < 0;
toc
% result identity test
isequal(Ind,Ind_)

6 commentaires

I guess because
[zeros(1,J);X;zeros(1,J)]
MATLAB needs to allocate big chunk of memory (and copy segment by segment, but that happens also with for-loop).
Michal
Michal le 9 Sep 2019
Modifié(e) : Michal le 9 Sep 2019
@Bruno I think the problem could be in built-in diff function, which is not properly programmed in a a case of dim = 1 option. See timing of the following code:
%% test data
X = rand(100000,1000);
%% findig local maxima over columns of X
[I,J] = size(X);
array = [zeros(1,J);X;zeros(1,J)];
% for-loop version
tic;
Ind = false(I,J);
for j = 1:J
Ind(:,j) = diff( sign( diff(array(:,j)) ) ) < 0;
end
toc
% vectorized version (~2 times slower than for-loop)
tic;
Ind_ = diff(sign(diff(array,1,1)),1,1) < 0;
toc
%% result identity test
isequal(Ind,Ind_)
Bruno Luong
Bruno Luong le 9 Sep 2019
Modifié(e) : Bruno Luong le 9 Sep 2019
Not entirely convinced. I still stick with memory related cause, because not only the verticat CAT but also DIFF, SIGN, DIFF create 3 big temporary arrays (hidden).
If you add 1,1 parameter in for-loop
tic;
[I,J] = size(X);
Ind = false(I,J);
for j = 1:J
Ind(:,j) = diff( sign( diff(array(:,j),1,1) ),1,1) < 0;
end
toc
it's still fast. How do you explain that?
You note also that the reative difference of CPU times is less if you reduce the first dimension of X.
Michal
Michal le 9 Sep 2019
Modifié(e) : Michal le 9 Sep 2019
I guess, that In this case I call diff(array(:,j),1,1), where array(:,j) is a vector not matrix, so diff in this case does not perform computing over separated columns of array. May be the diff built-in function does not use multithreading properly in this case? But you are right the memory allocation in vectorized code could be really one (!) of slowness cause.
Bruno Luong
Bruno Luong le 9 Sep 2019
It is possibly that the DIFF implementation on array does not access sequently memory in case of 2D array data, but row-by-row of the array, that might slow down.
I don't think the multi-threading is wrongly implemented.
Michal
Michal le 9 Sep 2019
The main problem is, that during continuous development of JIT engine are alwyas changing MATLAB performance characteristics for vectorized codes. In general, the standard for-loop codes becomes faster and faster.
I have plenty of highly vectorized MATLAB codes created during last 10 years, which are during last few years becomes slower than theirs for-loop counter parts. So, there is no code performance stability.

Connectez-vous pour commenter.

Réponses (0)

Catégories

Tags

Question posée :

le 9 Sep 2019

Commenté :

le 9 Sep 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by