how to avoid for loop to increase speed
Afficher commentaires plus anciens
I would like to calculate the mean value of vector A for every sample as defined in vector B. So, the first point of the resulting vector C is the mean of the first 9 (=B(1)) datapoints of A. The second point of C is the mean of the following 10 (=B(2) datapoints of A. Etc. The following code works, but takes time when processing large vectors:
A=rand(91,1); % vector with 91 random samples
B=[9,10,10,8,11,10,10,9,10,6]; % vector with the number of samples
C= zeros(length(B),1); % preallocate C
first=1;
for i = 1: length(C)
last = first+B(i)-1;
interval=(first:last);
C(i) = mean(A(interval));
first =last+1;
end
Is there a way to use B in an index of A, instead of using this for loop?
2 commentaires
Shouldn't sum(B) equal length(A)?
A=rand(91,1); % vector with 91 random samples
B=[9,10,10,8,11,10,10,9,10,6]; % vector with the number of samples
sum(B)
length(A)
Bertil Veenstra
le 4 Jan 2023
Réponse acceptée
Plus de réponses (2)
Mathieu NOE
le 4 Jan 2023
hello
had to change A to 93 samples so that it matches with sum(B) = 93
see my suggestion below. On this small data we can see an iùprovement of factor 4
Elapsed time is 0.004598 seconds (your code)
Elapsed time is 0.001204 seconds. (my code)
delta = 1.0e-14 *
0
-0.0222
-0.0111
-0.0333
-0.0555
0.1110
0.0777
-0.0222
-0.0444
-0.0222
wonder if that is going to be even better for larger vectors ?
A=rand(93,1); % vector with 93 random samples
B=[9,10,10,8,11,10,10,9,10,6]; % vector with the number of samples
C= zeros(length(B),1); % preallocate C
first=1;
tic
for i = 1: length(C)
last = first+B(i)-1;
interval=(first:last);
C(i) = mean(A(interval));
first =last+1;
end
toc
% alternative code
tic
As = cumsum(A(:));
Bs = cumsum(B(:));
As = As(Bs);
Cs = [As(1); diff(As)]./B(:);
toc
delta = C - Cs
plot(C,'-*b')
hold on
plot(Cs,'dr')
hold off
7 commentaires
Bertil Veenstra
le 5 Jan 2023
Mathieu NOE
le 5 Jan 2023
My pleasure !
Mathieu NOE
le 5 Jan 2023
Modifié(e) : Mathieu NOE
le 5 Jan 2023
Just for fun I compared the speed of my solution vs Matt's code
here tested on much larger vectors (1100063 samples) , still my solution is by far better , but is numerically less accurate as Matt's code . (we are talking relative error below 10^-10)
Original code : Elapsed time is 0.449751 seconds.
My suggestion : Elapsed time is 0.011151 seconds.
delta_max = 3.4435e-11
Matt suggestion : Elapsed time is 0.058991 seconds.
delta_max = 0
% original code
B = 8 + randi(5,100000,1);
samples = sum(B);
A=rand(samples,1); % vector with 1100063 random samples
C= zeros(length(B),1); % preallocate C
first=1;
tic
for i = 1: length(C)
last = first+B(i)-1;
interval=(first:last);
C(i) = mean(A(interval));
first =last+1;
end
toc
% alternative code # 1 (me)
tic
As = cumsum(A(:));
Bs = cumsum(B(:));
As = As(Bs);
Cs = [As(1); diff(As)]./B(:);
toc
delta_max = max(abs(C - Cs))
% alternative code # 2 (Matt J)
tic
G=repelem(1:numel(B), B);
n=min(numel(A),numel(G));
G=G(:); A=A(:);
Cs=accumarray(G(1:n),A(1:n))./accumarray(G(1:n),1);
toc
delta_max = max(abs(C - Cs))
(we are talking relative error below 10^-10)
That would depend a bit on A:
% original code
B = 8 + randi(5,100000,1);
samples = sum(B);
A=exp(-linspace(0,20,samples)); % vector with 1100063 random samples
% alternative code # 2 (Matt J)
tic
G=repelem(1:numel(B), B);
n=min(numel(A),numel(G));
G=G(:); A=A(:);
C=accumarray(G(1:n),A(1:n))./accumarray(G(1:n),1);
toc
% alternative code # 1 (me)
tic
As = cumsum(A(:));
Bs = cumsum(B(:));
As = As(Bs);
Cs = [As(1); diff(As)]./B(:);
toc
maxrelError = max(abs(C(:) - Cs(:))./C(:))'*100
Bertil Veenstra
le 6 Jan 2023
Modifié(e) : Bertil Veenstra
le 6 Jan 2023
@Stephen23 already showed you earlier that you can use accumarray to apply any function to the blocks.
A = randi(5,1,9300); % vector with random integers ranging from 1 to 5
B=ones(1,930)*10; % vector the number of samples
tic
C= zeros(length(B),1); % preallocate C
first=1;
for i = 1: length(C)
last = first+B(i)-1;
interval=(first:last);
C(i) = mode(A(interval));
first =last+1;
end
toc
tic
G=repelem((1:numel(B)),B);
C=accumarray(G(:),A(:),[],@mode);
toc
It is to be expected that speed-up is more modest, unfortunately. Accumarray isn't as well optimized for arbitrary functions.
Bertil Veenstra
le 6 Jan 2023
And instead of the mean, I am interested in the mode.
This method should offer speed-up for a generic function, provided that it can ignore NaNs and provided the blocks don't vary too greatly in length:
B = randi([5,10],1,9300);
A = randi(5,1,sum(B));
discrepancy = max( abs(loopMethod(A,B)-altMethod(A,B)),[],'all')
timeit(@() loopMethod(A,B))
timeit(@() altMethod(A,B))
function C=loopMethod(A,B)
C= zeros(1,length(B)); % preallocate C
first=1;
for i = 1: length(C)
last = first+B(i)-1;
interval=(first:last);
C(i) = mode(A(interval));
first =last+1;
end
end
function C=altMethod(A,B)
bmax=max(B);
I=(1:bmax)'<=B;
T=nan(size(I));
T(I)=A(:);
C=mode(T,1);
end
1 commentaire
Bertil Veenstra
le 6 Jan 2023
Catégories
En savoir plus sur Matrix Indexing dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!