I have coded a formulation using for loops but it takes forever when my matrices are very large. Can anyone help me make this code more efficient without for loops?
I'm coding the formulations in page two (in the lower half page) of this document: http://www-stat.wharton.upenn.edu/~tcai/paper/Testing-2-Covariance-Matrices.pdf
And here are my codes:
if true
% Input: X, Y
p=size(X,2);
n1=size(X,1); n2=size(Y,1);
sigma1=cov(X); sigma2=cov(Y);
for i=1:p
for j=1:p
for k=1:n1
A(k,:)=((X(k,i)-mean(X(:,i))).*(X(k,j)-mean(X(:,j)))-sigma1(i,j)).^2;
end
Teta_bar1(i,j)=1./n1 .* sum(A);
for k=1:n2
B(k,:)=((Y(k,i)-mean(Y(:,i))).*(Y(k,j)-mean(Y(:,j)))-sigma2(i,j)).^2;
end
Teta_bar2(i,j)=1./n2 .* sum(B);
M(i,j)=((sigma1(i,j)-sigma2(i,j)).^2)./(Teta_bar1(i,j)/n1+Teta_bar2(i,j)/n2);
end
end
end

 Réponse acceptée

Geoff Hayes
Geoff Hayes le 15 Oct 2014

1 vote

MJ - what does it takes forever when my matrices are very large mean? Seconds, minutes, hours? Please provide an example of a matrix (perhaps just the dimensions) along with the time it takes to evaluate the above.
Before looking at vectorizing your code, you may want to profile your code to see where the bottlenecks are.
A couple things that you may want to do (for your optimization) is to allocate or pre-size matrices before you start to use them. For example, M appears to be a pxp matrix, so why not size it as such before you start to use it
p=size(X,2);
n1=size(X,1);
n2=size(Y,1);
sigma1=cov(X);
sigma2=cov(Y);
% pre-size matrices
M = zeros(p,p);
Teta_bar1 = zeros(p,p);
Teta_bar2 = zeros(p,p);
A = ??
B = ??
Look also at your inner loops that iterate over k
for k=1:n1
A(k,:)=((X(k,i)-mean(X(:,i))).*(X(k,j)-mean(X(:,j)))-sigma1(i,j)).^2;
end
Note that you calculate
mean(X(:,i))
on each iteration of k even though this average does not depend on k at all. It doesn't even depend upon j, so the code is calculating the same average p*n1 times when it only needs to do this once. Similarly, the
mean(X(:,j)
is calculated n1 times when it is only needed once. Try changing this to
for i=1:p
meanXi = mean(X(:,i)); % <--- calculate the mean
for j=1:p
meanXj = mean(X(:,j)); % <--- calculate the mean
for k=1:n1
A(k,:)=((X(k,i)-meanXi).*(X(k,j)-meanXj)-sigma1(i,j)).^2;
end
You can apply the same logic to the second inner loop that iterates over k to avoid the duplicate mean calculations.

4 commentaires

Pierre Benoit
Pierre Benoit le 15 Oct 2014
Modifié(e) : Pierre Benoit le 15 Oct 2014
Even better than to calculate the mean in the loop, just do it before the loop :
meanX = mean(X);
then access it with i and j
for k=1:n1
A(k,:)=((X(k,i)-meanX(i)).*(X(k,j)-meanX(j))-sigma1(i,j)).^2;
end
MJ
MJ le 15 Oct 2014
Thanks for your answer. The size of X and Y in my problem are 3322 x 6 and 378000 x 6. I run the code about two hours ago and it is still running...
MJ
MJ le 15 Oct 2014
Wonderful!! Just applied your instructions and got the results in ONLY 3 seconds! You made my day, sir! :) Thanks a lot!
Geoff Hayes
Geoff Hayes le 15 Oct 2014
Glad it worked out!

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Loops and Conditional Statements dans Centre d'aide et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by