parallel computation for a for loop

Maryam (view profile)

on 12 Apr 2019
Latest activity Commented on by Maryam

on 12 Apr 2019

Catalytic (view profile)

Below is the part of the code I have written, in which I am trying to use parallel computation. But it does give me an error as below:
"Error: The variable k_1 in a parfor cannot be classified."
In each parfor iteration some specific rows of "k_1" matrix will be updated irrelavant to the rest, so I cannot see why I get this message. Any help in this regard will be highly appreciated. Please find the parallel portion of my code below:
k_1 = k;
M_1 = M;
% pi = 0;
parfor ppi=1:NP
for pii=1:kkk
% pi= pi + 1;
pi = (ppi-1)*kkk+pii;
Tempo1 = zeros(1,1);
Tempo1_M = zeros(1,1);
Tempo2 = zeros(1,1);
Tempo2_M = zeros(1,1);
for irow = 1:p(pi)
[xx,inside_angle] = find(irow==[p;q]);
Tempo1(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[k(p(inside_angle),p(pi))+lambda(pi)*k(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[k(q(inside_angle),p(pi))+lambda(pi)*k(q(inside_angle),q(pi))];
Tempo1_M(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[M(p(inside_angle),p(pi))+lambda(pi)*M(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[M(q(inside_angle),p(pi))+lambda(pi)*M(q(inside_angle),q(pi))];
end
for irow = 1:q(pi)
[xx,inside_angle] = find(irow==[p;q]);
Tempo2(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[alpha(pi)*k(p(inside_angle),p(pi))+k(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[alpha(pi)*k(q(inside_angle),p(pi))+k(q(inside_angle),q(pi))];
Tempo2_M(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[alpha(pi)*M(p(inside_angle),p(pi))+M(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[alpha(pi)*M(q(inside_angle),p(pi))+M(q(inside_angle),q(pi))];
end
%Assign tempos to k
for irow = 1:p(pi)
k_1(irow,p(pi)) = Tempo1(irow);
k_1(p(pi),irow) = Tempo1(irow);
M_1(irow,p(pi)) = Tempo1_M(irow);
M_1(p(pi),irow) = Tempo1_M(irow);
end
for irow = 1:q(pi)
k_1(irow,q(pi)) = Tempo2(irow);
k_1(q(pi),irow) = Tempo2(irow);
M_1(irow,q(pi)) = Tempo2_M(irow);
M_1(q(pi),irow) = Tempo2_M(irow);
end
end
end
k=k_1;
M=M_1;
Please note that k and M matrices are defined by me at the start of the code!

Maryam

Maryam (view profile)

on 12 Apr 2019
In other words, I don't understand why it is not parallel, and what changes should I make to convert it to parallel!
Matt J

Matt J (view profile)

on 12 Apr 2019
No, you need to explain to us why you think it is parallelizable. Parallelizable means the operations done by your loop iterations (over ppi) are independent of one another and could just as well be done on separate computers. Since all loop iterations in the code you've shown appear to be writing into the same matrix, it is hard to see what kind of independence from each other you think the loop iterations can have.
Maryam

Maryam (view profile)

on 12 Apr 2019
Well, the reason I think it's parallel is that in each iteration (over ppi) I produce 4 arrays, which are independant. Then I assign the arrays to the columns of k_1 and M_1, which the array numbers are not the same. For example for a 6x6 matrix, if "NP" will be equal 3 (which means I want to use 3 processors to do the job independantly from each other), then processor one will override 2 specific columns of k_1 (lets say 2 and 3), processor 2 override another 2 columns (1 and 4 for example), and the last processor should override the last two columns (5 and 6). These pairs have computed at the start of my code (which is not mention here), and it is prooved that there aren't any repeated number in pairs!
So by the explanation I provided, would you please tel me in what part I am making the mistake? In another words, what part of my explanation is wrong based on the parallel computation concept?
Again thank you so much for your time and considerations.

Catalytic (view profile)

on 12 Apr 2019

parfor pi=1:NP*kkk
Tempo1 = zeros( p(pi) ,1 );
Tempo1_M = zeros( p(pi) ,1);
Tempo2 = zeros(q(pi) ,1);
Tempo2_M = zeros(q(pi) ,1);
pSubs = zeros( p(pi) ,2 ); %new
qSubs = zeros(q(pi) ,2);
for irow = 1:p(pi)
[xx,inside_angle] = find(irow==[p;q]);
Tempo1(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[k(p(inside_angle),p(pi))+lambda(pi)*k(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[k(q(inside_angle),p(pi))+lambda(pi)*k(q(inside_angle),q(pi))];
Tempo1_M(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[M(p(inside_angle),p(pi))+lambda(pi)*M(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[M(q(inside_angle),p(pi))+lambda(pi)*M(q(inside_angle),q(pi))];
pSubs(irow,:)=[irow,p(pi)]; %new
end
for irow = 1:q(pi)
[xx,inside_angle] = find(irow==[p;q]);
Tempo2(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[alpha(pi)*k(p(inside_angle),p(pi))+k(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[alpha(pi)*k(q(inside_angle),p(pi))+k(q(inside_angle),q(pi))];
Tempo2_M(irow) = [(xx-1)*alpha(inside_angle)+(2-xx)]* ...
[alpha(pi)*M(p(inside_angle),p(pi))+M(p(inside_angle),q(pi))] + ...
[(2-xx)*lambda(inside_angle)+(xx-1)]* ...
[alpha(pi)*M(q(inside_angle),p(pi))+M(q(inside_angle),q(pi))];
qSubs(irow,:)=[irow,q(pi)]; %new
end
subsCell{pi}=[pSubs;qSubs]; %new
kValCell{pi}=[Tempo1;Tempo2];
MValCell{pi}=[Tempo1_M;Tempo2_M];
end
subs=cell2mat(subsCell);
kVal=cell2mat(kValCell);
MVal=cell2mat(MValCell);
k_1=accumarray(subs,kVal,size(k));
M_1=accumarray(subs,MVal,size(M));
k=k_1 + tril(k_1.',-1); %make symmetric
M=M_1 + tril(M_1.',-1);

Maryam

Maryam (view profile)

on 12 Apr 2019
Also another think is because the number of parallel processors are limited (lets say 5 for instance), this form of formulation requires "NP*KKK", which might be so large, about 1000 for instance. That was why I tried to do block parallel and divide the jobs between certain processors. This is the reason instead of using "parfor pi=1:NP*kkk" I used:
"parfor ppi=1:NP
for pii=1:kkk"
Matt J

Matt J (view profile)

on 12 Apr 2019
You are doing manually what the parfor machinery already does for you. Parfor is smart enough, without your intervention, to break the loop into chunks and distribute them to the workers.
Maryam

Maryam (view profile)

on 12 Apr 2019
I understand! Well I wasn't aware of that. Thank you so much for clarification and your help.