How do Matlab workers work?

16 vues (au cours des 30 derniers jours)
Tobias Brambier
Tobias Brambier le 10 Mai 2022
Commenté : Edric Ellis le 23 Mai 2022
So I have been working on optimizing/parallelising an existing piece of code as part of a project in my studies. And while doing so, I have encountered a rather strange problem, atleast to me.
The 'problem' occurs when I let the code run in a normal for-loop, versus a parfor-loop. According to the tic-toc command the parfor-loop's runtime using a single worker is about half the runtime it would take using a standard for-loop.
My problem with this is that according to my understanding standalone Matlab is single threaded. But using a single worker that also uses one thread is still faster, way faster even. And this is exactly where my understanding of the situation leaves me confused.
And yes, I have checked, there is actually only a single worker using a single thread.
On top of that I sort of need an explanation for this in my final papers for this project.
I really hope to find any explanation or even a hint at all to this, for me, strange behaviour.
  4 commentaires
Tobias Brambier
Tobias Brambier le 11 Mai 2022
Since I don't know how to quote, or if it is even possible:
Question from Edric Ellis: "What happens if you run parfor with no parallel pool?"
Answer: Running the parfor loop without a parallel pool results in a runtime close to the time I get when running a standard for-loop. Although a bit faster, still nowhere close to the times I got using parpool of size 1.
Recommendation from Walter Roberson: "Also try experimenting with setting numcompthread to 1."
Answer: I would assume you mean the NumThreads option in the Preference Menu, then yes I tried doing that too. Actually that was what I meant when I wrote: "And yes, I have checked, there is actually only a single worker using a single thread." I now realise, I could have clarified that with a bit more detail.
Assumption from Edric Ellis: "Providing you're using parpool("local") (and not "threads")..."
Answer: Yes, since the local Cluster was the default one I used that for most of testing. Using threads instead results in runtimes very similar, but still slower, to ones using local though.
Question from Edric Ellis: "Is your for loop in a script or a function?"
Answer: It is currently used in a script. Since I am unsure what exactly you mean, I just tried using the function keyword, which resulted in runtimes even slower then just using it in a script.
Question from Edric Ellis: "Are you able to narrow this down to a simple reproduction that you can share?"
Answer: I will ask my supervisors if I am allowed to just simply share the piece of example code that is causing the confusion. Although I don't know how helpful that would even be. I can also try and reduce the code down to an even simpler version to share and see if the situation stays the same.
Thanks so far for your recommendations! What I also forgot to mention is that the code is simply used to assemble stiffnes matrices for usage in a "finite element method"-program called "DAEdalon" made by the supervising professor.
Walter Roberson
Walter Roberson le 11 Mai 2022
https://www.mathworks.com/help/matlab/ref/maxnumcompthreads.html

Connectez-vous pour commenter.

Réponses (1)

Tobias Brambier
Tobias Brambier le 19 Mai 2022
Modifié(e) : Tobias Brambier le 19 Mai 2022
Ok, so after I found out that the formerly implemented waitbar created some huge overhead here is a simplified version of the code that still causes the same confusion:
%this is only used for generating the usually externally provided input data
%this number (sz) is the one to increase to change the workload!
sz = 120;
for i = 1 : sz
for j = 1 : sz
x( ( i - 1 ) * sz + j, : ) = [ j - 1, i - 1 ];
end
end
for j = 1 : sz - 1
for i = 1 : sz
inc = ( j - 1 ) * sz + i;
conn( inc, : ) = [ inc, inc + sz ];
end
end
for j = 1 : sz
for i = 1 : sz - 1
inc = ( j - 1 ) * sz + i;
conn( end + 1, : ) = [ inc, inc + 1 ];
end
end
numnp = size(x, 1);
ndm = size(x, 2);
ndf = ndm;
numelem = size(conn, 1);
k_parallel = sparse(numnp*ndf, numnp*ndf);
%this is the important part
parpool( 1 )
tic
parfor e = 1:numelem
k_elem_gs = sparse(numnp*ndf, numnp*ndf);
I = conn(e,1);
J = conn(e,2);
t = x(J,:) - x(I,:);
L = norm(t);
t = t/L;
B = [-t, t]/L;
k_elem = B'*B*L;
for i = 1:2
I = conn(e, i);
for j = 1:2
J = conn(e, j);
k_elem_gs(ndf*(I-1)+1:I*ndf, ndf*(J-1)+1:J*ndf) = ...
k_elem_gs(ndf*(I-1)+1:I*ndf, ndf*(J-1)+1:J*ndf) ...
+ k_elem(ndf*(i-1)+1:i*ndf, ndf*(j-1)+1:j*ndf);
end
end
k_parallel = k_parallel + k_elem_gs;
end
toc
  1 commentaire
Edric Ellis
Edric Ellis le 23 Mai 2022
I don't have a definitive answer, but here's what I think is going on. The serial MATLAB profiler reveals that the expensive pieces of the computation are the increments to k_elem_gs, and the update of k_parallel. In particular, I think the updates to k_parallel get more expensive as the number of non-zero elements increases. This is significant, because a parfor loop reorders these additions. Even when using a single worker, parfor divides up the work of the loop into "subranges", which execute separately. Each of these subranges will start the addition of k_parallel from scratch - i.e. starting from "cheap" additions. So whereas the client for loop does this:
k = k + k + k + k + k + k + k;
the single-worker parfor does something more like this:
k = {k + k + k} + {k + k} + {k + k};
(where each k on the right-hand side is different of course). There are still the same number of additions, but not all of them are expensive additions.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Parallel for-Loops (parfor) dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by