Discarded Messages with SPMD and labReceive ... why?

1 vue (au cours des 30 derniers jours)
EvanThomas
EvanThomas le 18 Juil 2022
Modifié(e) : EvanThomas le 20 Juil 2022
Hello,
I am using SPMD and trying to get some workers communicating w/ each other. There is a flag they need to send/receive. Whoever gets there job done and comitted first, sends out the flag, which the remaining workers should receive and therefore not commit their work.
Here is some abstact code that hopefully gets the point across of what I am trying to do. I would have thought the labBarrier at the bottom would have ensured all workers coming in 2nd place and after would have received the flag from the first workker finished. Some do, but .... I also get many of the warning messages similar to the following:
Lab 1:
Warning: An incoming message was discarded from lab 2 (tag: 2)
Indeed some workers are indeed missing the message, even if they finish seconds after that flag was sent out.
How does labSend work? I am missing something here?
----------------------------
% Emulating workers doing some variable time task
pause(randi([1 15]));
% See if other workers got their first and sent an update
for i=1:1:length(agentVec)
if i==labindex
Updates(i)=0;
else
if labProbe(i,2)
[Updates(i),srcWkrIdx,tag] = labReceive(i,2);
else
Updates(i) = 0;
end
end
end
if ~any(Updates)
% Commit work
flag = 1
else
% Otherwise take a nap
flag = 0
end
labSend(flag,agentVec(agentVec ~= labindex),2);
labBarrier;
  3 commentaires
EvanThomas
EvanThomas le 19 Juil 2022
Modifié(e) : EvanThomas le 19 Juil 2022
Hi Edric, thanks for the response. You can ignore the matching end as this isn't the code I am running. It was just meant to be a simple, absctract example demonstrating the concept of what I am trying to do. (Although, I just edited and removed the extra "end")
Right now, the flag is literally just a 1 or 0, which seems about as small a message can get. Is it still possible MPI is considering this "large".
I get the impression MPI isn't a very reliable tool for communication, as far as predictable behaviour. Is this generally true? Maybe I can't achieve what I am hoping, as a result?
Edric Ellis
Edric Ellis le 20 Juil 2022
I would actually say exactly the opposite - MPI is (generally) very reliable and predictable. I shall post an answer with a suggestion as to how you might proceed.
In the code that you've written, each worker is guaranteed to labSend to each other worker. However, each worker is not guaranteed to labReceive from each other worker. There are guaranteed to be mismatched send/receives.

Connectez-vous pour commenter.

Réponses (1)

Edric Ellis
Edric Ellis le 20 Juil 2022
Using conditional receives in this way is not a robust way to get the workers to collaborate - you have an ordering problem that cannot be solved. I think you can probably achieve your goal by using one of the "reduction" functions which are designed to collect together results from multiple workers. In particular, you could try gcat to allow each worker to find out what happened on every other worker. gcat (effectively) collects values from all workers and concatenates them together on each worker. In this way, you don't need the labBarrier call either. Something a bit like this:
myResult = doSomeWork();
allResults = gcat(myResult);
% Now, choose what to do based on the results from all workers.
  1 commentaire
EvanThomas
EvanThomas le 20 Juil 2022
Modifié(e) : EvanThomas le 20 Juil 2022
Thanks again for the feedback. Unfortunately, I'm not sure that will work for me, as it looks like SPMD waits until it here's back from all workers, which make sense given that it is concatenating all their responses and they are running asynchronously. So the function containing gcat won't complete until all worerks are done, if I understand things correctly. This takes me away from the asynchronous behavior I was needing at the next step.
For example, whenever Agent A is done it needs data from the workers that finished up to that point only. So, I would need a "partial" gcat, or some way to concatenate results from the subset of workers that finished only before Agent A. Not sure that is possible, though. Hopefully, my description makes sense
I felt like labSend and labReceive would be the only way to accomplish this. Unfortunately, that is not working, either.

Connectez-vous pour commenter.

Catégories

En savoir plus sur MATLAB dans Help Center et File Exchange

Produits


Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by