Parfor Freezing during computation

3 vues (au cours des 30 derniers jours)
Samuel Léveillé
Samuel Léveillé le 24 Juin 2019
Commenté : John Meluso le 17 Mar 2020
Hi,
im doing an optimization where the function being optimized uses a parfor to speed-up it's calculation.
The said function look something like this:
Data(X=1:10,Y=1:10) (just reference for the data format)
Parfor x=1:10
for y=1:10
dosomething(DATA(x,y)); (uses Quad and Fzeros but i dont think its that important)
end
end
This problem is: the total program takes 3-4 days to compute and while i run the code (on a 16 core xeon server), the program will sometime stall stopping iteration. It can be after 15 minutes or 40+hour... CPU usage drop to zero but no error message. (for reference, i managed to run the entire program a couple time without any issues but i need it to be extremly reliable...). I also see a couple worker popping in and out of the command list but all of them are at 0.1% load. At first i thought it was a probleme with the optimization routine but i accidently discovered that when i kill some worker in the command promt, an error message pop-up saying a worker was aborted and then the program restart iterating! However, it will continue only on the remaining worker i didn't kill. This process was done with trial and error and didn't manage to identify the cause.
Any advice? i tried to feed the dataset with a Parpool constant and calling only the value being used in the specific parfor iteration, to refer the above exemple:
C= parallel.pool.Constant(DATA);
Parfor x=1:10
data(1,:)=C.value(x,:);
for y=1:10
dosomething(data(1,y)); (uses Quad and Fzeros but i dont think its that important)
end
end
But this procedure yielded the same crashes and this time even faster than usual (might be random).
As i said, this problem seem totally random and will sometime not even happen for a particular test run. I tried to work on simulated Data(random) and a time-series (deterministic) and both did this issue. And each time it happened, I stopped it, restart the program and it didn't stall at the same place the previous one did.
PS: it also happend on my personnal laptop (2 cores old stuff), so im pretty sure the problem is'nt from the server i use. In the matlab window, the code stall, the play button stay on pause but no CPU load and no error message.
Thanks
  1 commentaire
John Meluso
John Meluso le 17 Mar 2020
Hi Samuel, I'm curious if you ever found a reliable solution to this problem? I'm running into the same issue running a simulation on a computing cluster and -- despite the plethora of people who seem to have the same problem -- I haven't seen anyone else offer a solution. Thanks!

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Parallel Computing Fundamentals dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by