Calling parpool with SpmdEnabled = False
9 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Jose Sanchez
le 29 Jan 2020
Réponse apportée : Edric Ellis
le 30 Jan 2020
Hello,
I am having an issue with my University cluster. It is that when one worker crash, for a reason like "communication with the worker is lost probably due to a network problem", then all the remaining workers crash, and it may happen again and again!
Because I am using parfor and it doesn't require communication among the workers like spmd, then I would like all the other workers to finish their job!
I learnt online that you can solve this issue by calling your HPC cluster with the parameter "SpmdEnabled" equals False. What I did but MATLAB ignored my request as you can see in the Warning message at the end.
My question is, how can I solve this issue?
---------------------------------------------------------------------------------------------------------
parpool('HPCServerProfile1', 160, 'SpmdEnabled', false)
Starting parallel pool (parpool) using the 'HPCServerProfile1' profile ...
Warning: Disabling SPMD on parallel pools is not supported on this cluster type. Set 'SpmdEnabled' value to true.
Connected to the parallel pool (number of workers: 160).
0 commentaires
Réponse acceptée
Edric Ellis
le 30 Jan 2020
Unfortunately, only MJS and Local cluster types support SpmdEnabled = false. You might be able to use the "cluster parfor" approach though - see the documentation. Basically, you would transform your main parfor loop like so:
% Important: do *not* create a parallel pool prior to running this!
% In fact, you may wish to call "delete(gcp('nocreate'))" to be sure.
% Get the cluster object
c = parcluster('HPCServerProfile1');
% Create the parforOptions structure
opts = parforOptions(c);
% Run the parfor loop directly on the cluster with no parallel pool
parfor (idx = 1:10000, opts)
% Body of the parfor loop
end
This should be more robust, however it does incur more overhead than the interactive parpool approach.
0 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Parallel Computing Fundamentals dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!