scaling a parfor to a more than one node on a cluster

6 vues (au cours des 30 derniers jours)
PatrizioGraziosi
PatrizioGraziosi le 26 Déc 2021
Hello everybody,
my purpose is to run an "outer" function which runs several times a specific function in a parfor loop
function my_outer_function
parpool('local',numworkers)
...
parfor id = 1 : n
[ temp1, temp2, temp3] = myfunction (id) ;
out1(id).x = temp1.x;
out1(id).y = temp1.y;
out2(id).x = temp2.x;
...
end
...
end
where out1, out2, ... and temp1, temp2, ... are struct type data.
When the outer function runs in a HPC on a single node and opens a parpool in 'local' mode, it opens a parpool at the beginning and keeps it open for the whole running time, everything works okay.
But I cannot figure out a way to make it working on more nodes, i.e. by opening a parpool on 2 or more nodes... is it possible? Do you have any suggestion, please?
Thanks
Patrizio
To specify the nature of the problem: I'm computing 'id' independent calculations of a physical quantity, then I collect the data and integrate them. Each 'id' calculation can run independently on a worker.

Réponse acceptée

Raymond Norris
Raymond Norris le 27 Déc 2021
Ciao Patrizio,
We've worked with Cineca directly in the past. To begin with, yes we need to resolve the MATLAB Parallel Server license. Secondly, there are serveral implementations of MATLAB Parallel Server at Cineca. The one you're pointing to is slightly out of date (I can tell by the documentation). I looked online for the other, but can't find it at the moment.
If you can wait until after the new year, we can schedule a time to get you setup. Please reach me offline.
Thanks,
Raymond

Plus de réponses (1)

Raymond Norris
Raymond Norris le 27 Déc 2021
Parallel Computing Toolbo provides a "local" profile, for running multi-core jobs on the machine running the MATLAB client. If you want processes to run across multiple nodes, you'll need to use MATLAB Parallel Server along with a scheduler to submit jobs. If you don't have a scheduler, MATLAB Parallel Server provides one (MJS). In doing so, you'll create a new profile, described the documentaiton, which instructs MATLAB how to communicate to the scheduler. Technical Support can walk you through the process of creating a new profile and submitting jobs to the scheduler.
  1 commentaire
PatrizioGraziosi
PatrizioGraziosi le 27 Déc 2021
Thank you Raymond!
First, happy to know I can open a multinode parpool.
I actually should have a SLURM scheduler, and the configuration instructions are detailed here
but I cannot figure out how to open a pool spanning on two or more nodes, it looks my code "freezes" when launching the parpool, this morning I found in "debug" the attached message.
It looks to me I am facing a license problem, but form the link aboved it should be possible to open a pool on more node, as far as I understand. So I wonder if I'm doing something wrong or if I must set up a MJS properly...
I think that my case is "Example 1" in the link above, however if I launch
j = c.batch(@parallel_example, 1, 'Pool', 44)
or
j = c.batch(@parallel_example, 2, 'Pool', 89)
I got the attached error messages (there are 48 core per node in the cluster, I set 45 per node in the parcluster c, c = parcluster('myaccount') ; c.NumWorkers = 90; c.AdditionalProperties.NumberOfNodes = 2 ; c.AdditionalProperties.NumWorkers = 90; c.AdditionalProperties.ProcsPerNode = 45 ; c.saveProfile )
Happy to receive any suggestion or indication, or confirmation I must contact the technical support.
Patrizio

Connectez-vous pour commenter.

Catégories

En savoir plus sur Parallel Computing Fundamentals dans Help Center et File Exchange

Produits


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by