How to avoid broadcast variable while optimizing a cost function in parallel computing?

Question

0 votes

I'm trying to minimize a heavy cost function (2500X2500 is the biggest matrix in it) using PSO in parallel computing. It takes me a couple of days for only one (!) iteration and I'm not sure why. Will be very thankfull for any help.

I use parallel computing in order to fasten things, but for now I get the message "The entire array or structure 'CostFunction' is a broadcast variable. This might result in unnecessary communication overhead". This are the problematic lines:

parfor i=1:nPop

% Evaluation (position value in the cost function)

particle(i).Cost = CostFunction(particle(i).Position);

end

While CostFunction is a function handle I defined earlier in the code, and it's input changes each iteration.

Using MATLAB profiler I managged to get statistics of the running time of my code, pointing that most of running time is in that single parfor loop

While ICF is my original cost function, and diss+null are the children of it. As I understand from the flame graph ICF and it's children are not children of the parfor loop, hence the running time is divided between the loop and the cost function seperately. And the time consuming Java method I dont know, but I do know it's part of the parallel process.

So I'm basically asking two questions:

Is the broadcast variable problem the cause for the long running time?
how can I avoid broadcasting my cost function?

thanks in advance

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Edric Ellis le 7 Déc 2022

0 votes

Investigating performance of parfor loops can be a bit tricky. Here are a few pointers:

Do you happen to know if your function already benefits from MATLAB's intrinsic multi-threading? (Check using your system's "Task Manager" or equivalent). If so, using only local workers with PCT will not speed things up as you are already using all your machine's resources. (Process workers run in single-threaded mode so each worker might well process things more slowly than your client - but if you've got several of them, you can still get speedup overall)
You can check the data transfer size using ticBytes and tocBytes. However, 2500x2500 is not particularly large, and I wouldn't expect it to cause things to take that long
You can use mpiprofile to profile the execution time on the workers - the client profile only shows that you're waiting for workers to complete their work.(This works fine with parfor, despite the name)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

How to avoid broadcast variable while optimizing a cost function in parallel computing?

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Community Treasure Hunt

How to avoid broadcast variable while optimizing a cost function in parallel computing?

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens