How can I improve the parfor performance in my code?

2 vues (au cours des 30 derniers jours)
Philip Muscarella
Philip Muscarella le 3 Mai 2021
Commenté : Jan le 5 Mai 2021
I have this nested parfor and for loop that takes ~8mins to run on 16 workers. I also have a version of this code that runs in about the same amount of time on a single proc. If anyone has some suggestions about how to improve performance that would be great.
parfor jj = 1:ny
y = Y(jj);
for ii = 1:nx
x = X(ii);
CosTerm = cos(wnkcosDir*x+wnksinDir*y+gamma);
SinTerm = sin(wnkcosDir*x+wnksinDir*y+gamma);
eta(ii,jj) = sum(sum((spec_dfdt.*CosTerm),1),2) ;
u(ii,jj) = grav*sum(sum((wnkcosDir.*spec_dfdt.*CosTerm.*romega),1),2) ;
v(ii,jj) = grav*sum(sum((wnksinDir.*spec_dfdt.*CosTerm.*romega),1),2) ;
w(ii,jj) = grav*sum(sum((wnk.*spec_dfdt.*SinTerm.*romega),1),2) ;
deta_dx(ii,jj) = - sum(sum((spec_dfdt.*wnkcosDir.*SinTerm),1),2) ;
deta_dy(ii,jj) = - sum(sum((spec_dfdt.*wnksinDir.*SinTerm),1),2) ;
end
end
Here are all the sizes/types of variables:
Name Size Bytes Class Attributes
X 3001x3001 72048008 double
Y 3001x3001 72048008 double
deta_dx 3001x3001 72048008 double
deta_dy 3001x3001 72048008 double
eta 3001x3001 72048008 double
gamma 123x93 91512 double
grav 1x1 8 double
romega 123x93 91512 double
spec_dfdt 123x93 91512 double
u 3001x3001 72048008 double
v 3001x3001 72048008 double
w 3001x3001 72048008 double
wnk 123x93 91512 double
wnkcosDir 123x93 91512 double
wnksinDir 123x93 91512 double
I also have been using ticBytes/tocBytes to track the communications to the workers.
BytesSentToWorkers BytesReceivedFromWorkers
__________________ ________________________
1 9.9847e+07 2.7339e+07
2 9.9703e+07 2.7195e+07
3 1.0057e+08 2.8062e+07
4 9.913e+07 2.6621e+07
5 9.9706e+07 2.7197e+07
6 9.913e+07 2.6621e+07
7 9.9565e+07 2.7057e+07
8 9.9565e+07 2.7057e+07
9 9.8986e+07 2.6475e+07
10 9.9274e+07 2.6764e+07
11 9.8983e+07 2.6472e+07
12 9.9127e+07 2.6617e+07
13 9.9559e+07 2.705e+07
14 9.9988e+07 2.7481e+07
15 1.01e+08 2.8492e+07
16 1.0013e+08 2.7624e+07
Total 1.5943e+09 4.3412e+08
Thanks in advance.
  1 commentaire
Edric Ellis
Edric Ellis le 4 Mai 2021
I would check using top or taskmgr or similar your CPU usage when running the for-loop version of your code. You might well find that MATLAB's intrinsic multi-threading is already doing a good job of parallelising your code. If this is the case, then parfor can never win because you don't have any more CPUs for it to take advantage of. parfor wins when your for-loop code cannot be multi-threaded by MATLAB; or, when you can offload the computations onto more CPUs by using a cluster.
One final note - in recent releases of MATLAB, you can use the 'all' flag to sum to perform the summation along all dimensions at once:
sum(magic(4), 'all')
ans = 136

Connectez-vous pour commenter.

Réponse acceptée

Jan
Jan le 4 Mai 2021
Modifié(e) : Jan le 4 Mai 2021
Move all repeated calculations out of the loop:
C1 = wnkcosDir.*spec_dfdt.*romega;
C2 = wnksinDir.*spec_dfdt.*romega;
C3 = wnk.*spec_dfdt.*romega;
C4 = -spec_dfdt.*wnkcosDir;
C5 = -spec_dfdt.*wnksinDir;
parfor jj = 1:ny
y = Y(jj);
C6 = wnksinDir*y+gamma;
C7 = wnksinDir*y+gamma;
for ii = 1:nx
x = X(ii);
CosTerm = cos(wnkcosDir*x + C6);
SinTerm = sin(wnkcosDir*x + C7);
eta(ii,jj) = sum(spec_dfdt .* CosTerm, 'all') ;
u(ii,jj) = grav*sum(C1 .* CosTerm, 'all') ;
v(ii,jj) = grav*sum(C2 .* CosTerm, 'all') ;
w(ii,jj) = grav*sum(C3 .* SinTerm, 'all') ;
deta_dx(ii,jj) = sum(C4 .* SinTerm, 'all');
deta_dy(ii,jj) = sum(C5 .* SinTerm, 'all');
end
end
Avoiding repeated calculations is cheaper than distributing it to multiple threads.
Calling SUM once with 'all' dimensions is more efficient than calling it twice.
  2 commentaires
Philip Muscarella
Philip Muscarella le 4 Mai 2021
I am in R2018a so the 'all' option is not available. Will look into updating.
Why do you need C6 and C7 if they are the same?
Jan
Jan le 5 Mai 2021
Oh, C6 and C7 are identical. Good point. I've overseen this. You know, all the small letters look like tiny flies when looking from a certain distance. So please tale my code only as a demonstration about how to moving repeated computations out of the loop.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Parallel for-Loops (parfor) dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by