# How can I improve the parfor performance in my code?

2 views (last 30 days)
Philip Muscarella on 3 May 2021
Commented: Jan on 5 May 2021
I have this nested parfor and for loop that takes ~8mins to run on 16 workers. I also have a version of this code that runs in about the same amount of time on a single proc. If anyone has some suggestions about how to improve performance that would be great.
parfor jj = 1:ny
y = Y(jj);
for ii = 1:nx
x = X(ii);
CosTerm = cos(wnkcosDir*x+wnksinDir*y+gamma);
SinTerm = sin(wnkcosDir*x+wnksinDir*y+gamma);
eta(ii,jj) = sum(sum((spec_dfdt.*CosTerm),1),2) ;
u(ii,jj) = grav*sum(sum((wnkcosDir.*spec_dfdt.*CosTerm.*romega),1),2) ;
v(ii,jj) = grav*sum(sum((wnksinDir.*spec_dfdt.*CosTerm.*romega),1),2) ;
w(ii,jj) = grav*sum(sum((wnk.*spec_dfdt.*SinTerm.*romega),1),2) ;
deta_dx(ii,jj) = - sum(sum((spec_dfdt.*wnkcosDir.*SinTerm),1),2) ;
deta_dy(ii,jj) = - sum(sum((spec_dfdt.*wnksinDir.*SinTerm),1),2) ;
end
end
Here are all the sizes/types of variables:
Name Size Bytes Class Attributes
X 3001x3001 72048008 double
Y 3001x3001 72048008 double
deta_dx 3001x3001 72048008 double
deta_dy 3001x3001 72048008 double
eta 3001x3001 72048008 double
gamma 123x93 91512 double
grav 1x1 8 double
romega 123x93 91512 double
spec_dfdt 123x93 91512 double
u 3001x3001 72048008 double
v 3001x3001 72048008 double
w 3001x3001 72048008 double
wnk 123x93 91512 double
wnkcosDir 123x93 91512 double
wnksinDir 123x93 91512 double
I also have been using ticBytes/tocBytes to track the communications to the workers.
__________________ ________________________
1 9.9847e+07 2.7339e+07
2 9.9703e+07 2.7195e+07
3 1.0057e+08 2.8062e+07
4 9.913e+07 2.6621e+07
5 9.9706e+07 2.7197e+07
6 9.913e+07 2.6621e+07
7 9.9565e+07 2.7057e+07
8 9.9565e+07 2.7057e+07
9 9.8986e+07 2.6475e+07
10 9.9274e+07 2.6764e+07
11 9.8983e+07 2.6472e+07
12 9.9127e+07 2.6617e+07
13 9.9559e+07 2.705e+07
14 9.9988e+07 2.7481e+07
15 1.01e+08 2.8492e+07
16 1.0013e+08 2.7624e+07
Total 1.5943e+09 4.3412e+08
Edric Ellis on 4 May 2021
I would check using top or taskmgr or similar your CPU usage when running the for-loop version of your code. You might well find that MATLAB's intrinsic multi-threading is already doing a good job of parallelising your code. If this is the case, then parfor can never win because you don't have any more CPUs for it to take advantage of. parfor wins when your for-loop code cannot be multi-threaded by MATLAB; or, when you can offload the computations onto more CPUs by using a cluster.
One final note - in recent releases of MATLAB, you can use the 'all' flag to sum to perform the summation along all dimensions at once:
sum(magic(4), 'all')
ans = 136

Jan on 4 May 2021
Edited: Jan on 4 May 2021
Move all repeated calculations out of the loop:
C1 = wnkcosDir.*spec_dfdt.*romega;
C2 = wnksinDir.*spec_dfdt.*romega;
C3 = wnk.*spec_dfdt.*romega;
C4 = -spec_dfdt.*wnkcosDir;
C5 = -spec_dfdt.*wnksinDir;
parfor jj = 1:ny
y = Y(jj);
C6 = wnksinDir*y+gamma;
C7 = wnksinDir*y+gamma;
for ii = 1:nx
x = X(ii);
CosTerm = cos(wnkcosDir*x + C6);
SinTerm = sin(wnkcosDir*x + C7);
eta(ii,jj) = sum(spec_dfdt .* CosTerm, 'all') ;
u(ii,jj) = grav*sum(C1 .* CosTerm, 'all') ;
v(ii,jj) = grav*sum(C2 .* CosTerm, 'all') ;
w(ii,jj) = grav*sum(C3 .* SinTerm, 'all') ;
deta_dx(ii,jj) = sum(C4 .* SinTerm, 'all');
deta_dy(ii,jj) = sum(C5 .* SinTerm, 'all');
end
end
Avoiding repeated calculations is cheaper than distributing it to multiple threads.
Calling SUM once with 'all' dimensions is more efficient than calling it twice.
##### 2 CommentsShowHide 1 older comment
Jan on 5 May 2021
Oh, C6 and C7 are identical. Good point. I've overseen this. You know, all the small letters look like tiny flies when looking from a certain distance. So please tale my code only as a demonstration about how to moving repeated computations out of the loop.

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by