Hi Community members,
I am trying to run a very simple parfor loop and a for loop to compare results. However, to my surprise parfor is almost 100 times slower than for loop. Can anyone please explain this? I intend to run a code with almost 10000000000 iterations and need to decide how to make it fastest. Your suggestions will be very helpful in this regard.
arr=[];
tic
parfor x=1:50
arr(x) = x;
end
toc
Elapsed time is 0.385357 seconds.
arr=[];
tic
for x=1:50
arr(x) = x;
end
toc
Elapsed time is 0.004795 seconds.

Réponses (2)

Raymond Norris
Raymond Norris le 20 Sep 2021

0 votes

There are several considerations
  • How many dedicated workers? Most often, more workers, the lower the amount of time.
  • Amount of time to run the unit of work. Does it take more time to send the code then to run the code?
Your example is trivial. If your code really takes 0.005s to run all your sims, then parfor is not needed. Conversely, here's a better trivial example
tic
parfor idx = 1:50
pause(2)
end
toc

5 commentaires

Qammar Abbas
Qammar Abbas le 20 Sep 2021
Hi Raymond,
Thank you for your reply.
  • I am using 2 workers
  • What do you mean by 'send the code?'
By sending the code I mean the action of the MATLAB client sending the instructions (i.e. code) to the two workers need to run.
Looking at your file writing example:
  • The two loops are not identical
parfor x=1:5
fprintf( w.Value, '%s\n', strcat(a{x},b{x}));
end
for x=1:5
fprintf( fid, '%s\n', strcat(a{x},b{x}));
end
The parfor will write the concatenation in any order (x might come out as 2 3 1 5 4). Does order matter?
  • You don't need a loop to write your code
tic
AB = strcat(a,b);
fprintf( fid, '%s\n', AB{:});
toc
Qammar Abbas
Qammar Abbas le 21 Sep 2021
Hi Raymond,
Order does not matter in my application. I am going to try your code and will get back with results. Thank you.
Qammar Abbas
Qammar Abbas le 21 Sep 2021
Though your idea is very good but I cannot do this because my cell arrays a and b are very large and strcat will generate another large array which has to be saved (in AB in your case) and my MATLAB runs out of memory. Therefore, I have opted to use a loop generating each combination explicitly and directly writing it to a file without storing in any variable, just to avoid memory run out error. Any other suggestions will be appreciated.
Qammar Abbas
Qammar Abbas le 21 Sep 2021
For more details, please see the Actual problem I posted.

Connectez-vous pour commenter.

Jan
Jan le 20 Sep 2021

0 votes

The main work in you example is the iterative growing of the array. This is a waste of time in sequential and parallel code. Pre-allocate the output properly.
Starting parallel threads must take some time. For such a trivial code, the overhead is expected to be higher than the payload. Compare this with instructing 8 people to say the numbers 1 to 50. It is much faster to do this by your own.

3 commentaires

Hi Jan,
Thank you for your reply. My actual aim is to write to a text file inside parfor loop. Something like the code below. In my case, I have to run almost 1 billion iterations. To speed up the code, I wanted to use parfor but an experiment on a smaller dataset produce rather unexpected results in terms of time. Can you help explaining this.
clc
clear all
close all
a={'1rh8iu' 'kddhr7865' 'x' '74' '62'}; b={'14swf' 'hs' '92' '0o' '3jh43'};
fcn = @() fopen( sprintf( 'parfor_%d.txt', labindex ), 'wt' );
w = WorkerObjWrapper( fcn, {}, @fclose );
tic
parfor x=1:5
fprintf( w.Value, '%s\n', strcat(a{x},b{x}));
end
toc
fid = fopen('for.txt');
tic
for x=1:5
fprintf( fid, '%s\n', strcat(a{x},b{x}));
end
toc
clear w;
Walter Roberson
Walter Roberson le 20 Sep 2021
a{x} and b{x} have to be transfered through to the worker. That requires communications time while the worker sends back its iteration number and the controller gathers the data and sends it over, and the worker puts it into the appropriate internal variables. Then you do the strcat() on the worker, which is not much work.
Then you do the fprintf(), which is not much work. You did not use 'W' permissions, so the fprintf() flushes the file after the fprintf(), which requires talking to the operating system which has to talk to the file system, which has to process the flush()
If you were writing much more data, then you would run into problems that the files are all on the same drive, and drive writing is usually most efficient when at two (sometimes four) processes per controller are writing. Not per drive, but rather per controller . The reason is that the memory bandwidth is per controller so you can use up all of the memory bandwidth with just one drive. Two processes allows there to be I/O requests in the queue immediately after one I/O has finished -- and these days a lot of drives are able to re-order I/O requests according to rotational latency requirements.
Steven Lord
Steven Lord le 20 Sep 2021
There's more information on the poster's actual use case in this other Answers post.

Connectez-vous pour commenter.

Catégories

Produits

Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by