parfor calculations take longer time than for
    10 vues (au cours des 30 derniers jours)
  
       Afficher commentaires plus anciens
    
    Mikhail
      
 le 25 Oct 2014
  
    
    
    
    
    Commenté : Paul Safier
 le 16 Juin 2022
            I am starting to work with the Parallel Computing Toolbox, and just constructed simple example to compare for and parfor:
tic
a=rand(4000,4000);
k=size(a,1);
(par)for i=1:k
    for j=1:k
        a(i,j)=2/(2+a(i,j));
    end
end
toc
I computed this with parfor and for (I wrote par in brackets so first time there is no "par", and second there is "par"). Computational time is several seconds, but parfor calculates it two or three times slower. I also used matlabpool (4 workers) before.
What is the problem?
Thanks in advance
3 commentaires
Réponse acceptée
  Mohammad Abouali
      
 le 25 Oct 2014
        
      Modifié(e) : Mohammad Abouali
      
 le 25 Oct 2014
  
      It has to do with the communication and the way you are addressing the memory or slicing the variable.
In general this sort of communication is causing too much interprocess communication and also you are addressing the memory in an uncoalesced fashion.
Note that MATLAB, unlike C, stores the variable by changing the first index first. This means that if A is a double precision variable with 4000x4000 element, then A(1,1) and A(2,1) are next to each other in the memory, while A(1,1) and A(1,2) are separated by 4000*sizeof(double). (Perhaps it has something to do with original implementation of MATLAB which was written in FORTRAN (not fact checked, this is what I have heard; they just wanted to keep it that way). FORTRAN also stores the variable with changing the first index first).
This means that
for i=...
  for j=...
    A(i,j)=...
  end
end
increases the cache misses (there are too much communication between RAM and CPU. while
    for j=...
      for i=...
        A(i,j)=...
      end
    end
Works on memory addresses which are close to each other; hence increases the cache hits; and consequently the performance. To get a feeling look at implementation 1 and 3. while implementation 1 took 0.884058 seconds on my system implementation 3 took only 0.451835 seconds. This difference would be larger and larger as your array sizes increases and if you run it on a system with lower cache memory on the CPU. By the way, the best way to implement this calculation is implementation 5. Let MATLAB handles the looping as much as possible. Underneath they have fine tuned the implementations, to use as much resources as possible.
Hope this would help. Below are couple of different implementations.
disp('1')
tic
a=rand(4000,4000);
k=size(a,1);
for i=1:k
    for j=1:k
        a(i,j)=2/(2+a(i,j));
    end
end
toc
disp('2')
tic
a=rand(4000,4000);
k=size(a,1);
parfor i=1:k
    for j=1:k
        a(i,j)=2/(2+a(i,j));
    end
end
toc
disp('3')
tic
a=rand(4000,4000);
k=size(a,1);
for j=1:k
    for i=1:k
        a(i,j)=2/(2+a(i,j));
    end
end
toc
disp('4')
tic
a=rand(4000,4000);
k=size(a,1);
parfor j=1:k
    for i=1:k
        a(i,j)=2/(2+a(i,j));
    end
end
toc
disp('5')
tic
a=rand(4000,4000);
a=2./(2+a);
toc
1
Elapsed time is 0.884058 seconds.
2
Elapsed time is 8.502811 seconds.
3
Elapsed time is 0.451835 seconds.
4
Elapsed time is 3.407372 seconds.
5
Elapsed time is 0.216000 seconds.
5 commentaires
Plus de réponses (0)
Voir également
Catégories
				En savoir plus sur Loops and Conditional Statements dans Help Center et File Exchange
			
	Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



