Parpool slow with chol operation
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Francesco C
le 18 Août 2016
Commenté : Francesco C
le 25 Août 2016
Hi, I found a bottleneck in my code, and I can't understand what is happening. I tried this in several computers and matlab versions (2014 and 2016) , and I found more or less the same pattern.
Suppose you have a large matrix operation, like a Choleski Decomposition of a large matrix. Well, if I parallelize across several cores (no matter how many, 4 ,8 ,16, regardless of the amount), I get that the single operation runs much slower. Example: Create this objects
z=randn(5200); Z=z*z'; zm=randn(1,5200);
open your parpool, and run a Chol. Decomp. in a parfor, you get for instance 3 seconds.
tic; parfor j=1:16; chol(Z,'lower'); end; toc
Now if you do the same in a standard sequential loop, you get the same result in similar time (or even a bit less)!!!
tic; for j=1:16; chol(Z,'lower'); end; toc
Why does this happen? What can I do to parallelize this code (my package does lot of things more than just a Choleski...) without penalizing so much the performance of a single task?
Many thanks in advance.
2 commentaires
José-Luis
le 18 Août 2016
Modifié(e) : José-Luis
le 18 Août 2016
I am not sure I follow. How do you think you are parallelizing the decomposition? To me it looks like each parfor loop is computing the decomposition: you are just doing the same operation multiple times instead of dividing one operation between multiple threads.
I don't think you can achieve what you want with parfor in this case.
Réponse acceptée
Edric Ellis
le 23 Août 2016
MATLAB's chol implementation is intrinsically multi-threaded. Therefore, chol is already fully utilising all the cores on your machine. If you have only local workers available, then you can't do any better than that. (Also note that Parallel Computing Toolbox workers run in a single-threaded mode by default to avoid oversubscribing the cores on your machine - which is why a single invocation of chol is slower inside parfor).
The only way to go faster is to use additional hardware - in the form of MATLAB Distributed Computing Server workers on additional machines.
Plus de réponses (1)
Matt J
le 18 Août 2016
Modifié(e) : Matt J
le 18 Août 2016
The problem is that your Z-matrices need to be cloned and broadcast to each of the workers, which carries considerable overhead given their 5200x5200 size. If you create the Z matrices on the workers themselves, you get a more favorable comparison, e.g.,
>> tic; for j=1:16; z=randn(5200); Z=z*z'; chol(Z,'lower'); end; toc
Elapsed time is 29.711892 seconds.
>> tic; parfor j=1:16; z=randn(5200); Z=z*z'; chol(Z,'lower'); end; toc
Elapsed time is 18.471961 seconds.
Voir également
Catégories
En savoir plus sur Parallel Computing Fundamentals dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!