Distributed array to solve a system of linear equations on a cluster

6 vues (au cours des 30 derniers jours)
Melissa Zirps
Melissa Zirps le 2 Déc 2021
Commenté : Oli Tissot le 12 Jan 2023
I'm trying to solve a system of linear equations in parallel on a computing cluster using iterative methods and distributed arrays. Right now my code looks like:
cores = 42;
cluster = parcluster;
parpool(cluster,cores);
K_solve_dist = distributed(K_solve);
force_vec_solv_dist = distributed(force_vec_solv);
[res_disp, flag_solv{ii,kk}(n,1)] = cgs(K_solve_dist,force_vec_solv_dist,tol_iter,max_iter);
However, regardless of how many cores I use, the run time seems to stay they same (this runtime is also the same as if I don't use distributed arrays at all). If I run it without the line "parpool(cluster,cores)" it runs almost 50% faster, but only uses 12 cores, even though there are more cores available. I'm trying to figure out if there's a way to use more than 12 cores and speed up the time it takes to preform this calculation.

Réponses (2)

Sam Marshalik
Sam Marshalik le 7 Déc 2021
Hey Melissa,
I would not think of distributed arrays as a way to speed up your computation. Distributed arrays are useful for when something does not fit into your machine's memory, so you spread the content of the matrix across multiple machines. This will not cause the code to run faster and in fact, like you saw, will probably run it slower, since you are introducing the overhead of communication into the equation.
It is worth pointing out that using distributed arrays on a single machine will not give you any benefit, since you are still limited to that one computer's hardware. If you have access to MATLAB Parallel Server on your cluster, then using distributed arrays with your computation will be helpful.
In short, you will truly see the benefit of using distributed arrays when you are working with very large data that can't fit on one machine. If you want to try to speed things up, you will want to take a look at things such as parfor, parfeval, gpuArray, and such parallel constructs.
  2 commentaires
Melissa Zirps
Melissa Zirps le 7 Déc 2021
Hi Sam,
Thanks for your response. However, as far as I can see, parfor, parfeval, and gpuArray can't be used to solve a system of linear equations? Is there a way to use parallelization while solving a system of linear equations?
Joss Knight
Joss Knight le 10 Déc 2021
gpuArray supports all the iterative solvers including cgs. However, it is mainly optimized for sparse matrices. If your matrix is dense you'll be better off using a direct solver (i.e. mldivide). This is of course also supported by gpuArray.

Connectez-vous pour commenter.


Eric Machorro
Eric Machorro le 11 Jan 2023
Piggy-backing on this question:
Setting aside the speed-up factor momentarily, How can I use CGS (or almost any Kylov type solver for that matter) with very big/long vectors which I cant hold all in memory? There are four variants to the problem
  1. I have a sparse matrix
  2. I have a symmetric, sparse matric (think Cholesky factor)
  3. I have a function handle that serves as the matrix-operator
  4. (revisting the speed up issue) I'd like to use this also in conjunction with non-GPU parallelization. Is this possible?
Does anyone have advice on any one of these?
Respectfully,
  1 commentaire
Oli Tissot
Oli Tissot le 12 Jan 2023
All the Krylov methods are supported for distributed arrays, so 1. and 3. are supported just as MATLAB does ; there is also the possibility to implement your own preconditioner through a function handle as well. However 2. is not supported as-this because there is no built-in notion of "symmetric matrix" in MATLAB and for MATLAB a Cholesky factor is not a symmetric matrix but a triangular matrix -- basically there is an ambiguity between symmetric and triangular and MATLAB considers those matrices as triangular and not symmetric. Of course, 3. is more generic than 2. so you can achieve 2. using 3. and implementing it yourself, if that makes sense.
Regarding 4., distributed arrays are multi-threaded: they'll use NumThreads as per your cluster profile configuration. Note the default is set to 1, so no multi-threading.
To use the distributed version, you simply need to call the Krylov method you'd like to use but with A being a distributed array.
Finally regarding speed-up and performance, the most consuming operations are usually the matrix-vector product and the preconditioner. If you know a clever way to apply your operator, you should use it. Same if you know a clever problem-specific preconditioner. There is usually a non-trivial balance to find between a extremely good preconditioner that is very costly to apply (extreme case here is \) and a poor preconditioner that will lead to a very poor convergence or even no convergence to the prescribed accuracy (extreme case here is no preconditioner at all).

Connectez-vous pour commenter.

Catégories

En savoir plus sur Parallel Computing Fundamentals dans Help Center et File Exchange

Produits


Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by