GPU and CPU Parallelization and Bicg Optimization

Question

0 votes

I use a matlab script to solve a big matrix using bicg function. Simply my code looks something like this:

for i=1:n
    ...
    [Pvect] = bicg(AS, BS, tol, maxit,L,U); %where AS, BS, L, and U are different in each loop
    %AS is a 10^6x10^6 sparse complex double
    %BS=10^6x1 is a sparse complex double
    %L&U are 10^6x10^6 sparse sparse complex double
    ...
end

Every for loop is independent. I recently parallelized this script by using parfor. The computer I use has 128 CPU cores, but I noticed that using parpool(anything more than 32) the local workers are exhausted (i.e., the code run time does not decrease significantly). However, I usually use n=32 (i.e., run the for script for 32 different scenarios), so this is not a big issue for me. The code currently looks something like this:

parpool(32)
parfor i=1:n
    ...
    [Pvect] = bicg(AS, BS, tol, maxit,L,U); %where AS, BS, L, and U are different in each loop
    %AS is a 10^6x10^6 sparse complex double
    %BS=10^6x1 is a sparse complex double
    %L&U are 10^6x10^6 sparse sparse complex double
    ...
end

I want to further speed up the code using gpuArray (which is supported on bicg). The main reason for that I also use another script where I run the bicg function sequentially many times. So in that case n is 1, but running it many times makes it computationally expensive. However, if possible, I also want to use gpuArrays for cases where n is 32 or more (i.e., the code described above).

I checked the documentation and other user questions, however, I am a little lost on how to utilize cpu and gpu power concurrently. The computer I use has 3 GPU's that I can utilize.

- Should I try to use only the GPUs for both the parfor loop and solution of bicg?

- Or should I run the parfoor loop with CPU power and use all the GPUs for solution of bicg? If so how can do this? As far as I understood, GPU resources will be distributed to each worker in this case.

- Or what would be your suggestion on doing this properly? Thank you very much for any kind of guidance in advance!

The computer that I use is the following (I can also try to use 2 of these computers/nodes in the future. Do you think that would help with any of the scenarios described above?):

GPU: 	3x NVIDIA A100 PCIE 40GB
(1 per socket )
gpu0: socket 0
gpu1: socket1
gpu2: socket1
GPU Memory: 	40 GB HBM2
CPU:  	2x AMD EPYC 7763 64-Core Processor ("Milan")
Total cores per node:  	128 cores on two sockets (64 cores / socket )
Hardware threads per core:  	1 per core
Hardware threads per node:  	128 x 1 = 128
Clock rate:  	2.45 GHz
RAM:  	256 GB
Cache:  	32KB L1 data cache per core
512KB L2 per core
32 MB L3 per core complex
(1 core complex contains 8 cores)
256 MB L3 total (8 core complexes )
Each socket can cache up to 288 MB
(sum of L2 and L3 capacity)
Local storage:  	144GB /tmp partition on a 288GB SSD.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Alvaro le 26 Jan 2023

1 vote

You cannot run a parfor loop in a GPU, but you can have each worker access a GPU to perform computations.

https://www.mathworks.com/matlabcentral/answers/36235-parfor-on-gpu#answer_45381

The documentation shows how to assign GPUs to workers but whether each worker needs its own GPU is not straightforward.

https://www.mathworks.com/help/parallel-computing/run-matlab-functions-on-multiple-gpus.html#MultiGPUExample-2

https://www.mathworks.com/matlabcentral/answers/120592-can-i-use-gpu-computing-inside-of-a-parfor-loop#answer_127456

https://www.mathworks.com/matlabcentral/answers/1779625-run-a-function-using-gpu-inside-a-parfor-loop-or-a-for-loop

To have bicg use the resources of GPU simply pass the arguments as gpuArray.

https://www.mathworks.com/help/parallel-computing/run-matlab-functions-on-a-gpu.html

Note that there are some limitations.

https://www.mathworks.com/help/matlab/ref/bicg.html#refsect-extended-capabilities

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Zulkuf Azizoglu le 26 Juin 2023

Thank you!

Connectez-vous pour commenter.

GPU and CPU Parallelization and Bicg Optimization

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

GPU and CPU Parallelization and Bicg Optimization

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens