loading a sequence of datasets using parfor
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I want to run a code that,
1) At each iteration loads a .mat (containing an image)
2) Then convert the arrary into a gpuArray and apply some function.
somthing like this:
total=100;
tic
for Iteraciones=1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Prueba Datos';
A=load(fullfile(sample_folder,fname));
B=A.roiwide;
C=B^4;
end
t_forCPU=toc
tic
for Iteraciones=1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Prueba Datos';
A=load(fullfile(sample_folder,fname));
B=A.roiwide;
B=gpuArray(B);
wait(gpuDevice)
C=B^4;
end
t_forGPU=toc
why if I replase for by parfor is slower?
why converting to gpuArray takes more tieme? that is: t_forGPU is slower than t_forCPU
1 commentaire
Réponses (2)
Walter Roberson
le 9 Avr 2021
You need to synchronize with the GPU, and spend time transfering data to it, and wait for it to be ready, and read the data back. Those all take time.
In order to have a gain of speed, the time the GPU would spend doing the operation must be less than the time the CPU would do the operation by enough to make up for the overheads.
If you do not send the GPU a big enough chunk of work, the overhead is going to be too costly.
C=B^4;
It is not clear whether B is a scalar, or is a square matrix? If it is a scalar, then you can be certain that the overheads are much higher than the performance gain from using the GPU.If it is a square matrix, then whether it is a gain or not is going to depend on the size.
for Iteraciones=1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Prueba Datos';
A=load(fullfile(sample_folder,fname));
Remember that you are timing the load() as well as the GPU. To be more fair in comparing the two, you should loop loading the data into a cell array first, before doing the timing, and then time only the computational loop.
0 commentaires
Oscar Martinez
le 12 Avr 2021
1 commentaire
Walter Roberson
le 12 Avr 2021
There are potential trade-offs about when you gather and when you create the arrays that are worth testing.
Some of the below approaches have the risk of exceeding GPU memory, so I included code to remove the entries from the GPU in the timing (as that is overhead you need to take into account)
total=100;
GC = cell(total,1);
for Iteraciones = 1:total
fname = strcat('roisandparameters',num2str(Iteraciones),'.mat');
sample_folder='C:\Users\User\Dropbox\oscar\particulas\2021\tutorial_parallele_computing\Data';
A=load(fullfile(sample_folder,fname));
B=A.roiwide;
CG{Iteraciones}=B;
end
%how long by CPU?
C = cell(total,1);
tic
for Iteractions = 1 : total
C{Iteractions} = CG{Interactions}^4;
end
time_by_cpu = toc;
%common GPU processing: do operations on data and gather() the results
C = cell(total,1);
tic
for Iteractions = 1 : total
B = gpuarray(GC{Interactions});
C{Iteractions} = gather(B^4);
end
clear B
time_by_gpu1 = toc;
%potential GPU processing: postpone the gather()
B = cell(total,1);
tic
for Iteractions = 1 : total
B{Interactions} = gpuarray(GC{Interactions})^4;
end
C = cell(total,1);
for Iteractions = 1 : total
C{Interactions} = gather(B{Interactions});
end
clear B
time_by_gpu2 = toc;
%potential GPU processing: create all the GPU arrays first
B = cell(total,1);
tic
for Iteractions = 1 : total
B{Interactions} = gpuarray(GC{Interactions});
end
C = cell(total,1);
for Iteractions = 1 : total
C{Interactions} = B{Interactions}^4;
end
D = cell(total,1);
for Interactions = 1 : total
D{Interactions} = gather(C{Interactions});
end
clear B C
time_by_gpu3 = toc;
%potential GPU processing: gather all at the end
B = cell(total,1);
tic
for Iteractions = 1 : total
B{Interactions} = gpuarray(GC{Interactions})^4;
end
C = cell(total,1);
[C{:}] = gather(B{:});
clear B
time_by_gpu4 = toc;
Voir également
Catégories
En savoir plus sur GPU Computing dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!