Handling memory when working with very huge data (.mat) files.

4 vues (au cours des 30 derniers jours)
Luqman Saleem
Luqman Saleem le 29 Août 2024
Commenté : Luqman Saleem le 31 Août 2024
I am working with two 5D arrays (A5D and B5D) saved in a big_mat_file.mat file. The size of these arrays is specified in the code below. The total size of big_mat_file.mat file is around 20GB. I want to perform three simple operations on these matrices, as shown in the code. I have access to my university's computing cluster. When I run the following code with 120 workers and 400GB of memory, I receive the following error
In distcomp/remoteparfor/handleIntervalErrorResult (line 245) In distcomp/remoteparfor/getCompleteIntervals (line 395) In parallel_function>distributed_execution (line 746) In parallel_function (line 578)
Can someone please help me understanding what is causing this error. Is it because of low memory? It there anyother way to do the following operattions?
clear; clc;
load("big_mat_file.mat");
% it has two very huge 5D arrays "A5D" and "B5D", and two small arrays "as" and "bs"
% size of both A5D and B5D is [41 16 8 80 82]
% size of "as" is [1 80] and size of "bs" is [1 82]
xs = -12:0.1:12;
NX = length(xs);
ys = 0:0.4:12;
NY = length(ys);
total_iterations = NX * NY;
results = zeros(total_iterations , 41 , 16, 8);
XXs = zeros(total_iterations, 1);
YYs = zeros(total_iterations, 1);
parfor idx = 1:total_iterations
[ix, iy] = ind2sub([NX, NY], idx);
x = xs(ix);
y = ys(iy);
term1 = 1./(exp(1/y*(A5D-x)) + 10); %operation 1
to_integrate = B5D.*term1; %operation 2
XXs(idx) = x;
YYs(idx) = y;
results(idx, :, :, :) = trapz(as,trapz(bs,to_integrate,5),4); %operation 3
end
XXs = reshape(XXs, [NX, NY]);
YYs = reshape(YYs, [NX, NY]);
results = reshape(results, [NX, NY, 41, 16, 8]);
clear A5D B5D
save('saved_data.mat','-v7.3');

Réponse acceptée

Saurabh
Saurabh le 30 Août 2024
Modifié(e) : Saurabh le 30 Août 2024
It seems like when you are performing some operation on Big Data which is 5D array and size 20GB accessing the university’s computing cluster, you encounter an error.
A heterogenous environment would be a cause of this issue.
The above link is a system requirement of Parallel Server, not “Parallel Computing Toolbox”, but it says an important point;
"Parallel processing constructs that work on the infrastructure enabled by parpool—parfor, parfeval spmd, distributed arrays, and message passing functions—cannot be used on a heterogeneous cluster configuration. The underlying MPI infrastructure requires that all cluster computers have matching word sizes and processor endianness."
The same Information can be found here:
If this is not the case then try changing the "worker" machine to a larger memory per core (in your case each worker will be allocated roughly 3-3.5GB), if this solves the issue, then the "workers" must have had insufficient memory.
If this is the case you can refer to below link, for troubleshooting steps:
I hope this helps.
  1 commentaire
Luqman Saleem
Luqman Saleem le 31 Août 2024
Thank you very much. It was the memory problem. Using the less number of workers worked.

Connectez-vous pour commenter.

Plus de réponses (1)

Sam Marshalik
Sam Marshalik le 30 Août 2024
You are likely running out of memory on the workers. You are not using sliced input variables (Sliced Variables - MATLAB & Simulink (mathworks.com) to access the 5D matrices and are sending the entire copy to each worker. They are likely big enough that you are running out of memory on those machines. I would suggest to run less workers (to give them access to more memory per worker), try using sliced input variables and pass only part of the matrix to the workers, or run on machines with more memory.
To test this theory, you can run your work and monitor memory usage on those machines - if this is the issue, you should see it max out.
  1 commentaire
Luqman Saleem
Luqman Saleem le 31 Août 2024
Thank you. It was the memory problem. Using the less number of workers worked.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Parallel for-Loops (parfor) dans Help Center et File Exchange

Produits


Version

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by