Using memory allocation to split table
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Below is an example of splitting tables based on chunkSize and saving them as individual .mat files. The chunksize is number of rows defined.
How do I use the same exact concept but instead of using number of rows as a basis, I want to use memory size as unit to split.
(I do not have control over data creation so I cannot go to source and split it there).
%%%% Example code
load patients
%% loading vars
patients = table(LastName,Gender,Age,Height,Weight,Smoker,Systolic,Diastolic);
chunkSize = 28; % chunk size from number of rows
noOfChunks = ceil(size(patients,1) / chunkSize);
%% To Output chunks
for idx = 1:noOfChunks
if idx == noOfChunks
data = patients(1:end,:);
patients(1:end,:) = [];
else
data = patients(1:chunkSize,:);
patients(1:chunkSize,:) = [];
end
% Save data
savefile = strcat('data',num2str(idx));
save(savefile, 'data');
end
2 commentaires
Walter Roberson
le 6 Jan 2021
patients(1:chunkSize,:) = []; % delete rows to save memory
That does not save memory.
You already have the entire patients table in memory, so you do not need to save memory in order to make room to add more entries.
So what is happening instead is that in order to do the deletions, MATLAB is having to take a copy of the patients table without the indicated rows, and then replace the patients table with the new version and release the old version. This requires tempory copies of the table, repeatedly, for no intermediate benefit other than making the code marginally easier (because you can use fixed indices.)
Perhaps it is worthwhile clearing the entire patient table after you have saved chunks of it, but not otherwise -- not unless you were also growing the table at the same time through some process.
Réponse acceptée
Walter Roberson
le 6 Jan 2021
Is LastName a fixed width character array, or is it a cell array of character vectors or is it a string array?
If it is not a fixed width character array, then you cannot predict the memory requirements of each row, and have to query the data to find out the memory requirements. It can be done: probably the easiest way would be to
name_bytes = cellfun(@length, patients.LastName)*2 + 104
The 104 is the basic size need per cell array entry, to which you have to add the number of bytes occupied by the characters, at 2 bytes per character position.
Gender,Age,Height,Weight,Smoker,Systolic,Diastolic
Those look to me to be fixed number of bytes per entry -- though it would depend on how the Systolic and Diastolic are recorded.
It looks to me as if the basic size of a table is 768 bytes, plus 210 bytes per variable.
2 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Data Distribution Plots dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!