Efficient Way To Split Dataset Into Subsets
Afficher commentaires plus anciens
Hello,
I need to split a large dataset (DxN numeric array) into multiple subsets. I can use the code below (where groupIDs is an Nx1 matrix of integer IDs - the group to which each datapoint belongs).
groups = unique(groupIDs);
for i = 1:numel(groups)
tempData = data(:,groupIDs==groups(i));
%do work on tempData
end
However, 90% of the run time of the above code is spent just creating tempData! That amounts to over a minute every time I want to do this. Is there a more efficient way to split data by groupIDs? I tried splitapply() but it doesn't seem to be any faster.
Are there any matlab gurus out there that know a trick? Thanks!
5 commentaires
Jos (10584)
le 24 Nov 2017
how large is "large"?
E
le 24 Nov 2017
Use the second (or third? - I always have to guess and check between the two) output of unique(groupIDs).
Edit: This likely isn't faster, you still need a comparison check inside the loop. I always forget that part about the third output of unique.
Jos (10584)
le 24 Nov 2017
12Gb? That is quite a lot. If this doesn't fit in memory, swapping to disk is the likely bottleneck ...
E
le 26 Nov 2017
Réponses (0)
Catégories
En savoir plus sur Structures dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!