Explicit indices for k-fold partitioning

Is there any way to explicity provide the indices of each partition in a k-fold partition? I'd like to find optimal hyperparameters, but all the methods seem to either sequentially or randomly divide up the data. My data evolves over time, where each time step has a different number of observations. Doing things either sequentially or randomly results in 'looking into the future'. I'd like the partitions to reflect the information I have up to that time, and predict the response for next time to obtain a kfoldloss.
(Time itself has no relevance however, so this isn't amenable to time-series type analysis. It's a classification problem)
thanks in advance
anthony

8 commentaires

Anthony Diaco
Anthony Diaco le 11 Sep 2020
Well yes, I know how to actually partition the data. The problem is you can't throw a cell array of separate data sets into fitcensemble and have it calculate a kfold loss across the entire thing
Adam Danz
Adam Danz le 11 Sep 2020
It sounds like you need to train on partition n-1 and test on partition n where the data in n-1 occured before n. Is that correct?
Anthony Diaco
Anthony Diaco le 11 Sep 2020
yes, that's exactly correct. But I need to optimize over all N.
Adam Danz
Adam Danz le 11 Sep 2020
If you group the data by temporal segments using a grouping variable, I think the stratified partitions I mentioned in my answer is the way to go but I haven't done what you're doing so I can't be certain.
Anthony Diaco
Anthony Diaco le 11 Sep 2020
I was just looking at doing it that way. It's not clear to me from the documentation what exactly it does with each 'group'. I'm not sure it trains on each one separately, which is what I would need. I'll give it a shot.
Adam Danz
Adam Danz le 14 Sep 2020
Anthony Diaco, I looked deeper into this today. With stratified sampling, the partitions ensure that each group is represented equally or close to equal. It doesn't sound that that's what you're looking for.
I think you should make your own partitions.
I'll update my answer with more detail.
Anthony Diaco
Anthony Diaco le 14 Sep 2020
Yes I came to the same conclusion. Didn't even bother testing it. I can easily make my own partitions. Indeed they're already made. The question is how can I input them into fitcensemble? There should be a way to just say what you want the test sets to be. I can't figure it out. I'd like to use matlabs builtin kfoldloss functionality and ideally their optimizable ensemble methodology. But right now i'm just using my own loop that i know isn't the best.
Anthony Diaco
Anthony Diaco le 14 Sep 2020
thanks so much for your attention on this btw!

Réponses (1)

Adam Danz
Adam Danz le 11 Sep 2020
Modifié(e) : Adam Danz le 14 Sep 2020
Perhaps something like
x = 1:100; % demo vector
k = 5; % 5-partitions
folds = cell(k,1);
for i = 1:k
folds{i} = x(i:k:end);
end
Though, those partitions are far from randomized but they maintain temporal order. To fix that, you could 1) create a grouping variable for each segment, randomize the segments, and the execute the loop above on the randomized segments.
Alternatively, you could use stratified sampling within subgroups using
but that only ensure that each group is represented equally, it will not maintain the temporal order of your data.

1 commentaire

Anthony Diaco
Anthony Diaco le 11 Sep 2020
thanks. i'll investigate that. any other ideas are welcome. i just don't see why there wouldn't be an easy way to just input the indices you want in each partition

Cette question est clôturée.

Clôturé :

le 20 Août 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by