Would kfold loss values vary if cross validation is performed after model training?

Question

Charles Bergen le 9 Mai 2025

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2177046-would-kfold-loss-values-vary-if-cross-validation-is-performed-after-model-training

Modifié(e) : the cyclist le 10 Mai 2025

I am concerned about the difference in cross validated (CV) predictions (kfoldpredict) in regression bagged ensembles (fitrensemble) if CV is performed after a model has been trained. If I understand this correctly, a fitrensemble model without CV will have access to all available variables in a data set. Thus generated trees will have a unique set of node split values different from node split values found in trees generated from a fitrensemble with CV on. Differences in these split values would then lead to an overall difference in possible outcomes for constructed trees in both models.

I guess this would boil down to, does the crossval and subsequent kfoldloss or kfoldpredict (really any CV predict functions) functions account for these differences when supplied a model that did not peform initial cross validation?

If there is an error in my thoughts, please let me know.

I tried to supply an example of my question below.

% No initial CV

Mdl = fitrensemble(looperValues(:,1:cherrios), allratios2,... 'Learners',t,'Weights',W1,'Method','Bag','NumLearningCycles',numblearningcyc,'Options',statset('UseParallel',true));

Mdl_CV_After_Training = crossval(MdllooperPhyschemMexB, 'KFold', 10);

Mdl_CV_After_Training_kfold_predictions = kfoldpredict(Mdl_CV_After_Training)

VS

% Yes initial CV

Mdl = fitrensemble(looperValues(:,1:cherrios), allratios2, 'Learners', t, 'Crossval', 'On','Weights',W1,'Method','Bag','NumLearningCycles',numblearningcyc,'Options',statset('UseParallel',true));

Mdl_Yes_CV_kfold_predictions = kfoldpredict(Mdl_CV_After_Training)

% Would Mdl_CV_After_Training_kfold_predictions == Mdl_Yes_CV_kfold_predictions?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

the cyclist le 9 Mai 2025

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2177046-would-kfold-loss-values-vary-if-cross-validation-is-performed-after-model-training#answer_1564992

Ouvrir dans MATLAB Online

The predictions will be identical, as long as you use the same fold assignments:

% Set seed, for reproducibility
rng default
% Simulate some data
N = 100;
X = randn(N,3);
y = sum(X+0.5*randn(N,1),2);
% Define a partition (which will be used for both models)
p = cvpartition(N,'KFold',10);
% Train one model using cross-validation during training
mdl_1 = fitrensemble(X,y,'CrossVal','on','CVPartition',p);
% Train a second model without using cross-validation during training, but apply it afterward
mdl_2 = fitrensemble(X,y);        
mdl2_cv = crossval(mdl_2,'CVPartition',p);
% Make the k-fold predictions
y1 = kfoldPredict(mdl_1);
y2 = kfoldPredict(mdl2_cv);
% See if they are equal -- THEY ARE!
isequal(y1,y2)
ans = logical
   1

If you do not make sure the two models use exactly the same fold assignments, the predictions will not be identical, but they will be statistically equivalent.

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

the cyclist le 9 Mai 2025

Modifié(e) : the cyclist le 10 Mai 2025

Ouvrir dans MATLAB Online

To make an analogy ...

If you used

N = 1000;
x1 = randn(N,1);
x2 = randn(N,1);

to draw two samples of (pseudo)randomly generated values from a normal distribution, you would not expect those to be identical samples unless you set the seed each time, to get the same sequence. However you would expect the two samples to have the same statistical properties (the same within sampling error). Same mean, standard deviation, etc.

Similarly, I would not expect your predictions to be identical, but for all properites to be the same to within sampling error.

Charles Bergen le 9 Mai 2025

I appreciate the insight.

Connectez-vous pour commenter.

Would kfold loss values vary if cross validation is performed after model training?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Would kfold loss values vary if cross validation is performed after model training?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

3 commentaires Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien