How can I use repeated, k-fold cross-validation results with rocmetrics?

3 vues (au cours des 30 derniers jours)
Thomas Kirsh
Thomas Kirsh le 6 Oct 2023
Commenté : the cyclist le 11 Oct 2023
I have 10-repeat 5-fold cross-validation scores and labels for a model that I'm trying to efficiently plot ROC curves for usine rocmetrics. When I run the line
robj = rocmetrics(target, prediction, 1);
I get the error
Error using rocmetrics>validateScoresLabelsAndWeights
The cell array of cross-validated scores must be a vector.
Each cell in target and prediction are double arrays of shapes 54x1 or 55x1. The shapes match cell to cell between both. I'm confused by this error because it's clear that the cell array of my scores(predictions) are vectors. I think the issue is the repeated cross-validation. How can I format my target and prediction in order to use rocmetrics with my results?

Réponses (1)

the cyclist
the cyclist le 6 Oct 2023
Note the following line from the rocmetrics documentation:
"For cross-validated data, you must specify Labels, Scores, and Weights as cell arrays with the same number of elements. rocmetrics treats an element in the cell arrays as data from one cross-validation fold and computes pointwise confidence intervals for the performance metrics. The length of Labels{i} and the number of rows in Scores{i} must be equal."
You need to supply the fold weights.
Alternatively, you could loop over the folds to see ROC metrics on each fold, or decide prior to calling rocmetrics how you want to combine the folds into a single prediction.
  7 commentaires
Thomas Kirsh
Thomas Kirsh le 11 Oct 2023
Thank you, that's a good workaround! My only concern is thinking about if this gives an accurate mean ROC for my experiment. Wouldn't it make more sense to concatenate the folds first and then reshape?
the cyclist
the cyclist le 11 Oct 2023
I have to admit that I don't really have experience specifically with repeated k-fold cross-validation, so I don't know what is conventional in terms of combining information from repeats and folds. My impression is that one treats it as M*k results, which is what my code is doing. I don't think concatenating folds is typically done, because that would look like you had a dataset that was k times larger.
Also ... I hope the data you posted isn't your real data. The model performance is no better than random.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Statistics and Machine Learning Toolbox dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by