crossval
Cross-validate machine learning model
Description
specifies additional options using one or more name-value arguments. For example, you can
specify the fraction of data for holdout validation, and the number of folds to use in the
cross-validated model.CVMdl
= crossval(Mdl
,Name=Value
)
Examples
Load the ionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
load ionosphere rng(1); % For reproducibility
Train a support vector machine (SVM) classifier. Standardize the predictor data and specify the order of the classes.
SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});
SVMModel
is a trained ClassificationSVM
classifier. 'b'
is the negative class and 'g'
is the positive class.
Cross-validate the classifier using 10-fold cross-validation.
CVSVMModel = crossval(SVMModel)
CVSVMModel = ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {'x1' 'x2' 'x3' 'x4' 'x5' 'x6' 'x7' 'x8' 'x9' 'x10' 'x11' 'x12' 'x13' 'x14' 'x15' 'x16' 'x17' 'x18' 'x19' 'x20' 'x21' 'x22' 'x23' 'x24' 'x25' 'x26' 'x27' 'x28' 'x29' 'x30' 'x31' 'x32' 'x33' 'x34'} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1×1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods
CVSVMModel
is a ClassificationPartitionedModel
cross-validated classifier. During cross-validation, the software completes these steps:
Randomly partition the data into 10 sets of equal size.
Train an SVM classifier on nine of the sets.
Repeat steps 1 and 2 k = 10 times. The software leaves out one partition each time and trains on the other nine partitions.
Combine generalization statistics for each fold.
Display the first model in CVSVMModel.Trained
.
FirstModel = CVSVMModel.Trained{1}
FirstModel = CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78×1 double] Bias: -0.2209 KernelParameters: [1×1 struct] Mu: [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 0.1286 0.5083 0.1879 0.4779 0.1567 0.3924 0.0875 0.3360 0.0789 0.3839 9.6066e-05 0.3562 -0.0308 0.3398 -0.0073 0.3590 -0.0628 0.4064 -0.0664 0.5535 -0.0749 0.3835 … ] (1×34 double) Sigma: [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 0.5205 0.5040 0.4780 0.5649 0.4896 0.6293 0.4924 0.6606 0.4535 0.6133 0.4878 0.6250 0.5140 0.6075 0.5150 0.6068 0.5222 0.5729 0.5103 0.5061 0.5478 0.5712 0.5032 … ] (1×34 double) SupportVectors: [78×34 double] SupportVectorLabels: [78×1 double] Properties, Methods
FirstModel
is the first of the 10 trained classifiers. It is a CompactClassificationSVM
classifier.
You can estimate the generalization error by passing CVSVMModel
to kfoldLoss
.
Specify a holdout sample proportion for cross-validation. By default, crossval
uses 10-fold cross-validation to cross-validate a naive Bayes classifier. However, you have several other options for cross-validation. For example, you can specify a different number of folds or a holdout sample proportion.
Load the ionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
load ionosphere
Remove the first two predictors for stability.
X = X(:,3:end); rng('default'); % For reproducibility
Train a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. 'b'
is the negative class and 'g'
is the positive class. fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});
Mdl
is a trained ClassificationNaiveBayes
classifier.
Cross-validate the classifier by specifying a 30% holdout sample.
CVMdl = crossval(Mdl,'Holdout',0.3)
CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {'x1' 'x2' 'x3' 'x4' 'x5' 'x6' 'x7' 'x8' 'x9' 'x10' 'x11' 'x12' 'x13' 'x14' 'x15' 'x16' 'x17' 'x18' 'x19' 'x20' 'x21' 'x22' 'x23' 'x24' 'x25' 'x26' 'x27' 'x28' 'x29' 'x30' 'x31' 'x32'} ResponseName: 'Y' NumObservations: 351 KFold: 1 Partition: [1×1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods
CVMdl
is a ClassificationPartitionedModel
cross-validated, naive Bayes classifier.
Display the properties of the classifier trained using 70% of the data.
TrainedModel = CVMdl.Trained{1}
TrainedModel = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1×32 cell} DistributionParameters: {2×32 cell} Properties, Methods
TrainedModel
is a CompactClassificationNaiveBayes
classifier.
Estimate the generalization error by passing CVMdl
to kfoldloss
.
kfoldLoss(CVMdl)
ans = 0.2095
The out-of-sample misclassification error is approximately 21%.
Reduce the generalization error by choosing the five most important predictors.
idx = fscmrmr(X,Y); Xnew = X(:,idx(1:5));
Train a naive Bayes classifier for the new predictor.
Mdlnew = fitcnb(Xnew,Y,'ClassNames',{'b','g'});
Cross-validate the new classifier by specifying a 30% holdout sample, and estimate the generalization error.
CVMdlnew = crossval(Mdlnew,'Holdout',0.3);
kfoldLoss(CVMdlnew)
ans = 0.1429
The out-of-sample misclassification error is reduced from approximately 21% to approximately 14%.
Train a regression generalized additive model (GAM) by using fitrgam
, and create a cross-validated GAM by using crossval
and the holdout option. Then, use kfoldPredict
to predict responses for validation-fold observations using a model trained on training-fold observations.
Load the patients
data set.
load patients
Create a table that contains the predictor variables (Age
, Diastolic
, Smoker
, Weight
, Gender
, SelfAssessedHealthStatus
) and the response variable (Systolic
).
tbl = table(Age,Diastolic,Smoker,Weight,Gender,SelfAssessedHealthStatus,Systolic);
Train a GAM that contains linear terms for predictors.
Mdl = fitrgam(tbl,'Systolic');
Mdl
is a RegressionGAM
model object.
Cross-validate the model by specifying a 30% holdout sample.
rng('default') % For reproducibility CVMdl = crossval(Mdl,'Holdout',0.3)
CVMdl = RegressionPartitionedGAM CrossValidatedModel: 'GAM' PredictorNames: {'Age' 'Diastolic' 'Smoker' 'Weight' 'Gender' 'SelfAssessedHealthStatus'} CategoricalPredictors: [3 5 6] ResponseName: 'Systolic' NumObservations: 100 KFold: 1 Partition: [1×1 cvpartition] NumTrainedPerFold: [1×1 struct] ResponseTransform: 'none' IsStandardDeviationFit: 0 Properties, Methods
The crossval
function creates a RegressionPartitionedGAM
model object CVMdl
with the holdout option. During cross-validation, the software completes these steps:
Randomly select and reserve 30% of the data as validation data, and train the model using the rest of the data.
Store the compact, trained model in the
Trained
property of the cross-validated model objectRegressionPartitionedGAM
.
You can choose a different cross-validation setting by using the 'CrossVal'
, 'CVPartition'
, 'KFold'
, or 'Leaveout'
name-value argument.
Predict responses for the validation-fold observations by using kfoldPredict
. The function predicts responses for the validation-fold observations by using the model trained on the training-fold observations. The function assigns NaN
to the training-fold observations.
yFit = kfoldPredict(CVMdl);
Find the validation-fold observation indexes, and create a table containing the observation index, observed response values, and predicted response values. Display the first eight rows of the table.
idx = find(~isnan(yFit)); t = table(idx,tbl.Systolic(idx),yFit(idx), ... 'VariableNames',{'Obseraction Index','Observed Value','Predicted Value'}); head(t)
Obseraction Index Observed Value Predicted Value _________________ ______________ _______________ 1 124 130.22 6 121 124.38 7 130 125.26 12 115 117.05 20 125 121.82 22 123 116.99 23 114 107 24 128 122.52
Compute the regression error (mean squared error) for the validation-fold observations.
L = kfoldLoss(CVMdl)
L = 43.8715
Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error.
Load Fisher's iris data set. Specify the predictor data X
and the response data Y
.
load fisheriris X = meas; Y = species; rng(1); % For reproducibility
Create an SVM template, and standardize the predictors.
t = templateSVM('Standardize',true)
t = Fit template for SVM. Standardize: 1
t
is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values.
Train the ECOC classifier, and specify the class order.
Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});
Mdl
is a ClassificationECOC
classifier. You can access its properties using dot notation.
Cross-validate Mdl
using 10-fold cross-validation.
CVMdl = crossval(Mdl);
CVMdl
is a ClassificationPartitionedECOC
cross-validated ECOC classifier.
Estimate the generalized classification error.
genError = kfoldLoss(CVMdl)
genError = 0.0400
The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.
Compute the quantile loss for a quantile neural network regression model, first partitioned using holdout validation and then partitioned using 5-fold cross-validation. Compare the two losses.
Load the carbig
data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration
, Cylinders
, Displacement
, and so on, as well as the response variable MPG
. View the first eight observations.
load carbig cars = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Origin,Weight,MPG); head(cars)
Acceleration Cylinders Displacement Horsepower Model_Year Origin Weight MPG ____________ _________ ____________ __________ __________ _______ ______ ___ 12 8 307 130 70 USA 3504 18 11.5 8 350 165 70 USA 3693 15 11 8 318 150 70 USA 3436 18 12 8 304 150 70 USA 3433 16 10.5 8 302 140 70 USA 3449 17 10 8 429 198 70 USA 4341 15 9 8 454 220 70 USA 4354 14 8.5 8 440 215 70 USA 4312 14
Remove rows of cars
where the table has missing values.
cars = rmmissing(cars);
Categorize the cars based on whether they were made in the USA.
cars.Origin = categorical(cellstr(cars.Origin)); cars.Origin = mergecats(cars.Origin,["France","Japan",... "Germany","Sweden","Italy","England"],"NotUSA");
Partition the data using cvpartition
. First, create a partition for holdout validation, using approximately 80% of the observations for the training data and 20% for the test data. Then, create a partition for 5-fold cross-validation.
rng(0,"twister") % For reproducibility holdoutPartition = cvpartition(height(cars),Holdout=0.20); kfoldPartition = cvpartition(height(cars),KFold=5);
Train a quantile neural network regression model using the cars
data. Specify MPG
as the response variable, and standardize the numeric predictors. Use the default 0.5 quantile (median).
Mdl = fitrqnet(cars,"MPG",Standardize=true);
Create the partitioned quantile regression models using crossval
.
holdoutMdl = crossval(Mdl,CVPartition=holdoutPartition)
holdoutMdl = RegressionPartitionedQuantileModel CrossValidatedModel: 'QuantileNeuralNetwork' PredictorNames: {'Acceleration' 'Cylinders' 'Displacement' 'Horsepower' 'Model_Year' 'Origin' 'Weight'} CategoricalPredictors: 6 ResponseName: 'MPG' NumObservations: 392 KFold: 1 Partition: [1×1 cvpartition] ResponseTransform: 'none' Quantiles: 0.5000 Properties, Methods
kfoldMdl = crossval(Mdl,CVPartition=kfoldPartition)
kfoldMdl = RegressionPartitionedQuantileModel CrossValidatedModel: 'QuantileNeuralNetwork' PredictorNames: {'Acceleration' 'Cylinders' 'Displacement' 'Horsepower' 'Model_Year' 'Origin' 'Weight'} CategoricalPredictors: 6 ResponseName: 'MPG' NumObservations: 392 KFold: 5 Partition: [1×1 cvpartition] ResponseTransform: 'none' Quantiles: 0.5000 Properties, Methods
Compute the quantile loss for holdoutMdl
and kfoldMdl
by using the kfoldLoss
object function.
holdoutL = kfoldLoss(holdoutMdl)
holdoutL = 0.9488
kfoldL = kfoldLoss(kfoldMdl)
kfoldL = 0.9628
holdoutL
is the quantile loss computed using one holdout set, while kfoldL
is an average quantile loss computed using five holdout sets. Cross-validation metrics tend to be better indicators of a model's performance on unseen data.
Input Arguments
Machine learning model, specified as a full classification, regression, or quantile regression model object, as given in the following tables of supported models.
Classification Model Object
Model | Full Classification Model Object |
---|---|
Discriminant analysis classifier | ClassificationDiscriminant |
Multiclass error-correcting output codes (ECOC) model | ClassificationECOC |
Ensemble classifier | ClassificationEnsemble , ClassificationBaggedEnsemble |
Generalized additive model | ClassificationGAM |
k-nearest neighbor model | ClassificationKNN |
Naive Bayes model | ClassificationNaiveBayes |
Neural network model | ClassificationNeuralNetwork |
Support vector machine for one-class and binary classification | ClassificationSVM |
Binary decision tree for multiclass classification | ClassificationTree |
Regression Model Object
Model | Full Regression Model Object |
---|---|
Regression ensemble model | RegressionEnsemble , RegressionBaggedEnsemble |
Gaussian process regression (GPR) model | RegressionGP (If you supply a custom
ActiveSet value in the call to
fitrgp , then you cannot cross-validate the GPR
model.) |
Generalized additive model (GAM) | RegressionGAM |
Neural network model | RegressionNeuralNetwork (If you use multiple response variables in
the call to fitrnet , then you cannot cross-validate the
neural network model.) |
Support vector machine regression model | RegressionSVM |
Regression tree model | RegressionTree |
Quantile Regression Model Object
Model | Full Quantile Regression Model Object |
---|---|
Quantile linear regression model (since R2025a) | RegressionQuantileLinear |
Quantile neural network model for regression (since R2025a) | RegressionQuantileNeuralNetwork |
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: crossval(Mdl,KFold=3)
specifies to use three folds in the
cross-validated model.
Cross-validation partition, specified as a cvpartition
object that specifies the type of cross-validation and the
indexing for the training and validation sets.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: Suppose you create a random partition for 5-fold cross-validation on 500
observations by using cvp = cvpartition(500,KFold=5)
. Then, you can
specify the cross-validation partition by setting
CVPartition=cvp
.
Fraction of the data used for holdout validation, specified as a scalar value in the range
(0,1). If you specify Holdout=p
, then the software completes these
steps:
Randomly select and reserve
p*100
% of the data as validation data, and train the model using the rest of the data.Store the compact trained model in the
Trained
property of the cross-validated model.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: Holdout=0.1
Data Types: double
| single
Number of folds to use in the cross-validated model, specified as a positive integer value
greater than 1. If you specify KFold=k
, then the software completes
these steps:
Randomly partition the data into
k
sets.For each set, reserve the set as validation data, and train the model using the other
k
– 1 sets.Store the
k
compact trained models in ak
-by-1 cell vector in theTrained
property of the cross-validated model.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: KFold=5
Data Types: single
| double
Leave-one-out cross-validation flag, specified as "on"
or
"off"
. If you specify Leaveout="on"
, then for
each of the n observations (where n is the number
of observations, excluding missing observations, specified in the
NumObservations
property of the model), the software completes
these steps:
Reserve the one observation as validation data, and train the model using the other n – 1 observations.
Store the n compact trained models in an n-by-1 cell vector in the
Trained
property of the cross-validated model.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: Leaveout="on"
Data Types: char
| string
Printout frequency, specified as a positive integer or
"off"
.
To track the number of folds trained by the software so far, specify a positive integer m. The software displays a message to the command line every time it finishes training m folds.
If you specify "off"
, the software does not display a message
when it completes training folds.
Note
You can specify Nprint
only if Mdl
is
a ClassificationEnsemble
or RegressionEnsemble
model object.
Example: NPrint=5
Data Types: single
| double
| char
| string
Options for computing in parallel, specified as a structure. Create the
Options
structure using statset
.
You need Parallel Computing Toolbox™ to run computations in parallel.
You can specify Options
only if Mdl
is a
ClassificationECOC
model object.
Example: Options=statset(UseParallel=true)
Data Types: struct
Output Arguments
Cross-validated machine learning model, returned as one of the cross-validated
(partitioned) model objects in the following tables, depending on the input model
Mdl
.
Classification Model Object
Model | Classification Model (Mdl ) | Cross-Validated Model (CVMdl ) |
---|---|---|
Discriminant analysis classifier | ClassificationDiscriminant | ClassificationPartitionedModel |
Multiclass error-correcting output codes (ECOC) model | ClassificationECOC | ClassificationPartitionedECOC |
Ensemble classifier | ClassificationEnsemble , ClassificationBaggedEnsemble | ClassificationPartitionedEnsemble |
Generalized additive model | ClassificationGAM | ClassificationPartitionedGAM |
k-nearest neighbor model | ClassificationKNN | ClassificationPartitionedModel |
Naive Bayes model | ClassificationNaiveBayes | ClassificationPartitionedModel |
Neural network model | ClassificationNeuralNetwork | ClassificationPartitionedModel |
Support vector machine for one-class and binary classification | ClassificationSVM | ClassificationPartitionedModel |
Binary decision tree for multiclass classification | ClassificationTree | ClassificationPartitionedModel |
Regression Model Object
Model | Regression Model (Mdl ) | Cross-Validated Model (CVMdl ) |
---|---|---|
Regression ensemble model | RegressionEnsemble , RegressionBaggedEnsemble | RegressionPartitionedEnsemble |
Gaussian process regression model | RegressionGP | RegressionPartitionedGP |
Generalized additive model | RegressionGAM | RegressionPartitionedGAM |
Neural network model | RegressionNeuralNetwork | RegressionPartitionedNeuralNetwork |
Support vector machine regression model | RegressionSVM | RegressionPartitionedSVM |
Regression tree model | RegressionTree | RegressionPartitionedModel |
Quantile Regression Model Object
Model | Quantile Regression Model (Mdl ) | Cross-Validated Model (CVMdl ) |
---|---|---|
Quantile linear regression model (since R2025a) | RegressionQuantileLinear | RegressionPartitionedQuantileModel |
Quantile neural network model for regression (since R2025a) | RegressionQuantileNeuralNetwork | RegressionPartitionedQuantileModel |
Tips
Assess the predictive performance of
Mdl
on cross-validated data using the kfold functions and properties ofCVMdl
, such askfoldPredict
,kfoldLoss
,kfoldMargin
, andkfoldEdge
for classification;kfoldPredict
andkfoldLoss
for regression; andkfoldPredict
andkfoldLoss
for quantile regression.Return a partitioned classifier with stratified partitioning by using the name-value argument
KFold
orHoldout
.Create a
cvpartition
objectcvp
usingcvp =
cvpartition
(n,KFold=k)
. Return a partitioned classifier with nonstratified partitioning by using the name-value argumentCVPartition=cvp
.
Alternative Functionality
Instead of training a model and then cross-validating it, you can create a cross-validated
model directly by using a fitting function and specifying one of these name-value arguments:
CVPartition
, Holdout
, KFold
,
or Leaveout
.
Extended Capabilities
Usage notes and limitations:
This function fully supports GPU arrays for a trained classification model specified as a
ClassificationECOC
,ClassificationEnsemble
,ClassificationKNN
,ClassificationNeuralNetwork
,ClassificationSVM
, orClassificationTree
object.This function fully supports GPU arrays for a trained regression model specified as a
RegressionEnsemble
,RegressionNeuralNetwork
,RegressionSVM
, orRegressionTree
object.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2012aYou can cross-validate RegressionQuantileLinear
and RegressionQuantileNeuralNetwork
model objects by using the
crossval
object function. The function returns RegressionPartitionedQuantileModel
model objects.
crossval
fully supports GPU arrays for RegressionNeuralNetwork
and ClassificationNeuralNetwork
.
Starting in R2023b, a cross-validated regression neural network model is a RegressionPartitionedNeuralNetwork
object. In previous releases, a cross-validated regression neural network model was a RegressionPartitionedModel
object.
You can create a RegressionPartitionedNeuralNetwork
object in two ways:
Create a cross-validated model from a regression neural network model object
RegressionNeuralNetwork
by using thecrossval
object function.Create a cross-validated model by using the
fitrnet
function and specifying one of the name-value argumentsCrossVal
,CVPartition
,Holdout
,KFold
, orLeaveout
.
Starting in R2022b, a cross-validated Gaussian process regression (GPR) model is a RegressionPartitionedGP
object. In previous releases, a cross-validated GPR
model was a RegressionPartitionedModel
object.
You can create a RegressionPartitionedGP
object in two ways:
Create a cross-validated model from a GPR model object
RegressionGP
by using thecrossval
object function.Create a cross-validated model by using the
fitrgp
function and specifying one of the name-value argumentsCrossVal
,CVPartition
,Holdout
,KFold
, orLeaveout
.
Regardless of whether you train a full or cross-validated GPR model first, you cannot specify an ActiveSet
value in the call to fitrgp
.
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)