Classification edge for cross-validated classification model
returns the classification edge obtained by the cross-validated
E = kfoldEdge(
CVMdl. For every fold,
kfoldEdge computes the classification edge for validation-fold
observations using a classifier trained on training-fold observations.
CVMdl.Y contain both sets of
Estimate k-fold Edge of Classifier
Compute the k-fold edge for a model trained on Fisher's iris data.
Load Fisher's iris data set.
Train a classification tree classifier.
tree = fitctree(meas,species);
Cross-validate the classifier using 10-fold cross-validation.
cvtree = crossval(tree);
Compute the k-fold edge.
edge = kfoldEdge(cvtree)
edge = 0.8578
Compute K-Fold Edge of Held-Out Observations
Compute the k-fold edge for an ensemble trained on the Fisher iris data.
Load the sample data set.
Train an ensemble of 100 boosted classification trees.
t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Learners',t);
Create a cross-validated ensemble from
ens and find the classification edge.
rng(10,'twister') % For reproducibility cvens = crossval(ens); E = kfoldEdge(cvens)
E = 3.2033
CVMdl — Cross-validated partitioned classifier
ClassificationPartitionedModel object |
ClassificationPartitionedEnsemble object |
Cross-validated partitioned classifier, specified as a
ClassificationPartitionedGAM object. You can create the object in two ways:
Pass a trained classification model listed in the following table to its
Train a classification model using a function listed in the following table and specify one of the cross-validation name-value arguments for the function.
Specify optional pairs of arguments as
the argument name and
Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
kfoldEdge(CVMdl,'Folds',[1 2 3 5]) specifies to use the
first, second, third, and fifth folds to compute the classification edge, but to exclude the
Folds — Fold indices to use
1:CVMdl.KFold (default) | positive integer vector
Fold indices to use, specified as a positive integer vector. The elements of
Folds must be within the range from
The software uses only the folds specified in
'Folds',[1 4 10]
IncludeInteractions — Flag to include interaction terms
Flag to include interaction terms of the model, specified as
false. This argument is valid only for a generalized
additive model (GAM). That is, you can specify this argument only when
The default value is
true if the models in
interaction terms. The value must be
false if the models do not
contain interaction terms.
Mode — Aggregation level for output
'average' (default) |
Aggregation level for the output, specified as
|The output is a scalar average over all folds.|
|The output is a vector of length k containing one value per fold, where k is the number of folds.|
If you want to specify this value,
E — Classification edge
numeric scalar | numeric column vector
Classification edge, returned as a numeric scalar or numeric column vector.
Eis the average classification edge over all folds.
Eis a k-by-1 numeric column vector containing the classification edge for each fold, where k is the number of folds.
min(CVMdl.NumTrainedPerFold)-by-1 numeric column vector. Each element
jis the average classification edge over all folds that the function obtains by using ensembles trained with weak learners
ClassificationPartitionedGAM, then the output value depends on the
(1 + min(NumTrainedPerFold.PredictorTrees))-by-1 numeric column vector. The first element of
Lis the average classification edge over all folds that is obtained using only the intercept (constant) term. The
(j + 1)th element of
Lis the average edge obtained using the intercept term and the first
jpredictor trees per linear term.
(1 + min(NumTrainedPerFold.InteractionTrees))-by-1 numeric column vector. The first element of
Lis the average classification edge over all folds that is obtained using the intercept (constant) term and all predictor trees per linear term. The
(j + 1)th element of
Lis the average edge obtained using the intercept term, all predictor trees per linear term, and the first
jinteraction trees per interaction term.
The classification edge is the weighted mean of the classification margins.
One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge.
The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The classification margin for multiclass classification is the difference between the classification score for the true class and the maximal score for the false classes.
If the margins are on the same scale (that is, the score values are based on the same score transformation), then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.
kfoldEdge computes the classification edge as described in the
edge object function. For a model-specific description, see
edge function reference page in the following
|Discriminant analysis classifier|
|Generalized additive model classifier|
|k-nearest neighbor classifier|
|Naive Bayes classifier|
|Neural network classifier|
|Support vector machine classifier|
|Binary decision tree for multiclass classification|
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
This function fully supports GPU arrays for the following cross-validated model objects:
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version HistoryIntroduced in R2011a
R2023b: Observations with missing predictor values are used in resubstitution and cross-validation computations
Starting in R2023b, the following classification model object functions use observations with missing predictor values as part of resubstitution ("resub") and cross-validation ("kfold") computations for classification edges, losses, margins, and predictions.
In previous releases, the software omitted observations with missing predictor values from the resubstitution and cross-validation computations.