Main Content

resubLoss

Resubstitution classification loss for classification ensemble model

Description

example

L = resubLoss(ens) returns the Classification Loss L by resubstitution for the trained classification ensemble model ens using the training data stored in ens.X and the corresponding true class labels stored in ens.Y. By default, resubLoss uses the loss, meaning the loss computed for the data used by fitcensemble to create ens.

The classification loss (L) is a resubstitution quality measure. Its interpretation depends on the loss function (LossFun) and weighting scheme, but in general, better classifiers yield smaller classification loss values. The default LossFun value is "classiferror" (misclassification rate in decimal).

L = resubLoss(ens,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the loss function, the indices of the weak learners to use for calculating the loss, and the aggregation level for the output.

Input Arguments

expand all

Classification ensemble model, specified as a ClassificationEnsemble model object trained with fitcensemble.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: resubLoss(ens,LossFun="exponential",UseParallel=true) specifies to use an exponential loss function, and to perform computations in parallel.

Indices of weak learners in the ensemble to use in resubLoss, specified as a vector of positive integers in the range [1:ens.NumTrained]. By default, all learners are used.

Example: Learners=[1 2 4]

Data Types: single | double

Loss function, specified as a built-in loss function name or a function handle.

The following table describes the values for the built-in loss functions.

ValueDescription
"binodeviance"Binomial deviance
"classifcost"Observed misclassification cost
"classiferror"Misclassified rate in decimal
"exponential"Exponential loss
"hinge"Hinge loss
"logit"Logistic loss
"mincost"Minimal expected misclassification cost (for classification scores that are posterior probabilities)
"quadratic"Quadratic loss

  • "mincost" is appropriate for classification scores that are posterior probabilities.

  • Bagged and subspace ensembles return posterior probabilities by default (ens.Method is "Bag" or "Subspace").

  • To use posterior probabilities as classification scores when the ensemble method is "AdaBoostM1", "AdaBoostM2", "GentleBoost", or "LogitBoost", you must specify the double-logit score transform by entering the following:

    ens.ScoreTransform = "doublelogit";

  • For all other ensemble methods, the software does not support posterior probabilities as classification scores.

You can specify your own function using function handle notation. Suppose that n is the number of observations in X, and K is the number of distinct classes (numel(ens.ClassNames), where ens is the input model). Your function must have the signature

lossvalue = lossfun(C,S,W,Cost)
where:

  • The output argument lossvalue is a scalar.

  • You specify the function name (lossfun).

  • C is an n-by-K logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in ens.ClassNames.

    Create C by setting C(p,q) = 1, if observation p is in class q, for each row. Set all other elements of row p to 0.

  • S is an n-by-K numeric matrix of classification scores. The column order corresponds to the class order in ens.ClassNames. S is a matrix of classification scores, similar to the output of predict.

  • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes the weights to sum to 1.

  • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) - eye(K) specifies a cost of 0 for correct classification and 1 for misclassification.

For more details on loss functions, see Classification Loss.

Example: LossFun="binodeviance"

Example: LossFun=@Lossfun

Data Types: char | string | function_handle

Aggregation level for the output, specified as "ensemble", "individual", or "cumulative".

ValueDescription
"ensemble"The output is a scalar value, the loss for the entire ensemble.
"individual"The output is a vector with one element per trained learner.
"cumulative"The output is a vector in which element J is obtained by using learners 1:J from the input list of learners.

Example: Mode="individual"

Data Types: char | string

Flag to run in parallel, specified as a numeric or logical 1 (true) or 0 (false). If you specify UseParallel=true, the resubLoss function executes for-loop iterations by using parfor. The loop runs in parallel when you have Parallel Computing Toolbox™.

Example: UseParallel=true

Data Types: logical

Examples

expand all

Load Fisher"s iris data set.

load fisheriris

Train a classification ensemble of 100 decision trees using AdaBoostM2. Specify tree stumps as the weak learners.

t = templateTree(MaxNumSplits=1);
ens = fitcensemble(meas,species,Method="AdaBoostM2",Learners=t);

Estimate the resubstitution classification error.

loss = resubLoss(ens)
loss = 0.0333

More About

expand all

Extended Capabilities

Version History

Introduced in R2011a

expand all