cvloss

Classification error by cross-validation for classification tree model

Syntax

E = cvloss(tree)

E = cvloss(tree,Name=Value)

[E,SE,Nleaf,BestLevel]
= cvloss(___)

Description

E = cvloss(tree) returns the cross-validated classification error (loss) E for the trained classification tree model tree. The cvloss function uses stratified partitioning to create cross-validated sets. That is, for each fold, each partition of the data has roughly the same class proportions as in the data used to train tree.

example

E = cvloss(tree,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the pruning level, tree size, and number of cross-validation samples.

example

[E,SE,Nleaf,BestLevel] = cvloss(___) also returns the standard error of E, the number of leaf nodes of tree, and the optimal pruning level for tree, using any of the input argument combinations in the previous syntaxes.

example

Examples

collapse all

Compute the Cross-Validation Error

Open Live Script

Compute the cross-validation error for a default classification tree.

Load the ionosphere data set.

load ionosphere

Grow a classification tree using the entire data set.

Mdl = fitctree(X,Y);

Compute the cross-validation error.

rng(1); % For reproducibility
E = cvloss(Mdl)

E = 
0.1140

E is the 10-fold misclassification error.

Find the Best Pruning Level Using Cross Validation

Open Live Script

Apply k-fold cross validation to find the best level to prune a classification tree for all of its subtrees.

Load the ionosphere data set.

load ionosphere

Grow a classification tree using the entire data set. View the resulting tree.

Mdl = fitctree(X,Y);
view(Mdl,'Mode','graph')

Figure Classification tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 60 objects of type line, text. One or more of the lines displays its values using only markers

Compute the 5-fold cross-validation error for each subtree except for the highest pruning level. Specify to return the best pruning level over all subtrees.

rng(1); % For reproducibility
m = max(Mdl.PruneList) - 1

m = 
7

[E,~,~,bestLevel] = cvloss(Mdl,'Subtrees',0:m,'KFold',5)

bestLevel = 
6

Of the 7 pruning levels, the best pruning level is 6.

Prune the tree to the best level. View the resulting tree.

MdlPrune = prune(Mdl,'Level',bestLevel);
view(MdlPrune,'Mode','graph')

Figure Classification tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 12 objects of type line, text. One or more of the lines displays its values using only markers

Input Arguments

collapse all

`tree` — Classification tree model
`ClassificationTree` model object

Classification tree model, specified as a ClassificationTree model object trained with fitctree.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: [E,SE,Nleaf,BestLevel] = cvloss(tree,KFold=5) specifies to use 5 cross-validation samples.

`Subtrees` — Pruning level
`0` (default) | vector of nonnegative integers | `"all"`

Pruning level, specified as a vector of nonnegative integers in ascending order or "all".

If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree, and max(tree.PruneList) indicates the completely pruned tree (that is, just the root node).

If you specify "all", then cvloss operates on all subtrees (in other words, the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList).

cvloss prunes tree to each level specified by Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments.

For the function to invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting Prune="on" when you use fitctree, or by pruning tree using prune.

Example: Subtrees="all"

Data Types: single | double | char | string

`TreeSize` — Tree size
`"se"` (default) | `"min"`

Tree size, specified as one of these values:

"se" — cvloss returns the best pruning level (BestLevel), which corresponds to the highest pruning level with the loss within one standard deviation of the minimum (L+se, where L and se relate to the smallest value in Subtrees).
"min" — cvloss returns the best pruning level, which corresponds to the element of Subtrees with the smallest loss. This element is usually the smallest element of Subtrees.

Example: TreeSize="min"

Data Types: char | string

`KFold` — Number of cross-validation samples
10 (default) | positive integer value greater than 1

Number of cross-validation samples, specified as a positive integer value greater than 1.

Example: KFold=8

Data Types: single | double

Output Arguments

collapse all

`E` — Cross-validation classification error
numeric vector

Cross-validation classification error (loss), returned as a numeric vector of the same length as Subtrees.

`SE` — Standard error
numeric vector

Standard error of E, returned as a numeric vector of the same length as Subtrees.

`Nleaf` — Number of leaf nodes
vector of integer values

Number of leaf nodes in the pruned subtrees, returned as a vector of integer values that has the same length as Subtrees. Leaf nodes are terminal nodes, which give responses, not splits.

`BestLevel` — Best pruning level
numeric scalar

Best pruning level, returned as a numeric scalar whose value depends on TreeSize:

When TreeSize is "se", the loss function returns the highest pruning level whose loss is within one standard deviation of the minimum (L+se, where L and se relate to the smallest value in Subtrees).
When TreeSize is "min", the loss function returns the element of Subtrees with the smallest loss, usually the smallest element of Subtrees.

Alternatives

You can construct a cross-validated tree model with crossval, and call kfoldLoss instead of cvloss. If you are going to examine the cross-validated tree more than once, then the alternative can save time.

However, unlike cvloss, kfoldLoss does not return SE, Nleaf, or BestLevel. kfoldLoss also does not allow you to examine any error other than the classification error.

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a

cvloss

Syntax

Description

Examples

Compute the Cross-Validation Error

Find the Best Pruning Level Using Cross Validation

Input Arguments

tree — Classification tree model ClassificationTree model object

Name-Value Arguments

Subtrees — Pruning level 0 (default) | vector of nonnegative integers | "all"

TreeSize — Tree size "se" (default) | "min"

KFold — Number of cross-validation samples 10 (default) | positive integer value greater than 1

Output Arguments

E — Cross-validation classification error numeric vector

SE — Standard error numeric vector

Nleaf — Number of leaf nodes vector of integer values

BestLevel — Best pruning level numeric scalar

Alternatives

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`tree` — Classification tree model
`ClassificationTree` model object

`Subtrees` — Pruning level
`0` (default) | vector of nonnegative integers | `"all"`

`TreeSize` — Tree size
`"se"` (default) | `"min"`

`KFold` — Number of cross-validation samples
10 (default) | positive integer value greater than 1

`E` — Cross-validation classification error
numeric vector

`SE` — Standard error
numeric vector

`Nleaf` — Number of leaf nodes
vector of integer values

`BestLevel` — Best pruning level
numeric scalar

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.