Main Content

cvloss

Regression error by cross validation

Description

example

E = cvloss(tree) returns the cross-validated regression error (loss) for tree, a regression tree.

[E,SE] = cvloss(tree) also returns the standard error of E.

[E,SE,Nleaf] = cvloss(tree) returns the number of leaves (terminal nodes) in tree.

[E,SE,Nleaf,bestLevel] = cvloss(tree) returns the optimal pruning level for tree.

example

[___] = cvloss(tree,Name=Value) cross validates with additional options specified by one or more name-value arguments.

Examples

collapse all

Compute the cross-validation error for a default regression tree.

Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG.

load carsmall
X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set.

Mdl = fitrtree(X,MPG);

Compute the cross-validation error.

rng(1); % For reproducibility
E = cvloss(Mdl)
E = 27.6976

E is the 10-fold weighted, average MSE (weighted by number of test observations in the folds).

Apply k-fold cross validation to find the best level to prune a regression tree for all of its subtrees.

Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG.

load carsmall
X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set. View the resulting tree.

Mdl = fitrtree(X,MPG);
view(Mdl,Mode="graph")

Figure Regression tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 60 objects of type line, text. One or more of the lines displays its values using only markers

Compute the 5-fold cross-validation error for each subtree except for the first two lowest and highest pruning level. Specify to return the best pruning level over all subtrees.

rng(1); % For reproducibility
m = max(Mdl.PruneList) - 1
m = 15
[~,~,~,bestLevel] = cvloss(Mdl,SubTrees=2:m,KFold=5)
bestLevel = 14

Of the 15 pruning levels, the best pruning level is 14.

Prune the tree to the best level. View the resulting tree.

MdlPrune = prune(Mdl,Level=bestLevel);
view(MdlPrune,Mode="graph")

Figure Regression tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 12 objects of type line, text. One or more of the lines displays its values using only markers

Input Arguments

collapse all

Trained regression tree, specified as a RegressionTree object created using the fitrtree function.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: E = cvloss(tree,Subtrees="all") prunes all subtrees.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: E = cvloss(tree,"Subtrees","all") prunes all subtrees.

Pruning level, specified as a vector of nonnegative integers in ascending order or "all".

If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the completely pruned tree (in other words, just the root node).

If you specify "all", then cvloss operates on all subtrees (in other words, the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList).

cvloss prunes tree to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments.

To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting Prune="on", or by pruning tree using prune.

Example: Subtrees="all"

Data Types: single | double | char | string

Tree size, specified as one of the following:

  • "se" — The cvloss function uses the smallest tree whose cost is within one standard error of the minimum cost.

  • "min" — The cvloss function uses the minimal cost tree.

Example: TreeSize="min"

Number of folds to use in a cross-validated tree, specified as a positive integer greater than 1.

Example: KFold=8

Output Arguments

collapse all

Cross-validation mean squared error (loss), returned as a numeric vector of the same length as Subtrees.

Standard error of E, returned as a numeric vector of the same length as Subtrees.

Number of leaf nodes in the pruned subtrees, returned as a numeric vector of the same length as Subtrees. Leaf nodes are terminal nodes, which give responses, not splits.

Best pruning level as defined in the TreeSize name-value argument, returned as a numeric scalar whose value depends on TreeSize:

  • If TreeSize is "se", then bestLevel is the largest pruning level that achieves a value of E within SE of the minimum error.

  • If TreeSize is "min", then bestLevel is the smallest value in Subtrees.

Alternatives

You can create a cross-validated tree model using crossval, and call kfoldLoss instead of cvloss. If you are going to examine the cross-validated tree more than once, then the alternative can save time.

However, unlike cvloss, kfoldLoss does not return SE, Nleaf, or BestLevel.

Extended Capabilities

Version History

Introduced in R2011a