# cvloss

Regression error by cross validation

## Syntax

``E = cvloss(tree)``
``````[E,SE] = cvloss(tree)``````
``````[E,SE,Nleaf] = cvloss(tree)``````
``````[E,SE,Nleaf,bestLevel] = cvloss(tree)``````
``[___] = cvloss(tree,Name=Value)``

## Description

example

````E = cvloss(tree)` returns the cross-validated regression error (loss) for `tree`, a regression tree.```
``````[E,SE] = cvloss(tree)``` also returns the standard error of `E`.```
``````[E,SE,Nleaf] = cvloss(tree)``` returns the number of leaves (terminal nodes) in `tree`.```
``````[E,SE,Nleaf,bestLevel] = cvloss(tree)``` returns the optimal pruning level for `tree`.```

example

````[___] = cvloss(tree,Name=Value)` cross validates with additional options specified by one or more name-value arguments.```

## Examples

collapse all

Compute the cross-validation error for a default regression tree.

Load the `carsmall` data set. Consider `Displacement`, `Horsepower`, and `Weight` as predictors of the response `MPG`.

```load carsmall X = [Displacement Horsepower Weight];```

Grow a regression tree using the entire data set.

`Mdl = fitrtree(X,MPG);`

Compute the cross-validation error.

```rng(1); % For reproducibility E = cvloss(Mdl)```
```E = 27.6976 ```

`E` is the 10-fold weighted, average MSE (weighted by number of test observations in the folds).

Apply k-fold cross validation to find the best level to prune a regression tree for all of its subtrees.

Load the `carsmall` data set. Consider `Displacement`, `Horsepower`, and `Weight` as predictors of the response `MPG`.

```load carsmall X = [Displacement Horsepower Weight];```

Grow a regression tree using the entire data set. View the resulting tree.

```Mdl = fitrtree(X,MPG); view(Mdl,Mode="graph")``` Compute the 5-fold cross-validation error for each subtree except for the first two lowest and highest pruning level. Specify to return the best pruning level over all subtrees.

```rng(1); % For reproducibility m = max(Mdl.PruneList) - 1```
```m = 15 ```
`[~,~,~,bestLevel] = cvloss(Mdl,SubTrees=2:m,KFold=5)`
```bestLevel = 14 ```

Of the `15` pruning levels, the best pruning level is `14`.

Prune the tree to the best level. View the resulting tree.

```MdlPrune = prune(Mdl,Level=bestLevel); view(MdlPrune,Mode="graph")``` ## Input Arguments

collapse all

Trained regression tree, specified as a `RegressionTree` object created using the `fitrtree` function.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: `E = cvloss(tree,Subtrees="all")` prunes all subtrees.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `E = cvloss(tree,"Subtrees","all")` prunes all subtrees.

Pruning level, specified as a vector of nonnegative integers in ascending order or `"all"`.

If you specify a vector, then all elements must be at least `0` and at most `max(tree.PruneList)`. `0` indicates the full, unpruned tree and `max(tree.PruneList)` indicates the completely pruned tree (in other words, just the root node).

If you specify `"all"`, then `cvloss` operates on all subtrees (in other words, the entire pruning sequence). This specification is equivalent to using `0:max(tree.PruneList)`.

`cvloss` prunes `tree` to each level indicated in `Subtrees`, and then estimates the corresponding output arguments. The size of `Subtrees` determines the size of some output arguments.

To invoke `Subtrees`, the properties `PruneList` and `PruneAlpha` of `tree` must be nonempty. In other words, grow `tree` by setting `Prune="on"`, or by pruning `tree` using `prune`.

Example: `Subtrees="all"`

Data Types: `single` | `double` | `char` | `string`

Tree size, specified as one of the following:

• `"se"` — The `cvloss` function uses the smallest tree whose cost is within one standard error of the minimum cost.

• `"min"` — The `cvloss` function uses the minimal cost tree.

Example: `TreeSize="min"`

Number of folds to use in a cross-validated tree, specified as a positive integer greater than 1.

Example: `KFold=8`

## Output Arguments

collapse all

Cross-validation mean squared error (loss), returned as a numeric vector of the same length as `Subtrees`.

Standard error of `E`, returned as a numeric vector of the same length as `Subtrees`.

Number of leaf nodes in the pruned subtrees, returned as a numeric vector of the same length as `Subtrees`. Leaf nodes are terminal nodes, which give responses, not splits.

Best pruning level as defined in the `TreeSize` name-value argument, returned as a numeric scalar whose value depends on `TreeSize`:

• If `TreeSize` is `"se"`, then `bestLevel` is the largest pruning level that achieves a value of `E` within `SE` of the minimum error.

• If `TreeSize` is `"min"`, then `bestLevel` is the smallest value in `Subtrees`.

## Alternatives

You can create a cross-validated tree model using `crossval`, and call `kfoldLoss` instead of `cvloss`. If you are going to examine the cross-validated tree more than once, then the alternative can save time.

However, unlike `cvloss`, `kfoldLoss` does not return `SE`, `Nleaf`, or `BestLevel`.

## Version History

Introduced in R2011a