RegressionPartitionedModel

Cross-validated regression model

Description

RegressionPartitionedModel is a set of regression models trained on cross-validated folds. Estimate the quality of regression by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, and kfoldfun. Every “kfold” method uses models trained on in-fold observations to predict response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, every training fold contains roughly 4/5 of the data and every test fold contains roughly 1/5 of the data. The first model stored in Trained{1} was trained on X and Y with the first 1/5 excluded, the second model stored in Trained{2} was trained on X and Y with the second 1/5 excluded, and so on. When you call kfoldPredict, it computes predictions for the first 1/5 of the data using the first model, for the second 1/5 of data using the second model and so on. In short, response for every observation is computed by kfoldPredict using the model trained without this observation.

Creation

Description

You can create a RegressionPartitionedModel object in two ways:

Create a cross-validated model from a regression tree model object RegressionTree by using the crossval object function.
Create a cross-validated model by using the fitrtree function and specifying one of the name-value arguments CrossVal, CVPartition, Holdout, KFold, or Leaveout.

Properties

expand all

`BinEdges` — Bin edges for numeric predictors
Read-only: cell array of p numeric vectors

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the NumBins name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the NumBins value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for the numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Data Types: cell

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

`CrossValidatedModel` — Name of cross-validated model
Read-only: character vector

This property is read-only.

Name of the cross-validated model, returned as a character vector.

Data Types: char

`ModelParameters` — Parameters of cross-validated model
object

Parameters of the cross-validated model, returned as an object.

`NumObservations` — Number of observations in training data
Read-only: positive integer

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

`Partition` — Partition used in cross-validation
Read-only: `CVPartition` object

This property is read-only.

Partition used in cross-validation, returned as a CVPartition object.

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

This property is read-only.

Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X.

Data Types: cell

`ResponseName` — Response variable name
character vector

Response variable name, specified as a character vector.

Data Types: char

`ResponseTransform` — Function for transforming responses
`'none'` (default) | function handle

Function for transforming the raw response values (mean squared error), specified as a function handle or 'none'. The default 'none' means no transformation; equivalently, 'none' means @(x)x. A function handle must accept a matrix of response values and return a matrix of the same size.

Add or change a ResponseTransform function using dot notation:

tree.ResponseTransform = @function

Data Types: char | function_handle

`Trained` — Trained learners
cell array of compact regression models

Trained learners, returned as a cell array of compact regression models.

Data Types: cell

`W` — Scaled weights in ensemble
Read-only: numeric vector

This property is read-only.

Scaled weights in the ensemble, returned as a numeric vector. W has length n, the number of rows in the training data. The sum of the elements of W is 1.

Data Types: double

`X` — Predictor values
Read-only: real matrix | table

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

`Y` — Class labels
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

This property is read-only.

Class labels corresponding to the observations in X, returned as a categorical array, cell array of character vectors, character array, logical vector, or numeric vector. Each row of Y represents the classification of the corresponding row of X.

Object Functions

`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`kfoldLoss`	Loss for cross-validated partitioned regression model
`kfoldPredict`	Predict responses for observations in cross-validated regression model
`kfoldfun`	Cross-validate function for regression

Examples

collapse all

Evaluate Cross-Validation Error

Open Live Script

Load the sample data. Create a variable X containing the Horsepower and Weight data.

load carsmall
X = [Horsepower Weight];

Construct a regression tree using the sample data.

cvtree = fitrtree(X,MPG,'crossval','on');

Evaluate the cross-validation error of the carsmall data using Horsepower and Weight as predictor variables for mileage (MPG).

L = kfoldLoss(cvtree)

L = 
25.5338

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

RegressionPartitionedModel can be a cross-validated regression tree trained by using fitrtree with GPU array input arguments.
The object functions of a RegressionPartitionedModel model fully support GPU arrays.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a

RegressionPartitionedModel

Description

Creation

Description

Properties

BinEdges — Bin edges for numeric predictors Read-only: cell array of p numeric vectors

CategoricalPredictors — Categorical predictor indices Read-only: vector of positive integers | []

CrossValidatedModel — Name of cross-validated model Read-only: character vector

ModelParameters — Parameters of cross-validated model object

NumObservations — Number of observations in training data Read-only: positive integer

Partition — Partition used in cross-validation Read-only: CVPartition object

PredictorNames — Predictor names Read-only: cell array of character vectors

ResponseName — Response variable name character vector

ResponseTransform — Function for transforming responses 'none' (default) | function handle

Trained — Trained learners cell array of compact regression models

W — Scaled weights in ensemble Read-only: numeric vector

X — Predictor values Read-only: real matrix | table

Y — Class labels Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

Object Functions

Examples

Evaluate Cross-Validation Error

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`BinEdges` — Bin edges for numeric predictors
Read-only: cell array of p numeric vectors

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

`CrossValidatedModel` — Name of cross-validated model
Read-only: character vector

`ModelParameters` — Parameters of cross-validated model
object

`NumObservations` — Number of observations in training data
Read-only: positive integer

`Partition` — Partition used in cross-validation
Read-only: `CVPartition` object

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

`ResponseName` — Response variable name
character vector

`ResponseTransform` — Function for transforming responses
`'none'` (default) | function handle

`Trained` — Trained learners
cell array of compact regression models

`W` — Scaled weights in ensemble
Read-only: numeric vector

`X` — Predictor values
Read-only: real matrix | table

`Y` — Class labels
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.