Contenu principal

RegressionPartitionedEnsemble

Cross-validated regression ensemble

Description

RegressionPartitionedEnsemble is a set of regression ensembles trained on cross-validated folds. You can estimate the quality of the regression by using one or more kfold functions: kfoldfun, kfoldLoss, and kfoldPredict.

Each kfold function uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, when you use kfoldPredict with a k-fold cross-validated model, the software estimates a response for every observation using the model trained without that observation. For more information, see Partitioned Models.

Creation

You can create a RegressionPartitionedEnsemble object in two ways:

Properties

expand all

Cross-Validation Properties

This property is read-only.

Name of the cross-validated model, returned as a character vector.

Data Types: char

This property is read-only.

Number of folds in the cross-validated ensemble, returned as a positive integer.

Data Types: double

This property is read-only.

Parameters of the cross-validated ensemble, returned as an object.

This property is read-only.

Number of weak learners used to train each trained learner in Trained, returned as a positive integer.

Data Types: double

This property is read-only.

Partition used in the cross-validation, returned as a cvpartition object.

This property is read-only.

Trained learners, returned as a KFold-length cell array of full ensembles. Every ensemble is full, meaning it contains its training data and weights.

Data Types: cell

This property is read-only.

Trained learners, returned as a KFold-length cell array of compact ensembles.

Data Types: cell

Other Regression Properties

This property is read-only.

Bin edges for numeric predictors, returned as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the NumBins name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the NumBins value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end
Xbinned contains the bin indices, ranging from 1 to the number of bins, for the numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Data Types: cell

This property is read-only.

Categorical predictor indices, returned as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

This property is read-only.

Predictor names in order of their appearance in the predictor data X, returned as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X.

Data Types: cell

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

Function for transforming the predicted response values, specified as "none" or a function handle. "none" means no transformation; equivalently, "none" means @(x)x. A function handle must accept a matrix of response values and return a matrix of the same size.

To change the function for transforming the predicted response values, use dot notation. For example, for a model Mdl and a function function that you define, you can specify:

Mdl.ResponseTransform = @function;

Data Types: char | string | function_handle

This property is read-only.

Scaled weights in the ensemble, returned as a numeric vector. W has length n, the number of rows in the training data. The sum of the elements of W is 1.

Data Types: double

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

This property is read-only.

Response data, returned as a numeric column vector with the same number of rows as X. Each entry in Y is the response to the data in the corresponding row of X.

Data Types: double

Object Functions

gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
kfoldLossLoss for cross-validated partitioned regression model
kfoldPredictPredict responses for observations in cross-validated regression model
kfoldfunCross-validate function for regression
resumeResume training of cross-validated regression ensemble model

Examples

collapse all

Create a partitioned regression ensemble using 10-fold cross-validation, and examine the loss for each fold.

Load the carsmall data set.

load carsmall

Create a subset of variables.

X = [Cylinders Displacement Horsepower Weight];
y = MPG;

Create a regression ensemble model.

rens = fitrensemble(X,y);

Create a 10-fold cross-validated ensemble from rens.

rng(10,"twister") % For reproducibility
cvrens = crossval(rens);

Examine the losses across the folds.

L = kfoldLoss(cvrens,Mode="individual")
L = 10×1

   21.4489
   48.4388
   28.2560
   17.5354
   29.9441
   49.5254
   51.2372
   31.0152
   31.6388
    8.9607

L is a vector containing the loss for each trained learner in the ensemble.

Algorithms

expand all

Extended Capabilities

expand all

Version History

Introduced in R2011a