RegressionEnsemble

Ensemble regression

Description

RegressionEnsemble combines a set of trained weak learner models and data on which these learners were trained. It can predict ensemble response for new data by aggregating predictions from its weak learners.

Creation

Description

Create a regression ensemble object using fitrensemble.

Properties

expand all

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
idxNumeric = idxNumeric';
end
for j = idxNumeric
x = X(:,j);
% Convert x to array if x is a table.
if istable(x)
x = table2array(x);
end
% Group x into bins by using the discretize function.
xbinned = discretize(x,[-inf; edges{j}; inf]);
Xbinned(:,j) = xbinned;
end
Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

How the ensemble combines weak learner weights, returned as either 'WeightedAverage' or 'WeightedSum'.

Data Types: char

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

Fit information, returned as a numeric array. The FitInfoDescription property describes the content of this array.

Data Types: double

Description of the information in FitInfo, returned as a character vector or cell array of character vectors.

Data Types: char | cell

Description of the cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation:

• 'bayesopt' (default) — Object of class BayesianOptimization

• 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Names of weak learners in ensemble, returned as a cell array of character vectors. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}.

Data Types: cell

Method that fitrensemble uses to create the ensemble, returned as a character vector.

Data Types: char

Parameters used in training the ensemble, returned as an EnsembleParams object. The properties of ModelParameters include the type of ensemble, either 'classification' or 'regression', the Method used to create the ensemble, and other parameters, depending on the ensemble.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

Number of trained weak learners in the ensemble, returned as a positive integer.

Data Types: double

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

Reason that fitrensemble stopped adding weak learners to the ensemble, returned as a character vector.

Data Types: char

Result of using the regularize method on the ensemble, returned as a structure. Use Regularization with shrink to lower resubstitution error and shrink the ensemble.

Data Types: struct

Name of the response variable, returned as a character vector.

Data Types: char

Function for transforming raw response values, specified as a function handle or function name. The default is 'none', which means @(y)y, or no transformation. The function should accept a vector (the original response values) and return a vector of the same size (the transformed response values).

Example: Suppose you create a function handle that applies an exponential transformation to an input vector by using myfunction = @(y)exp(y). Then, you can specify the response transformation as 'ResponseTransform',myfunction.

Data Types: char | string | function_handle

Trained regression models, returned as a cell vector. The entries of the cell vector contain the corresponding compact regression models.

If Method is 'LogitBoost' or 'GentleBoost', then the ensemble stores trained learner j in the CompactRegressionLearner property of the object stored in Trained{j}. That is, to access trained learner j, use ens.Trained{j}.CompactRegressionLearner.

Data Types: cell

Trained weights for the weak learners in the ensemble, returned as a numeric vector. TrainedWeights has T elements, where T is the number of weak learners in learners. The ensemble computes predicted response by aggregating weighted predictions from its learners.

Data Types: double

Scaled weights in tree, returned as a numeric vector. W has length n, the number of rows in the training data.

Data Types: double

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

Row classifications corresponding to the rows of X, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. Each row of Y represents the classification of the corresponding row of X.

Data Types: single | double | logical | char | string | cell | categorical

Object Functions

 compact Reduce size of regression ensemble model crossval Cross-validate machine learning model cvshrink Cross-validate pruning and regularization of regression ensemble gather Gather properties of Statistics and Machine Learning Toolbox object from GPU lime Local interpretable model-agnostic explanations (LIME) loss Regression error for regression ensemble model partialDependence Compute partial dependence plotPartialDependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots predict Predict responses using regression ensemble model predictorImportance Estimates of predictor importance for regression ensemble of decision trees regularize Find optimal weights for learners in regression ensemble removeLearners Remove members of compact regression ensemble resubLoss Resubstitution loss for regression ensemble model resubPredict Predict response of regression ensemble by resubstitution resume Resume training of regression ensemble model shapley Shapley values shrink Prune regression ensemble

Examples

collapse all

Load the carsmall data set. Consider a model that explains a car's fuel economy (MPG) using its weight (Weight) and number of cylinders (Cylinders).

X = [Weight Cylinders];
Y = MPG;

Train a boosted ensemble of 100 regression trees using the LSBoost method. Specify that Cylinders is a categorical variable.

Mdl = fitrensemble(X,Y,'Method','LSBoost',...
'PredictorNames',{'W','C'},'CategoricalPredictors',2)
Mdl =
RegressionEnsemble
PredictorNames: {'W'  'C'}
ResponseName: 'Y'
CategoricalPredictors: 2
ResponseTransform: 'none'
NumObservations: 94
NumTrained: 100
Method: 'LSBoost'
LearnerNames: {'Tree'}
ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
FitInfo: [100x1 double]
FitInfoDescription: {2x1 cell}
Regularization: []

Mdl is a RegressionEnsemble model object that contains the training data, among other things.

Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained regression trees (CompactRegressionTree model objects) that compose the ensemble.

Plot a graph of the first trained regression tree.

view(Mdl.Trained{1},'Mode','graph')

By default, fitrensemble grows shallow trees for boosted ensembles of trees.

Predict the fuel economy of 4,000 pound cars with 4, 6, and 8 cylinders.

XNew = [4000*ones(3,1) [4; 6; 8]];
mpgNew = predict(Mdl,XNew)
mpgNew = 3×1

19.5926
18.6388
15.4810

Tips

For an ensemble of regression trees, the Trained property contains a cell vector of ens.NumTrained CompactRegressionTree model objects. For a textual or graphical display of tree t in the cell vector, enter

view(ens.Trained{t})

Version History

Introduced in R2011a