Main Content

ClassificationEnsemble

Package: classreg.learning.classif
Superclasses: CompactClassificationEnsemble

Ensemble classifier

Description

ClassificationEnsemble combines a set of trained weak learner models and data on which these learners were trained. It can predict ensemble response for new data by aggregating predictions from its weak learners. It stores data used for training, can compute resubstitution predictions, and can resume training if desired.

Construction

Create a classification ensemble object (ens) using fitcensemble.

Properties

BinEdges

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end
Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

CategoricalPredictors

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

ClassNames

List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.)

CombineWeights

Character vector describing how ens combines weak learner weights, either 'WeightedSum' or 'WeightedAverage'.

Cost

Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is read-only.

ExpandedPredictorNames

Expanded predictor names, stored as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

FitInfo

Numeric array of fit information. The FitInfoDescription property describes the content of this array.

FitInfoDescription

Character vector describing the meaning of the FitInfo array.

HyperparameterOptimizationResults

Description of the cross-validation optimization of hyperparameters, stored as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation:

  • 'bayesopt' (default) — Object of class BayesianOptimization

  • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

LearnerNames

Cell array of character vectors with names of weak learners in the ensemble. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}.

Method

Character vector describing the method that creates ens.

ModelParameters

Parameters used in training ens.

NumObservations

Numeric scalar containing the number of observations in the training data.

NumTrained

Number of trained weak learners in ens, a scalar.

PredictorNames

Cell array of names for the predictor variables, in the order in which they appear in X.

Prior

Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only.

ReasonForTermination

Character vector describing the reason fitcensemble stopped adding weak learners to the ensemble.

ResponseName

Character vector with the name of the response variable Y.

RowsUsed

Rows of the original training data stored in the model, specified as a logical vector. This property is empty if all rows are stored in X and Y.

ScoreTransform

Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree.

Add or change a ScoreTransform function using dot notation:

ens.ScoreTransform = 'function'

or

ens.ScoreTransform = @function

Trained

A cell vector of trained classification models.

  • If Method is 'LogitBoost' or 'GentleBoost', then ClassificationEnsemble stores trained learner j in the CompactRegressionLearner property of the object stored in Trained{j}. That is, to access trained learner j, use ens.Trained{j}.CompactRegressionLearner.

  • Otherwise, cells of the cell vector contain the corresponding, compact classification models.

TrainedWeights

Numeric vector of trained weights for the weak learners in ens. TrainedWeights has T elements, where T is the number of weak learners in learners.

UsePredForLearner

Logical matrix of size P-by-NumTrained, where P is the number of predictors (columns) in the training data X. UsePredForLearner(i,j) is true when learner j uses predictor i, and is false otherwise. For each learner, the predictors have the same order as the columns in the training data X.

If the ensemble is not of type Subspace, all entries in UsePredForLearner are true.

W

Scaled weights, a vector with length n, the number of rows in X. The sum of the elements of W is 1.

X

Matrix or table of predictor values that trained the ensemble. Each column of X represents one variable, and each row represents one observation.

Y

Numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X.

Object Functions

compactReduce size of classification ensemble model
compareHoldoutCompare accuracies of two classification models using new data
crossvalCross-validate classification ensemble model
edgeClassification edge for classification ensemble model
gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
limeLocal interpretable model-agnostic explanations (LIME)
lossClassification loss for classification ensemble model
marginClassification margins for classification ensemble model
partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predictClassify observations using ensemble of classification models
predictorImportanceEstimates of predictor importance for classification ensemble of decision trees
resubEdgeResubstitution classification edge for classification ensemble model
resubLossResubstitution classification loss for classification ensemble model
resubMarginResubstitution classification margins for classification ensemble model
resubPredictClassify observations in classification ensemble model
resumeResume training of classification ensemble model
shapleyShapley values
testckfoldCompare accuracies of two classification models by repeated cross-validation

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

Examples

collapse all

Load the ionosphere data set.

load ionosphere

Train a boosted ensemble of 100 classification trees using all measurements and the AdaBoostM1 method.

Mdl = fitcensemble(X,Y,'Method','AdaBoostM1')
Mdl = 
  ClassificationEnsemble
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'b'  'g'}
           ScoreTransform: 'none'
          NumObservations: 351
               NumTrained: 100
                   Method: 'AdaBoostM1'
             LearnerNames: {'Tree'}
     ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
                  FitInfo: [100x1 double]
       FitInfoDescription: {2x1 cell}


Mdl is a ClassificationEnsemble model object.

Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained classification trees (CompactClassificationTree model objects) that compose the ensemble.

Plot a graph of the first trained classification tree.

view(Mdl.Trained{1},'Mode','graph')

Figure Classification tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 36 objects of type line, text. One or more of the lines displays its values using only markers

By default, fitcensemble grows shallow trees for boosted ensembles of trees.

Predict the label of the mean of X.

predMeanX = predict(Mdl,mean(X))
predMeanX = 1x1 cell array
    {'g'}

Tips

For an ensemble of classification trees, the Trained property of ens stores an ens.NumTrained-by-1 cell vector of compact classification models. For a textual or graphical display of tree t in the cell vector, enter:

  • view(ens.Trained{t}.CompactRegressionLearner) for ensembles aggregated using LogitBoost or GentleBoost.

  • view(ens.Trained{t}) for all other aggregation methods.

Extended Capabilities

Version History

Introduced in R2011a

expand all