crossval

Class: ClassificationNaiveBayes

Cross-validated naive Bayes classifier

Description

example

CVMdl = crossval(Mdl) returns a partitioned naive Bayes classifier (CVSMdl) from a trained naive Bayes classifier (Mdl).

By default, crossval uses 10-fold cross validation on the training data to create CVMdl.

example

CVMdl = crossval(Mdl,Name,Value) returns a partitioned naive Bayes classifier with additional options specified by one or more Name,Value pair arguments.

For example, you can specify a holdout sample proportion.

Input Arguments

expand all

A fully trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp.

Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'Holdout',0.1

Data Types: double | single

Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps:

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'KFold',5

Data Types: single | double

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

  1. Reserve the observation as validation data, and train the model using the other n – 1 observations.

  2. Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'Leaveout','on'

Output Arguments

expand all

Cross-validated naive Bayes classifier, returned as a ClassificationPartitionedModel model.

Examples

expand all

Load the ionosphere data set.

load ionosphere
X = X(:,3:end); % Remove first two predictors for stability
rng(1);         % For reproducibility

Train a naive Bayes classifier. It is good practice to define the class order. Assume that each predictor is conditionally, normally distributed given its label.

Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});

Mdl is a trained ClassificationNaiveBayes classifier. 'b' is the negative class and 'g' is the positive class.

Cross validate the classifier using 10-fold cross validation.

CVMdl = crossval(Mdl)
CVMdl = 
  classreg.learning.partition.ClassificationPartitionedModel
    CrossValidatedModel: 'NaiveBayes'
         PredictorNames: {1x32 cell}
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 10
              Partition: [1x1 cvpartition]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'none'


  Properties, Methods

FirstModel = CVMdl.Trained{1}
FirstModel = 
  classreg.learning.classif.CompactClassificationNaiveBayes
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: {'b'  'g'}
            ScoreTransform: 'none'
         DistributionNames: {1x32 cell}
    DistributionParameters: {2x32 cell}


  Properties, Methods

CVMdl is a ClassificationPartitionedModel cross-validated classifier. The software:

  1. Randomly partitions the data into 10, equally sized sets.

  2. Trains a naive Bayes classifier on nine of the sets.

  3. Repeats steps 1 and 2 k = 10 times. It excludes one partition each time, and trains on the other nine partitions.

  4. Combines generalization statistics for each fold.

FirstModel is the first of the 10 trained classifiers. It is a CompactClassificationNaiveBayes model.

You can estimate the generalization error by passing CVMdl to kfoldLoss.

By default, crossval uses 10-fold cross validation to cross validate a naive Bayes classifier. You have several other options, such as specifying a different number of folds or holdout-sample proportion. This example shows how to specify a holdout-sample proportion.

Load the ionosphere data set.

load ionosphere
X = X(:,3:end); % Remove first two predictors for stability
rng(1);         % For reproducibility

Train a naive Bayes classifier. Assume that each predictor is conditionally, normally distributed given its label. It is good practice to define the class order.

Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});

Mdl is a trained ClassificationNaiveBayes classifier. 'b' is the negative class and 'g' is the positive class.

Cross validate the classifier by specifying a 30% holdout sample.

CVMdl = crossval(Mdl,'Holdout',0.30)
CVMdl = 
  classreg.learning.partition.ClassificationPartitionedModel
    CrossValidatedModel: 'NaiveBayes'
         PredictorNames: {1x32 cell}
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 1
              Partition: [1x1 cvpartition]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'none'


  Properties, Methods

TrainedModel = CVMdl.Trained{1}
TrainedModel = 
  classreg.learning.classif.CompactClassificationNaiveBayes
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: {'b'  'g'}
            ScoreTransform: 'none'
         DistributionNames: {1x32 cell}
    DistributionParameters: {2x32 cell}


  Properties, Methods

CVMdl is a ClassificationPartitionedModel. TrainedModel is a CompactClassificationNaiveBayes classifier trained using 70% of the data.

Estimate the generalization error.

kfoldLoss(CVMdl)
ans = 0.2571

The out-of-sample misclassification error is approximately 2.6%.

Tips

Assess the predictive performance of Mdl on cross-validated data using the “kfold” function and properties of CVMdl, such as kfoldLoss.

Alternatives

Instead of creating a naive Bayes classifier followed by a cross-validation classifier, create a cross-validated classifier directly using fitcnb and by specifying any of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.