classperf

Evaluate classifier performance

Description

classperf without input arguments displays the properties of a classperformance object. For more information, see classperformance Properties.

example

cp = classperf(groundTruth) creates an empty classperformance object cp using the true labels groundTruth for every observation in your data set.

cp = classperf(groundTruth,classifierOutput) creates a classperformance object cp using the true labels groundTruth, and then updates the object properties based on the results of the classifier classifierOutput. Use this syntax when you want to know the classifier performance on a single validation run.

example

classperf(cp,classifierOutput) updates the classperformance object cp with the results of a classifier classifierOutput. Use this syntax to update the performance of the classifier iteratively, such as inside a for loop for multiple cross-validation runs.

example

classperf(cp,classifierOutput,testIdx) uses testIdx to compare the results of the classifier to the true labels and update the object cp. testIdx represents a subset of the true labels (ground truth) in the current validation.

classperf(___,Name,Value) specifies additional options with one or more Name,Value pair arguments. Specify these options after all other input arguments.

Examples

collapse all

Create indices for the 10-fold cross-validation and classify measurement data for the Fisher iris data set. The Fisher iris data set contains width and length measurements of petals and sepals from three species of irises.

Load the data set.

load fisheriris

Create indices for the 10-fold cross-validation.

indices = crossvalind('Kfold',species,10);

Initialize an object to measure the performance of the classifier.

cp = classperf(species);

Perform the classification using the measurement data and report the error rate, which is the ratio of the number of incorrectly classified samples divided by the total number of classified samples.

for i = 1:10
    test = (indices == i); 
    train = ~test;
    class = classify(meas(test,:),meas(train,:),species(train,:));
    classperf(cp,class,test);
end
cp.ErrorRate
ans = 0.0200

Suppose you want to use the observation data from the setosa and virginica species only and exclude the versicolor species from cross-validation.

labels = {'setosa','virginica'};
indices = crossvalind('Kfold',species,10,'Classes',labels);

indices now contains zeros for the rows that belong to the versicolor species.

Perform the classification again.

for i = 1:10
    test = (indices == i); 
    train = ~test;
    class = classify(meas(test,:),meas(train,:),species(train,:));
    classperf(cp,class,test);
end
cp.ErrorRate
ans = 0.0160

Load the data set.

load fisheriris
X = meas;
Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y contains the true class names (labels) of the corresponding iris species.

Initialize the classperformance object using the true labels.

cp = classperf(Y)
                        Label: ''
                  Description: ''
                  ClassLabels: {3x1 cell}
                  GroundTruth: [150x1 double]
         NumberOfObservations: 150
               ControlClasses: [2x1 double]
                TargetClasses: 1
            ValidationCounter: 0
           SampleDistribution: [150x1 double]
            ErrorDistribution: [150x1 double]
    SampleDistributionByClass: [3x1 double]
     ErrorDistributionByClass: [3x1 double]
               CountingMatrix: [4x3 double]
                  CorrectRate: NaN
                    ErrorRate: NaN
              LastCorrectRate: 0
                LastErrorRate: 0
             InconclusiveRate: NaN
               ClassifiedRate: NaN
                  Sensitivity: NaN
                  Specificity: NaN
      PositivePredictiveValue: NaN
      NegativePredictiveValue: NaN
           PositiveLikelihood: NaN
           NegativeLikelihood: NaN
                   Prevalence: NaN
              DiagnosticTable: [2x2 double]

Perform the classification using the k-nearest neighbor classifier. Cross-validate the model 10 times by using 145 samples as the training set and 5 samples as the test set. After each cross-validation run, update the classifier performance object with the results.

for i = 1:10
    [train,test] = crossvalind('LeaveMOut',Y,5);
    mdl = fitcknn(X(train,:),Y(train),'NumNeighbors',3);
    predictions = predict(mdl,X(test,:));
    classperf(cp,predictions,test);
end

Report the classification error rate, which is a ratio of the number of incorrectly classified samples divided by the total number of classified samples.

cp.ErrorRate
ans = 0.0467

Input Arguments

collapse all

True labels for all observations in your data set, specified as a vector of integers, logical vector, string vector, or cell array of character vectors.

Classification results from a classifier, specified as a vector of integers, logical vector, string vector, or cell array of character vectors. When classifierOutput is a cell arrray of character vectors or string vector, an empty character vector or string represents an inconclusive result. For a vector of integers, NaN represents an inconclusive result.

  • If you do not specify testIdx, classifierOutput must be the same size and data type as groundTruth.

  • If you specify testIdx as a vector of integers, classifierOutput must have the same number of elements as testIdx. If testIdx is a logical vector, the number of elements in classifierOutput must equal sum(testIdx).

Classifier performance information, specified as a classperformance object. For details, see classperformance Properties.

Subset of true labels (groundTruth), specified as a vector of integers or logical vector. The testIdx argument indicates a subset of true labels (from a test set). The function uses testIdx as an index vector to get a subset of labels from groundTruth, such as groundTruth(testIdx).

  • If testIdx is a logical vector, its length must equal the total number of observations (cp.NumberOfObservations).

  • If testIdx is a vector of integers, it cannot contain duplicate integers, and each integer must be greater than 0 but less than or equal to the total number of observations.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: cp = classperf(groundTruth,classifierOutput,'Positive',[1 2 3]) specifies the labels for the target (diseased) classes.

Labels for the target classes, specified as the comma-separated pair consisting of 'Positive' and a vector of integers, logical vector, string vector, or cell array of character vectors.

  • If groundTruth is a vector of integers, the positive label and negative label (specified by the 'Negative' name-value pair argument) must be vectors of integers.

  • If groundTruth is a string vector or cell array of character vectors, the positive label and negative label can be string vectors, cell arrays of character vectors, or vectors of positive integers. The entries must be a subset of grp2idx(groundTruth).

By default, the positive label corresponds to the first class returned by grp2idx(groundTruth) and the negative label corresponds to all other classes.

The function uses the positive label to set the TargetClasses property of the cp object.

The positive and negative labels are disjoint subsets of unique(groundTruth). For example, suppose you have a data set that contains data from six patients. Five patients have ovarian, lung, prostate, skin, or brain cancer, and one patient does not have cancer. Then ClassLabels = {'Ovarian', 'Lung', 'Prostate', 'Skin', 'Brain', 'Healthy'}. You can test a classifier for lung cancer only by setting the positive label to [2] and the negative label to [1 3 4 5 6]. Alternatively, you can test for any type of cancer by setting the positive label to [1 2 3 4 5] and the negative label to [6].

In clinical tests, the function counts inconclusive values (empty character vector '' or NaN) as false negatives to calculate the specificity and as false positives to calculate the sensitivity. The function dose not count any tested observation with its true class not within the union of positive label and negative label. However, if the true class of a tested observation is within the union but its predicted class is not covered by groundTruth, the function counts that observation as inconclusive.

Example: 'Positive',[1 2]

Labels for the control classes, specified as the comma-separated pair consisting of 'Negative' and a vector of integers, logical vector, string vector, or cell array of character vectors.

  • If groundTruth is a vector of integers, the positive label and negative label (specified by the 'Negative' name-value pair argument) must be vectors of integers.

  • If groundTruth is a string vector or cell array of character vectors, the positive label and negative label can be string vectors, cell arrays of character vectors, or vectors of positive integers. The entries must be a subset of grp2idx(groundTruth).

By default, the positive label corresponds to the first class returned by grp2idx(groundTruth) and the negative label corresponds to all other classes.

The function uses the negative label to set the ControlClasses property of the cp object. For details on how the function uses the positive and negative labels, see 'Positive'.

Example: 'Negative',[3]

Introduced before R2006a