This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

crossvalind

Generate cross-validation indices

Syntax

cvIndices = crossvalind(cvMethod,N,M)
[train,test] = crossvalind(cvMethod,N,M)
___ = crossvalind(___,Name,Value)

Description

example

cvIndices = crossvalind(cvMethod,N,M) returns the indices cvIndices after applying cvMethod on N observations using M as the selection parameter.

example

[train,test] = crossvalind(cvMethod,N,M) returns the logical vectors train and test, representing observations that belong to the training set and the test (evaluation) set, respectively. You can specify any supported cross-validation method except 'Kfold', which accepts a scalar output only.

___ = crossvalind(___,Name,Value) specifies additional options using one or more name-value pair arguments in addition to the arguments in previous syntaxes. For example, cvIndices = crossvalind('Kfold',Groups,10,'Class',{'Cancer','Control'}) specifies to use observations from the Cancer and Control groups when generating indices using the 10-fold cross-validation.

Examples

collapse all

Create indices for the 10-fold cross-validation and classify measurement data for the Fisher iris data set. The Fisher iris data set contains width and length measurements of petals and sepals from three species of irises.

Load the data set.

load fisheriris

Create indices for the 10-fold cross-validation.

indices = crossvalind('Kfold',species,10);

Initialize an object to measure the performance of the classifier.

cp = classperf(species);

Perform the classification using the measurement data and report the error rate, which is the ratio of the number of incorrectly classified samples divided by the total number of classified samples.

for i = 1:10
    test = (indices == i); 
    train = ~test;
    class = classify(meas(test,:),meas(train,:),species(train,:));
    classperf(cp,class,test);
end
cp.ErrorRate
ans = 0.0200

Suppose you want to use the observation data from the setosa and virginica species only and exclude the versicolor species from cross-validation.

labels = {'setosa','virginica'};
indices = crossvalind('Kfold',species,10,'Classes',labels);

indices now contains zeros for the rows that belong to the versicolor species.

Perform the classification again.

for i = 1:10
    test = (indices == i); 
    train = ~test;
    class = classify(meas(test,:),meas(train,:),species(train,:));
    classperf(cp,class,test);
end
cp.ErrorRate
ans = 0.0160

Load the carbig data set.

load carbig;
x = Displacement; 
y = Acceleration;
N = length(x);

Train a second degree polynomial model with the leave-one-out cross-validation, and evaluate the averaged cross-validation error.

sse = 0; % Initialize the sum of squared error.
for i = 1:100
    [train,test] = crossvalind('LeaveMOut',N,1);
    yhat = polyval(polyfit(x(train),y(train),2),x(test));
    sse = sse + sum((yhat - y(test)).^2);
end
CVerr = sse / 100
CVerr = 3.5310

Input Arguments

collapse all

Cross-validation method, specified as a character vector or string.

This table describes the valid cross-validation methods. Depending on the method, the third input argument (M) has different meanings and requirements.

cvMethodMDescription

'Kfold'

M is the fold parameter, most commonly known as K in the K-fold cross-validation. M must be a positive integer. The default value is 5.

The method uses K-fold cross-validation to generate indices. This method uses M-1 folds for training and the last fold for evaluation. The method repeats this process M times, leaving one different fold for evaluation each time.

'HoldOut'

M is the proportion of observations to hold out for the test set. M must be a scalar between 0 and 1. The default value is 0.5, corresponding to a 50% holdout.

The method randomly selects approximately N*M observations to hold out for the test (evaluation) set. Using this method within a loop is similar to using K-fold cross-validation one time outside the loop, except that nondisjointed subsets are assigned to each evaluation.

'LeaveMOut

M is the number of observations to leave out for the test set. M must be a positive integer. The default value is 1, corresponding to the leave-one-out cross-validation (LOOCV).

The method randomly selects M observations to hold out for the evaluation set. Using this cross-validation method within a loop does not guarantee disjointed evaluation sets. To guarantee disjointed evaluation sets, use 'Kfold' instead.

'Resubstitution'

M must be specified as a two-element vector [P,Q]. Each element must be a scalar between 0 and 1. The default value is [1,1], corresponding to the full resubstitution.

The method randomly selects N*P observations for the evaluation set and N*Q observations for the training set. The method selects the sets while minimizing the number of observations used in both sets.

Q = 1-P corresponds to the holdout (100*P)%.

Example: 'Kfold'

Data Types: char | string

Total number of observations or grouping information, specified as a positive integer, vector of positive integers, logical vector, or cell array of character vectors.

N can be a positive integer specifying the total number of samples in your data set, for instance.

N can also be a vector of positive integers or logical values, or a cell array of character vectors, containing grouping information or labels for your samples. The partition of the groups depends on the type of cross-validation. For 'Kfold', each group is divided into M subsets, approximately equal in size. For all other methods, approximately equal numbers of observations from each group are selected for the evaluation (test) set. The training set contains at least one observation from each group regardless of the cross-validation method you use.

Example: 100

Data Types: double | cell

Cross-validation parameter, specified as a positive scalar between 0 and 1, positive integer, or two-element vector. Depending on the cross-validation method, the requirements for M differ. For details, see cvMethod.

Example: 5

Data Types: double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: [train,test] = crossvalind('LeaveMOut',groups,1,'Min',3) specifies to have at least three observations in each group in the training set when performing the leave-one-out cross-validation.

Class or group information, specified as the comma-separated pair consisting of 'Classes' and a vector of positive integers, character vector, string, string vector, or cell array of character vectors. This option lets you restrict the observations to only the specified groups.

This name-value pair argument is applicable only when you specify N as a grouping variable. The data type of 'Classes' must match that of N. For example, if you specify N as a cell array of character vectors containing class labels, you must use a cell array of character vectors to specify 'Classes'. The output arguments you specify contain the value 0 for observations belonging to excluded classes.

Example: 'Classes',{'Cancer','Control'}

Data Types: double | cell

Minimum number of observations for each group in the training set, specified as the comma-separated pair consisting of 'Min' and a positive integer. Setting a large value can help to balance the training groups, but causes partial resubstitution when there are not enough observations.

This name-value pair argument is not applicable for the 'Kfold' method.

Example: 'Min',3

Data Types: double

Output Arguments

collapse all

Cross-validation indices, returned as a vector.

If you are using 'Kfold' as the cross-validation method, cvIndices contains equal (or approximately equal) proportions of the integers 1 through M, which define a partition of the N observations into M disjointed subsets.

For other cross-validation methods, cvIndices is a logical vector containing 1s for observations that belong to the training set and 0s for observations that belong to the test (evaluation) set.

Training set, returned as a logical vector. This argument specifies which observations belong to the training set.

Test set, returned as a logical vector. This argument specifies which observations belong to the test set.

Introduced before R2006a