summary
Summarize cross-validation partition with stratification or grouping variable
Since R2025a
Syntax
Description
Examples
Create a cvpartition object using a grouping variable. Display a summary of the cross-validation.
Load data on tsunami occurrences, and create a table from the data. Display the first eight observations in the table.
Tbl = readtable("tsunamis.xlsx");
head(Tbl)    Latitude    Longitude    Year    Month    Day    Hour    Minute    Second    ValidityCode            Validity             CauseCode          Cause           EarthquakeMagnitude          Country                   Location             MaxHeight    IidaMagnitude    Intensity    NumDeaths    DescDeaths
    ________    _________    ____    _____    ___    ____    ______    ______    ____________    _________________________    _________    __________________    ___________________    ___________________    __________________________    _________    _____________    _________    _________    __________
      -3.8        128.3      1950     10       8       3       23       NaN           2          {'questionable tsunami' }        1        {'Earthquake'    }            7.6            {'INDONESIA'      }    {'JAVA TRENCH, INDONESIA'}       2.8            1.5            1.5          NaN          NaN    
      19.5         -156      1951      8      21      10       57       NaN           4          {'definite tsunami'     }        1        {'Earthquake'    }            6.9            {'USA'            }    {'HAWAII'                }       3.6            1.8            NaN          NaN          NaN    
     -9.02       157.95      1951     12      22     NaN      NaN       NaN           2          {'questionable tsunami' }        6        {'Volcano'       }            NaN            {'SOLOMON ISLANDS'}    {'KAVACHI'               }         6            2.6            NaN          NaN          NaN    
     42.15       143.85      1952      3       4       1       22        41           4          {'definite tsunami'     }        1        {'Earthquake'    }            8.1            {'JAPAN'          }    {'SE. HOKKAIDO ISLAND'   }       6.5            2.7              2           33            1    
      19.1         -155      1952      3      17       3       58       NaN           4          {'definite tsunami'     }        1        {'Earthquake'    }            4.5            {'USA'            }    {'HAWAII'                }         1            NaN            NaN          NaN          NaN    
      43.1        -82.4      1952      5       6     NaN      NaN       NaN           1          {'very doubtful tsunami'}        9        {'Meteorological'}            NaN            {'USA'            }    {'LAKE HURON, MI'        }      1.52            NaN            NaN          NaN          NaN    
     52.75        159.5      1952     11       4      16       58       NaN           4          {'definite tsunami'     }        1        {'Earthquake'    }              9            {'RUSSIA'         }    {'KAMCHATKA'             }        18            4.2              4         2236            3    
        50        156.5      1953      3      18     NaN      NaN       NaN           3          {'probable tsunami'     }        1        {'Earthquake'    }            5.8            {'RUSSIA'         }    {'N. KURIL ISLANDS'      }       1.5            0.6            NaN          NaN          NaN    
Create a random nonstratified partition for 5-fold cross-validation on the observations in Tbl. Ensure that observations with the same Country value are in the same fold by using the GroupingVariables name-value argument.
rng(0,"twister") % For reproducibility c = cvpartition(size(Tbl,1),KFold=5, ... GroupingVariables=Tbl.Country)
c = 
Group k-fold cross validation partition
    NumObservations: 162
        NumTestSets: 5
          TrainSize: [126 130 130 131 131]
           TestSize: [36 32 32 31 31]
           IsCustom: 0
          IsGrouped: 1
       IsStratified: 0
  Properties, Methods
c is a cvpartition object. The IsGrouped property value is 1 (true), indicating that at least one grouping variable was used to create the object.
Display a summary of the cvpartition object c.
summaryTbl = summary(c)
summaryTbl=150×5 table
      Set       SetSize        GroupLabel         GroupCount    PercentInSet
    ________    _______    ___________________    __________    ____________
    "train1"      126      {'INDONESIA'      }        25           19.841   
    "train1"      126      {'USA'            }        15           11.905   
    "train1"      126      {'SOLOMON ISLANDS'}        10           7.9365   
    "train1"      126      {'JAPAN'          }        19           15.079   
    "train1"      126      {'RUSSIA'         }        19           15.079   
    "train1"      126      {'FIJI'           }         1          0.79365   
    "train1"      126      {'GREENLAND'      }         1          0.79365   
    "train1"      126      {'CHILE'          }         6           4.7619   
    "train1"      126      {'GREECE'         }         5           3.9683   
    "train1"      126      {'ECUADOR'        }         1          0.79365   
    "train1"      126      {'VANUATU'        }         5           3.9683   
    "train1"      126      {'TONGA'          }         1          0.79365   
    "train1"      126      {'PHILIPPINES'    }         7           5.5556   
    "train1"      126      {'CANADA'         }         1          0.79365   
    "train1"      126      {'ATLANTIC OCEAN' }         1          0.79365   
    "train1"      126      {'FRANCE'         }         1          0.79365   
      ⋮
The first row in summaryTbl shows that 25 of the 126 observations in the first training set Tbl(training(c,1),:) (approximately 20%) have the Country value INDONESIA. The software ensures that the first test set Tbl(test(c,1),:) does not contain any observations with this value.
Check the Country values for the observations in the first test set.
summaryTest1 = summaryTbl(summaryTbl.Set=="test1",:)summaryTest1=6×5 table
      Set      SetSize         GroupLabel         GroupCount    PercentInSet
    _______    _______    ____________________    __________    ____________
    "test1"      36       {'PAPUA NEW GUINEA'}        13           36.111   
    "test1"      36       {'MEXICO'          }         8           22.222   
    "test1"      36       {'PERU'            }         9               25   
    "test1"      36       {'JAPAN SEA'       }         1           2.7778   
    "test1"      36       {'MONTSERRAT'      }         4           11.111   
    "test1"      36       {'TURKEY'          }         1           2.7778   
As expected, the first test set does not contain any observations with the Country value INDONESIA.
Create a cvpartition object using a stratification variable. Display a summary of the cross-validation, and then modify the summary display.
Load the fisheriris data set. The matrix meas contains flower measurements for 150 different flowers. The variable species lists the species for each flower.
load fisheririsCreate a random stratified partition for 3-fold cross-validation. Use the species variable as the stratification variable.
rng(0,"twister") % For reproducibility c = cvpartition(species,KFold=3)
c = 
K-fold cross validation partition
    NumObservations: 150
        NumTestSets: 3
          TrainSize: [100 100 100]
           TestSize: [50 50 50]
           IsCustom: 0
          IsGrouped: 0
       IsStratified: 1
  Properties, Methods
c is a cvpartition object. The IsStratified property value is 1 (true), indicating that a stratification variable was used to create the object.
Display a summary of the cvpartition object c.
summaryTbl = summary(c)
summaryTbl=21×5 table
      Set       SetSize    StratificationLabel    StratificationCount    PercentInSet
    ________    _______    ___________________    ___________________    ____________
    "all"         150        {'setosa'    }               50                33.333   
    "all"         150        {'versicolor'}               50                33.333   
    "all"         150        {'virginica' }               50                33.333   
    "train1"      100        {'setosa'    }               34                    34   
    "train1"      100        {'versicolor'}               33                    33   
    "train1"      100        {'virginica' }               33                    33   
    "test1"        50        {'setosa'    }               16                    32   
    "test1"        50        {'versicolor'}               17                    34   
    "test1"        50        {'virginica' }               17                    34   
    "train2"      100        {'setosa'    }               33                    33   
    "train2"      100        {'versicolor'}               33                    33   
    "train2"      100        {'virginica' }               34                    34   
    "test2"        50        {'setosa'    }               17                    34   
    "test2"        50        {'versicolor'}               17                    34   
    "test2"        50        {'virginica' }               16                    32   
    "train3"      100        {'setosa'    }               33                    33   
      ⋮
The first row in summaryTbl shows that 50 of the 150 flowers in the data set (approximately 33%) are setosa flowers.
Modify the summary display to include test set information only.
testSummaryTbl = summaryTbl(contains(summaryTbl.Set,"test"),:)testSummaryTbl=9×5 table
      Set      SetSize    StratificationLabel    StratificationCount    PercentInSet
    _______    _______    ___________________    ___________________    ____________
    "test1"      50         {'setosa'    }               16                  32     
    "test1"      50         {'versicolor'}               17                  34     
    "test1"      50         {'virginica' }               17                  34     
    "test2"      50         {'setosa'    }               17                  34     
    "test2"      50         {'versicolor'}               17                  34     
    "test2"      50         {'virginica' }               16                  32     
    "test3"      50         {'setosa'    }               17                  34     
    "test3"      50         {'versicolor'}               16                  32     
    "test3"      50         {'virginica' }               17                  34     
The first row in testSummaryTbl shows that 16 of the 50 flowers in the first test set (approximately 32%) are setosa flowers.
Modify summaryTbl to include setosa information only.
setosaSummaryTbl = summaryTbl(summaryTbl.StratificationLabel=="setosa",:)setosaSummaryTbl=7×5 table
      Set       SetSize    StratificationLabel    StratificationCount    PercentInSet
    ________    _______    ___________________    ___________________    ____________
    "all"         150          {'setosa'}                 50                33.333   
    "train1"      100          {'setosa'}                 34                    34   
    "test1"        50          {'setosa'}                 16                    32   
    "train2"      100          {'setosa'}                 33                    33   
    "test2"        50          {'setosa'}                 17                    34   
    "train3"      100          {'setosa'}                 33                    33   
    "test3"        50          {'setosa'}                 17                    34   
The second row in setosaSummaryTbl shows that 34 of the 100 flowers in the first training set are setosa flowers.
Display summary information with a separate column for each of the three flower species.
speciesSummaryTbl = unstack(summaryTbl(:,1:4), ... "StratificationCount","StratificationLabel")
speciesSummaryTbl=7×5 table
      Set       SetSize    setosa    versicolor    virginica
    ________    _______    ______    __________    _________
    "all"         150        50          50           50    
    "train1"      100        34          33           33    
    "test1"        50        16          17           17    
    "train2"      100        33          33           34    
    "test2"        50        17          17           16    
    "train3"      100        33          34           33    
    "test3"        50        17          16           17    
The second row in speciesSummaryTbl shows that of the 100 flowers in the first training set, 34 are setosa flowers, 33 are versicolor flowers, and 33 are virginica flowers.
Input Arguments
Validation partition, specified as a cvpartition object. The validation partition type of c,
                c., must be Type'kfold' or
              'holdout'. The IsGrouped or
              IsStratified property of c must be
              1 (true).
summary does not support validation partitions created using
            tall arrays.
Output Arguments
Summary table describing the validation partition c, returned
            as a table.
- The first column - Setdescribes the specific data set for which information is displayed. Possible values include- "all"(the full data set),- "train1"(the first training set),- "test1"(the first test set), and so on.
- The second column - SetSizedescribes the size of each data set listed in- Set.
- The remaining columns depend on the properties of - c.- If - c.IsStratifiedis- 1(- true), then the remaining columns are- StratificationLabel,- StratificationCount, and- PercentInSet.- StratificationLabeldescribes the label of interest in the stratification variable.- StratificationCountdescribes the number of observations in the data set- Setwith the label- StratificationLabel.- PercentInSetdescribes the percentage of observations in the data set- Setwith the label- StratificationLabel.
- If - c.IsGroupedis- 1(- true), then the number of remaining columns varies based on the number of grouping variables.- For two or more grouping variables, - GroupLabel1describes the label in the first grouping variable,- GroupLabel2describes the label in the second grouping variable, and so on.- GroupCountdescribes the number of observations in the data set- Setwith the combination of labels in- GroupLabel1,- GroupLabel2, and so on.- PercentInSetis the percentage of observations in the data set- Setwith the combination of labels in- GroupLabel1,- GroupLabel2, and so on.- For one grouping variable, the columns are similar, with only one - GroupLabelcolumn.
 
Version History
Introduced in R2025a
See Also
cvpartition | test | training
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)