Classification Learner

The Classification Learner app lets you train models to classify data using supervised machine learning.

Using Classification Learner, you can perform common machine learning tasks such as interactively exploring your data, selecting features, specifying validation schemes, training models, and assessing results. Choose from several classification types including decision trees, support vector machines (SVM), and k-nearest neighbors, and select from ensemble methods such as bagging, boosting, and random subspace.

Classification Learner helps you choose the best model for your data by letting you perform model assessment and model comparisons using confusion matrices and ROC curves. Export classification models to the MATLAB workspace to generate predictions on new data, or generate MATLAB code to integrate models into applications such as computer vision, signal processing, and data analytics.


Classification Learner Features

  • Import a table or matrix directly from the MATLAB workspace.
  • Choose from k-fold, holdout, or resubstitution cross-validation schemes.
  • View data distribution in response classes using pair-wise scatter plots.
  • Selectively include features in each model.
  • Decision trees: Deep tree, medium tree, and shallow tree
  • Support vector machines: Linear SVM, fine Gaussian SVM, medium Gaussian SVM, coarse Gaussian SVM, quadratic SVM, and cubic SVM
  • Nearest neighbor classifiers: Fine KNN, medium KNN, coarse KNN, cosine KNN, cubic KNN, and weighted KNN
  • Ensemble classifiers: Boosted trees (AdaBoost, RUSBoost), bagged trees, subspace KNN, and subspace discriminant
  • Assess classifier performance using confusion matrices, ROC curves, or scatter plots.
  • Compare model accuracy using the misclassification rate on the validation set.
  • Improve model accuracy with advanced options and feature selection.
  • Export the best model to the workspace to make predictions on new data.
  • Generate MATLAB code to train classifiers on new data.
  • Use MATLAB code in machine learning applications such as computer vision and signal processing.

Classification Learner Example Datasets

After you've installed Classification Learner, use the following example datasets to get started.

Download the MAT-file and double-click ClassificationLearner_Example_Datasets.mat from either inside MATLAB or from your OS.

Fisher Iris

The Fisher Iris dataset consists of samples from three species of iris (Iris setosa, Iris virginica, and Iris versicolor). Features such as the length and the width of the sepals and petals were measured in centimeters.

Number of predictors: 4
Number of observations: 150
Number of classes: 3

Credit Rating

The credit rating dataset contains financial ratios and industry sectors information for a list of corporate customers. The response variable consists of credit ratings (AAA, AA, A, BBB, BB, B, CCC) assigned by a rating agency.

Number of predictors: 6
Number of observations: 3932
Number of classes: 7

Ovarian

The ovarian cancer dataset consists of high-resolution ovarian cancer data generated using the WCX2 protein array. The sample set includes 95 controls and 121 ovarian cancers.

Number of predictors: 100
Number of observations: 216
Number of classes: 2

Don’s Easter Egg

Don’s Easter Egg dataset is a simulated dataset that consists of only 2 variables and 2 classes. Since the separation boundary is highly nonlinear between the two classes, this makes for a challenging exploratory problem.

Number of predictors: 2
Number of observations: 1873
Number of classes: 2

Arrhythmia

The heart arrhythmia dataset consists of patient information and response variables that indicate the presence and absence of cardiac arrhythmia in the patient. In this medical diagnostic application, misclassifying patient as "normal" may have a more severe consequence than misclassifying a patient as "has arrhythmia."

Number of predictors: 279
Number of observations: 150
Number of classes: 16

Ionosphere

The ionosphere dataset comprises preprocessed signals obtained from a phased array of 16 high-frequency antennas. Good returned radar signals are those showing evidence of some type of structure in the ionosphere. Bad signals are those that pass through the ionosphere.

Number of predictors: 34
Number of observations: 351
Number of classes: 2