Documentation

# CompactRegressionSVM

Package: classreg.learning.regr

Compact support vector machine regression model

## Description

`CompactRegressionSVM` is a compact support vector machine (SVM) regression model. It consumes less memory than a full, trained support vector machine model (`RegressionSVM` model) because it does not store the data used to train the model.

Because the compact model does not store the training data, you cannot use it to perform certain tasks, such as cross validation. However, you can use a compact SVM regression model to predict responses using new input data.

## Construction

`compactMdl = compact(mdl)` returns a compact SVM regression model `compactMdl` from a full, trained SVM regression model, `mdl`. For more information, see `compact`.

### Input Arguments

expand all

Full, trained SVM regression model, specified as a `RegressionSVM` model returned by `fitrsvm`.

## Properties

expand all

Dual problem coefficients, specified as a vector of numeric values. `Alpha` contains m elements, where m is the number of support vectors in the trained SVM regression model. The dual problem introduces two Lagrange multipliers for each support vector. The values of `Alpha` are the differences between the two estimated Lagrange multipliers for the support vectors. For more details, see Understanding Support Vector Machine Regression.

If you specified to remove duplicates using `RemoveDuplicates`, then, for a particular set of duplicate observations that are support vectors, `Alpha` contains one coefficient corresponding to the entire set. That is, MATLAB® attributes a nonzero coefficient to one observation from the set of duplicates and a coefficient of `0` to all other duplicate observations in the set.

Data Types: `single` | `double`

Primal linear problem coefficients, stored as a numeric vector of length p, where p is the number of predictors in the SVM regression model.

The values in `Beta` are the linear coefficients for the primal optimization problem.

If the model is obtained using a kernel function other than `'linear'`, this property is empty (`'[]'`).

The `predict` method computes predicted response values for the model as `YFIT = (X/S)×Beta + Bias`, where`S` is the value of the kernel scale stored in the `KernelParameters.Scale` property.

Data Types: `double`

Bias term in the SVM regression model, stored as a scalar value.

Data Types: `double`

Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty (`[]`).

Data Types: `single` | `double`

Expanded predictor names, stored as a cell array of character vectors.

If the model uses encoding for categorical variables, then `ExpandedPredictorNames` includes the names that describe the expanded variables. Otherwise, `ExpandedPredictorNames` is the same as `PredictorNames`.

Data Types: `cell`

Kernel function parameters, stored as a structure with the following fields.

FieldDescription
`Function` Kernel function name (a character vector).
`Scale`Numeric scale factor used to divide predictor values.

You can specify values for `KernelParameters.Function` and `KernelParameters.Scale` by using the `KernelFunction` and `KernelScale` name-value pair arguments in `fitrsvm`, respectively.

Data Types: `struct`

Predictor means, stored as a vector of numeric values.

If the training data is standardized, then `Mu` is a numeric vector of length p, where p is the number of predictors used to train the model. In this case, the `predict` method centers predictor matrix `X` by subtracting the corresponding element of `Mu` from each column.

If the training data is not standardized, then `Mu` is empty (`'[]'`).

Data Types: `single` | `double`

Predictor names, stored as a cell array of character vectors containing the name of each predictor in the order in which they appear in `X`. `PredictorNames` has a length equal to the number of columns in `X`.

Data Types: `cell`

Response variable name, stored as a character vector.

Data Types: `char`

Response transformation function, specified as `'none'` or a function handle. `ResponseTransform` describes how the software transforms raw response values.

For a MATLAB function, or a function that you define, enter its function handle. For example, you can enter ```Mdl.ResponseTransform = @function```, where `function` accepts a numeric vector of the original responses and returns a numeric vector of the same size containing the transformed responses.

Data Types: `char` | `function_handle`

Predictor standard deviations, stored as a vector of numeric values.

If the training data is standardized, then `Sigma` is a numeric vector of length p, where p is the number of predictors used to train the model. In this case, the `predict` method scales the predictor matrix `X` by dividing each column by the corresponding element of `Sigma`, after centering each element using `Mu`.

If the training data is not standardized, then `Sigma` is empty (`'[]'`).

Data Types: `single` | `double`

Support vectors, stored as an m-by-p matrix of numeric values. m is the number of support vectors (`sum(Mdl.IsSupportVector)`), and p is the number of predictors in `X`.

If you specified to remove duplicates using `RemoveDuplicates`, then for a given set of duplicate observations that are support vectors, `SupportVectors` contains one unique support vector.

Data Types: `single` | `double`

## Methods

 discardSupportVectors Discard support vectors loss Regression error for support vector machine regression model predict Predict responses using support vector machine regression model

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

## Examples

collapse all

This example shows how to reduce the size of a full, trained SVM regression model by discarding the training data and some information related to the training process.

This example uses the abalone data from the UCI Machine Learning Repository. Download the data and save it in your current directory with the name `'abalone.data'`. Read the data into a `table`.

```tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); rng default % for reproducibility```

The sample data contains 4177 observations. All of the predictor variables are continuous except for `sex`, which is a categorical variable with possible values `'M'` (for males), `'F'` (for females), and `'I'` (for infants). The goal is to predict the number of rings on the abalone, and thereby determine its age, using physical measurements.

Train an SVM regression model using a Gaussian kernel function and an automatic kernel scale. Standardize the data.

`mdl = fitrsvm(tbl,'Var9','KernelFunction','gaussian','KernelScale','auto','Standardize',true)`
```mdl = RegressionSVM PredictorNames: {1x8 cell} ResponseName: 'Var9' CategoricalPredictors: 1 ResponseTransform: 'none' Alpha: [3635x1 double] Bias: 10.8144 KernelParameters: [1x1 struct] Mu: [1x10 double] Sigma: [1x10 double] NumObservations: 4177 BoxConstraints: [4177x1 double] ConvergenceInfo: [1x1 struct] IsSupportVector: [4177x1 logical] Solver: 'SMO' Properties, Methods```

Compact the model.

`compactMdl = compact(mdl)`
```compactMdl = classreg.learning.regr.CompactRegressionSVM PredictorNames: {1x8 cell} ResponseName: 'Var9' CategoricalPredictors: 1 ResponseTransform: 'none' Alpha: [3635x1 double] Bias: 10.8144 KernelParameters: [1x1 struct] Mu: [1x10 double] Sigma: [1x10 double] SupportVectors: [3635x10 double] Properties, Methods```

The compacted model discards the training data and some information related to the training process.

Compare the size of the full model `mdl` and the compact model `compactMdl`.

```vars = whos('compactMdl','mdl'); [vars(1).bytes,vars(2).bytes]```
```ans = 323793 775968```

The compacted model consumes about half the memory of the full model.

## References

 Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report No. 48, 1994.

 Waugh, S. Extending and benchmarking Cascade-Correlation, Ph.D. thesis, Computer Science Department, University of Tasmania, 1995.

 Clark, D., Z. Schreter, A. Adams. A Quantitative Comparison of Dystal and Backpropagation, submitted to the Australian Conference on Neural Networks, 1996.

 Lichman, M. UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.