Fit a support vector machine regression model
fitrsvm
trains or cross-validates
a support vector machine (SVM) regression model on a low- through
moderate-dimensional predictor data set. fitrsvm
supports
mapping the predictor data using kernel functions, and supports SMO,
ISDA, or L1 soft-margin minimization via quadratic
programming for objective-function minimization.
To train a linear SVM regression model on a high-dimensional
data set, that is, data sets that include many predictor variables,
use fitrlinear
instead.
To train an SVM model for binary classification, see fitcsvm
for low- through moderate-dimensional
predictor data sets, or fitclinear
for
high-dimensional data sets.
returns
a full, trained support vector machine (SVM) regression model Mdl
= fitrsvm(Tbl
,ResponseVarName
)Mdl
trained
using the predictors values in the table Tbl
and
the response values in Tbl.ResponseVarName
.
returns
an SVM regression model with additional options specified by one or
more name-value pair arguments, using any of the previous syntaxes.
For example, you can specify the kernel function or train a cross-validated
model.Mdl
= fitrsvm(___,Name,Value
)
fitrsvm
supports low- through moderate-dimensional
data sets. For high-dimensional data set, use fitrlinear
instead.
Unless your data set is large, always try to standardize
the predictors (see Standardize
). Standardization
makes predictors insensitive to the scales on which they are measured.
It is good practice to cross-validate using the KFold
name-value
pair argument. The cross-validation results determine how well the
SVM model generalizes.
Sparsity in support vectors is a desirable property
of an SVM model. To decrease the number of support vectors, set the BoxConstraint
name-value
pair argument to a large value. This action also increases the training
time.
For optimal training time, set CacheSize
as
high as the memory limit on your computer allows.
If you expect many fewer support vectors than observations
in the training set, then you can significantly speed up convergence
by shrinking the active-set using the name-value pair argument 'ShrinkagePeriod'
.
It is good practice to use 'ShrinkagePeriod',1000
.
Duplicate observations that are far from the regression
line do not affect convergence. However, just a few duplicate observations
that occur near the regression line can slow down convergence considerably.
To speed up convergence, specify 'RemoveDuplicates',true
if:
Your data set contains many duplicate observations.
You suspect that a few duplicate observations can fall near the regression line.
However, to maintain the original data set during training, fitrsvm
must
temporarily store separate data sets: the original and one without
the duplicate observations. Therefore, if you specify true
for
data sets containing few duplicates, then fitrsvm
consumes
close to double the memory of the original data.
After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.
For the mathematical formulation of linear and nonlinear SVM regression problems and the solver algorithms, see Understanding Support Vector Machine Regression.
NaN
, <undefined>
, empty character
vector (''
), empty string (""
), and
<missing>
values indicate missing data values.
fitrsvm
removes entire rows of data corresponding
to a missing response. When normalizing weights,
fitrsvm
ignores any weight corresponding to an
observation with at least one missing predictor. Consequently, observation box
constraints might not equal BoxConstraint
.
fitrsvm
removes observations
that have zero weight.
If you set 'Standardize',true
and 'Weights'
,
then fitrsvm
standardizes the predictors
using their corresponding weighted means and weighted standard deviations.
That is, fitrsvm
standardizes predictor j (xj)
using
xjk is observation k (row) of predictor j (column).
If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.
The PredictorNames
property stores
one element for each of the original predictor variable names. For
example, assume that there are three predictors, one of which is a
categorical variable with three levels. Then PredictorNames
is
a 1-by-3 cell array of character vectors containing the original names
of the predictor variables.
The ExpandedPredictorNames
property
stores one element for each of the predictor variables, including
the dummy variables. For example, assume that there are three predictors,
one of which is a categorical variable with three levels. Then ExpandedPredictorNames
is
a 1-by-5 cell array of character vectors containing the names of the
predictor variables and the new dummy variables.
Similarly, the Beta
property stores
one beta coefficient for each predictor, including the dummy variables.
The SupportVectors
property stores
the predictor values for the support vectors, including the dummy
variables. For example, assume that there are m support
vectors and three predictors, one of which is a categorical variable
with three levels. Then SupportVectors
is an m-by-5
matrix.
The X
property stores the training
data as originally input. It does not include the dummy variables.
When the input is a table, X
contains only the
columns used as predictors.
For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.
For a variable having k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is -1 for levels up to j, and +1 for levels j + 1 through k.
The names of the dummy variables stored in the ExpandedPredictorNames
property
indicate the first level with the value +1.
The software stores k –
1 additional predictor names for the dummy variables,
including the names of levels 2, 3, ..., k.
All solvers implement L1 soft-margin minimization.
Let p
be the proportion of outliers
that you expect in the training data. If you set 'OutlierFraction',p
,
then the software implements robust learning.
In other words, the software attempts to remove 100p
%
of the observations when the optimization algorithm converges. The
removed observations correspond to gradients that are large in magnitude.
[1] Clark, D., Z. Schreter, A. Adams. A Quantitative Comparison of Dystal and Backpropagation, submitted to the Australian Conference on Neural Networks, 1996.
[2] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working set selection using second order information for training support vector machines.” Journal of Machine Learning Research, Vol 6, 2005, pp. 1889–1918.
[3] Kecman V., T. -M. Huang, and M. Vogt. “Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance.” In Support Vector Machines: Theory and Applications. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.
[4] Lichman, M. UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
[5] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report No. 48, 1994.
[6] Waugh, S. Extending and benchmarking Cascade-Correlation, Ph.D. thesis, Computer Science Department, University of Tasmania, 1995.
CompactRegressionSVM
| predict
| RegressionPartitionedSVM
| RegressionSVM