kfoldPredict
Predict responses for observations in cross-validated linear regression model
Description
returns cross-validated predicted responses by the cross-validated linear regression
model YHat
= kfoldPredict(CVMdl
)CVMdl
. For every fold, kfoldPredict
predicts the responses for validation-fold observations using a model trained on
training-fold observations.
YHat
contains predicted responses for each
regularization strength in the linear regression models that compose CVMdl
.
uses the YHat
= kfoldPredict(CVMdl
,PredictionForMissingValue=prediction
)prediction
value as the predicted response for
observations with missing values in the predictor data. By default,
kfoldPredict
uses the median of the observed response
values in the training-fold data. (since R2023b)
Examples
Simulate 10000 observations from this model
is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
e is random normal error with mean 0 and standard deviation 0.3.
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
Cross-validate a linear regression model.
CVMdl = fitrlinear(X,Y,'CrossVal','on')
CVMdl = RegressionPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 10000 KFold: 10 Partition: [1×1 cvpartition] ResponseTransform: 'none' Properties, Methods
Mdl1 = CVMdl.Trained{1}
Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000×1 double] Bias: 0.0107 Lambda: 1.1111e-04 Learner: 'svm' Properties, Methods
By default, fitrlinear
implements 10-fold cross-validation. CVMdl
is a RegressionPartitionedLinear
model. It contains the property Trained
, which is a 10-by-1 cell array holding 10 RegressionLinear
models that the software trained using the training set.
Predict responses for observations that fitrlinear
did not use in training the folds.
yHat = kfoldPredict(CVMdl);
Because there is one regularization strength in Mdl
, yHat
is a numeric vector.
Simulate 10000 observations as in Predict Cross-Validated Responses.
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
Create a set of 15 logarithmically-spaced regularization strengths from through .
Lambda = logspace(-5,-1,15);
Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA.
X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso');
CVMdl
is a RegressionPartitionedLinear
model. Its Trained
property contains a 5-by-1 cell array of trained RegressionLinear
models, each one holds out a different fold during training. Because fitrlinear
trained using 15 regularization strengths, you can think of each RegressionLinear
model as 15 models.
Predict cross-validated responses.
YHat = kfoldPredict(CVMdl); size(YHat)
ans = 1×2
10000 15
YHat(2,:)
ans = 1×15
-1.7338 -1.7332 -1.7319 -1.7299 -1.7266 -1.7239 -1.7135 -1.7210 -1.7324 -1.7063 -1.6397 -1.5112 -1.2631 -0.7841 -0.0096
YHat
is a 10000-by-15 matrix. YHat(2,:)
is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.
Input Arguments
Cross-validated, linear regression model, specified as a RegressionPartitionedLinear
model object. You can create a
RegressionPartitionedLinear
model using fitrlinear
and specifying any of the one of the cross-validation,
name-value pair arguments, for example, CrossVal
.
To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear
regression model (X
and Y
).
Since R2023b
Predicted response value to use for observations with missing predictor
values, specified as "median"
, "mean"
,
or a numeric scalar.
Value | Description |
---|---|
"median" | kfoldPredict uses the median of the
observed response values in the training-fold data as the
predicted response value for observations with missing
predictor values. |
"mean" | kfoldPredict uses the mean of the
observed response values in the training-fold data as the
predicted response value for observations with missing
predictor values. |
Numeric scalar | kfoldPredict uses this value as the
predicted response value for observations with missing
predictor values. |
Example: "mean"
Example: NaN
Data Types: single
| double
| char
| string
Output Arguments
Cross-validated predicted responses, returned as an
n-by-L numeric array.
n is the number of observations in the predictor data
that created CVMdl
(see X
) and
L is the number of regularization strengths in
CVMdl.Trained{1}.Lambda
.
YHat(
is the predicted response for observation i
,j
)i
using
the linear regression model that has regularization strength
CVMdl.Trained{1}.Lambda(
.j
)
The predicted response using the model with regularization strength j is
x is an observation from the predictor data matrix
X
, and is row vector.is the estimated column vector of coefficients. The software stores this vector in
Mdl.Beta(:,
.j
)is the estimated, scalar bias, which the software stores in
Mdl.Bias(
.j
)
Extended Capabilities
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2016akfoldPredict
fully supports GPU arrays.
Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue
name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.
This table lists the object functions that support the
PredictionForMissingValue
name-value argument. By default, the
functions use the training set median as the predicted response value for observations with
missing predictor values.
Model Type | Model Objects | Object Functions |
---|---|---|
Gaussian process regression (GPR) model | RegressionGP , CompactRegressionGP | loss , predict , resubLoss , resubPredict |
RegressionPartitionedGP | kfoldLoss , kfoldPredict | |
Gaussian kernel regression model | RegressionKernel | loss , predict |
RegressionPartitionedKernel | kfoldLoss , kfoldPredict | |
Linear regression model | RegressionLinear | loss , predict |
RegressionPartitionedLinear | kfoldLoss , kfoldPredict | |
Neural network regression model | RegressionNeuralNetwork , CompactRegressionNeuralNetwork | loss , predict , resubLoss , resubPredict |
RegressionPartitionedNeuralNetwork | kfoldLoss , kfoldPredict | |
Support vector machine (SVM) regression model | RegressionSVM , CompactRegressionSVM | loss , predict , resubLoss , resubPredict |
RegressionPartitionedSVM | kfoldLoss , kfoldPredict |
In previous releases, the regression model loss
and predict
functions listed above used NaN
predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.
See Also
RegressionPartitionedLinear
| predict
| RegressionLinear
| fitrlinear
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)