fitlmcens
Syntax
Description
returns a censored linear regression model fit to the input data in
mdl = fitlmcens(tbl,ResponseVarName,Censoring=cens)tbl, using the response variable specified by
ResponseVarName and the censoring information in
cens. If the response variable is in the last column of
tbl, you do not need to specify
ResponseVarName.
additionally specifies the linear regression model to use for fitting.mdl = fitlmcens(tbl,ResponseVarName,modelspec,Censoring=cens)
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example, you can specify categorical
variables, observations to exclude, and use observation weights.mdl = fitlmcens(___,Name=Value)
Examples
Load the readmissiontimes sample data.
load readmissiontimesThe variables Age, Weight, and ReadmissionTime contain data for patient, age, weight, and time of readmission. The Censored variable contains censoring information for ReadmissionTime.
Save the variables in a table, and fit a censored linear regression model to the data using ReadmissionTime as the response and Censored as the censoring information.
tbl = table(Age,Weight,ReadmissionTime,Censored); mdl = fitlmcens(tbl,"ReadmissionTime",Censoring="Censored")
mdl =
Censored linear regression model
ReadmissionTime ~ 1 + Age + Weight
Estimated Coefficients:
Estimate SE tStat pValue
_________ ________ ________ __________
(Intercept) 28.62 3.5313 8.1047 1.7047e-12
Age -0.060686 0.061984 -0.97905 0.33001
Weight -0.11977 0.017199 -6.9638 4.1162e-10
Sigma: 4.245
Number of observations: 100, Error degrees of freedom: 96
25 right-censored observations
75 uncensored observations
Likelihood ratio statistic vs. constant model: 39, p-value = 3.47e-09
mdl is a CensoredLinearModel object that contains the results of fitting the model to the data. The small p-value for the Weight term indicates that it has a statistically significant effect on patient readmission time.
Load the readmissiontimes sample data.
load readmissiontimesThe variables Age, Weight, and ReadmissionTime contain data for patient age, weight, and time of readmission. The Censored variable contains censoring information for ReadmissionTime.
Save Age, Weight, and ReadmissionTime in a table.
tbl = table(Age,Weight,ReadmissionTime);
Fit a censored linear regression model using Age, Weight, and Smoker as the predictor variables, ReadmissionTime as the response, and Censored as the censoring information. Because ReadmissionTime is the last column in tbl, you do not need to specify the ResponseVarName argument.
mdl1 = fitlmcens(tbl,Censoring=Censored)
mdl1 =
Censored linear regression model
ReadmissionTime ~ 1 + Age + Weight
Estimated Coefficients:
Estimate SE tStat pValue
_________ ________ ________ __________
(Intercept) 28.62 3.5313 8.1047 1.7047e-12
Age -0.060686 0.061984 -0.97905 0.33001
Weight -0.11977 0.017199 -6.9638 4.1162e-10
Sigma: 4.245
Number of observations: 100, Error degrees of freedom: 96
25 right-censored observations
75 uncensored observations
Likelihood ratio statistic vs. constant model: 39, p-value = 3.47e-09
mdl1 is a CensoredLinearModel object that includes the results of fitting a censored linear regression model to the data. The output display includes information about the model, statistics for each model term, and the censored observations. The p-values for the Weight and Age terms indicate that Weight has a statistically significant effect on patient readmission time and Age does not.
Fit another model to the data, using only the Weight term.
mdl2 = fitlmcens(tbl,"ReadmissionTime~Weight",Censoring=Censored)mdl2 =
Censored linear regression model
ReadmissionTime ~ 1 + Weight
Estimated Coefficients:
Estimate SE tStat pValue
________ _______ _______ __________
(Intercept) 26.398 2.7107 9.7387 4.9168e-16
Weight -0.12041 0.01729 -6.9642 3.9554e-10
Sigma: 4.273
Number of observations: 100, Error degrees of freedom: 97
25 right-censored observations
75 uncensored observations
Likelihood ratio statistic vs. constant model: 38, p-value = 7.06e-10
The result for Likelihood ratio statistic vs. constant model shows that mdl2 is a slightly better fit than mdl1.
Load the censoreddata sample data.
load censoreddataThe matrix X contains data for three predictors, and the matrix yint contains censoring information for a response variable. Display yint.
yint
yint = 10×2
-Inf 13.9492
-Inf -0.1978
-Inf 6.9939
64.7670 Inf
4.2314 Inf
-1.1874 Inf
0.2764 2.2764
36.1247 38.1247
2.5400 4.5400
30.4107 32.4107
The first three rows of yint specify left-censored observations. The fourth to sixth rows specify right-censored observations. The remaining rows specify interval-censored observations.
Fit a linear regression model to the censored data in X and yint.
mdl = fitlmcens(X,yint)
mdl =
Censored linear regression model
y ~ 1 + x1 + x2 + x3
Estimated Coefficients:
Estimate SE tStat pValue
________ ______ ________ _______
(Intercept) 17.317 10.189 1.6996 0.14995
x1 9.401 8.7053 1.0799 0.3295
x2 -3.2891 13.057 -0.25191 0.81114
x3 -10.134 7.947 -1.2751 0.2583
Sigma: 25.7
Number of observations: 10, Error degrees of freedom: 5
4 interval-censored observations
3 right-censored observations
3 left-censored observations
Likelihood ratio statistic vs. constant model: 2.11, p-value = 0.551
The large p-values indicate that not enough evidence exists to conclude that any model terms have a statistically significant effect on patient readmission time.
Load the readmissiontimes sample data.
load readmissiontimesThe variables Age, Weight, Smoker, and ReadmissionTime contain data for patient age, weight, smoking status, and time of readmission. The Censored variable contains censoring information for ReadmissionTime.
Save the Age, Weight, ReadmissionTime, and Censored variables in a table, and create a vector of indices for observations corresponding to smokers.
tbl = table(Age,Weight,ReadmissionTime,Censored); idx = Smoker==1;
Fit a censored linear regression model to the data for nonsmokers using ReadmissionTime as the response and Censored as the censoring information. Specify an interactions model.
mdl = fitlmcens(tbl,"ReadmissionTime","interactions",Censoring="Censored",ExcludeObservations=idx)
mdl =
Censored linear regression model
ReadmissionTime ~ 1 + Age*Weight
Estimated Coefficients:
Estimate SE tStat pValue
_________ _________ _______ _________
(Intercept) 49.413 16.878 2.9276 0.0047949
Age -0.57333 0.44326 -1.2934 0.20073
Weight -0.25837 0.1084 -2.3834 0.020282
Age:Weight 0.0035564 0.0028401 1.2522 0.21527
Sigma: 4.604
Number of observations: 66, Error degrees of freedom: 61
16 right-censored observations
50 uncensored observations
Likelihood ratio statistic vs. constant model: 24.2, p-value = 2.23e-05
The small p-value for Weight indicates that patient weight has a statistically significant effect on readmission time.
Input Arguments
Input data, specified as a table. tbl includes data for the
predictor variables, and can contain also data for the response variable and the
censoring information. The predictor variables can be numeric, logical, categorical,
character, or string. The response variable must be numeric or logical. When
tbl contains censoring information, it must be in the integer
vector format described in cens.
When you specify tbl without specifying
ResponseVarName or y,
fitlmcens uses the variable in the last column of the table as
the response variable and the rest as the predictor variables.
To use a different column as the response variable, set the
ResponseVarname-value argument.To use a subset of the columns as predictors, set the
PredictorVarsname-value argument.To define a model specification, set the
modelspecargument using a formula or terms matrix. The formula or terms matrix specifies which columns to use as the predictor or response variables.
The variable names in the table do not have to be valid MATLAB® identifiers, but the names must not contain leading or trailing blanks. If the names are not valid, you cannot use a formula when you fit or adjust a model.
You can verify the variable names in tbl
by using the isvarname function. If the variable names are
not valid, then you can convert them by using the matlab.lang.makeValidName function.
Data Types: table
Name of the variable to use as the response, specified as a string scalar or character vector.
ResponseVarName indicates
which variable in tbl contains
the response data. When you specify
ResponseVarName, you must
also specify the tbl input
argument.
Data Types: char | string
Censoring information for the observations, specified as an integer vector, an interval, or a
variable name. You cannot use cens to specify interval censoring. To
specify interval censoring, see y.
When you specify cens as an integer vector, it must have the same number
of elements as the number of observations in the input data. Each element of
cens must be -1, 0, or
1 to indicate that the corresponding observation is left-censored,
uncensored, or right-censored, respectively.
When you specify cens as an interval, it must be a two-element numeric
vector [L R] where L < R.
fitlmcens censors observations according to their response values.
Response values less than or equal to
Lare left-censored atL.Response values inside the interval are uncensored.
Response values greater than or equal to
Rare right-censored atR.
When you specify cens as a variable name, you must also specify tbl. tbl must include a variable of the same name that contains censoring information in the integer vector format described above.
You cannot specify cens when y is a two-column matrix.
Example: [-10,10]
Example: [-1*ones(10,1);zeros(10,1);ones(10,1)]
Example: "censvar"
Data Types: single | double | string | char
Model specification, specified as one of the following values.
A character vector or string scalar containing the model name.
Value Model Description "constant"Model contains only a constant (intercept) term "linear"Model contains an intercept and linear term for each predictor "interactions"Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms) "purequadratic"Model contains an intercept term and linear and squared terms for each predictor "quadratic"Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors "polyijk"Model is a polynomial with all terms up to degree iin the first predictor, degreejin the second predictor, and so on. Specify the maximum degree for each predictor by using numerals 0 through 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example,"poly13"has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second predictors, respectively.A t-by-(p + 1) terms matrix that specifies the terms in the model, where t is the number of terms, p is the number of predictor variables, and +1 accounts for the response variable. A terms matrix is convenient when the number of predictors is large and you want to generate the terms programmatically. For more information, see Terms Matrix.
A character vector or string scalar formula in the form
"y ~ terms",where the
termsare in Wilkinson Notation. The variable names in the formula must be variable names intblor variable names specified byVarNames. Also, the variable names must be valid MATLAB identifiers.The software determines the order of terms in a fitted model by using the order of terms in
tblorX. Therefore, the order of terms in the model can be different from the order of terms in the specified formula. For more information, see Formula.
When you specify modelspec, you cannot use the
PredictorVars name-value argument to specify the predictor
variables.
Example: "quadratic"
Example: "y ~ x1 + x2^2 + x1:x2"
Data Types: single | double | char | string
Predictor variables, specified as an n-by-p matrix,
where n is the number of observations and p is
the number of predictor variables. Each column of X represents
one variable, and each row represents one observation.
By default, the model includes a constant term unless you explicitly remove it, so do not
include a column of 1s in X.
Data Types: single | double
Response variable, specified as an n-by-2 numeric matrix, where
n is the number of observations. Each row in
yint corresponds to the same row in tbl or
X.
Each row of yint contains the lower and upper bounds for the
interval-censored observation.
To specify a left-censored observation, set the lower bound to
-Inf.To specify a right-censored observation, set the upper bound to
Inf.To specify an uncensored observation, set the upper and lower bounds to the response value of the observation.
Data Types: single | double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: fitlmcens(X,y,Censoring=cens,ExcludeObservations=1:5,Intercept=false)
fits a linear regression model without an intercept to the censored data in
X and y, excluding the first five
observations.
Categorical predictor list, specified as a string array or cell array of character
vectors containing categorical predictor names in the table tbl,
or a logical or numeric index vector indicating which predictor columns are categorical.
If the predictor data is in
tbl, then, by default,fitlmcenstreats all categorical values, logical values, character arrays, string arrays, and cell arrays of character vectors as categorical predictors.If the predictor data is in a matrix
X, then the default value ofCategoricalVarsis an empty matrix[]. That is, no predictor is categorical unless you specify it as categorical.
For example, you can specify the second and third variables out of six as categorical using either of the following examples.
Example: CategoricalVars=[2 3]
Example: CategoricalVars=logical([0 1 1 0 0 0])
Data Types: single | double | logical | string | cell
Observations to exclude from the fit, specified as a logical or numeric index vector indicating which observations to exclude.
For example, you can exclude the second and third observations of six using either of the following examples.
Example: Exclude=[2 3]
Example: Exclude=logical([0 1 1 0 0 0])
Data Types: single | double | logical
Indicator for the constant term (intercept) in the fit, specified as a logical
1 (true) to include the term in the model, or
0 (false) to remove the term from the model.
By default, the model includes a constant term unless you explicitly remove it.
Use Intercept only when specifying the model using a
character vector or string scalar, not a formula or matrix.
Example: Intercept=false
Data Types: logical
Predictor variables to use in the fit, specified as a string array or cell array
of character vectors of the variable names in the table tbl, or a
logical or numeric index vector indicating which columns are predictor
variables.
The string values or character vectors must be names in tbl
or names you specify using the VarNames name-value
argument.
The default value is all variables in X, or all variables in
tbl except ResponseVar.
When you specify PredictorVars, you cannot use the
modelspec input argument to specify a terms matrix.
For example, you can specify the second and third variables as the predictor variables using either of the following examples.
Example: PredictorVars=[2 3]
Example: PredictorVars=logical([0 1 1 0 0 0])
Data Types: single | double | logical | string | cell
Names of variables, specified as a string array or cell array of character vectors
that includes the names for the columns of X first, and the name
for the response variable y last.
The variable names do not have to be valid MATLAB identifiers, but the names must not contain leading or trailing blanks. If the names are not valid, you cannot use a formula when you fit or adjust a model.
You can verify the variable names by using the isvarname function. If the variable names are not valid, then you can
convert them by using the matlab.lang.makeValidName
function.
You cannot specify VarNames when you specify input data using
the tbl input argument.
Example: VarNames=["Horsepower","Acceleration","Model_Year","MPG"]
Data Types: string | cell
Observation weights, specified as an n-by-1 vector of nonnegative scalar values, where n is the number of observations.
Data Types: single | double
Output Arguments
Censored linear model, returned as a CensoredLinearModel object.
More About
A terms matrix
T is a t-by-(p + 1) matrix that
specifies the terms in a model, where t is the number of terms,
p is the number of predictor variables, and +1 accounts for the
response variable. The value of T(i,j) is the exponent of variable
j in term i.
For example, suppose that an input includes three predictor variables, x1,
x2, and x3, and the response variable
y in the order x1, x2,
x3, and y. Each row of T
represents one term:
[0 0 0 0]— Constant term (intercept)[0 1 0 0]—x2; equivalently,x1^0 * x2^1 * x3^0[1 0 1 0]—x1*x3[2 0 0 0]—x1^2[0 1 2 0]—x2*(x3^2)
The 0 at the end of each term represents the response variable. In
general, a column vector of zeros in a terms matrix represents the position of the response
variable. If the predictor and response variables are in a matrix and column vector,
respectively, then you must include 0 for the response variable in the
last column of each row.
A formula for model specification is a character vector or string scalar of
the form '.y ~
terms'
yis the response name.termsrepresents the predictor terms in a model using Wilkinson notation.
To represent predictor and response variables, use the variable names of the input
argument tbl or the variable names specified by using the
VarNames name-value argument.
For example, if VarNames is ["x1","x2",...,"xn","y"].:
"y ~ x1 + x2 + x3"specifies a three-variable linear model with an intercept."y ~ x1 + x2 + x3 – 1"specifies a three-variable linear model without an intercept. Note that formulas include a constant (intercept) term by default. To exclude a constant term from the model, you must include–1in the formula.
Wilkinson notation describes the terms in a model. The notation relates to the terms included in the model, not to the multipliers (coefficients) of those terms.
Wilkinson notation uses these symbols:
+means include the next variable.–means do not include the next variable.:defines an interaction, which is a product of the terms.*defines an interaction and all lower order terms.^raises the predictor to a power, exactly as in*repeated, so^includes lower order terms as well.()groups the terms.
This table shows typical examples of Wilkinson notation.
| Wilkinson Notation | Terms in Standard Notation |
|---|---|
1 | Constant (intercept) term |
x1^k, where k is a positive
integer | x1,
x12, ...,
x1k |
x1 + x2 | x1, x2 |
x1*x2 | x1, x2,
x1*x2 |
x1:x2 | x1*x2 only |
–x2 | Do not include x2 |
x1*x2 + x3 | x1, x2, x3,
x1*x2 |
x1 + x2 + x3 + x1:x2 | x1, x2, x3,
x1*x2 |
x1*x2*x3 – x1:x2:x3 | x1, x2, x3,
x1*x2, x1*x3,
x2*x3 |
x1*(x2 + x3) | x1, x2, x3,
x1*x2, x1*x3 |
For more details, see Wilkinson Notation.
Algorithms
fitlmcens fits a censored linear regression model to the data using
a modified version of maximum likelihood estimation for linear regression. Maximum likelihood
estimation maximizes
where is the response for the ith observation, is the intercept term in the model formula, is the vector of model coefficients, and is the standard deviation of the error term. For censored observations , fitlmcens replaces in the equation above.
If is left-censored,
fitlmcensusesIf is right-censored,
fitlmcensusesIf is interval-censored,
fitlmcensuseswhere Li and Ri are the left and right bounds for the interval corresponding to .
Version History
Introduced in R2025a
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)