fitlmcens
Syntax
Description
returns a censored linear regression model fit to the input data in
mdl
= fitlmcens(tbl
,ResponseVarName
,Censoring=cens
)tbl
, using the response variable specified by
ResponseVarName
and the censoring information in
cens
. If the response variable is in the last column of
tbl
, you do not need to specify
ResponseVarName
.
additionally specifies the linear regression model to use for fitting.mdl
= fitlmcens(tbl
,ResponseVarName
,modelspec
,Censoring=cens
)
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example, you can specify categorical
variables, observations to exclude, and use observation weights.mdl
= fitlmcens(___,Name=Value
)
Examples
Load the readmissiontimes
sample data.
load readmissiontimes
The variables Age
, Weight
, and ReadmissionTime
contain data for patient, age, weight, and time of readmission. The Censored
variable contains censoring information for ReadmissionTime
.
Save the variables in a table, and fit a censored linear regression model to the data using ReadmissionTime
as the response and Censored
as the censoring information.
tbl = table(Age,Weight,ReadmissionTime,Censored); mdl = fitlmcens(tbl,"ReadmissionTime",Censoring="Censored")
mdl = Censored linear regression model ReadmissionTime ~ 1 + Age + Weight Estimated Coefficients: Estimate SE tStat pValue _________ ________ ________ __________ (Intercept) 28.62 3.5313 8.1047 1.7047e-12 Age -0.060686 0.061984 -0.97905 0.33001 Weight -0.11977 0.017199 -6.9638 4.1162e-10 Sigma: 4.245 Number of observations: 100, Error degrees of freedom: 96 25 right-censored observations 75 uncensored observations Likelihood ratio statistic vs. constant model: 39, p-value = 3.47e-09
mdl
is a CensoredLinearModel
object that contains the results of fitting the model to the data. The small p-value for the Weight
term indicates that it has a statistically significant effect on patient readmission time.
Load the readmissiontimes
sample data.
load readmissiontimes
The variables Age
, Weight
, and ReadmissionTime
contain data for patient age, weight, and time of readmission. The Censored
variable contains censoring information for ReadmissionTime
.
Save Age
, Weight
, and ReadmissionTime
in a table.
tbl = table(Age,Weight,ReadmissionTime);
Fit a censored linear regression model using Age
, Weight
, and Smoker
as the predictor variables, ReadmissionTime
as the response, and Censored
as the censoring information. Because ReadmissionTime
is the last column in tbl
, you do not need to specify the ResponseVarName
argument.
mdl1 = fitlmcens(tbl,Censoring=Censored)
mdl1 = Censored linear regression model ReadmissionTime ~ 1 + Age + Weight Estimated Coefficients: Estimate SE tStat pValue _________ ________ ________ __________ (Intercept) 28.62 3.5313 8.1047 1.7047e-12 Age -0.060686 0.061984 -0.97905 0.33001 Weight -0.11977 0.017199 -6.9638 4.1162e-10 Sigma: 4.245 Number of observations: 100, Error degrees of freedom: 96 25 right-censored observations 75 uncensored observations Likelihood ratio statistic vs. constant model: 39, p-value = 3.47e-09
mdl1
is a CensoredLinearModel
object that includes the results of fitting a censored linear regression model to the data. The output display includes information about the model, statistics for each model term, and the censored observations. The p-values for the Weight
and Age
terms indicate that Weight
has a statistically significant effect on patient readmission time and Age
does not.
Fit another model to the data, using only the Weight
term.
mdl2 = fitlmcens(tbl,"ReadmissionTime~Weight",Censoring=Censored)
mdl2 = Censored linear regression model ReadmissionTime ~ 1 + Weight Estimated Coefficients: Estimate SE tStat pValue ________ _______ _______ __________ (Intercept) 26.398 2.7107 9.7387 4.9168e-16 Weight -0.12041 0.01729 -6.9642 3.9554e-10 Sigma: 4.273 Number of observations: 100, Error degrees of freedom: 97 25 right-censored observations 75 uncensored observations Likelihood ratio statistic vs. constant model: 38, p-value = 7.06e-10
The result for Likelihood ratio statistic vs. constant model
shows that mdl2
is a slightly better fit than mdl1
.
Load the censoreddata
sample data.
load censoreddata
The matrix X
contains data for three predictors, and the matrix yint
contains censoring information for a response variable. Display yint
.
yint
yint = 10×2
-Inf 13.9492
-Inf -0.1978
-Inf 6.9939
64.7670 Inf
4.2314 Inf
-1.1874 Inf
0.2764 2.2764
36.1247 38.1247
2.5400 4.5400
30.4107 32.4107
The first three rows of yint
specify left-censored observations. The fourth to sixth rows specify right-censored observations. The remaining rows specify interval-censored observations.
Fit a linear regression model to the censored data in X
and yint
.
mdl = fitlmcens(X,yint)
mdl = Censored linear regression model y ~ 1 + x1 + x2 + x3 Estimated Coefficients: Estimate SE tStat pValue ________ ______ ________ _______ (Intercept) 17.317 10.189 1.6996 0.14995 x1 9.401 8.7053 1.0799 0.3295 x2 -3.2891 13.057 -0.25191 0.81114 x3 -10.134 7.947 -1.2751 0.2583 Sigma: 25.7 Number of observations: 10, Error degrees of freedom: 5 4 interval-censored observations 3 right-censored observations 3 left-censored observations Likelihood ratio statistic vs. constant model: 2.11, p-value = 0.551
The large p
-values indicate that not enough evidence exists to conclude that any model terms have a statistically significant effect on patient readmission time.
Load the readmissiontimes
sample data.
load readmissiontimes
The variables Age
, Weight
, Smoker
, and ReadmissionTime
contain data for patient age, weight, smoking status, and time of readmission. The Censored
variable contains censoring information for ReadmissionTime
.
Save the Age
, Weight
, ReadmissionTime
, and Censored
variables in a table, and create a vector of indices for observations corresponding to smokers.
tbl = table(Age,Weight,ReadmissionTime,Censored); idx = Smoker==1;
Fit a censored linear regression model to the data for nonsmokers using ReadmissionTime
as the response and Censored
as the censoring information. Specify an interactions model.
mdl = fitlmcens(tbl,"ReadmissionTime","interactions",Censoring="Censored",ExcludeObservations=idx)
mdl = Censored linear regression model ReadmissionTime ~ 1 + Age*Weight Estimated Coefficients: Estimate SE tStat pValue _________ _________ _______ _________ (Intercept) 49.413 16.878 2.9276 0.0047949 Age -0.57333 0.44326 -1.2934 0.20073 Weight -0.25837 0.1084 -2.3834 0.020282 Age:Weight 0.0035564 0.0028401 1.2522 0.21527 Sigma: 4.604 Number of observations: 66, Error degrees of freedom: 61 16 right-censored observations 50 uncensored observations Likelihood ratio statistic vs. constant model: 24.2, p-value = 2.23e-05
The small p-value for Weight
indicates that patient weight has a statistically significant effect on readmission time.
Input Arguments
Input data, specified as a table. tbl
includes data for the
predictor variables, and can contain also data for the response variable and the
censoring information. The predictor variables can be numeric, logical, categorical,
character, or string. The response variable must be numeric or logical. When
tbl
contains censoring information, it must be in the integer
vector format described in cens
.
When you specify tbl
without specifying
ResponseVarName
or y
,
fitlmcens
uses the variable in the last column of the table as
the response variable and the rest as the predictor variables.
To use a different column as the response variable, set the
ResponseVar
name-value argument.To use a subset of the columns as predictors, set the
PredictorVars
name-value argument.To define a model specification, set the
modelspec
argument using a formula or terms matrix. The formula or terms matrix specifies which columns to use as the predictor or response variables.
The variable names in the table do not have to be valid MATLAB® identifiers, but the names must not contain leading or trailing blanks. If the names are not valid, you cannot use a formula when you fit or adjust a model.
You can verify the variable names in tbl
by using the isvarname
function. If the variable names are
not valid, then you can convert them by using the matlab.lang.makeValidName
function.
Data Types: table
Name of the variable to use as the response, specified as a string scalar or character vector.
ResponseVarName
indicates
which variable in tbl
contains
the response data. When you specify
ResponseVarName
, you must
also specify the tbl
input
argument.
Data Types: char
| string
Censoring information for the observations, specified as an integer vector, an interval, or a
variable name. You cannot use cens
to specify interval censoring. To
specify interval censoring, see y
.
When you specify cens
as an integer vector, it must have the same number
of elements as the number of observations in the input data. Each element of
cens
must be -1
, 0
, or
1
to indicate that the corresponding observation is left-censored,
uncensored, or right-censored, respectively.
When you specify cens
as an interval, it must be a two-element numeric
vector [L R]
where L < R
.
fitlmcens
censors observations according to their response values.
Response values less than or equal to
L
are left-censored atL
.Response values inside the interval are uncensored.
Response values greater than or equal to
R
are right-censored atR
.
When you specify cens
as a variable name, you must also specify tbl
. tbl
must include a variable of the same name that contains censoring information in the integer vector format described above.
You cannot specify cens
when y
is a two-column matrix.
Example: [-10,10]
Example: [-1*ones(10,1);zeros(10,1);ones(10,1)]
Example: "censvar"
Data Types: single
| double
| string
| char
Model specification, specified as one of the following values.
A character vector or string scalar containing the model name.
Value Model Description "constant"
Model contains only a constant (intercept) term "linear"
Model contains an intercept and linear term for each predictor "interactions"
Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms) "purequadratic"
Model contains an intercept term and linear and squared terms for each predictor "quadratic"
Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors "poly
ijk
"Model is a polynomial with all terms up to degree i
in the first predictor, degreej
in the second predictor, and so on. Specify the maximum degree for each predictor by using numerals 0 through 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example,"poly13"
has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second predictors, respectively.A t-by-(p + 1) terms matrix that specifies the terms in the model, where t is the number of terms, p is the number of predictor variables, and +1 accounts for the response variable. A terms matrix is convenient when the number of predictors is large and you want to generate the terms programmatically. For more information, see Terms Matrix.
A character vector or string scalar formula in the form
"y ~ terms"
,where the
terms
are in Wilkinson Notation. The variable names in the formula must be variable names intbl
or variable names specified byVarNames
. Also, the variable names must be valid MATLAB identifiers.The software determines the order of terms in a fitted model by using the order of terms in
tbl
orX
. Therefore, the order of terms in the model can be different from the order of terms in the specified formula. For more information, see Formula.
When you specify modelspec
, you cannot use the
PredictorVars
name-value argument to specify the predictor
variables.
Example: "quadratic"
Example: "y ~ x1 + x2^2 + x1:x2"
Data Types: single
| double
| char
| string
Predictor variables, specified as an n-by-p matrix,
where n is the number of observations and p is
the number of predictor variables. Each column of X
represents
one variable, and each row represents one observation.
By default, the model includes a constant term unless you explicitly remove it, so do not
include a column of 1s in X
.
Data Types: single
| double
Response variable, specified as an n-by-2 numeric matrix, where
n is the number of observations. Each row in
yint
corresponds to the same row in tbl
or
X
.
Each row of yint
contains the lower and upper bounds for the
interval-censored observation.
To specify a left-censored observation, set the lower bound to
-Inf
.To specify a right-censored observation, set the upper bound to
Inf
.To specify an uncensored observation, set the upper and lower bounds to the response value of the observation.
Data Types: single
| double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: fitlmcens(X,y,Censoring=cens,ExcludeObservations=1:5,Intercept=false)
fits a linear regression model without an intercept to the censored data in
X
and y
, excluding the first five
observations.
Categorical predictor list, specified as a string array or cell array of character
vectors containing categorical predictor names in the table tbl
,
or a logical or numeric index vector indicating which predictor columns are categorical.
If the predictor data is in
tbl
, then, by default,fitlmcens
treats all categorical values, logical values, character arrays, string arrays, and cell arrays of character vectors as categorical predictors.If the predictor data is in a matrix
X
, then the default value ofCategoricalVars
is an empty matrix[]
. That is, no predictor is categorical unless you specify it as categorical.
For example, you can specify the second and third variables out of six as categorical using either of the following examples.
Example: CategoricalVars=[2 3]
Example: CategoricalVars=logical([0 1 1 0 0 0])
Data Types: single
| double
| logical
| string
| cell
Observations to exclude from the fit, specified as a logical or numeric index vector indicating which observations to exclude.
For example, you can exclude the second and third observations of six using either of the following examples.
Example: Exclude=[2 3]
Example: Exclude=logical([0 1 1 0 0 0])
Data Types: single
| double
| logical
Indicator for the constant term (intercept) in the fit, specified as a logical
1
(true
) to include the term in the model, or
0
(false
) to remove the term from the model.
By default, the model includes a constant term unless you explicitly remove it.
Use Intercept
only when specifying the model using a
character vector or string scalar, not a formula or matrix.
Example: Intercept=false
Data Types: logical
Predictor variables to use in the fit, specified as a string array or cell array
of character vectors of the variable names in the table tbl
, or a
logical or numeric index vector indicating which columns are predictor
variables.
The string values or character vectors must be names in tbl
or names you specify using the VarNames
name-value
argument.
The default value is all variables in X
, or all variables in
tbl
except ResponseVar
.
When you specify PredictorVars
, you cannot use the
modelspec
input argument to specify a terms matrix.
For example, you can specify the second and third variables as the predictor variables using either of the following examples.
Example: PredictorVars=[2 3]
Example: PredictorVars=logical([0 1 1 0 0 0])
Data Types: single
| double
| logical
| string
| cell
Names of variables, specified as a string array or cell array of character vectors
that includes the names for the columns of X
first, and the name
for the response variable y
last.
The variable names do not have to be valid MATLAB identifiers, but the names must not contain leading or trailing blanks. If the names are not valid, you cannot use a formula when you fit or adjust a model.
You can verify the variable names by using the isvarname
function. If the variable names are not valid, then you can
convert them by using the matlab.lang.makeValidName
function.
You cannot specify VarNames
when you specify input data using
the tbl
input argument.
Example: VarNames=["Horsepower","Acceleration","Model_Year","MPG"]
Data Types: string
| cell
Observation weights, specified as an n-by-1 vector of nonnegative scalar values, where n is the number of observations.
Data Types: single
| double
Output Arguments
Censored linear model, returned as a CensoredLinearModel
object.
More About
A terms matrix
T
is a t-by-(p + 1) matrix that
specifies the terms in a model, where t is the number of terms,
p is the number of predictor variables, and +1 accounts for the
response variable. The value of T(i,j)
is the exponent of variable
j
in term i
.
For example, suppose that an input includes three predictor variables, x1
,
x2
, and x3
, and the response variable
y
in the order x1
, x2
,
x3
, and y
. Each row of T
represents one term:
[0 0 0 0]
— Constant term (intercept)[0 1 0 0]
—x2
; equivalently,x1^0 * x2^1 * x3^0
[1 0 1 0]
—x1*x3
[2 0 0 0]
—x1^2
[0 1 2 0]
—x2*(x3^2)
The 0
at the end of each term represents the response variable. In
general, a column vector of zeros in a terms matrix represents the position of the response
variable. If the predictor and response variables are in a matrix and column vector,
respectively, then you must include 0
for the response variable in the
last column of each row.
A formula for model specification is a character vector or string scalar of
the form '
.y
~
terms
'
y
is the response name.terms
represents the predictor terms in a model using Wilkinson notation.
To represent predictor and response variables, use the variable names of the input
argument tbl
or the variable names specified by using the
VarNames
name-value argument.
For example, if VarNames
is ["x1","x2",...,"xn","y"]
.:
"y ~ x1 + x2 + x3"
specifies a three-variable linear model with an intercept."y ~ x1 + x2 + x3 – 1"
specifies a three-variable linear model without an intercept. Note that formulas include a constant (intercept) term by default. To exclude a constant term from the model, you must include–1
in the formula.
Wilkinson notation describes the terms in a model. The notation relates to the terms included in the model, not to the multipliers (coefficients) of those terms.
Wilkinson notation uses these symbols:
+
means include the next variable.–
means do not include the next variable.:
defines an interaction, which is a product of the terms.*
defines an interaction and all lower order terms.^
raises the predictor to a power, exactly as in*
repeated, so^
includes lower order terms as well.()
groups the terms.
This table shows typical examples of Wilkinson notation.
Wilkinson Notation | Terms in Standard Notation |
---|---|
1 | Constant (intercept) term |
x1^k , where k is a positive
integer | x1 ,
x12 , ...,
x1k |
x1 + x2 | x1 , x2 |
x1*x2 | x1 , x2 ,
x1*x2 |
x1:x2 | x1*x2 only |
–x2 | Do not include x2 |
x1*x2 + x3 | x1 , x2 , x3 ,
x1*x2 |
x1 + x2 + x3 + x1:x2 | x1 , x2 , x3 ,
x1*x2 |
x1*x2*x3 – x1:x2:x3 | x1 , x2 , x3 ,
x1*x2 , x1*x3 ,
x2*x3 |
x1*(x2 + x3) | x1 , x2 , x3 ,
x1*x2 , x1*x3 |
For more details, see Wilkinson Notation.
Algorithms
fitlmcens
fits a censored linear regression model to the data using
a modified version of maximum likelihood estimation for linear regression. Maximum likelihood
estimation maximizes
where is the response for the ith observation, is the intercept term in the model formula, is the vector of model coefficients, and is the standard deviation of the error term. For censored observations , fitlmcens
replaces in the equation above.
If is left-censored,
fitlmcens
usesIf is right-censored,
fitlmcens
usesIf is interval-censored,
fitlmcens
useswhere Li and Ri are the left and right bounds for the interval corresponding to .
Version History
Introduced in R2025a
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)