Documentation

coefTest

Linear hypothesis test on linear regression model coefficients

Description

example

p = coefTest(mdl) computes the p-value for an F-test that all coefficient estimates in mdl, except for the intercept term, are zero.

example

p = coefTest(mdl,H) performs an F-test that H × B = 0, where B represents the coefficient vector. Use H to specify the coefficients to include in the F-test.

p = coefTest(mdl,H,C) performs an F-test that H × B = C.

example

[p,F] = coefTest(___) also returns the F-test statistic F using any of the input argument combinations in previous syntaxes.

example

[p,F,r] = coefTest(___) also returns the numerator degrees of freedom r for the test.

Examples

collapse all

Fit a linear regression model and test the coefficients of the fitted model to see if they are zero.

Load the carsmall data set and create a table in which the Model_Year predictor is categorical.

Model_Year = categorical(Model_Year);
tbl = table(MPG,Weight,Model_Year);

Fit a linear regression model of mileage as a function of the weight, weight squared, and model year.

mdl = fitlm(tbl,'MPG ~ Model_Year + Weight^2')
mdl =
Linear regression model:
MPG ~ 1 + Weight + Model_Year + Weight^2

Estimated Coefficients:
Estimate         SE         tStat       pValue
__________    __________    _______    __________

(Intercept)          54.206        4.7117     11.505    2.6648e-19
Weight            -0.016404     0.0031249    -5.2493    1.0283e-06
Model_Year_76        2.0887       0.71491     2.9215     0.0044137
Model_Year_82        8.1864       0.81531     10.041    2.6364e-16
Weight^2         1.5573e-06    4.9454e-07      3.149     0.0022303

Number of observations: 94, Error degrees of freedom: 89
Root Mean Squared Error: 2.78
R-squared: 0.885,  Adjusted R-Squared: 0.88
F-statistic vs. constant model: 172, p-value = 5.52e-41

The last line of the model display shows the F-statistic value of the regression model and the corresponding p-value. The small p-value indicates that the model fits significantly better than a degenerate model consisting of only an intercept term. You can return these two values by using coefTest.

[p,F] = coefTest(mdl)
p = 5.5208e-41
F = 171.8844

Fit a linear regression model and test the significance of a specified coefficient in the fitted model by using coefTest. You can also use anova to test the significance of each predictor in the model.

Load the carsmall data set and create a table in which the Model_Year predictor is categorical.

Model_Year = categorical(Model_Year);
tbl = table(MPG,Acceleration,Weight,Model_Year);

Fit a linear regression model of mileage as a function of the weight, weight squared, and model year.

mdl = fitlm(tbl,'MPG ~ Acceleration + Model_Year + Weight')
mdl =
Linear regression model:
MPG ~ 1 + Acceleration + Weight + Model_Year

Estimated Coefficients:
Estimate         SE         tStat        pValue
__________    __________    ________    __________

(Intercept)          40.523        2.5293      16.021    5.8302e-28
Acceleration      -0.023438       0.11353    -0.20644       0.83692
Weight           -0.0066799    0.00045796     -14.586    2.5314e-25
Model_Year_76        1.9898       0.80696      2.4657      0.015591
Model_Year_82        7.9661       0.89745      8.8763    6.7725e-14

Number of observations: 94, Error degrees of freedom: 89
Root Mean Squared Error: 2.93
R-squared: 0.873,  Adjusted R-Squared: 0.867
F-statistic vs. constant model: 153, p-value = 5.86e-39

The model display includes the p-value for the t-statistic for each coefficient to test the null hypothesis that the corresponding coefficient is zero.

You can examine the significance of the coefficient using coefTest. For example, test the significance of the Acceleration coefficient. According to the model display, Acceleration is the second predictor. Specify the coefficient by using a numeric index vector.

[p_Acceleration,F_Acceleration,r_Acceleration] = coefTest(mdl,[0 1 0 0 0])
p_Acceleration = 0.8369
F_Acceleration = 0.0426
r_Acceleration = 1

p_Acceleration is the p-value corresponding to the F-statistic value F_Acceleration, and r_Acceleration is the numerator degrees of freedom for the F-test. The returned p-value indicates that Acceleration is not statistically significant in the fitted model. Note that p_Acceleration is equal to the p-value of t-statistic (tStat) in the model display, and F_Acceleration is the square of tStat.

Test the significance of the categorical predictor Model_Year. Instead of testing Model_Year_76 and Model_Year_82 separately, you can perform a single test for the categorical predictor Model_Year. Specify Model_Year_76 and Model_Year_82 by using a numeric index matrix.

[p_Model_Year,F_Model_Year,r_Model_Year] = coefTest(mdl,[0 0 0 1 0; 0 0 0 0 1])
p_Model_Year = 2.7408e-14
F_Model_Year = 45.2691
r_Model_Year = 2

The returned p-value indicates that Model_Year is statistically significant in the fitted model.

You can also return these values by using anova.

anova(mdl)
ans=4×5 table
SumSq     DF    MeanSq        F          pValue
_______    __    _______    ________    __________

Acceleration    0.36613     1    0.36613    0.042618       0.83692
Weight           1827.7     1     1827.7      212.75    2.5314e-25
Model_Year       777.81     2      388.9      45.269    2.7408e-14
Error            764.59    89      8.591

Input Arguments

collapse all

Linear regression model object, specified as a LinearModel object created by using fitlm or stepwiselm, or a CompactLinearModel object created by using compact.

Hypothesis matrix, specified as an r-by-s numeric index matrix, where r is the number of coefficients to include in an F-test, and s is the total number of coefficients.

• If you specify H, then the output p is the p-value for an F-test that H × B = 0, where B represents the coefficient vector.

• If you specify H and C, then the output p is the p-value for an F-test that H × B = C.

Example: [1 0 0 0 0] tests the first coefficient among five coefficients

Data Types: single | double

Hypothesized value for testing the null hypothesis, specified as a numeric vector with the same number of rows as H.

If you specify H and C, then the output p is the p-value for an F-test that H × B = C, where B represents the coefficient vector.

Data Types: single | double

Output Arguments

collapse all

p-value for the F-test, returned as a numeric value in the range [0,1].

Value of the test statistic for the F-test, returned as a numeric value.

Numerator degrees of freedom for the F-test, returned as a positive integer. The F-statistic has r degrees of freedom in the numerator and mdl.DFE degrees of freedom in the denominator.

Algorithms

The p-value, F-statistic, and numerator degrees of freedom are valid under these assumptions:

• The data comes from a model represented by the formula in the Formula property of the fitted model.

• The observations are independent, conditional on the predictor values.

Under these assumptions hold, let β represent the (unknown) coefficient vector of the linear regression. Suppose H is a full-rank matrix of size r-by-s, where r is the number of coefficients to include in an F-test, and s is the total number of coefficients. Let c be a vector the same size as β. The following is a test statistic for the hypothesis that  = c:

$F={\left(H\stackrel{^}{\beta }-c\right)}^{\prime }{\left(HV{H}^{\prime }\right)}^{-1}\left(H\stackrel{^}{\beta }-c\right).$

Here $\stackrel{^}{\beta }$ is the estimate of the coefficient vector β, stored in the Coefficients property, and V is the estimated covariance of the coefficient estimates, stored in the CoefficientCovariance property. When the hypothesis is true, the test statistic F has an F Distribution with r and u degrees of freedom, where u is the degrees of freedom for error, stored in the DFE property.

Alternative Functionality

• The values of commonly used test statistics are available in the Coefficients property of a fitted model.

• anova provides tests for each model predictor and groups of predictors.