# devianceTest

Analysis of deviance for generalized linear regression model

## Syntax

``tbl = devianceTest(mdl)``

## Description

example

````tbl = devianceTest(mdl)` returns an analysis of deviance table for the generalized linear regression model `mdl`. The table `tbl` gives the result of a test that determines whether the model `mdl` fits significantly better than a constant model.```

## Examples

collapse all

Perform a deviance test on a generalized linear regression model.

Generate sample data using Poisson random numbers with two underlying predictors `X(:,1)` and `X(:,2)`.

```rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);```

Create a generalized linear regression model of Poisson data.

`mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson')`
```mdl = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate SE tStat pValue ________ _________ ______ ______ (Intercept) 1.0405 0.022122 47.034 0 x1 0.9968 0.003362 296.49 0 x2 1.987 0.0063433 313.24 0 100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0 ```

Test whether the model differs from a constant in a statistically significant way.

`tbl = devianceTest(mdl)`
```tbl=2×4 table Deviance DFE chi2Stat pValue __________ ___ __________ ______ log(y) ~ 1 2.9544e+05 99 log(y) ~ 1 + x1 + x2 107.4 97 2.9533e+05 0 ```

The small p-value indicates that the model significantly differs from a constant. Note that the model display of `mdl` includes the statistics shown in the second row of the table.

## Input Arguments

collapse all

Generalized linear regression model, specified as a `GeneralizedLinearModel` object created using `fitglm` or `stepwiseglm`, or a `CompactGeneralizedLinearModel` object created using `compact`.

## Output Arguments

collapse all

Analysis of deviance summary statistics, returned as a table.

`tbl` contains analysis of deviance statistics for both a constant model and the model `mdl`. The table includes these columns for each model.

ColumnDescription
`Deviance`

Deviance is twice the difference between the loglikelihoods of the corresponding model (`mdl` or constant) and the saturated model. For more information, see Deviance.

`DFE`

Degrees of freedom for the error (residuals), equal to np, where n is the number of observations, and p is the number of estimated coefficients

`chi2Stat`

F-statistic or chi-squared statistic, depending on whether the dispersion is estimated (F-statistic) or not (chi-squared statistic)

• F-statistic is the difference between the deviance of the constant model and the deviance of the full model, divided by the estimated dispersion.

• Chi-squared statistic is the difference between the deviance of the constant model and the deviance of the full model.

`pValue`

p-value associated with the test: chi-squared statistic with p – 1 degrees of freedom, or F-statistic with p – 1 numerator degrees of freedom and `DFE` denominator degrees of freedom, where p is the number of estimated coefficients

collapse all

### Deviance

Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to a saturated model.

The deviance of a model M1 is twice the difference between the loglikelihood of the model M1 and the saturated model Ms. A saturated model is a model with the maximum number of parameters that you can estimate.

For example, if you have n observations (yi, i = 1, 2, ..., n) with potentially different values for XiTβ, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model with the parameters b. Then the deviance of the model M1 is

`$-2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right),$`

where b1 and bs contain the estimated parameters for the model M1 and the saturated model, respectively. The deviance has a chi-squared distribution with np degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in the model M1.

Assume you have two different generalized linear regression models M1 and M2, and M1 has a subset of the terms in M2. You can assess the fit of the models by comparing their deviances D1 and D2. The difference of the deviances is

`$\begin{array}{l}D={D}_{2}-{D}_{1}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)+2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)\\ \text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{\hspace{0.17em}}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{1},y\right)\right).\end{array}$`

Asymptotically, the difference D has a chi-squared distribution with degrees of freedom v equal to the difference in the number of parameters estimated in M1 and M2. You can obtain the p-value for this test by using ```1  —  chi2cdf(D,v)```.

Typically, you examine D using a model M2 with a constant term and no predictors. Therefore, D has a chi-squared distribution with p – 1 degrees of freedom. If the dispersion is estimated, the difference divided by the estimated dispersion has an F distribution with p – 1 numerator degrees of freedom and np denominator degrees of freedom.

## Version History

Introduced in R2012a