# summarize

Distribution summary statistics of Bayesian linear regression model for predictor variable selection

## Syntax

``summarize(Mdl)``
``SummaryStatistics = summarize(Mdl)``

## Description

To obtain a summary of a standard Bayesian linear regression model, see `summarize`.

example

````summarize(Mdl)` displays a tabular summary of the random regression coefficients and disturbance variance of the Bayesian linear regression model `Mdl` at the command line. For each parameter, the summary includes the: Standard deviation (square root of the variance)95% equitailed credible intervalsProbability that the parameter is greater than 0Description of the distributions, if knownMarginal probability that a coefficient should be included in the model, for stochastic search variable selection (SSVS) predictor-variable-selection models ```

example

````SummaryStatistics = summarize(Mdl)` returns a structure array with a table summarizing the regression coefficients and disturbance variance, and a description of the joint distribution of the parameters.```

## Examples

collapse all

Consider the multiple linear regression model that predicts the US real gross national product (`GNPR`) using a linear combination of industrial production index (`IPI`), total employment (`E`), and real wages (`WR`).

`${\text{GNPR}}_{t}={\beta }_{0}+{\beta }_{1}{\text{IPI}}_{t}+{\beta }_{2}{\text{E}}_{t}+{\beta }_{3}{\text{WR}}_{t}+{\epsilon }_{t}.$`

For all $t$, ${\epsilon }_{t}$ is a series of independent Gaussian disturbances with a mean of 0 and variance ${\sigma }^{2}$.

Assume these prior distributions for $\mathit{k}$ = 0,...,3:

• ${\beta }_{k}|{\sigma }^{2},{\gamma }_{k}={\gamma }_{k}\sigma \sqrt{{V}_{k1}}{Z}_{1}+\left(1-{\gamma }_{k}\right)\sigma \sqrt{{V}_{k2}}{Z}_{2}$, where ${\mathit{Z}}_{1}$ and ${\mathit{Z}}_{2}\text{\hspace{0.17em}}$are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori, but they are dependent on the disturbance variance.

• ${\sigma }^{2}\sim IG\left(A,B\right)$. $A$ and $B$ are the shape and scale, respectively, of an inverse gamma distribution.

• ${\gamma }_{\mathit{k}}\in \left\{0,1\right\}$and it represents the random variable-inclusion regime variable with a discrete uniform distribution.

Create a prior model for SSVS. Specify the number of predictors `p`.

```p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','mixconjugateblm','VarNames',VarNames);```

`PriorMdl` is a `mixconjugateblm` Bayesian linear regression model object for SSVS predictor selection representing the prior distribution of the regression coefficients and disturbance variance.

Summarize the prior distribution.

`summarize(PriorMdl)`
``` | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------------ Intercept | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution IPI | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution E | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution WR | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1) ```

The function displays a table of summary statistics and other information about the prior distribution at the command line.

Load the Nelson-Plosser data set, and create variables for the predictor and response data.

```load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;```

Estimate the posterior distributions. Suppress the estimation display.

`PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);`

`PosteriorMdl` is an `empiricalblm` model object that contains the posterior distributions of $\beta$ and ${\sigma }^{2}$.

Obtain summary statistics from the posterior distribution.

`summary = summarize(PosteriorMdl);`

`summary` is a structure array containing two fields: `MarginalDistributions` and `JointDistribution`.

Display the marginal distribution summary by using dot notation.

`summary.MarginalDistributions`
```ans=5×5 table Mean Std CI95 Positive Distribution __________ _________ ________________________ ________ _____________ Intercept -18.66 10.348 -37.006 0.8406 0.0412 {'Empirical'} IPI 4.4555 0.15287 4.1561 4.7561 1 {'Empirical'} E 0.00096765 0.0003759 0.00021479 0.0016644 0.9968 {'Empirical'} WR 2.4739 0.36337 1.7607 3.1882 1 {'Empirical'} Sigma2 47.773 8.6863 33.574 67.585 1 {'Empirical'} ```

The `MarginalDistributions` field is a table of summary statistics and other information about the posterior distribution.

## Input Arguments

collapse all

Bayesian linear regression model for predictor variable selection, specified as a model object in this table.

Model ObjectDescription
`mixconjugateblm`Dependent, Gaussian-mixture-inverse-gamma conjugate model for SSVS predictor variable selection, returned by `bayeslm`
`mixsemiconjugateblm`Independent, Gaussian-mixture-inverse-gamma semiconjugate model for SSVS predictor variable selection, returned by `bayeslm`
`lassoblm`Bayesian lasso regression model returned by `bayeslm`

## Output Arguments

collapse all

Parameter distribution summary, returned as a structure array containing the information in this table.

Structure FieldDescription
`MarginalDistributions`

Table containing a summary of the parameter distributions. Rows correspond to parameters. Columns correspond to the:

• Estimated posterior mean (`Mean`)

• Standard deviation (`Std`)

• 95% equitailed credible interval (`CI95`)

• Posterior probability that the parameter is greater than 0 (`Positive`)

• Description of the marginal or conditional posterior distribution of the parameter (`Distribution`)

Row names are the names in `Mdl.VarNames`. The name of the last row is `Sigma2`.

`JointDistribution`

A string scalar that describes the distributions of the regression coefficients (`Beta`) and the disturbance variance (`Sigma2`) when known.

For distribution descriptions:

• `N(Mu,V)` denotes the normal distribution with mean `Mu` and variance matrix `V`. This distribution can be multivariate.

• `IG(A,B)` denotes the inverse gamma distribution with shape `A` and scale `B`.

• `Mixture distribution` denotes a Student’s t mixture distribution.

Note

If `Mdl` is a `lassoblm` model and `Mdl.Probability` is a function handle representing the regime probability distribution, then `summarize` cannot estimate prior distribution statistics for the coefficients. Therefore, entries corresponding to coefficient statistics are `NaN` values.

collapse all

### Bayesian Linear Regression Model

A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables.

For times t = 1,...,T:

• yt is the observed response.

• xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t.

• β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt.

• εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is

`$\ell \left(\beta ,{\sigma }^{2}|y,x\right)=\prod _{t=1}^{T}\varphi \left({y}_{t};{x}_{t}\beta ,{\sigma }^{2}\right).$`

ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;.

Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.

## Algorithms

• If `Mdl` is a `lassoblm` model object and `Mdl.Probability` is a numeric vector, then the 95% credible intervals on the regression coefficients are ```Mean + [–2 2]*Std```, where `Mean` and `Std` are variables in the summary table.

• If `Mdl` is a `mixconjugateblm` or `mixsemiconjugateblm` model object, then the 95% credible intervals on the regression coefficients are estimated from the mixture cdf. If the estimation fails, then `summarize` returns `NaN` values instead.

## Version History

Introduced in R2018b