Create Regression Models with ARIMA Errors

Default Regression Model with ARIMA Errors

This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify the regression model with ARIMA errors.

Specify the default regression model with ARIMA(3,1,2) errors:

$\begin{array}{c}{y}_{t}=c+{X}_{t}\beta +{u}_{t}\\ \left(1-{a}_{1}L-{a}_{2}{L}^{2}-{a}_{3}{L}^{3}\right)\left(1-L\right){u}_{t}=\left(1+{b}_{1}L+{b}_{2}{L}^{2}\right){\epsilon }_{t}.\end{array}$

Mdl = regARIMA(3,1,2)
Mdl =
regARIMA with properties:

Description: "ARIMA(3,1,2) Error Model (Gaussian Distribution)"
Distribution: Name = "Gaussian"
Intercept: NaN
Beta: [1×0]
P: 4
D: 1
Q: 2
AR: {NaN NaN NaN} at lags [1 2 3]
SAR: {}
MA: {NaN NaN} at lags [1 2]
SMA: {}
Variance: NaN

The software sets each parameter to NaN, and the innovation distribution to Gaussian. The AR coefficients are at lags 1 through 3, and the MA coefficients are at lags 1 and 2. The property P = p + D = 3 + 1 = 4. Therefore, the software requires at least four presample values to initialize the time series.

Pass Mdl into estimate with data to estimate the parameters set to NaN. The regARIMA model sets Beta to [] and does not display it. If you pass a matrix of predictors (${X}_{t}$) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in ${X}_{t}$.

Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values.

Be aware that the regression model intercept (Intercept) is not identifiable in regression models with ARIMA errors. If you want to estimate Mdl, then you must set Intercept to a value using, for example, dot notation. Otherwise, estimate might return a spurious estimate of Intercept.

ARIMA Error Model Without an Intercept

This example shows how to specify a regression model with ARIMA errors without a regression intercept.

Specify the default regression model with ARIMA(3,1,2) errors:

$\begin{array}{c}{y}_{t}={X}_{t}\beta +{u}_{t}\\ \left(1-{a}_{1}L-{a}_{2}{L}^{2}-{a}_{3}{L}^{3}\right)\left(1-L\right){u}_{t}=\left(1+{b}_{1}L+{b}_{2}{L}^{2}\right){\epsilon }_{t}.\end{array}$

Mdl = regARIMA('ARLags',1:3,'MALags',1:2,'D',1,'Intercept',0)
Mdl =
regARIMA with properties:

Description: "ARIMA(3,1,2) Error Model (Gaussian Distribution)"
Distribution: Name = "Gaussian"
Intercept: 0
Beta: [1×0]
P: 4
D: 1
Q: 2
AR: {NaN NaN NaN} at lags [1 2 3]
SAR: {}
MA: {NaN NaN} at lags [1 2]
SMA: {}
Variance: NaN

The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default.

Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation.

In general, if you want to use estimate to estimate a regression models with ARIMA errors where D > 0 or s > 0, then you must set Intercept to a value before estimation.

You can modify the properties of Mdl using dot notation.

ARIMA Error Model with Nonconsecutive Lags

This example shows how to specify a regression model with ARIMA errors, where the nonzero AR and MA terms are at nonconsecutive lags.

Specify the regression model with ARIMA(8,1,4) errors:

$\begin{array}{c}{y}_{t}={X}_{t}\beta +{u}_{t}\\ \left(1-{a}_{1}L-{a}_{4}{L}^{4}-{a}_{8}{L}^{8}\right)\left(1-L\right){u}_{t}=\left(1+{b}_{1}L+{b}_{4}{L}^{4}\right){\epsilon }_{t}.\end{array}$

Mdl = regARIMA('ARLags',[1,4,8],'D',1,'MALags',[1,4],...
'Intercept',0)
Mdl =
regARIMA with properties:

Description: "ARIMA(8,1,4) Error Model (Gaussian Distribution)"
Distribution: Name = "Gaussian"
Intercept: 0
Beta: [1×0]
P: 9
D: 1
Q: 4
AR: {NaN NaN NaN} at lags [1 4 8]
SAR: {}
MA: {NaN NaN} at lags [1 4]
SMA: {}
Variance: NaN

The AR coefficients are at lags 1, 4, and 8, and the MA coefficients are at lags 1 and 4. The software sets the interim lags to 0.

Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then estimate holds all interim lag coefficients to 0 during estimation.

Known Parameter Values for a Regression Model with ARIMA Errors

This example shows how to specify values for all parameters of a regression model with ARIMA errors.

Specify the regression model with ARIMA(3,1,2) errors:

$\begin{array}{c}{y}_{t}={X}_{t}\left[\begin{array}{l}2.5\\ -0.6\end{array}\right]+{u}_{t}\\ \left(1-0.7L+0.3{L}^{2}-0.1{L}^{3}\right)\left(1-L\right){u}_{t}=\left(1+0.5L+0.2{L}^{2}\right){\epsilon }_{t},\end{array}$

where ${\epsilon }_{t}$ is Gaussian with unit variance.

Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],...
'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},...
'Variance',1,'D',1)
Mdl =
regARIMA with properties:

Description: "Regression with ARIMA(3,1,2) Error Model (Gaussian Distribution)"
Distribution: Name = "Gaussian"
Intercept: 0
Beta: [2.5 -0.6]
P: 4
D: 1
Q: 2
AR: {0.7 -0.3 0.1} at lags [1 2 3]
SAR: {}
MA: {0.5 0.2} at lags [1 2]
SMA: {}
Variance: 1

The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate it. However, you can simulate or forecast responses by passing Mdl to simulate or forecast.

Regression Model with ARIMA Errors and t Innovations

This example shows how to set the innovation distribution of a regression model with ARIMA errors to a t distribution.

Specify the regression model with ARIMA(3,1,2) errors:

$\begin{array}{c}{y}_{t}={X}_{t}\left[\begin{array}{l}2.5\\ -0.6\end{array}\right]+{u}_{t}\\ \left(1-0.7L+0.3{L}^{2}-0.1{L}^{3}\right)\left(1-L\right){u}_{t}=\left(1+0.5L+0.2{L}^{2}\right){\epsilon }_{t},\end{array}$

where ${\epsilon }_{t}$ has a t distribution with the default degrees of freedom and unit variance.

Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],...
'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},'Variance',1,...
'Distribution','t','D',1)
Mdl =
regARIMA with properties:

Description: "Regression with ARIMA(3,1,2) Error Model (t Distribution)"
Distribution: Name = "t", DoF = NaN
Intercept: 0
Beta: [2.5 -0.6]
P: 4
D: 1
Q: 2
AR: {0.7 -0.3 0.1} at lags [1 2 3]
SAR: {}
MA: {0.5 0.2} at lags [1 2]
SMA: {}
Variance: 1

The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate.

Specify a ${t}_{10}$ distribution.

Mdl.Distribution = struct('Name','t','DoF',10)
Mdl =
regARIMA with properties:

Description: "Regression with ARIMA(3,1,2) Error Model (t Distribution)"
Distribution: Name = "t", DoF = 10
Intercept: 0
Beta: [2.5 -0.6]
P: 4
D: 1
Q: 2
AR: {0.7 -0.3 0.1} at lags [1 2 3]
SAR: {}
MA: {0.5 0.2} at lags [1 2]
SMA: {}
Variance: 1

You can simulate or forecast responses by passing Mdl to simulate or forecast because Mdl is completely specified.

In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.