Main Content

Specify Gradient for HMC Sampler

This example shows how to set up a Bayesian linear regression model for efficient posterior sampling using the Hamiltonian Monte Carlo (HMC) sampler. The prior distribution of the coefficients is a multivariate t, and the disturbance variance has an inverse gamma prior.

Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR).

$$\texttt{GNPR}_t=\beta_0+\beta_1\texttt{IPI}_t+\beta_2\texttt{E}_t+\beta_3\texttt{WR}_t+\varepsilon_t.$$

For all $t$, $\varepsilon_t$ is a series of independent Gaussian disturbances with a mean of 0 and variance $\sigma^2$. Assume these prior distributions.

  • $\beta_j\vert\sigma^2$ is a 4-D t distribution with 30 degrees of freedom for each component, correlation matrix C, location ct, and scale st.

  • $\sigma^2 \sim IG(10r_1,10r_2)$, with shape $10r_1$ and scale $10r_2$.

bayeslm treats these assumptions and the data likelihood as if the corresponding posterior is analytically intractable.

Declare a MATLAB® function that:

  • Accepts values of $\beta$ and $\sigma^2$ together in a column vector, and accepts values of the hyperparameters.

  • Returns the log of the joint prior distribution, $\pi\left(\beta,\sigma^2\right)$, given the values of $\beta$ and $\sigma^2$, and returns the gradient of the log prior density with respect to $\beta$ and $\gamma$.

function [logPDF,gradient] = priorMVTIGHMC(params,ct,st,dof,C,a,b)
%priorMVTIGHMC Log density of multivariate t times inverse gamma and
%gradient
%   priorMVTIGHMC passes params(1:end-1) to the multivariate t density
%   function with dof degrees of freedom for each component and positive
%   definite correlation matrix C, and passes params(end) to the inverse
%   gamma distribution with shape a and scale b.  priorMVTIG returns the
%   log of the product of the two evaluated densities and its gradient.
%   After applying the log, terms involving constants only are dropped from
%   logPDF.
%   
%   params: Parameter values at which the densities are evaluated, an
%           m-by-1 numeric vector.
%
%       ct: Multivariate t distribution component centers, an (m-1)-by-1
%           numeric vector.  Elements correspond to the first m-1 elements
%           of params.
%
%       st: Multivariate t distribution component scales, an (m-1)-by-1
%           numeric vector.  Elements correspond to the first m-1 elements 
%           of params.
%
%      dof: Degrees of freedom for the multivariate t distribution, a
%           numeric scalar or (m-1)-by-1 numeric vector. priorMVTIG expands
%           scalars such that dof = dof*ones(m-1,1). Elements of dof
%           correspond to the elements of params(1:end-1).
%
%        C: correlation matrix for the multivariate t distribution, an
%           (m-1)-by-(m-1) symmetric, positive definite matrix. Rows and
%           columns correspond to the elements of params(1:end-1).
% 
%        a: Inverse gamma shape parameter, a positive numeric scalar.
%
%        b: Inverse gamma scale parameter, a positive scalar.
%
beta = params(1:(end-1));
numCoeffs = numel(beta);
sigma2 = params(end);

tVal = (beta - ct)./st;
expo = -(dof + numCoeffs)/2;
logmvtDensity = expo*log(1 + (1/dof)*((tVal'/C)*tVal));
logigDensity = (-a - 1)*log(sigma2) - 1/(sigma2*b);
logPDF = logmvtDensity + logigDensity;

grad_beta = (expo/(1 + (1/dof)*(tVal'/C*tVal)))*((2/dof)*((tVal'/C)'./st));
grad_sigma = (-a - 1)/sigma2 + 1/(sigma2^2*b);
gradient = [grad_beta; grad_sigma];
end

Create an anonymous function that operates like priorMVTIGHMC, but accepts the parameter values only and holds the hyperparameter values fixed at arbitrarily chosen values.

rng(1); % For reproducibility
dof = 30;
V = rand(4,4);
Sigma = V'*V;
st = sqrt(diag(Sigma));
C = Sigma./(st*st');
ct = -10*rand(4,1);
a = 10*rand;
b = 10*rand;
logPDF = @(params)priorMVTIGHMC(params,ct,st,dof,C,a,b);

Create a custom joint prior model for the linear regression parameters. Specify the number of predictors, p. Also, specify the function handle for priorMVTIGHMC and the variable names.

p = 3;
PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,...
    'VarNames',["IPI" "E" "WR"]);

PriorMdl is a customblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance.

Load the Nelson-Plosser data set. Create variables for the response and predictor series. For numerical stability, convert the series to returns.

load Data_NelsonPlosser
X = price2ret(DataTable{:,PriorMdl.VarNames(2:end)});
y = price2ret(DataTable{:,'GNPR'});

Estimate the marginal posterior distributions of $\beta$ and $\sigma^2$ using the HMC sampler. To monitor the progress of the sampler, specify a verbosity level of 2. For numerical stability, specify reparameterization of the disturbance variance.

options = sampleroptions('Sampler','hmc','VerbosityLevel',2)
PosteriorMdl = estimate(PriorMdl,X,y,'Options',options,'Reparameterize',true);
options = 

  struct with fields:

                        Sampler: 'HMC'
           StepSizeTuningMethod: 'dual-averaging'
         MassVectorTuningMethod: 'iterative-sampling'
    NumStepSizeTuningIterations: 100
          TargetAcceptanceRatio: 0.6500
                  NumStepsLimit: 2000
                 VerbosityLevel: 2
                       NumPrint: 100

o Tuning mass vector using method: iterative-sampling 

o Initial step size for dual-averaging = 0.0625

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.312313e+01 |   1.048e-01 |          48 |   6.300e-01 |           3 |
Finished mass vector tuning iteration 1 of 5. 

o Initial step size for dual-averaging = 0.5

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.278350e+01 |   5.691e-01 |           9 |   6.300e-01 |           0 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.080779e+01 |   4.112e-02 |           5 |   8.000e-01 |           0 |
Finished mass vector tuning iteration 2 of 5. 

o Initial step size for dual-averaging = 2

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.044744e+01 |   3.088e-01 |          16 |   5.800e-01 |           0 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  3.823729e+01 |   2.779e-01 |           4 |   7.700e-01 |           0 |
|      200 |  4.085115e+01 |   1.140e-01 |          15 |   8.433e-01 |           0 |
Finished mass vector tuning iteration 3 of 5. 

o Initial step size for dual-averaging = 0.25

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.132405e+01 |   3.157e-01 |          16 |   6.500e-01 |           0 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.357148e+01 |   1.591e-02 |           2 |   8.050e-01 |           0 |
|      200 |  3.890737e+01 |   2.841e-01 |          16 |   8.533e-01 |           0 |
Finished mass vector tuning iteration 4 of 5. 

o Initial step size for dual-averaging = 0.25

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.358423e+01 |   3.442e-01 |          15 |   6.700e-01 |           0 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.048687e+01 |   9.933e-02 |           4 |   7.950e-01 |           0 |
|      200 |  4.334420e+01 |   3.098e-01 |           4 |   8.433e-01 |           0 |
Finished mass vector tuning iteration 5 of 5. 

o Tuning step size using method: dual-averaging. Target acceptance ratio = 0.65

o Initial step size for dual-averaging = 0.5

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 |  4.302411e+01 |   2.995e-01 |          17 |   6.100e-01 |           0 |

Method: MCMC sampling with 10000 draws
Number of observations: 61
Number of predictors:   4
 
           |   Mean     Std         CI95        Positive  Distribution 
-----------------------------------------------------------------------
 Intercept |  0.0077  0.0086  [-0.009,  0.024]    0.812     Empirical  
 IPI       | -0.1632  0.1461  [-0.476,  0.099]    0.123     Empirical  
 E         |  2.3694  0.5464  [ 1.396,  3.564]    1.000     Empirical  
 WR        | -0.2309  0.3023  [-0.844,  0.338]    0.219     Empirical  
 Sigma2    |  0.0036  0.0007  [ 0.002,  0.005]    1.000     Empirical  
 

PosteriorMdl is an empiricalblm model object storing the draws from the posterior distributions.

View trace plots and autocorrelation function (ACF) plots of the draws from the posteriors.

for j = 1:4
    figure;
    subplot(2,1,1)
    plot(PosteriorMdl.BetaDraws(j,:));
    title(sprintf('Trace plot - Beta %d',j));
    xlabel('MCMC draw')
    ylabel('Simulation index')
    subplot(2,1,2)
    autocorr(PosteriorMdl.BetaDraws(j,:))
end

figure;
subplot(2,1,1)
plot(PosteriorMdl.Sigma2Draws);
title('Trace plot - Disturbance Variance');
xlabel('MCMC draw')
ylabel('Simulation index')
subplot(2,1,2)
autocorr(PosteriorMdl.Sigma2Draws)

The trace plots show adequate mixing and no transient behavior. The autocorrelation plots show low correlation among subsequent samples.

See Also

Objects

Functions

Topics