Hat Matrix and Leverage

Hat Matrix

Purpose

The hat matrix provides a measure of leverage. It is useful for investigating whether one or more observations are outlying with regard to their X values, and therefore might be excessively influencing the regression results.

Definition

The hat matrix is also known as the projection matrix because it projects the vector of observations, y, onto the vector of predictions, $\stackrel{^}{y}$, thus putting the "hat" on y. The hat matrix H is defined in terms of the data matrix X:

H = X(XTX)–1XT

and determines the fitted or predicted values since

$\stackrel{^}{y}=Hy=Xb.$

The diagonal elements of H, hii, are called leverages and satisfy

$\begin{array}{l}0\le {h}_{ii}\le 1\\ \sum _{i=1}^{n}{h}_{ii}=p,\end{array}$

where p is the number of coefficients, and n is the number of observations (rows of X) in the regression model. HatMatrix is an n-by-n matrix in the Diagnostics table.

How To

After obtaining a fitted model, say, mdl, using fitlm or stepwiselm, you can:

• Display the HatMatrix by indexing into the property using dot notation

mdl.Diagnostics.HatMatrix
When n is large, HatMatrix might be computationally expensive. In those cases, you can obtain the diagonal values directly, using

mdl.Diagnostics.Leverage

Leverage

Purpose

Leverage is a measure of the effect of a particular observation on the regression predictions due to the position of that observation in the space of the inputs. In general, the farther a point is from the center of the input space, the more leverage it has. Because the sum of the leverage values is p, an observation i can be considered as an outlier if its leverage substantially exceeds the mean leverage value, p/n, for example, a value larger than 2*p/n.

Definition

The leverage of observation i is the value of the ith diagonal term, hii, of the hat matrix, H, where

H = X(XTX)–1XT.

The diagonal terms satisfy

$\begin{array}{l}0\le {h}_{ii}\le 1\\ \sum _{i=1}^{n}{h}_{ii}=p,\end{array}$

where p is the number of coefficients in the regression model, and n is the number of observations. The minimum value of hii is 1/n for a model with a constant term. If the fitted model goes through the origin, then the minimum leverage value is 0 for an observation at x = 0.

It is possible to express the fitted values, $\stackrel{^}{y}$, by the observed values, y, since

$\stackrel{^}{y}=Hy=Xb.$

Hence, hii expresses how much the observation yi has impact on ${\stackrel{^}{y}}_{i}$. A large value of hii indicates that the ith case is distant from the center of all X values for all n cases and has more leverage. Leverage is an n-by-1 column vector in the Diagnostics table.

How To

After obtaining a fitted model, say, mdl, using fitlm or stepwiselm, you can:

• Display the Leverage vector by indexing into the property using dot notation

mdl.Diagnostics.Leverage

• Plot the leverage for the values fitted by your model using

plotDiagnostics(mdl)
See the plotDiagnostics method of the LinearModel class for details.

Determine High Leverage Observations

This example shows how to compute Leverage values and assess high leverage observations. Load the sample data and define the response and independent variables.

y = hospital.BloodPressure(:,1);
X = double(hospital(:,2:5));

Fit a linear regression model.

mdl = fitlm(X,y);

Plot the leverage values.

plotDiagnostics(mdl) For this example, the recommended threshold value is 2*5/100 = 0.1. There is no indication of high leverage observations.