Main Content

stepwise

Interactive stepwise regression

Syntax

stepwise
stepwise(X,y)
stepwise(X,y,inmodel,penter,premove)

Description

stepwise uses the sample data in hald.mat to display a graphical user interface for performing stepwise regression of the response values in heat on the predictive terms in ingredients.

Graphical user interface for stepwise regression.

The upper left of the interface displays estimates of the coefficients for all potential terms, with horizontal bars indicating 90% (colored) and 95% (grey) confidence intervals. The red color indicates that, initially, the terms are not in the model. Values displayed in the table are those that would result if the terms were added to the model.

The middle portion of the interface displays summary statistics for the entire model. These statistics are updated with each step.

The lower portion of the interface, Model History, displays the RMSE for the model. The plot tracks the RMSE from step to step, so you can compare the optimality of different models. Hover over the blue dots in the history to see which terms were in the model at a particular step. Click on a blue dot in the history to open a copy of the interface initialized with the terms in the model at that step.

Initial models, as well as entrance/exit tolerances for the p-values of F-statistics, are specified using additional input arguments to stepwise. Defaults are an initial model with no terms, an entrance tolerance of 0.05, and an exit tolerance of 0.10.

To center and scale the input data (compute z-scores) to improve conditioning of the underlying least-squares problem, select Scale Inputs from the Stepwise menu.

You proceed through a stepwise regression in one of two ways:

  1. Click Next Step to select the recommended next step. The recommended next step either adds the most significant term or removes the least significant term. When the regression reaches a local minimum of RMSE, the recommended next step is “Move no terms.” You can perform all of the recommended steps at once by clicking All Steps.

  2. Click a line in the plot or in the table to toggle the state of the corresponding term. Clicking a red line, corresponding to a term not currently in the model, adds the term to the model and changes the line to blue. Clicking a blue line, corresponding to a term currently in the model, removes the term from the model and changes the line to red.

To call addedvarplot and produce an added variable plot from the stepwise interface, select Added Variable Plot from the Stepwise menu. A list of terms is displayed. Select the term you want to add, and then click OK.

Click Export to display a dialog box that allows you to select information from the interface to save to the MATLAB® workspace. Check the information you want to export and, optionally, change the names of the workspace variables to be created. Click OK to export the information.

stepwise(X,y) displays the interface using the p predictive terms in the n-by-p matrix X and the response values in the n-by-1 vector y. Distinct predictive terms should appear in different columns of X.

Note

stepwise automatically includes a constant term in all models. Do not enter a column of 1s directly into X.

stepwise treats NaN values in either X or y as missing values, and ignores them.

stepwise(X,y,inmodel,penter,premove) additionally specifies the initial model (inmodel) and the entrance (penter) and exit (premove) tolerances for the p-values of F-statistics. inmodel is either a logical vector with length equal to the number of columns of X, or a vector of indices, with values ranging from 1 to the number of columns in X. The value of penter must be less than or equal to the value of premove.

Algorithms

Stepwise regression is a systematic method for adding and removing terms from a multilinear model based on their statistical significance in a regression. The method begins with an initial model and then compares the explanatory power of incrementally larger and smaller models. At each step, the p value of an F-statistic is computed to test models with and without a potential term. If a term is not currently in the model, the null hypothesis is that the term would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, the term is added to the model. Conversely, if a term is currently in the model, the null hypothesis is that the term has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, the term is removed from the model. The method proceeds as follows:

  1. Fit the initial model.

  2. If any terms not in the model have p-values less than an entrance tolerance (that is, if it is unlikely that they would have zero coefficient if added to the model), add the one with the smallest p value and repeat this step; otherwise, go to step 3.

  3. If any terms in the model have p-values greater than an exit tolerance (that is, if it is unlikely that the hypothesis of a zero coefficient can be rejected), remove the one with the largest p value and go to step 2; otherwise, end.

Depending on the terms included in the initial model and the order in which terms are moved in and out, the method may build different models from the same set of potential terms. The method terminates when no single step improves the model. There is no guarantee, however, that a different initial model or a different sequence of steps will not lead to a better fit. In this sense, stepwise models are locally optimal, but may not be globally optimal.

Version History

Introduced before R2006a