surrogateAssociation

Mean predictive measure of association for surrogate splits in decision tree

Syntax

ma = surrogateAssociation(tree)
ma = surrogateAssociation(tree,N)

Description

ma = surrogateAssociation(tree) returns a matrix of predictive measures of association for the predictors in tree.

ma = surrogateAssociation(tree,N) returns a matrix of predictive measures of association averaged over the nodes in vector N.

Input Arguments

tree

A regression tree constructed with fitrtree, or a compact regression tree constructed with compact.

N

Vector of node numbers in tree.

Output Arguments

ma

  • ma = surrogateAssociation(tree) returns a P-by-P matrix, where P is the number of predictors in tree. ma(i,j) is the predictive measure of association between the optimal split on variable i and a surrogate split on variable j. For more details, see Algorithms.

  • ma = surrogateAssociation(tree,N) returns a P-by-P representing the predictive measure of association between variables averaged over nodes in the vector N. N contains node numbers from 1 to max(tree.NumNodes).

Examples

expand all

Load the carsmall data set. Specify Displacement, Horsepower, and Weight as predictor variables.

load carsmall
X = [Displacement Horsepower Weight];

Grow a regression tree using MPG as the response. Specify to use surrogate splits for missing values.

tree = fitrtree(X,MPG,'surrogate','on');

Find the mean predictive measure of association between the predictor variables.

ma = surrogateAssociation(tree)
ma = 3×3

    1.0000    0.2167    0.5083
    0.4521    1.0000    0.3769
    0.2540    0.2659    1.0000

Find the mean predictive measure of association averaged over the odd-numbered nodes in tree.

N = 1:2:tree.NumNodes;
ma = surrogateAssociation(tree,N)
ma = 3×3

    1.0000    0.1250    0.6875
    0.5632    1.0000    0.5861
    0.3333    0.3148    1.0000

More About

expand all

Algorithms

Element ma(i,j) is the predictive measure of association averaged over surrogate splits on predictor j for which predictor i is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor i and surrogate splits on predictor j and dividing by the total number of optimal splits on predictor i, including splits for which the predictive measure of association between predictors i and j is negative.