# surrogateAssociation

Mean predictive measure of association for surrogate splits in classification tree

## Syntax

``ma = surrogateAssociation(tree)``
``ma = surrogateAssociation(tree,N)``

## Description

example

````ma = surrogateAssociation(tree)` returns a matrix of predictive measures of association for the predictors in `tree`.```
````ma = surrogateAssociation(tree,N)` returns a matrix of predictive measures of association averaged over the nodes in vector `N`.```

## Examples

collapse all

`load fisheriris`

Grow a classification tree using `species` as the response. Specify to use surrogate splits for missing values.

`tree = fitctree(meas,species,'surrogate','on');`

Find the mean predictive measure of association between the predictor variables.

`ma = surrogateAssociation(tree)`
```ma = 4×4 1.0000 0 0 0 0 1.0000 0 0 0.4633 0.2500 1.0000 0.5000 0.2065 0.1413 0.4022 1.0000 ```

Find the mean predictive measure of association averaged over the odd-numbered nodes in `tree`.

```N = 1:2:tree.NumNodes; ma = surrogateAssociation(tree,N)```
```ma = 4×4 1.0000 0 0 0 0 1.0000 0 0 0.7600 0.5000 1.0000 1.0000 0.4130 0.2826 0.8043 1.0000 ```

## Input Arguments

collapse all

Trained classification tree, specified as a `ClassificationTree` or `CompactClassificationTree` model object. That is, `tree` is a trained classification model returned by `fitctree` or `compact`.

Nodes, specified as a vector of node numbers in `tree`.

## Output Arguments

collapse all

Predictive measures of association for predictors in `tree`, returned as a matrix.

• ```ma = surrogateAssociation(tree)``` returns a `P`-by-`P` matrix, where `P` is the number of predictors in `tree`. `ma(i,j)` is the predictive measure of association between the optimal split on variable `i` and a surrogate split on variable `j`. For more details, see .

• ```ma = surrogateAssociation(tree,N)``` returns a `P`-by-`P` matrix representing the predictive measure of association between variables averaged over nodes in the vector `N`. `N` contains node numbers from `1` to `max(tree.NumNodes)`.

collapse all

### Predictive Measure of Association

The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split yields the maximum predictive measure of association. The second-best surrogate split has the second-largest predictive measure of association.

Suppose xj and xk are predictor variables j and k, respectively, and jk. At node t, the predictive measure of association between the optimal split xj < u and a surrogate split xk < v is

`${\lambda }_{jk}=\frac{\text{min}\left({P}_{L},{P}_{R}\right)-\left(1-{P}_{{L}_{j}{L}_{k}}-{P}_{{R}_{j}{R}_{k}}\right)}{\text{min}\left({P}_{L},{P}_{R}\right)}.$`
• PL is the proportion of observations in node t, such that xj < u. The subscript L stands for the left child of node t.

• PR is the proportion of observations in node t, such that xju. The subscript R stands for the right child of node t.

• ${P}_{{L}_{j}{L}_{k}}$ is the proportion of observations at node t, such that xj < u and xk < v.

• ${P}_{{R}_{j}{R}_{k}}$ is the proportion of observations at node t, such that xju and xkv.

• Observations with missing values for xj or xk do not contribute to the proportion calculations.

λjk is a value in (–∞,1]. If λjk > 0, then xk < v is a worthwhile surrogate split for xj < u.

### Surrogate Decision Splits

A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion.

When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the second-best surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association.

## Algorithms

Element `ma(i,j)` is the predictive measure of association averaged over surrogate splits on predictor `j` for which predictor `i` is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor `i` and surrogate splits on predictor `j` and dividing by the total number of optimal splits on predictor `i`, including splits for which the predictive measure of association between predictors `i` and `j` is negative.

## Version History

Introduced in R2014b