# cmdscale

Classical multidimensional scaling

## Syntax

```Y = cmdscale(D) [Y,e] = cmdscale(D) [Y,e] = cmdscale(D,p) ```

## Description

`Y = cmdscale(D)` takes an `n`-by-`n` distance matrix `D`, and returns an `n`-by-`p` configuration matrix `Y`. Rows of `Y` are the coordinates of `n` points in `p`-dimensional space for some `p < n`. When `D` is a Euclidean distance matrix, the distances between those points are given by `D`. `p` is the dimension of the smallest space in which the `n` points whose inter-point distances are given by `D` can be embedded.

`[Y,e] = cmdscale(D)` also returns the eigenvalues of `Y*Y'`. When `D` is Euclidean, the first `p` elements of `e` are positive, the rest zero. If the first `k` elements of `e` are much larger than the remaining `(n-k)`, then you can use the first `k` columns of `Y` as `k`-dimensional points whose inter-point distances approximate `D`. This can provide a useful dimension reduction for visualization, e.g., for `k = 2`.

`D` need not be a Euclidean distance matrix. If it is non-Euclidean or a more general dissimilarity matrix, then some elements of `e` are negative, and `cmdscale` chooses `p` as the number of positive eigenvalues. In this case, the reduction to `p` or fewer dimensions provides a reasonable approximation to `D` only if the negative elements of `e` are small in magnitude.

`[Y,e] = cmdscale(D,p)` also accepts a positive integer `p` between 1 and `n`. `p` specifies the dimensionality of the desired embedding `Y`. If a `p` dimensional embedding is possible, then `Y` will be of size `n`-by-`p` and `e` will be of size `p`-by-1. If only a `q` dimensional embedding with `q < p` is possible, then `Y` will be of size `n`-by-`q` and `e` will be of size `p`-by-1. Specifying `p` may reduce the computational burden when `n` is very large.

You can specify `D` as either a full dissimilarity matrix, or in upper triangle vector form such as is output by `pdist`. A full dissimilarity matrix must be real and symmetric, and have zeros along the diagonal and positive elements everywhere else. A dissimilarity matrix in upper triangle form must have real, positive entries. You can also specify `D` as a full similarity matrix, with ones along the diagonal and all other elements less than one. `cmdscale` transforms a similarity matrix to a dissimilarity matrix in such a way that distances between the points returned in `Y` equal or approximate `sqrt(1-D)`. To use a different transformation, you must transform the similarities prior to calling `cmdscale`.

## Examples

collapse all

This example shows how to construct a map of 10 US cities based on the distances between those cities, using `cmdscale`.

First, create the distance matrix and pass it to `cmdscale`. In this example, `D` is a full distance matrix: it is square and symmetric, has positive entries off the diagonal, and has zeros on the diagonal.

```cities = ... {'Atl','Chi','Den','Hou','LA','Mia','NYC','SF','Sea','WDC'}; D = [ 0 587 1212 701 1936 604 748 2139 2182 543; 587 0 920 940 1745 1188 713 1858 1737 597; 1212 920 0 879 831 1726 1631 949 1021 1494; 701 940 879 0 1374 968 1420 1645 1891 1220; 1936 1745 831 1374 0 2339 2451 347 959 2300; 604 1188 1726 968 2339 0 1092 2594 2734 923; 748 713 1631 1420 2451 1092 0 2571 2408 205; 2139 1858 949 1645 347 2594 2571 0 678 2442; 2182 1737 1021 1891 959 2734 2408 678 0 2329; 543 597 1494 1220 2300 923 205 2442 2329 0]; [Y,eigvals] = cmdscale(D);```

Next, look at the eigenvalues returned by `cmdscale`. Some of these are negative, indicating that the original distances are not Euclidean. This is because of the curvature of the earth.

```format short g [eigvals eigvals/max(abs(eigvals))]```
```ans = 10×2 9.5821 0.0000 1.6868 0.0000 0.0082 0.0000 0.0014 0.0000 0.0005 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0009 -0.0000 -0.0055 -0.0000 -0.0355 -0.0000 ```

However, in this case, the two largest positive eigenvalues are much larger in magnitude than the remaining eigenvalues. So, despite the negative eigenvalues, the first two coordinates of `Y` are sufficient for a reasonable reproduction of `D`.

```Dtriu = D(find(tril(ones(10),-1)))'; maxrelerr = max(abs(Dtriu-pdist(Y(:,1:2))))./max(Dtriu)```
```maxrelerr = 0.0075371 ```

Here is a plot of the reconstructed city locations as a map. The orientation of the reconstruction is arbitrary.

```plot(Y(:,1),Y(:,2),'.') text(Y(:,1)+25,Y(:,2),cities) xlabel('Miles') ylabel('Miles')``` Determine how the quality of reconstruction varies when you reduce points to distances using different metrics.

Generate ten points in 4-D space that are close to 3-D space. Take a linear transformation of the points so that their transformed values are close to a 3-D subspace that does not align with the coordinate axes.

```rng default % Set the seed for reproducibility A = [normrnd(0,1,10,3) normrnd(0,0.1,10,1)]; B = randn(4,4); X = A*B;```

Reduce the points in `X` to distances by using the Euclidean metric. Find a configuration `Y` with the inter-point distances.

```D = pdist(X,'euclidean'); Y = cmdscale(D);```

Compare the quality of the reconstructions when using 2, 3, or 4 dimensions. The small `maxerr3` value indicates that the first 3 dimensions provide a good reconstruction.

`maxerr2 = max(abs(pdist(X)-pdist(Y(:,1:2)))) `
```maxerr2 = 0.1631 ```
`maxerr3 = max(abs(pdist(X)-pdist(Y(:,1:3)))) `
```maxerr3 = 0.0187 ```
`maxerr4 = max(abs(pdist(X)-pdist(Y)))`
```maxerr4 = 9.6589e-15 ```

Reduce the points in `X` to distances by using the `'cityblock'` metric. Find a configuration `Y` with the inter-point distances.

```D = pdist(X,'cityblock'); [Y,e] = cmdscale(D);```

Evaluate the quality of the reconstruction. `e` contains at least one negative element of large magnitude, which might account for the poor quality of the reconstruction.

`maxerr = max(abs(pdist(X)-pdist(Y)))`
```maxerr = 9.0488 ```
`min(e)`
```ans = -5.6586 ```

## References

 Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984.