6 views (last 30 days)

Show older comments

I have three questions regarding Dendrogram and linkge function. I have data of some x models e.g.,

x1=0.1576

x2=0.9706

x3=0.9572

x4=0.4854

x5=0.8003

x6=0.1419

x7=0.4218

x8=0.9157

x9=0.7922

x10=0.9595

1) I run the following code to draw the Dendrogram. What the coulmns one and two correspond in output of the linkage(X)?

X = [0.1576; 0.9706; 0.9572; 0.4854; 0.8003; 0.1419; 0.4218; 0.9157; 0.7922; 0.9595];

tree = linkage(X,'average');

figure()

dendrogram(tree)

The tree variable is

tree =

3.0000 10.0000 0.0023

5.0000 9.0000 0.0081

2.0000 11.0000 0.0122

1.0000 6.0000 0.0157

8.0000 13.0000 0.0467

4.0000 7.0000 0.0636

12.0000 15.0000 0.1545

14.0000 16.0000 0.3039

17.0000 18.0000 0.5976

I struggling to understand what the numbers(14,12,16) in coulmns one and two correspond.

2) How the numbers in Dendrogram are assinged on the horizontal axis? I thought these would be the columns one and two however they are not.

3)I would like to change the horzontal axis numbers to the names of the models, to which these numbers correspond so that the horizontal numbers show model names of similar clustors instead of the numbers. the names could be x1, x2,x3 i.e.,

Kindly, help me to sort out this.

Pratyush Roy
on 18 May 2021

Edited: Pratyush Roy
on 18 May 2021

Hi Imran,

1) Agglomerative hierarchical cluster tree, returned as a numeric matrix. Z is an (m-1)-by-3 matrix, where m is the number of observations in the original data. Columns 1 and 2 of Z contain cluster indices linked in pairs to form a binary tree. The leaf nodes are numbered from 1 to m. Leaf nodes are the singleton clusters from which all higher clusters are built. Each newly formed cluster, corresponding to row Z(I,:), is assigned the index m + I. The entries Z(I,1) and Z(I,2) contain the indices of the two component clusters that form cluster m+I. The m-1 higher clusters correspond to the interior nodes of the clustering tree. Z(I,3) contains the linkage distance between the two clusters merged in row Z(I,:).

For example, consider building a tree with 30 initial nodes. Suppose that cluster 5 and cluster 7 are combined at step 12, and that the distance between them at that step is 1.5. Then Z(12,:) is [5 7 1.5]. The newly formed cluster has index 12 + 30 = 42. If cluster 42 appears in a later row, then the function is combining the cluster created at step 12 into a larger cluster.

2) The horizontal axis numbers are the leaf node indices for the tree. If there are 30 or fewer data points in the original data set, then each leaf in the dendrogram corresponds to one data point. If there are more than 30 data points, then dendrogram collapses lower branches so that there are 30 leaf nodes. As a result, some leaves in the plot correspond to more than one data point. You can refer to the documentation link here for more information.

3)You can use the "Labels" Name-Value pair to change the horizontal labels in the dendrogram. The code snippet below is helpful to understand how to use string names for Labels:

labels = cellstr(num2str((1:10)', 'x%d')) % Generates cell array of character vectors {'x1'},{'x2'},{'x3'}

dendrogram(X, 'Labels', labels)

Hope this helps!

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!