Fast K-means

Fast K-means implementation with optional weights and K-means++ style seeding.

Tim Benham

Version 1.1.0.0 (3,94 ko)

3,8K téléchargements

(6)

4 mai 2011

Télécharger

Ouvrir dans MATLAB Online

Suivre

Télécharger

Ouvrir dans MATLAB Online

Suivre

[L, C, D] = FKMEANS(X, k) partitions the vectors in the n-by-p matrix X
into k (or, rarely, fewer) clusters by applying the well known batch
K-means algorithm. Rows of X correspond to points, columns correspond to
variables. The output k-by-p matrix C contains the cluster centroids. The
n-element output column vector L contains the cluster label of each
point. The k-element output column vector D contains the residual cluster
distortions as measured by total squared distance of cluster members from
the centroid.

FKMEANS(X, C0) where C0 is a k-by-p matrix uses the rows of C0 as the
initial centroids instead of choosing them randomly from X.

FKMEANS(X, k, options) allows optional parameter name/value pairs to
be specified. Parameters are:

'weight' - n-by-1 weight vector used to adjust centroid and distortion
calculations. Weights should be positive.
'careful' - binary option that determines whether "careful seeding"
as recommended by Arthur and Vassilvitskii is used when
choosing initial centroids. This option should be used
with care because numerical experiments suggest it may
be counter-productive when the data is noisy.

NOTES

(1) The careful seeding procedure chooses the first centroid at random
from X, and each successive centroid from the remaining points according
to the categorical distribution with selection probabilities proportional
to the point's minimum squared Euclidean distance from the already chosen
centroids. This tends to spread the points out more evenly, and, if the
data is made of k well separated clusters, is likely to choose an initial
centroid from each cluster. This can speed convergence and reduce the
likelihood of getting a bad solution [1]. However, in experiments where
5% uniformly distributed noise data was added to such naturally clustered
data the results were frequently worse then when centroids were chosen at
random.

(2) If, as is possible, a cluster is empty at the end of an iteration,
then there may be fewer than k clusters returned. In practice this seems
to happen very rarely.

(3) Unlike the Mathworks KMEANS this implementation does not perform a
final, slow, phase of incremental K-means ('onlinephase') that guarantees
convergence to a local minimum.

References
[1] "k-means++: The Advantages of Careful Seeding", by David Arthur and
Sergei Vassilvitskii, SODA 2007.

D’autres ont également téléchargé

MITT

805 téléchargements

5,00 / 5 (2)

plotwithmarkers.m

2,7K téléchargements

3,00 / 5 (4)

ballplot2

2,4K téléchargements

3,00 / 5 (2)

Citation pour cette source

Tim Benham (2026). Fast K-means (https://fr.mathworks.com/matlabcentral/fileexchange/31274-fast-k-means), MATLAB Central File Exchange. Extrait(e) le avril 20, 2026.

Remerciements

A inspiré : Sparsified K-Means

Catégories

En savoir plus sur Statistics and Machine Learning Toolbox dans Help Center et MATLAB Answers

Compatibilité avec les versions de MATLAB

Compatible avec toutes les versions

Plateformes compatibles

Windows
macOS
Linux

Ouvrir dans un nouvel onglet

Version	Publié le	Notes de version	Action
1.1.0.0	4 mai 2011	Change order of output parameters to better conform with Mathworks kmeans.	Télécharger
1.0.0.0	2 mai 2011		Télécharger