Understanding Gaussian Mixture Models
4 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I'm trying to understand GMM by reading the sources available online. I have achieved clustering using K-Means and was seeing how GMM would compare to K-means.
Here is what I have understood, please let me know if my concept is wrong:
GMM is like KNN, in the sense that clustering is achieved in both cases. But in GMM each cluster has their own independent mean and covariance. Furthermore k-means performs hard assignments of data points to clusters whereas in GMM we get a collection of independant gaussian distributions, and for each data point we have a probability that it belongs to one of the distributions.
To understand it better I have used MatLab to code it and achieve the desired clustering. I have used SIFT features for the purpose of feature extraction. And have used k-means clustering to initialize the values. (This is from the VLFeat documentation)
%images is a 459 x 1 cell array where each cell contains the training image
[locations, all_feats] = vl_dsift(single(images{1}), 'fast', 'step', 50); %all_feats will be 128 x no. of keypoints detected
for i=2:(size(images,1))
[locations, feats] = vl_dsift(single(images{i}), 'fast', 'step', 50);
all_feats = cat(2, all_feats, feats); %cat column wise all features
end
numClusters = 50; %Just a random selection.
% Run KMeans to pre-cluster the data
[initMeans, assignments] = vl_kmeans(single(all_feats), numClusters, ...
'Algorithm','Lloyd', ...
'MaxNumIterations',5);
initMeans = double(initMeans); %GMM needs it to be double
% Find the initial means, covariances and priors
for i=1:numClusters
data_k = all_feats(:,assignments==i);
initPriors(i) = size(data_k,2) / numClusters;
if size(data_k,1) == 0 || size(data_k,2) == 0
initCovariances(:,i) = diag(cov(data'));
else
initCovariances(:,i) = double(diag(cov(double((data_k')))));
end
end
% Run EM starting from the given parameters
[means,covariances,priors,ll,posteriors] = vl_gmm(double(all_feats), numClusters, ...
'initialization','custom', ...
'InitMeans',initMeans, ...
'InitCovariances',initCovariances, ...
'InitPriors',initPriors);
Based on the above I have means, covariances and priors. My main question is, What now? I am kind of lost now.
Also the means, covariances vectors are each of the size 128 x 50. I was expecting them to be 1 x 50 since each column is a cluster, wont each cluster have only one mean and covariance? (I know 128 are the SIFT features but I was expecting means and covariances).
In k-means I used the the MatLab command knnsearch(X,Y) which basically finds the nearest neighbour in X for each point in Y.
So how to achieve this in GMM, I know its a collection of probabilities, and ofcourse the nearest match from that probability will be our winning cluster. And this is where I am confused. All tutorials online have taught how to achieve the means, covariances values, but do not say much in how to actually use them in terms of clustering.
Thank you
2 commentaires
Adam
le 23 Sep 2014
Modifié(e) : Adam
le 23 Sep 2014
I don't have a clue what Sift is, but if it produces 128 'features' and you have 50 clusters then each cluster presumably has a mean in each feature.
When I did a GMM I was doing it on multi-dimensional images so the Gaussians in question were n-dimensional for n images as they had a value in each image for any given location.
Réponses (0)
Voir également
Catégories
En savoir plus sur Statistics and Machine Learning Toolbox dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!