implementing k means algorithm on spike sorting data

Question

0 votes

Hi there, I am trying to implemement my own K means function without using the unbuilt function 'kmeans'.

I started with some complex waveform data and reduced the dimensionality to 2 PC and plotted on a scatter, 3 distinct clusters emerge.

to do k means first i set random centroids within the range of the data - e.g.

k=3
%state the number of clusters%
centroids = min(wav_pca) + (max(wav_pca)-min(wav_pca)).* rand(k,1) %create random centroids in the range of test data%
scatter(wav_pca(:,1),wav_pca(:,2))
hold on
scatter(centroids(:,1),centroids(:,2),'x');
hold off

this gives me starting centroids - howevr i don't this the distribution is as random as i'd like.

then I have to compute the euclidean distance from each point to a centroid and assign it to the one with the shortest distance

for j=1:k
for i=1:length(wav_pca)
    
        distance=sqrt( (centroids(j,1)- wav_pca(i,1))^2 + (centroids(j,2)- wav_pca(i,2)^2) )
        end
    end

for this I tried to use this for loop but it's not creating the matrix of distances that I need.

then each point must be assigned to it's closest centroid, giving it a cluster ID

the cluster centroids need to be recomputed as an average of all the assigned points and the points reassigned, this needs to be iterated though until the assignments change and I am unsure how to do this.

thanks for all that you can help with, if you need any more info let me know, and apologies for being new to matlab.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Aditya Patil le 17 Fév 2021

Ouvrir dans MATLAB Online

0 votes

Note that the parenthesis is wrong for the second part of the equation. The square is to be taken of the y1 - y2 term, and not just y2(wav_pca in your case).

The correct equation would be

sqrt((centroids(j,1) - wav_pca(i,1))^2 + (centroids(j,2) - wav_pca(i,2))^2)

You can further simplify the code by using vectorization as follows

sqrt((centroids(:,1) - wav_pca(i,1)).^2 + (centroids(:,2) - wav_pca(i,2)).^2)

This will calculate the distance for all centroids, and not just one point at at time. You can also do it other way around, taking distance for all points at a time for each centroid.

Further, the sqrt is unnecessary, as you are only interested in the relative distance, and not the exact value.

(centroids(:,1) - wav_pca(i,1)).^2 + (centroids(:,2) - wav_pca(i,2)).^2

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

implementing k means algorithm on spike sorting data

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

implementing k means algorithm on spike sorting data

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Plus de réponses (0)

Catégories

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens