Vous suivez désormais cette question
- Les mises à jour seront visibles dans votre flux de contenu suivi.
- Selon vos préférences en matière de communication il est possible que vous receviez des e-mails.
How can I reassign clusters based on similarity or any other method?
23 commentaires
Hi @Med Future ,
I have modified your code shared on the form and it is capable of reassigning clusters based on similarity.
% Define cell1 and cell2
cell1 = [1, 2, 3; 4, 5, 6]; % Example data for cell1
cell2 = [7, 8, 9; 10, 11, 12]; % Example data for cell2
% Normalize the rows of the cells for cosine similarity
cell1_norm = cell1 ./ sqrt(sum(cell1.^2, 2));
cell2_norm = cell2 ./ sqrt(sum(cell2.^2, 2));
% Compute the cosine similarity matrix
similarity_matrix = cell1_norm * cell2_norm';
% Average similarity score
similarity_score = mean(similarity_matrix(:));
% Display the similarity score
fprintf('Average Cosine Similarity Score: %f\n', similarity_score);
% Define the threshold for similarity to reassign clusters
similarity_threshold = 0.9;
if similarity_score > similarity_threshold
% Combine the data from both cells
combinedData = [cell1; cell2];
% Apply K-means clustering
k = 2; % Define the number of clusters 'k'
[idx, C] = kmeans(combinedData, k);
% Calculate centroid distances for cluster reassignment
centroid_distances = pdist(C); % Calculate pairwise distances between centroids
avg_distance = mean(centroid_distances); % Calculate the average centroid distance
% Reassign clusters if centroid distances exceed a certain threshold
centroid_threshold = 5; % Define a threshold for centroid distances
if avg_distance > centroid_threshold
% Calculate the pairwise distances between data points and centroids distances = pdist2(combinedData, C);
% Find the minimum distance for each data point
[~, min_indices] = min(distances, [], 2);
% Update the cluster assignments in 'idx' based on the minimum distances
idx = min_indices;
end
% Iterate over the clusters and check for different features
unique_clusters = unique(idx); % Get the unique cluster labels
num_clusters = numel(unique_clusters); % Get the number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for different features within the cluster
if any(range(cluster_data) > 1)
% Split the cluster into subclusters with similar features
subclusters = kmeans(cluster_data, 2);
% Update the cluster assignments in 'idx' for the subclusters
idx(idx == unique_clusters(i)) = subclusters + max(idx);
end
end
% Merge clusters with similar features
unique_clusters = unique(idx); % Get the updated unique cluster labels
num_clusters = numel(unique_clusters); % Get the updated number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for similar features with other clusters
for j = i+1:num_clusters
other_cluster_data = combinedData(idx == unique_clusters(j), :); % Get the data points for the other cluster
% Check for similar features using a threshold
if max(pdist2(cluster_data, other_cluster_data)) < 1
% Merge the clusters into a single cluster
idx(idx == unique_clusters(j)) = unique_clusters(i);
end
end
end
% Display the updated clustering results
figure;
gscatter(combinedData(:,1), combinedData(:,2), idx);
title('Modified Clustering Results');
% Save the modified clustering results
save('modified_clustered_data.mat', 'idx', 'combinedData');
else
fprintf('Similarity score is less than %f, not reassigning clusters.\n', similarity_threshold);
end
I will go through the code step by step to let you understand how it achieves this. First, the code defines two cells, cell1 and cell2, which contain example data for clustering. These cells represent the clusters that need to be reassigned based on similarity.
cell1 = [1, 2, 3; 4, 5, 6]; % Example data for cell1
cell2 = [7, 8, 9; 10, 11, 12]; % Example data for cell2
Next, the code normalizes the rows of the cells using the cosine similarity measure. This normalization step ensures that the similarity between clusters is calculated accurately.
cell1_norm = cell1 ./ sqrt(sum(cell1.^2, 2));
cell2_norm = cell2 ./ sqrt(sum(cell2.^2, 2));
After normalizing the cells, the code computes the cosine similarity matrix between cell1_norm and cell2_norm. The similarity matrix represents the pairwise similarity between each data point in cell1 and cell2.
similarity_matrix = cell1_norm * cell2_norm';
To determine the average similarity score between the clusters, the code calculates the mean of all elements in the similarity matrix.
similarity_score = mean(similarity_matrix(:));
The code then displays the average cosine similarity score.
fprintf('Average Cosine Similarity Score: %f\n', similarity_score);
Next, the code defines a similarity threshold. If the similarity score is greater than the threshold, the clusters will be reassigned based on similarity.
similarity_threshold = 0.9;
The code checks if the similarity score exceeds the threshold. If it does, the clusters will be reassigned.
if similarity_score > similarity_threshold
% Combine the data from both cells
combinedData = [cell1; cell2];
% Apply K-means clustering
k = 2; % Define the number of clusters 'k'
[idx, C] = kmeans(combinedData, k);
The code then calculates the centroid distances between the clusters. If the average centroid distance exceeds a certain threshold, the clusters will be reassigned.
centroid_distances = pdist(C); % Calculate pairwise distances between centroids
avg_distance = mean(centroid_distances); % Calculate the average centroid distance
% Reassign clusters if centroid distances exceed a certain threshold
centroid_threshold = 5; % Define a threshold for centroid distances
if avg_distance > centroid_threshold
% Calculate the pairwise distances between data points and centroids
distances = pdist2(combinedData, C);
% Find the minimum distance for each data point
[~, min_indices] = min(distances, [], 2);
% Update the cluster assignments in 'idx' based on the minimum distances
idx = min_indices;
end
The code then iterates over the clusters and checks for different features within each cluster. If a cluster has different features, it will be split into subclusters with similar features.
unique_clusters = unique(idx); % Get the unique cluster labels
num_clusters = numel(unique_clusters); % Get the number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for different features within the cluster
if any(range(cluster_data) > 1)
% Split the cluster into subclusters with similar features
subclusters = kmeans(cluster_data, 2);
% Update the cluster assignments in 'idx' for the subclusters
idx(idx == unique_clusters(i)) = subclusters + max(idx);
end
end
After splitting clusters with different features, the code merges clusters with similar features. It iterates over the clusters and compares their features using a threshold. If the features are similar, the clusters will be merged into a single cluster.
unique_clusters = unique(idx); % Get the updated unique cluster labels
num_clusters = numel(unique_clusters); % Get the updated number of clusters
for i = 1:num_clusters
cluster_data = combinedData(idx == unique_clusters(i), :); % Get the data points for the current cluster
% Check for similar features with other clusters
for j = i+1:num_clusters
other_cluster_data = combinedData(idx == unique_clusters(j), :); % Get the data points for the other cluster
% Check for similar features using a threshold
if max(pdist2(cluster_data, other_cluster_data)) < 1
% Merge the clusters into a single cluster
idx(idx == unique_clusters(j)) = unique_clusters(i);
end
end
end
Finally, the code displays the updated clustering results by plotting the data points with their assigned clusters.
% Display the updated clustering results
figure;
gscatter(combinedData(:,1), combinedData(:,2), idx);
title('Modified Clustering Results');
% Save the modified clustering results
save('modified_clustered_data.mat', 'idx', 'combinedData');
else
fprintf('Similarity score is less than %f, not reassigning clusters.\n', similarity_threshold);
end
In nutshell, this modified code is capable of reassigning clusters based on similarity. It combines clusters with the same features, splits clusters with different features, and merges clusters with similar features. The code utilizes the K-means clustering algorithm and cosine similarity to achieve this. Please see attached plot along with test results.
Hope, this answers your question.
Réponses (1)
19 commentaires
Voir également
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Une erreur s'est produite
Impossible de terminer l’action en raison de modifications de la page. Rechargez la page pour voir sa mise à jour.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asie-Pacifique
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)