clusterization of data in 1-D vector

3 vues (au cours des 30 derniers jours)
paganelle
paganelle le 28 Oct 2020
Commenté : paganelle le 28 Oct 2020
I have large logical vector looking as V = [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 ..............]
I need to find the position of each group of 1 (lets say - center of each group) but if two groups of ones are too close to each other (say, less than 3 zerros in between) I need to consider those groups as a single group. I.e. at the firs stage I need to find groups (bold-underlined elements) and then find the ceter element of each group (shift +/-1 element does not matter)
1st stage (clusterization): [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 ..............]
2nd stage (find a center of each cluster): [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ..............]
The way I implemented now is following: I do smoothing of the entire vector (it is couple million elements). The span is chousen to be equal of maximum expected lenght of the group and then I look for local maxima (islocalmax) with 'MinSeparation' of minimum distace between groups. It works, but really slow (I have 360x180 = 64800 of vectors - yes, it is LAT/LONG grid with ~10M elements in each vector)
Is any way to speed up this? I believe it should be some "textbook" examples of it!

Réponse acceptée

Adam Danz
Adam Danz le 28 Oct 2020
Modifié(e) : Adam Danz le 28 Oct 2020
There are lots of alternatives.
  • Input A is a vector of 1s and 0s.
  • n is minimum number of 0s between 1s separate groups of 1s.
  • T is a table showing the start and stop index for each consecutive group of 1s split by less than n zeros and the length of each group.
A = [0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1];
% Length of each group of consecutive 1s
T = table();
T.OnesLength = diff(find([0;A(:);0]==0))-1;
T(T.OnesLength==0,:) = [];
% Index of 1st '1' in each group of consecutive 1s
T.OnesStart = find(diff([0;A(:)])==1);
% Index of last '1' in each group of consecutive 1s
T.OnesStop = T.OnesStart + T.OnesLength - 1;
% Determine the number of 0s between consecutive 1s
ZerosBetween = [T.OnesStart(2:end) - T.OnesStop(1:end-1); NaN]-1;
disp(T)
OnesLength OnesStart OnesStop __________ _________ ________ 3 4 6 3 9 11 6 18 23 2 29 30 1 32 32 2 34 35 1 37 37 4 42 45
% join groups of consecutive 1s with less than n zeros between.
n = 3;
joinGroups = ZerosBetween < n;
t = find(diff([0;joinGroups])==1);
f = find(diff([0;joinGroups])==-1);
T.remove = false(height(T),1);
for i = 1:numel(t)
T.OnesStop(t(i)) = T.OnesStop(f(i));
T.OnesLength(t(i)) = sum(T.OnesLength(t(i):f(i))) + sum(ZerosBetween(t(i):f(i)-1));
T.remove(t(i)+1:f(i)) = true;
end
T(T.remove,:) = [];
T.remove = [];
disp(T)
OnesLength OnesStart OnesStop __________ _________ ________ 8 4 11 6 18 23 9 29 37 4 42 45
Now you can use the segment length and the start/stop indices to compute the segement centers.
  1 commentaire
paganelle
paganelle le 28 Oct 2020
Perfect way, thank you!
It is ~5 times faster than method I used previously.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Resizing and Reshaping Matrices dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by