Finding the max number of non-zero repeating elements for each row in a large matrix
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a (large) matrix of type 'uint8' or 'uint16', for (a small) example
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
One can make the following assumptions on A:
- size(A,2) is relatively small, e.g.

- size(A,1) is big, i.e. a few hundred thousands of rows, possibly even a few millions;
- the rows of A are sorted in an ascending manner;
- the elements of A are bounded above by a constant M that is sufficiently smaller than the maximum the type allows, e.g.
for 'uint8' and
for 'uint16'.
I would like to compute a vector
b = [2;
3;
0;
3];
assigning to each row of A the max number of repetitions of the non-zero elements contained in that row. Now, the way I do this currently is the following vectorized code running on my GPU:
a = permute(unique(A(A~=0)),[3 2 1]); % determine the unique non-zero values of A;
b = max(sum(uint8(bsxfun(@eq,A,a)),2,'native'),[],3); % compare A with a and count each value, then take max.
But if the matrix A is large and has a lot of variance (hence so does a), there appears to be a lot of unnecessary computation. So I was wondering if there is a faster way to do this since I have to do it for a lot of such large matrices.
0 commentaires
Réponse acceptée
Matt J
le 28 Juil 2022
Modifié(e) : Matt J
le 28 Juil 2022
size(A,2) is relatively small,
That being the case, I'd just use a loop:
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
[m,n]=size(A);
count=zeros(m,1);
b=-inf(m,1);
Alast=A(:,1);
for i=1:size(A,2)
Ai=A(:,i);
increm=(Ai==Alast & Alast~=0);
count(increm)=count(increm)+1;
reset=~increm;
count(reset)=(Ai(reset)~=0);
b=max(b,count);
Alast=Ai;
end
b
2 commentaires
Plus de réponses (2)
Bruno Luong
le 29 Juil 2022
Modifié(e) : Bruno Luong
le 29 Juil 2022
This code would not requires too much RAM beside a copy of A transposed.
It somesort of vectorized runlength encoding. A little bit cryptic code I must admit.
The for-loop posted by Matt is the best IMO
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
[m,n] = size(A);
B = [A.'; Inf(1,m)];
[c,r] = find([ones(1,m); diff(B)]);
Aij = B(c+(r-1)*(n+1));
runlength = diff(c).*(diff(r)==0);
runvalue = Aij(1:end-1);
lgtnz = runlength.*(runvalue>0); % does count the length of sequence of 0s
maxlgt = accumarray(r(1:end-1), lgtnz, [m,1], @max)
0 commentaires
Matt J
le 28 Juil 2022
Modifié(e) : Matt J
le 28 Juil 2022
This should be fine for the uint8 case. For the uint16 case, it might strain RAM.
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
N=size(A,1);
M=max(A(:));
[I,J,S]=find(A);
b=max( accumarray([I,S],1,[N,M]) ,[],2)
2 commentaires
Bruno Luong
le 29 Juil 2022
Modifié(e) : Bruno Luong
le 29 Juil 2022
I think because
[I,S]
is converted to the lower class of I and S, which is uint8/uint16. Thus I is truncated to intmax of the class.
you can fix by replace with
[I,double(S)]
Voir également
Catégories
En savoir plus sur Matrix Indexing dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!