Fastest way to find unique cells in a logical cell array

11 vues (au cours des 30 derniers jours)
Haider Ali le 7 Mar 2020
Commenté : Haider Ali le 8 Mar 2020
Hello all,
I have a large array of cells (currently 1x4096 but likely to far bigger than that) with each cell of 6x6 logical values. Each cells contains 6 values that are 1 and rest are 0s. Only the placement of these 1s and 0s is different.
I want to find the fastest way to find unique cells in this array.
Currently I am using a for loop for finding unique cells like this:
for idx2 = 1:length(A)-1
for k = idx2 + 1 : length(A)
if(A{idx2} == A{k})
matching_indx(cnt) = uint16(k);
cnt=cnt+1;
end
end
end
But its really slow and does not scale well to larger cell arrays with larger values in each cell (8x8).
What would be the fastest way to achieve this?
Also what would be the most memory efficient way for these type of values? Would uisng sparse help in saving memory AND computations?
Regards
2 commentairesAfficher AucuneMasquer Aucune
Ameer Hamza le 7 Mar 2020
Can you attach a sample dataset so that we can test a solution and compare the efficiency?
Haider Ali le 7 Mar 2020
@Ameer Hamza, yes sure.

Connectez-vous pour commenter.

Réponse acceptée

Guillaume le 7 Mar 2020
Modifié(e) : Guillaume le 7 Mar 2020
Why are you using a cell array to start with? A 6x6x4096 matrix would be more efficient in term of speed and memory. It also makes finding the histogram of your 6x6 array dead easy:
%converting cell array into N x N x numel(A) matrix
m = cat(3, A{:}); %better variable names than A and m required!
%finding unique NxN matrices and their count:
masrows = reshape(m, [], size(m, 3)).'; %a numel(A) x (NxN) matrix
[uniquem, ~, id] = unique(masrows, 'rows'); %find unique matrices and assign corresponding ID to each row
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []); %reshape into N x N x numuniquematrices
count = accumarray(id, 1); %one of the many ways to calculate histograms
If you're happy to store the matrices as a count x N x N array instead of a N x N x count array, you would avoid the two transpose above which would give you a speed gain.
Also, in matlab it's better to avoid using integer types unless you really need to.
8 commentairesAfficher 6 commentaires plus anciensMasquer 6 commentaires plus anciens
Ameer Hamza le 7 Mar 2020
Modifié(e) : Ameer Hamza le 7 Mar 2020
@Haider Ali, I am not sure about speed gain. Remember in my code above I artificially increased the size of A
A = [A A A A A A]; % to increase the size of the dataset
If you use A, then the speed gain is close to 2 on my system too.
I could not think of an efficient way in MATLAB to typecast a row of your matrix A into uint64 without slowing down the above codes. I think @Guillaume suggests to change the generation of matrix A such that it saves values as uint64.
Another way to overcome the limited precision of double is to partition your matrix into sub-matrices. As we know that the double will work up to a 7x7 matrix, i.e., its binary representation has 49 digits. Now consider the matrix has a dimension of 8x8, i.e., 64 digits. We can partition it into [1x49 1x15]. So change the lines in my code as follow,
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:64)*2.^(14:-1:0)'];
Similarly for 9x9 matrix
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:81)*2.^(31:-1:0)'];
and for 10x10 matrix
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:98)*2.^(48:-1:0)' masrows(:, 99:100)*2.^(1:-1:0)'];
I hope the pattern is clear. You can easily write a statement to automate the above line depending on the dimension of matrix A.
The gain in speed, for the case of 10x10 matrix, is shown by the following script
% generating a pseudo dataset with 128 unique matrices
rng(0);
A = rand(10, 10, 128) > 0.5;
A = A(:,:,randi(128, 1, 4096));
A = mat2cell(A, 10, 10, ones(1,4096)); % convert to cell cell array to be compatilbe with functions
% artifically augment dataset
A = [A A A A A A];
[unique1,count1] = fun1(A);
[unique2,count2] = fun2(A);
isequal1 = isequal(unique1, unique2)
isequal2 = isequal(count1, count2)
t1 = timeit(@() fun1(A))
t2 = timeit(@() fun2(A))
function [uniquem, count] = fun1(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
[uniquem, ~, id] = unique(masrows, 'rows');
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
function [uniquem, count] = fun2(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:98)*2.^(48:-1:0)' masrows(:, 99:100)*2.^(1:-1:0)'];
[~, ia, id] = unique(binaryVal, 'rows');
uniquem = reshape(masrows(ia,:).', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
The result is
isequal1 =
logical
1
isequal2 =
logical
1
t1 =
0.0864
t2 =
0.0162
About 5.5x speed gain. Note that this method works for arbitrary dimension matrix.
Haider Ali le 8 Mar 2020
@Ameer Hamza, thank you! You have been really helpful.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Sparse Matrices dans Help Center et File Exchange

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by