Fastest way to find unique cells in a logical cell array

Hello all,
I have a large array of cells (currently 1x4096 but likely to far bigger than that) with each cell of 6x6 logical values. Each cells contains 6 values that are 1 and rest are 0s. Only the placement of these 1s and 0s is different.
I want to find the fastest way to find unique cells in this array.
Currently I am using a for loop for finding unique cells like this:
for idx2 = 1:length(A)-1
for k = idx2 + 1 : length(A)
if(A{idx2} == A{k})
matching_indx(cnt) = uint16(k);
cnt=cnt+1;
end
end
end
But its really slow and does not scale well to larger cell arrays with larger values in each cell (8x8).
What would be the fastest way to achieve this?
Also what would be the most memory efficient way for these type of values? Would uisng sparse help in saving memory AND computations?
Regards

2 commentaires

Can you attach a sample dataset so that we can test a solution and compare the efficiency?
@Ameer Hamza, yes sure.

Connectez-vous pour commenter.

 Réponse acceptée

Guillaume
Guillaume le 7 Mar 2020
Modifié(e) : Guillaume le 7 Mar 2020
Why are you using a cell array to start with? A 6x6x4096 matrix would be more efficient in term of speed and memory. It also makes finding the histogram of your 6x6 array dead easy:
%converting cell array into N x N x numel(A) matrix
m = cat(3, A{:}); %better variable names than A and m required!
%finding unique NxN matrices and their count:
masrows = reshape(m, [], size(m, 3)).'; %a numel(A) x (NxN) matrix
[uniquem, ~, id] = unique(masrows, 'rows'); %find unique matrices and assign corresponding ID to each row
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []); %reshape into N x N x numuniquematrices
count = accumarray(id, 1); %one of the many ways to calculate histograms
If you're happy to store the matrices as a count x N x N array instead of a N x N x count array, you would avoid the two transpose above which would give you a speed gain.
Also, in matlab it's better to avoid using integer types unless you really need to.

8 commentaires

@Guillaume, yes you are right. I should be using 3d arrays as I am creating these 6x6 matrices myself.
I have attached a sample A.mat. I run your code on A.mat and its giving me an error on 4th line "Brace indexing is not supported for variables of this type."
But I think only finding the unique matrices is enough for my purpose.
What other measures can I take for memory and performance optimizations for my use case?
Should I use sparse to further save memory as my data consists of a lot of zeros?
Would using tall arrays help with performace efficieny in case of larger datasets?
Thank you very much for your response.
I fixed all the bugs in my answer a while back. Try it again.
How large are we talking about?
If you store the matrices flattened as rows of a 2D matrix (which is the masrows in my answer), you can indeed store that as sparse, and unique works with sparse, so that's indeed a solution. However, a 6x6x4096 logical matrix is only ~140 kB so there's probably no need. Even a 6x6x30e9 (30 million matrices) is only about 1 GB.
Tall arrays would only be beneficial you store the matrices flattened as well.
Speed can be further improved by passing a compressing the rows of the masrows matrix before passing to the unique function. For example, consider each row as a binary number and calculate its decimal representation. For example, change the lines
[uniquem, ~, id] = unique(masrows, 'rows');
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []);
to
binaryVal = masrows*2.^(35:-1:0)'; % calculate the decimal representation for each row
[~, ia, id] = unique(binaryVal);
uniquem = reshape(masrows(ia,:).', size(m, 1), size(m, 2), []);
The speed gain is easily visible if you increase the size of the input dataset. Consider the results below.
load('A.mat')
A = [A A A A A A]; % to increase the size of dataset
[unique1,count1] = fun1(A);
[unique2,count2] = fun2(A);
isequal1 = isequal(unique1, unique2)
isequal2 = isequal(count1, count2)
t1 = timeit(@() fun1(A))
t2 = timeit(@() fun2(A))
function [uniquem, count] = fun1(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
[uniquem, ~, id] = unique(masrows, 'rows');
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
function [uniquem, count] = fun2(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
binaryVal = masrows*2.^(35:-1:0)';
[~, ia, id] = unique(binaryVal);
uniquem = reshape(masrows(ia,:).', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
The output is
isequal1 =
logical
1
isequal2 =
logical
1
t1 =
0.0332
t2 =
0.0093
About 4x speed gain.
However, due to the precision of double data type, this method will work as long as the matrix is of dimension 6x6. It should also work for a 7x7 matrix. For 8x8 matrices, the precision will not be enough to give the correct result.
Good idea about the binary conversion.Also saves on memory.
8x8 would be doable if you store as uint64, the actual conversion to uint64 being the most difficult part. Chunking into bytes (uint8) and typecasting to uint64 would be the easiest.
@Ameer Hamza, great idea! I was also thinking about encoding/representing each row with a single value. But the time gain on my latop is 2x instead of 4x (t1=0.0265, t2=0.0124). I wonder why that is.
Thank you.
@Guillaume, how would this binary conversion solution scale for bigger matrices (8x8, 9x9, 10x10)? Would you please share your solution?
Thanks.
@Haider Ali, I am not sure about speed gain. Remember in my code above I artificially increased the size of A
A = [A A A A A A]; % to increase the size of the dataset
If you use A, then the speed gain is close to 2 on my system too.
I could not think of an efficient way in MATLAB to typecast a row of your matrix A into uint64 without slowing down the above codes. I think @Guillaume suggests to change the generation of matrix A such that it saves values as uint64.
Another way to overcome the limited precision of double is to partition your matrix into sub-matrices. As we know that the double will work up to a 7x7 matrix, i.e., its binary representation has 49 digits. Now consider the matrix has a dimension of 8x8, i.e., 64 digits. We can partition it into [1x49 1x15]. So change the lines in my code as follow,
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:64)*2.^(14:-1:0)'];
Similarly for 9x9 matrix
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:81)*2.^(31:-1:0)'];
and for 10x10 matrix
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:98)*2.^(48:-1:0)' masrows(:, 99:100)*2.^(1:-1:0)'];
I hope the pattern is clear. You can easily write a statement to automate the above line depending on the dimension of matrix A.
The gain in speed, for the case of 10x10 matrix, is shown by the following script
% generating a pseudo dataset with 128 unique matrices
rng(0);
A = rand(10, 10, 128) > 0.5;
A = A(:,:,randi(128, 1, 4096));
A = mat2cell(A, 10, 10, ones(1,4096)); % convert to cell cell array to be compatilbe with functions
% artifically augment dataset
A = [A A A A A A];
[unique1,count1] = fun1(A);
[unique2,count2] = fun2(A);
isequal1 = isequal(unique1, unique2)
isequal2 = isequal(count1, count2)
t1 = timeit(@() fun1(A))
t2 = timeit(@() fun2(A))
function [uniquem, count] = fun1(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
[uniquem, ~, id] = unique(masrows, 'rows');
uniquem = reshape(uniquem.', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
function [uniquem, count] = fun2(A)
m = cat(3, A{:});
masrows = reshape(m, [], size(m, 3)).';
binaryVal = [masrows(:, 1:49)*2.^(48:-1:0)' masrows(:, 50:98)*2.^(48:-1:0)' masrows(:, 99:100)*2.^(1:-1:0)'];
[~, ia, id] = unique(binaryVal, 'rows');
uniquem = reshape(masrows(ia,:).', size(m, 1), size(m, 2), []);
count = accumarray(id, 1);
end
The result is
isequal1 =
logical
1
isequal2 =
logical
1
t1 =
0.0864
t2 =
0.0162
About 5.5x speed gain. Note that this method works for arbitrary dimension matrix.
@Ameer Hamza, thank you! You have been really helpful.

Connectez-vous pour commenter.

Plus de réponses (0)

Produits

Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by