Speed up a procedure that uses ACCUMARRAY and UNIQUE functions
Afficher commentaires plus anciens
Given is a matrix A, e.g:
A=[1 2; 2 3; 2 5; 3 1; 4 2; 5 1; 5 3; 5 3; 5 5; 6 1; 7 1; 7 1; 7 2]
I would like to get a new matrix B, where the first column is a vector containing the unique elements of the first column of matrix A, and the second column is a vector obtained by adding the values of the second column of matrix A having the same value in the first column of matrix A for a given unique element.
One way to do this is using the following code:
[B1,~,b1_indx] = unique(A(:,1));
B2= accumarray(b1_indx,A(:,2));
B= [B1, B2];
However, this operation is relatively slow when the size of matrix A is rather large. Is there a way to do the same procedure in a faster way, e.g., without using UNIQUE function?
Thanks!
11 commentaires
Stephen23
le 27 Mai 2018
"However, this operation is relatively slow when the size of matrix A is rather large."
Which operation is slow: the unique call or the accumarray call? Can you show us tic / toc or profiler measurements ?
Igor Dakic
le 27 Mai 2018
@Igor: Optimizing code depends on the input data. Please post a meaningful and relevant set of data, either as (compressed) MAT file or created dynamically by rand/randi/... It matters if the data have 1e6 elements and the tool is called 10 times, or if the data size is 10 and the tool is called 1e6 times. If e.g. the first column contains only positive integers between 0 and 255 this could be done very efficiently with a loop in a C-mex.
Igor Dakic
le 27 Mai 2018
Modifié(e) : Igor Dakic
le 27 Mai 2018
Igor Dakic
le 27 Mai 2018
per isakson
le 27 Mai 2018
Modifié(e) : per isakson
le 27 Mai 2018
"go from 0 to more than 255" thus, try
B2 = accumarray( A(:,1)+1, A(:,2), [],[],nan );
Ameer Hamza
le 27 Mai 2018
Thus will not necessarily be helpful since OP question still requires B1 in the final answer.
Jan
le 27 Mai 2018
@Igor Dakic: Could you please post a relevant input instead of a rough description? "0 to more than 255" is not exhaustive enough. What about integer numbers from 0 to 65535? And how large are the inputs? It is not efficient, if the readers guess, what your data are.
Igor Dakic
le 28 Mai 2018
There is no attached matrix. -- Ah, I've found one in the comment below. Please post relevant data in the question, because it is not easy to search them in comments to answers.
Réponses (3)
Walter Roberson
le 27 Mai 2018
0 votes
4 commentaires
Ameer Hamza
le 27 Mai 2018
This solution is slower than the unique and accumarray.
A = randi([0 25], 1000000, 2);
tic
[B1,~,b1_indx] = unique(A(:,1));
B2= accumarray(b1_indx,A(:,2));
B= [B1, B2];
toc
Elapsed time is 0.133922 seconds.
tic
[Agroups, B1_] = findgroups(A(:,1));
B2_ = splitapply(@sum, A(:,2), Agroups);
B_ = [B1_ B2_];
toc
Elapsed time is 0.242804 seconds.
Also the accumarray and unique are almost identical
tic
[B1,~,b1_indx] = unique(A(:,1));
toc
Elapsed time is 0.134523 seconds.
tic
[Agroups, B1_] = findgroups(A(:,1));
toc
Elapsed time is 0.122399 seconds.
But splitapply is way slower than accumarray
tic
B2= accumarray(b1_indx,A(:,2));
toc
Elapsed time is 0.029304 seconds.
tic
B2_ = splitapply(@sum, A(:,2), Agroups);
toc
Elapsed time is 0.151380 seconds.
It appears that the unique and accumarray might be the optimal choices in this case
Jan
le 27 Mai 2018
splitapply calls accumarray internally, such that I assume, it it slower in every case due to the additional overhead.
Guillaume
le 27 Mai 2018
Not only does splitapply calls accumarray but it only uses it to build a cell array of indices to group. The actual applying of the function is done with a traditional loop. splitapply is guaranteed to be much slower than accumarray.
Igor Dakic
le 27 Mai 2018
John D'Errico
le 27 Mai 2018
Modifié(e) : John D'Errico
le 27 Mai 2018
But how much more than 255? You keep being almost evasive in answering questions.
For example, this would work, IF the elements of A are ALWAYS purely positive integers, of a limited size:
A=[1 2; 2 3; 2 5; 3 1; 4 2; 5 1; 5 3; 5 3; 5 5; 6 1; 7 1; 7 1; 7 2]
B2 = accumarray(A(:,1),A(:,2));
B1 = find(B2);
B = [B1,B2(B1)];
The result is the same. But, it requires that ALL elements of the first column of A are positive integers, that none of them are HUGE, that none of the elements of B(:,2) are negative or zero (because then the find may get screwed up. And the find MAY be necessary.)
A = randi([1,100],[1e6,2]);
tic
B2 = accumarray(A(:,1),A(:,2));
B1 = find(B2);
B = [B1,B2(B1)];
toc
Elapsed time is 0.048715 seconds.
tic
[B1,~,b1_indx] = unique(A(:,1));
B2= accumarray(b1_indx,A(:,2));
B= [B1, B2];
toc
Elapsed time is 0.226216 seconds.
So significantly faster. HOWEVER, if the elements of the first column of A were huge, so on the order of 1e10 or so? Then what I have shown you would be completely worthless. Likewise, if some of the elements of B(:,2) are zero or negative? Again, the wrong answer may be the result.
The point is, algorithms are very often dependent on the properties of the data being fed to it. I can actually think of another algorithm that would work efficiently under different assumptions about the characteristics of A. But if we are given no serious clue as to what to expect, then it is impossible to be helpful. We are just making random guesses then.
1 commentaire
Igor Dakic
le 28 Mai 2018
Jan
le 28 Mai 2018
I will C-mex this in the evening. But here the idea of an efficient solution:
Data = load('matrix_A.mat');
A = Data.A;
n = max(A(:, 1));
R = zeros(n, 1);
for k = 1:size(A, 1)
a1 = A(k);
R(a1) = R(a1) + A(k, 2);
end
index = find(R);
B = [index(:), R(index)];
This is 200 times slower than the accumarray approach for the test data provided by John. But in C this might be faster.
4 commentaires
wei zhang
le 18 Fév 2021
How about the speed with c mex? Is it faster than accumarray?
Jan
le 19 Fév 2021
It depends on the inputs. If e.g. the input is sorted, or all values match the limits of UINT8, a lookup table will most likely be faster. For the general case accumarray is flexible and fast enough.
Do you have some specific inputs?
@Jan My inputs are a large COO format sparse matrix. I have a same problem with accumarray based on 2d index. The details are in the link . I wrote a c++ mex, which is similar to your idea to speed up, but failed. I expected you could give me some suggestion.
Besides, what is a lookup table? similar to hash map?
Do you think the cuda/ thrust would be a better attempt?
Jan
le 20 Fév 2021
What failed in your MEX file? Is it a programming problem or is the speed too low?
A lookup table us very efficient, if there is a "small" set of data, which can be used as an index. Example:
% Check is value is member of a set:
Set = uint8(randperm(255, 100));
Data = randi([1, 255], 1, 1e7, 'uint8');
tic;
match1 = ismember(Data, Set);
toc
tic;
LUT = false(1, 255);
LUT(Set) = true;
match2 = LUT(Data);
toc
% Elapsed time is 0.315920 seconds. ISMEMBER
% Elapsed time is 0.061450 seconds. Lookup table
This is can be applied to modify the values also, e.g. if the elements of an RGB image.
RGB = randi([0,255], 4000, 3000, 3); % Large data set
LUT = log((0:255).^3); % Expensive, but small data set
Result = LUT(RGB + 1);
Catégories
En savoir plus sur Matrix Indexing dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!