Find the desired row in the matrix

Question

Chenglin Li le 24 Oct 2022

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix

Commenté : Jan le 25 Oct 2022

matrix.xlsx

Hello! I have a matrix, the first three rows are the x, y, and z coordinates of the points, the fourth row is the sum of the first three columns, and the fifth row is the product of the first three columns. I want to extract the index of the number of rows that occur only once in the matrix, for example, the sum of the first row is 90, the product is 26040, they are unique, so I extract it; If it's line 10 and line 22, only the sum is the same, but the product is different and they're extracted separately; If you have rows 55 and 56, the sum and the product are the same, then you only need to extract one row of data.

Can anyone help me with this as I'm completely new with MATLAB. I would be grateful.

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Chenglin Li le 24 Oct 2022

Well, thank you for your answer. Thank you very much！

Jan le 24 Oct 2022

@Rik: I thought of unqiue or histcounts also, but did not found a solution. Please check my answer. I'd be glad to see a less twiddling solution.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Jan le 24 Oct 2022

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix#answer_1081953

Modifié(e) : Jan le 24 Oct 2022

Ouvrir dans MATLAB Online

While removing multiple rows is easy using the unique(x, 'rows'), I did not find a built-in functions to identify the vectors, which occur once only.

If the data set is small (some hundrets of rows), a nested loop is fine:

% Remove rows from M, which columns 4:5 are not occurring once only:
% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
%    ^ just some stuff
off = false;         % Slightly faster
A   = M(:, 4:5);     % Columns used for comparison
nA  = size(A, 1);    % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = all(A(iA, :) == A, 2);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :)     % Only rows, which occur once only
Result = 3×5
    0.8932    0.9660    0.0837    2.0000    3.0000
    0.0193    0.2833    0.5621    4.0000    2.0000
    0.6671    0.1563    0.3340    1.0000    5.0000

A small acceleration is (most likely, test this using tic/toc) to test the columns separately:

A4  = M(:, 4);       % Columns used for comparison
A5  = M(:, 5);       % Columns used for comparison
nA  = size(A4, 1);   % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = (A4(iA) == A4 & A5(iA) == A5);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end

The costs for this nested loops grow with O(2), so the double size of the inputs needs 4 times longer to be processed. This gets very slow for huge data sets, e.g. with millions of rows. Then:

% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :)
Result = 3×5
    0.8932    0.9660    0.0837    2.0000    3.0000
    0.0193    0.2833    0.5621    4.0000    2.0000
    0.6671    0.1563    0.3340    1.0000    5.0000

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Chenglin Li le 25 Oct 2022

Thank you very much, this program has helped me a lot, let me have the next idea!!!

Connectez-vous pour commenter.

Answer 2

Rik le 25 Oct 2022

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix#answer_1082688

Ouvrir dans MATLAB Online

Inspired by the answer and comment by Jan, I gave it a try as well. However, at least for this size, the answers from Jan are faster. Perhaps the functions I use would scale better, but I did not test that.

Perhaps accumarray would have a better performance than histcounts. If this is really a bottleneck in your code, you could try that.

% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
%    ^ just some stuff
% Confirm the results match:
Jan_v1(M) , Rik(M)
ans = 3×5
    0.7516    0.4447    0.5350    2.0000    3.0000
    0.0193    0.3578    0.2807    4.0000    2.0000
    0.1917    0.4433    0.3679    1.0000    5.0000
ans = 3×5
    0.7516    0.4447    0.5350    2.0000    3.0000
    0.0193    0.3578    0.2807    4.0000    2.0000
    0.1917    0.4433    0.3679    1.0000    5.0000
% do warmup rounds first (only needed online), then test the timing for
% each implementation
for n=1:3,timeit(@()Jan_v1(M));timeit(@()Jan_v2(M));timeit(@()Jan_v3(M));timeit(@()Rik(M));end
timeit(@()Jan_v1(M)),timeit(@()Jan_v2(M)),timeit(@()Jan_v3(M)),timeit(@()Rik(M))
ans = 1.5764e-05
ans = 1.1022e-05
ans = 1.9246e-05
ans = 6.7721e-05
function output=Rik(M)
% Return the rows of the matrix for which the entries in the 4th and 5th column are unique.
% First create a temporary matrix that only contains the relevant columns.
A = M(:, 4:5);
% indA contains indices to A to create the unique list
% indB contains indices to the unique list to get back to A
% We need to use 'stable' to avoid sorting.
[~,indA,indB] = unique(A,'rows','stable');
% Count how often every index occurs
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v1(M)
off = false;         % Slightly faster
A   = M(:, 4:5);     % Columns used for comparison
nA  = size(A, 1);    % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = all(A(iA, :) == A, 2);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :);    % Only rows, which occur once only
end
function Result=Jan_v2(M)
off = false;         % Slightly faster
A4  = M(:, 4);       % Columns used for comparison
A5  = M(:, 5);       % Columns used for comparison
nA  = size(A4, 1);   % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = (A4(iA) == A4 & A5(iA) == A5);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :);     % Only rows, which occur once only
end
function Result=Jan_v3(M)
% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :);
end

2 commentaires
Afficher AucuneMasquer Aucune

Chenglin Li le 25 Oct 2022

Thank you. I'll try again. Thank you very much indeed

Jan le 25 Oct 2022

Ouvrir dans MATLAB Online

Thanks, @Rik, for this comparison. While my loop versions have some speed advantages for tiny input, they are far to slow for large data. With

n  = 1e6;
M1 = [rand(n, 3), randi([0, 1000], n, 2)];  % Few repeated values
M2 = [rand(n, 3), randi([0, 10], n, 2)];    % Many repeated vaues
for n=1:1, timeit(@()Jan_v3(M1));timeit(@()Rik(M1));end
timeit(@()Jan_v3(M1))
timeit(@()Rik(M1))
timeit(@()Jan_v3(M2))
timeit(@()Rik(M2))

Sorry, I hesitate to post the timings online, because they vary from run to run by 25% ! The difference between the 2 functions is smaller than this deviation between runs. My conclusion: Both have almost the same hight speed.

function output=Rik(M)
A = M(:, 4:5);
[~,indA,indB] = unique(A,'rows','stable');
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v3(M)
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; diff(A(:, 1)) | diff(A(:, 2))];
% nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :);
end

Connectez-vous pour commenter.

Find the desired row in the matrix

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Plus de réponses (1)

2 commentaires
Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Community Treasure Hunt

Find the desired row in the matrix

3 commentaires Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Plus de réponses (1)

2 commentaires Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Community Treasure Hunt

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune