Comparing and removing rows of an array that are within 5% of each other

7 vues (au cours des 30 derniers jours)
Michael Costa
Michael Costa le 26 Août 2021
I have an array which is ~30 million x 14. It is sorted in ascending order of the first element of each row. I am trying to compare each row in the array to the previous row, and remove it if all 14 values are within 5% or less of the previous row's 14 values. The idea is that, if a row is within 5% of the previous row, I can treat them as if they are duplicates, and I don't want to include them in my final data set. Since the array is large, I would prefer to use logical indexing if possible, but I am also willing to use a for loop if neccesary.

Réponses (1)

Image Analyst
Image Analyst le 26 Août 2021
Try this:
data = 10 + rand(6, 4) % Sample data
[rows, columns] = size(data);
% Find out percentage differences between an element and the one above it.
percentDifferences = abs([ones(1, columns); diff(data, 1)] ./ data)
% Find out which rows have all percent differences less than 5% of previous row.
rowsToDelete = all(percentDifferences < 0.05, 2)
% Do the deletions.
data(rowsToDelete, :) = []

Catégories

En savoir plus sur Matrix Indexing dans Help Center et File Exchange

Produits


Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by