Identify Duplicate values in an array and replace with Nan

19 vues (au cours des 30 derniers jours)
Poulomi Ganguli
Poulomi Ganguli le 7 Sep 2023
Hello: I have a array as below of dimension 40,000x3. Third columns often contain duplicate values. I need to identify duplicate values and replace with Nan. Kindly help.
2.39300000000000 0 6.16800000000000
2.38720000000000 0 6.16800000000000
2.38480000000000 0 6.16800000000000
2.37380000000000 0 6.16800000000000
2.37410000000000 0 6.16800000000000
2.37020000000000 0 6.16800000000000
2.36880000000000 0 6.16800000000000
2.36350000000000 0 6.16800000000000

Réponse acceptée

Mrutyunjaya Hiremath
Mrutyunjaya Hiremath le 7 Sep 2023
If you want to replace only the duplicates with 'NaN' and keep one occurrence of each value intact, here is the code:
% Sample array (Replace this with your 40,000x3 array)
array = [
2.3930, 0, 6.1680;
2.3872, 0, 6.1680;
2.3848, 0, 6.1780;
2.3738, 0, 6.1680;
2.3741, 0, 6.1690;
2.3702, 0, 6.1780;
2.3688, 0, 6.1690;
2.3635, 0, 6.1780;
];
% Extract the third column
third_col = array(:, 3);
% Find unique values and their first occurrence index
[unique_vals, ~, ic] = unique(third_col);
% Count the occurrence of each unique value
counts = accumarray(ic, 1);
% Identify values that occur more than once (duplicates)
duplicate_vals = unique_vals(counts > 1);
% Replace only duplicates with NaN, keep one occurrence of each value
for val = duplicate_vals'
idx = find(third_col == val);
third_col(idx(2:end)) = NaN; % Keep the first occurrence, replace the rest with NaN
end
% Update the third column in the original array
array(:, 3) = third_col;
% Display the updated array
disp(array);
2.3930 0 6.1680 2.3872 0 NaN 2.3848 0 6.1780 2.3738 0 NaN 2.3741 0 6.1690 2.3702 0 NaN 2.3688 0 NaN 2.3635 0 NaN
In this code, for each duplicate value, find its indices in the third column using 'find'. Then, keep the first occurrence (index idx(1)) and replace the rest (idx(2:end)) with NaN. This will leave one instance of each value in the third column and replace only the duplicates with 'NaN'.

Plus de réponses (1)

Dyuman Joshi
Dyuman Joshi le 7 Sep 2023
Modifié(e) : Dyuman Joshi le 7 Sep 2023
Here's a much faster and simpler approach -
array = [2.3930, 0, 6.1680;
2.3872, 0, 6.1680;
2.3848, 0, 6.1780;
2.3738, 0, 6.1680;
2.3741, 0, 6.1690;
2.3702, 0, 6.1780;
2.3688, 0, 6.1690;
2.3635, 0, 6.1780];
%Get the unique values and the indices corresponding to their 1st occurence
%in order they appear in the array
[val,first_idx] = unique(array(:,3),'stable')
val = 3×1
6.1680 6.1780 6.1690
first_idx = 3×1
1 3 5
%Convert all the values of column 3 to NaN
array(:,3) = NaN;
%Re-assign the values according to the indices
array(first_idx,3) = val
array = 8×3
2.3930 0 6.1680 2.3872 0 NaN 2.3848 0 6.1780 2.3738 0 NaN 2.3741 0 6.1690 2.3702 0 NaN 2.3688 0 NaN 2.3635 0 NaN

Catégories

En savoir plus sur Matrices and Arrays dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by