Finding Duplicate string values in two cell array 22124x1
Afficher commentaires plus anciens
I have a cell 22124x1 and it contain duplicate Values, I want to know how many times these values duplicate and their index
first cell contain these values Datacell=
'221853_s_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'222031_at'
'222031_at'
'31637_s_at'
'37796_at'
'38340_at'
'39854_r_at'
'53202_at'
'53202_at'
'60528_at'
'60528_at'
'90610_at'
'90610_at'
symbol cell:
'OR1D4 '
' OR1D5'
' HLA-DRB4 '
' HLA-DRB5 '
' LOC100133661 '
' LOC100294036'
'UTP14A '
' UTP14C'
'GTF2H2 '
'ZNF324B '
' LOC644504'
'JMJD7 '
'ZNF324B '
' JMJD7-PLA2G4B'
'OR2A20P '
' OR2A5 '
' OR2A9P'
'ZNF324B '
' ZNF584'
'WHAMM '
' WHAMML1 '
'LOC100290658 '
' WHAMML2'
'NR1D1 '
' THRA'
'C7orf25 '
' PRR5 '
' PRR5-ARHGAP8'
'LOC100290658 '
'C7orf25 '
' SAP25'
'HIP1R '
' LOC100294412'
Any help will be highly appreciated
1 commentaire
Chuck Olosky
le 3 Oct 2017
Added (2) additional lines to get names and indices:
function [dupNames, dupNdxs] = getDuplicates(aList) % find duplicate entries in the list of names
[uniqueList,~,uniqueNdx] = unique(aList);
N = histc(uniqueNdx,1:numel(uniqueList));
dupNames = uniqueList(N>1);
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1), ...
'UniformOutput',false);
end
Réponses (1)
Jos (10584)
le 26 Jan 2016
Let C be your cell array of strings, then
[UniqueC,~,k] = unique(C)
N = histc(k,1:numel(UniqueC))
will give you the unique elements in UniqueC and their frequency in N
2 commentaires
Ansam Al-Sabti
le 26 Jan 2016
Sulaymon Eshkabilov
le 4 Juil 2021
The code given by Chuck Olosky gives the duplicate string names and indexes:
...
dupNames = uniqueList(N>1); % Names
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1),'UniformOutput',false); % Indexes
Catégories
En savoir plus sur Large Files and Big Data dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!