Finding Duplicate string values in two cell array 22124x1

I have a cell 22124x1 and it contain duplicate Values, I want to know how many times these values duplicate and their index
first cell contain these values Datacell=
'221853_s_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'222031_at'
'222031_at'
'31637_s_at'
'37796_at'
'38340_at'
'39854_r_at'
'53202_at'
'53202_at'
'60528_at'
'60528_at'
'90610_at'
'90610_at'
symbol cell:
'OR1D4 '
' OR1D5'
' HLA-DRB4 '
' HLA-DRB5 '
' LOC100133661 '
' LOC100294036'
'UTP14A '
' UTP14C'
'GTF2H2 '
'ZNF324B '
' LOC644504'
'JMJD7 '
'ZNF324B '
' JMJD7-PLA2G4B'
'OR2A20P '
' OR2A5 '
' OR2A9P'
'ZNF324B '
' ZNF584'
'WHAMM '
' WHAMML1 '
'LOC100290658 '
' WHAMML2'
'NR1D1 '
' THRA'
'C7orf25 '
' PRR5 '
' PRR5-ARHGAP8'
'LOC100290658 '
'C7orf25 '
' SAP25'
'HIP1R '
' LOC100294412'
Any help will be highly appreciated

1 commentaire

Added (2) additional lines to get names and indices:
function [dupNames, dupNdxs] = getDuplicates(aList) % find duplicate entries in the list of names
[uniqueList,~,uniqueNdx] = unique(aList);
N = histc(uniqueNdx,1:numel(uniqueList));
dupNames = uniqueList(N>1);
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1), ...
'UniformOutput',false);
end

Connectez-vous pour commenter.

Réponses (1)

Let C be your cell array of strings, then
[UniqueC,~,k] = unique(C)
N = histc(k,1:numel(UniqueC))
will give you the unique elements in UniqueC and their frequency in N

2 commentaires

Thanks. But It does not give me their index unfortuantely
The code given by Chuck Olosky gives the duplicate string names and indexes:
...
dupNames = uniqueList(N>1); % Names
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1),'UniformOutput',false); % Indexes

Connectez-vous pour commenter.

Catégories

En savoir plus sur Large Files and Big Data dans Centre d'aide et File Exchange

Tags

Aucun tag saisi pour le moment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by