Using "unique" to identify unique values AND number of occurrences of each unique value

17 vues (au cours des 30 derniers jours)

Below is the head entries of a table
head(hits)
ID res1 score
_____________ ____ _______
AGAP001076-RD 282 0.67229
AGAP001076-RD 285 0.75292
AGAP001076-RD 286 0.66957
AGAP001076-RD 296 0.51694
AGAP001076-RD 298 0.51655
AGAP001076-RD 310 0.54564
AGAP001076-RD 314 0.74495
AGAP010077-RA 349 0.52136
Using "unique" I can obtain unique IDs. I would also like to obtain the number of occurences of each unique ID, e.g AGAP001076-RD 6
Thank you for your attention

Réponse acceptée

Steven Lord
Steven Lord le 19 Sep 2024
Use the groupcounts function.
A = {'AGAP001076-RD' 282 0.67229
'AGAP001076-RD' 285 0.75292
'AGAP001076-RD' 286 0.66957
'AGAP001076-RD' 296 0.51694
'AGAP001076-RD' 298 0.51655
'AGAP001076-RD' 310 0.54564
'AGAP001076-RD' 314 0.74495
'AGAP010077-RA' 349 0.52136};
[counts, groupID] = groupcounts(A(:, 1))
counts = 2×1
7 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
groupID = 2x1 cell array
{'AGAP001076-RD'} {'AGAP010077-RA'}
  3 commentaires
Paul
Paul le 19 Sep 2024
Check the linked doc page for groupcounts to see how to call it with a table input.
Steven Lord
Steven Lord le 19 Sep 2024
A = {'AGAP001076-RD' 282 0.67229
'AGAP001076-RD' 285 0.75292
'AGAP001076-RD' 286 0.66957
'AGAP001076-RD' 296 0.51694
'AGAP001076-RD' 298 0.51655
'AGAP001076-RD' 310 0.54564
'AGAP001076-RD' 314 0.74495
'AGAP010077-RA' 349 0.52136};
T = cell2table(A)
T = 8x3 table
A1 A2 A3 _________________ ___ _______ {'AGAP001076-RD'} 282 0.67229 {'AGAP001076-RD'} 285 0.75292 {'AGAP001076-RD'} 286 0.66957 {'AGAP001076-RD'} 296 0.51694 {'AGAP001076-RD'} 298 0.51655 {'AGAP001076-RD'} 310 0.54564 {'AGAP001076-RD'} 314 0.74495 {'AGAP010077-RA'} 349 0.52136
If your data is in a table array like the one I created above, you just have to tell groupcounts which variable(s) in the table is/are the grouping variable(s).
countsAndID = groupcounts(T, 'A1')
countsAndID = 2x3 table
A1 GroupCount Percent _________________ __________ _______ {'AGAP001076-RD'} 7 87.5 {'AGAP010077-RA'} 1 12.5
You can use multiple grouping variables as well. Let's make some data with duplicate rows and replace the values in A2 with ones more likely to cause a collision in the combination of the grouping variables A1 and A2.
T2 = T(randi(height(T), 20, 1), :);
T2.A2 = randi(5, 20, 1)
T2 = 20x3 table
A1 A2 A3 _________________ __ _______ {'AGAP001076-RD'} 2 0.51655 {'AGAP001076-RD'} 4 0.74495 {'AGAP001076-RD'} 1 0.75292 {'AGAP001076-RD'} 4 0.51655 {'AGAP001076-RD'} 5 0.54564 {'AGAP001076-RD'} 5 0.66957 {'AGAP001076-RD'} 5 0.51694 {'AGAP010077-RA'} 2 0.52136 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 3 0.75292 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 4 0.74495 {'AGAP001076-RD'} 4 0.51655 {'AGAP001076-RD'} 4 0.51694 {'AGAP001076-RD'} 4 0.51694 {'AGAP001076-RD'} 2 0.67229
countsAndID = groupcounts(T2, ["A1", "A2"])
countsAndID = 6x4 table
A1 A2 GroupCount Percent _________________ __ __________ _______ {'AGAP001076-RD'} 1 4 20 {'AGAP001076-RD'} 2 2 10 {'AGAP001076-RD'} 3 1 5 {'AGAP001076-RD'} 4 8 40 {'AGAP001076-RD'} 5 4 20 {'AGAP010077-RA'} 2 1 5
Let's check. How many rows of T2 have the same A1 and A2 values as the first row of the countsAndID table?
matchesForFirstRowA1 = matches(T2.A1, countsAndID{1, "A1"});
matchesForFirstRowA2 = T2.A2 == countsAndID{1, "A2"};
result = T2(matchesForFirstRowA1 & matchesForFirstRowA2, :)
result = 4x3 table
A1 A2 A3 _________________ __ _______ {'AGAP001076-RD'} 1 0.75292 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 1 0.51655
Does that match the count that groupcount returned in that first row of countsAndID?
isequal(height(result), countsAndID{1, "GroupCount"})
ans = logical
1

Connectez-vous pour commenter.

Plus de réponses (1)

Animesh
Animesh le 19 Sep 2024
In MATLAB, you can use the "unique" function along with the "histcounts" function to find the number of occurrences of each unique ID in your table. Here's how you can do it:
% Assume 'hits' is your table
% Extract the 'ID' column from the table
ids = hits.ID;
% Find unique IDs and their indices
[uniqueIDs, ~, idx] = unique(ids);
% Count the occurrences of each unique ID
occurrences = histcounts(idx, 1:max(idx)+1);
% Display the results
for i = 1:length(uniqueIDs)
fprintf('%s %d\n', uniqueIDs{i}, occurrences(i));
end
You can refer the following MathWorks documentation for more information on "histcounts" function:

Catégories

En savoir plus sur Tables dans Help Center et File Exchange

Produits


Version

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by