How to count data based on the categories
Afficher commentaires plus anciens
There is a Master matrix as follow that includes a unique ID (first column), different ID (second column) and last three columns (C/D/E) are included a number with a different range. (e.g. Column #3 included 1:2; column #4 included 1:2; column #5 included 1:10). I want a new matrix (like matrix OT in the following) that according to the unique ID find in the first column, count column 3 to 5 based on the different category. (e.g. ID = 1, column #3, category #1 = counted as 5, category #2 = counted as 0).
Master = [1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10]
Example of output for first ID #1
OT = [1 5 0 2 3 2 0 0 0 0 0 0 1 1 0];

1 commentaire
Réponse acceptée
Plus de réponses (2)
Image Analyst
le 22 Oct 2015
If you have the Statistics and Machine Learning Toolbox, you can use grpstats():
Master = [...
1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10]
statsArray = grpstats(Master, Master(:,1), 'sum')
and in the command window you'll see
statsArray =
5 202692 5 8 28
12 381496 6 7 36
9 107462 3 5 19
Columns 3-5 in statsArray are the sums in columns 3-5 broken down by category number in column 1 of Master.
2 commentaires
Moe
le 22 Oct 2015
Image Analyst
le 23 Oct 2015
Sorry, I didn't understand your definition/distinction between count and sum. If you want count, it seems to get the number of unique numbers. So you can just do
statsArray = grpstats(Master, Master(:,1), @fun)
with "fun" being defined as:
function num = fun(array)
num= length(unique(array));
It seems a lot simpler than the answer you chose, but whatever... I know I modified my answer after you had already picked a solution. If you want a :one-liner" you can still use it.
Peter Perkins
le 23 Oct 2015
Another possibility, using rowfun and a table. This code:
Master = ...
[1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10];
M = array2table(Master,'VariableNames',{'A' 'B' 'C' 'D' 'E'});
outNames = {'C1' 'C2' 'D1' 'D2' 'E1' 'E2' 'E3' 'E4' 'E5' 'E6' 'E7' 'E8' 'E9' 'E10'};
counts = rowfun(@fun, M,'GroupingVariables','A','InputVariables',{'C' 'D' 'E'},'OutputVariableNames',outNames)
... with this function:
function [varargout] = fun(C,D,E)
counts = [histc(C',1:2) histc(D',1:2) histc(E',1:10)];
varargout = num2cell(counts);
... produces this output:
counts =
A GroupCount C1 C2 D1 D2 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10
_ __________ __ __ __ __ __ __ __ __ __ __ __ __ __ ___
1 5 5 0 2 3 2 0 0 0 0 0 0 1 2 0
2 6 6 0 5 1 1 0 0 0 0 2 1 2 0 0
3 3 3 0 1 2 0 1 0 0 0 0 1 0 0 1
There are probably better ways to provide this result, for example with a table more like this:
ans =
A GroupCount Ccounts Dcounts Ecounts
_ __________ _______ _______ _____________
1 5 5 0 2 3 [1x10 double]
2 6 6 0 5 1 [1x10 double]
3 3 3 0 1 2 [1x10 double]
but the former is more or less what you seem to be asking for.
Catégories
En savoir plus sur Logical dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!