Find and reduce a numeric array with identical columns
Afficher commentaires plus anciens
Dear Sir/Madam,
I would like to ask you the following question:
I have a data file like this
tmp = [...
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438]
I would like re-group (or reduce) it with following conditions:
For any row, if column 1 AND column 3 of this row is identical with any column 1 AND column 3 of any other row. Then reduce to one new row with new value of column 2, this new value of column 2 is the sum of original values of column 2. Column 1 is kept the same, Column 4 is not important.
So, for above data, I expect to have the answer:
119 5 6832 0.20313 % 5+4-4=5
122 -48 6853 0.29688
125 -8 6853 0.4375 % -12+4=-8
121 12 6914 0.5625
119 48 6914 0.17188
What Matlab command to use? I would greatly appreciate it if you left your code and running output.
I am using MATLAB R2014a.
Thank you very much
3 commentaires
The order of the rows in your output is not clearly defined. What is the rule to get that order?
For example, both 121 and 122 each only occur once in the first column, but in the output matrix are listed neither in the sequence that they occur in the input matrix, nor in numeric order. How is this order supposed to be determined?
Image Analyst
le 30 Déc 2018
I was wondering the same thing. Hopefully the order doesn't matter. I'm sure you could write the code afterwards in such a ways that it didn't matter.
John Smith
le 30 Déc 2018
Modifié(e) : John Smith
le 30 Déc 2018
Réponse acceptée
Plus de réponses (1)
Image Analyst
le 30 Déc 2018
Modifié(e) : Image Analyst
le 30 Déc 2018
What about using grpstats(), if you have the Statistics and Machine Learning Toolbox.
tmp = [...
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438]
col5 = 10000*tmp(:, 1) + tmp(:, 3)
tmp = [tmp, col5];
% No sum in grpstats, so have to do it twice.
% Once to get the mean and once to get the count.
outputMean = grpstats(tmp, tmp(:, 5), 'mean')
outputNumel = grpstats(tmp, tmp(:, 5), 'numel')
% Crop off temporary 5th column
output = outputMean(:, 1:4) % Initialize
% Column 2 is the sum = mean * count
output(:, 2) = outputMean(:, 2) .* outputNumel(:, 2)
The output seems to be sorted by the first column though:
output =
119 5 6832 0.088544
119 48 6914 0.17188
121 12 6914 0.5625
122 -48 6853 0.29688
125 -8 6853 0.25781
That might be a problem for you. I'm not sure. Of course column 4 can be cropped off or ignored since you say it's not important.
Catégories
En savoir plus sur Repeated Measures and MANOVA dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!