Find duplicate entries and sum up their associated values then put everything back in a table

20 vues (au cours des 30 derniers jours)
Hello everyone,
I have an excel sheet with two columns. The first column is a list of names, and the second column is the score of each person. Some names have multiple score and I want to find duplicate entries and sum up their associated values then put everything back in a table. Basically clean up the duplicate but want the sum of the scores. The excel file is attached as well
I would really appreciate if you have a script or a function can achieve the results.
Here where I’m at
Data = readtable("Data.xlsx")
unique(Data.Names)
[C,ia,idx] = unique(Data(:,1),'stable');
val = accumarray(idx,Data(:,2),[],@sum);
mat = [C val]

Réponse acceptée

dpb
dpb le 8 Juil 2021
tData = readtable("Data.xlsx");
tData.Properties.VariableNames={'Name','Score'}; % set a known set of variable names
tSum=rowfun(@sum,tData,'InputVariables','Score','GrouptingVariables','Name');
writetable(tSum,'PickAnOutputFileName.xls')
  2 commentaires
LeoAiE
LeoAiE le 8 Juil 2021
@dpb Thank you for your help but it didn't run corretly for some reason it giving the follwoing message
LeoAiE
LeoAiE le 8 Juil 2021
@dpb Never mind it was grouping Variables misspelled. Thank you for your help I really appreciate it

Connectez-vous pour commenter.

Plus de réponses (1)

Cris LaPierre
Cris LaPierre le 8 Juil 2021
It gets a bit trickly since you don't want to change the order of the names, but I think you could do this with groupsummary.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
Data.Order = (1:height(Data))'
Data = 50×3 table
Names Scores Order _____________________ ______ _____ {'Daniel Nelson' } 78 1 {'Henry Cunningham' } 28 2 {'Adison Dixon' } 41 3 {'Haris Carroll' } 15 4 {'Gianna Jones' } 3 5 {'Ted Alexander' } 64 6 {'Connie Hill' } 43 7 {'Abigail Clark' } 7 8 {'Nicole Brooks' } 2 9 {'Heather Tucker' } 67 10 {'Alen Crawford' } 97 11 {'Dexter Carroll' } 32 12 {'Elian Thompson' } 9 13 {'Nicholas Campbell'} 35 14 {'Alina Davis' } 72 15 {'Adelaide Gibson' } 57 16
% Use groupsummary to find sum and min of scores and order for each unique name
sumTbl = groupsummary(Data,"Names",["sum","min"],["Scores" "Order"])
sumTbl = 18×6 table
Names GroupCount sum_Scores min_Scores sum_Order min_Order _____________________ __________ __________ __________ _________ _________ {'Abigail Clark' } 3 135 7 61 8 {'Adelaide Gibson' } 3 160 45 103 16 {'Adison Dixon' } 4 242 41 107 3 {'Alen Crawford' } 2 101 4 43 11 {'Alina Davis' } 2 76 4 51 15 {'Connie Hill' } 4 81 4 107 7 {'Daniel Nelson' } 2 103 25 44 1 {'Dexter Carroll' } 2 89 32 45 12 {'Elian Thompson' } 2 95 9 47 13 {'Elian Wells' } 2 130 45 55 17 {'Gianna Jones' } 4 122 3 115 5 {'Haris Carroll' } 4 181 15 111 4 {'Heather Tucker' } 3 112 13 67 10 {'Henry Cunningham' } 2 128 28 46 2 {'Leonardo Elliott' } 2 46 9 57 18 {'Nicholas Campbell'} 2 74 35 49 14
% sort the results, which are alphabetical, so they are back in the original order
sumTbl = sortrows(sumTbl,"min_Order");
% create a new table with just Names and total scores
finalData = sumTbl(:,["Names","sum_Scores"]);
% rename 'sum_Scores' back to 'Scores'
finalData.Properties.VariableNames(2) = "Scores"
finalData = 18×2 table
Names Scores _____________________ ______ {'Daniel Nelson' } 103 {'Henry Cunningham' } 128 {'Adison Dixon' } 242 {'Haris Carroll' } 181 {'Gianna Jones' } 122 {'Ted Alexander' } 190 {'Connie Hill' } 81 {'Abigail Clark' } 135 {'Nicole Brooks' } 105 {'Heather Tucker' } 112 {'Alen Crawford' } 101 {'Dexter Carroll' } 89 {'Elian Thompson' } 95 {'Nicholas Campbell'} 74 {'Alina Davis' } 76 {'Adelaide Gibson' } 160
  3 commentaires
dpb
dpb le 8 Juil 2021
I didn't think of groupsummary, Chris. Good thinking. Was keeping the order an input requirement? I missed it if it was. I've done same as you before several times; have one humongous spreadsheet that has to get updated piecemeal -- it has a (hidden) column that contains =ROW() precisely for the purpose.
Cris LaPierre
Cris LaPierre le 8 Juil 2021
I had interpreted it as having to be in the same order, but on a re-read, that is not stated. If not, it gets much simpler.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
% Use groupsummary to find sum scores for each unique name
finalData = groupsummary(Data,"Names","sum")
This keeps the GroupCount information, but that's not bad info to have either.

Connectez-vous pour commenter.

Produits


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by