How do I pull out from a table, the total mean of one item based on a particular condition (for several subjects), and then compute the difference in mean values?

4 views (last 30 days)
Monique Chatterton on 22 Oct 2021
I have a large table of data (example below).
For each "Subject", I need to determine 1) the total mean value of all "Item As" that are 'High' in "Condition Z", and 2) the total mean value of all "Item As" that are 'Low' in "Condition Z". For example, Subject 1 the answers would be: 25 (i.e. The mean for Item A for 'High' "Condition Z" items); and 50 (i.e. The mean for Item A for 'Low' "Condition Z" items)
I then need to generate a table that, for each "Subject", lists the difference in these two values as follows: ("total mean value of all "Item As" that are 'High' in "Condition Z"") - ("the total mean value of all "Item As" that are 'Low' in "Condition Z"). Thank you very much!
Subject Item A Condition Z Restudied Class
1 30 High Y A
1 20 High N B
1 45 Low N A
1 55 Low Y B
2 17 High Y A
2 45 Low N B
2 13 High Y A
3 56 High Y A
3 12 Low N B
3 34 Low N B
3 15 High N A

Dave B on 22 Oct 2021
Edited: Dave B on 22 Oct 2021
You can do the first part with groupsummary
Subject = [1 1 1 1 2 2 2 3 3 3 3]';
ItemA=[30 20 45 55 17 45 13 56 12 34 15]';
ConditionZ=["High" "High" "Low" "Low" "High" "Low" "High" "High" "Low" "Low" "High"]';
t=table(Subject,ItemA,ConditionZ)
t = 11×3 table
Subject ItemA ConditionZ _______ _____ __________ 1 30 "High" 1 20 "High" 1 45 "Low" 1 55 "Low" 2 17 "High" 2 45 "Low" 2 13 "High" 3 56 "High" 3 12 "Low" 3 34 "Low" 3 15 "High"
tsummary=groupsummary(t,["Subject" "ConditionZ"],"mean","ItemA")
tsummary = 6×4 table
Subject ConditionZ GroupCount mean_ItemA _______ __________ __________ __________ 1 "High" 2 25 1 "Low" 2 50 2 "High" 2 15 2 "Low" 1 45 3 "High" 2 35.5 3 "Low" 2 23
You might be tempted to do the second part by just subtracting tsummary.mean_ItemA(tsummary.ConditionZ=="High") - tsummary.mean_ItemA(tsummary.ConditionZ=="low")...but i think a nicer solution is to make two separate tables and join them which will make sure that all of the indices line up.
thigh = tsummary(tsummary.ConditionZ=="High",:)
thigh = 3×4 table
Subject ConditionZ GroupCount mean_ItemA _______ __________ __________ __________ 1 "High" 2 25 2 "High" 2 15 3 "High" 2 35.5
tlow = tsummary(tsummary.ConditionZ=="Low",:)
tlow = 3×4 table
Subject ConditionZ GroupCount mean_ItemA _______ __________ __________ __________ 1 "Low" 2 50 2 "Low" 1 45 3 "Low" 2 23
tdiff=join(tlow,thigh,'Keys','Subject')
tdiff = 3×7 table
Subject ConditionZ_tlow GroupCount_tlow mean_ItemA_tlow ConditionZ_thigh GroupCount_thigh mean_ItemA_thigh _______ _______________ _______________ _______________ ________________ ________________ ________________ 1 "Low" 2 50 "High" 2 25 2 "Low" 1 45 "High" 2 15 3 "Low" 2 23 "High" 2 35.5
tdiff.difference = tdiff.mean_ItemA_thigh - tdiff.mean_ItemA_tlow
tdiff = 3×8 table
Subject ConditionZ_tlow GroupCount_tlow mean_ItemA_tlow ConditionZ_thigh GroupCount_thigh mean_ItemA_thigh difference _______ _______________ _______________ _______________ ________________ ________________ ________________ __________ 1 "Low" 2 50 "High" 2 25 -25 2 "Low" 1 45 "High" 2 15 -30 3 "Low" 2 23 "High" 2 35.5 12.5
tdiff=tdiff(:,[1 end]) % just for display, no reason to eliminate the intermediate columns
tdiff = 3×2 table
Subject difference _______ __________ 1 -25 2 -30 3 12.5
Monique Chatterton on 23 Oct 2021
This worked beautifully! Thank you so much for this answer and the additional information and explanations!

dpb on 22 Oct 2021
tMEANS=rowfun(@mean,tData,'GroupingVariables'{'ConditionZ'},'InputVariables','ItemA','OutputVariableNames','GroupedMeans');
See the doc for all the skinny on grouping variables and rowfun and friends...you might also find groupsummary of interest.

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by