Duplicate x,y in table; get min/max of variables to new table

1 vue (au cours des 30 derniers jours)
newbie9
newbie9 le 2 Nov 2019
Commenté : newbie9 le 2 Nov 2019
I have a large dataset, small subset attached. There are duplicate x,y values. I need a cleaned table with "var1_max, var1_min, var2_max, var2_min" etc. for every duplicate "x,y"
I'm just not sure where to start setting up the problem. Thanks for any pointers to get going. Using Matlab R2018a, no fancy add-ons/packages
file_in = 'example.csv';
data_in = readtable(file_in);
%%table_out.Properties.VariableNames = {'id', 'x', 'y', 'var1_max', 'var1_min', ...
%% 'var2_max', 'var2_min', 'var3_max', 'var3_min', 'var4_max', 'var4_min', ...
%% 'var5_max', 'var5_min', 'var6_max', 'var6_min', 'var7_max', 'var7_min'};
  3 commentaires
Cris LaPierre
Cris LaPierre le 2 Nov 2019
Not sure I understand completely. I have the same questions a dpb. It looks like you've highlighted groups, but rows 12-14 only have duplicate x values, not x and y.
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
Rows 6-10 do have values there. What should the min/max be?
And finally, 12-14 have same x but different y. How should this be handled?
newbie9
newbie9 le 2 Nov 2019
Modifié(e) : newbie9 le 2 Nov 2019
Please disregard highlight on rows 12-14, it is in error.
I think I should add "var1_mid" ... "var7_mid" to handle cases where x,y are unique. Example in row 1:
  • x = 31.74053 | y = 27.18408 | var1_max = NaN | var1_min = NaN | var1_mid = 90 | var2_max = NaN | var2_min = NaN | var2_mid = 54 | ...etc.
Answers to other questions below:
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
  • just the min/max within the group (those that have the same x,y values)
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
  • NaN
Rows 6-10 do have values there. What should the min/max be?
  • x = 31.79403 | y = 27.26837 | var1_max = 79 | var1_min = 68 | var1_mid = NaN | var2_max = 251 | var2_min = 193 | var2_mid = NaN | ...etc.

Connectez-vous pour commenter.

Réponse acceptée

Cris LaPierre
Cris LaPierre le 2 Nov 2019
One potential solution that is simple to implement is to use groupsummary. You can have your data grouped by x and y, and have it return the within-group min, mean, and max for table variables you specify.
However, you prescribe wanting NaN for mid-value if there are min/max values. This won't do that .It will provide values for all 3 statistics. Also note that group summary will only return one row for each group. You can use the join function if you want to merge the two together.
newData = groupsummary(data_in,{'x','y'},{'min','mean','max'},["var1","var2","var3","var4","var5","var6","var7"])
  1 commentaire
newbie9
newbie9 le 2 Nov 2019
This is perfect, thank you. The groupsummary returns a GroupCount, so I can just use that such that if > 1 then mean = NaN

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Tables dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by