Using accumarray to organize large CSV
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a CSV with 4 columns, containing several thousands of rows of data. I have attached a sample data sheet with a small amount of data as an example.
I'm trying to write a code to import the data in a way that will be easy to create scatter plots/visualize the data, as well as calculate some basic stats (mean, median, std. dev., etc).
I want to group the data based on TWO of the columns: SITE NUMBER and DEPTH
The other two columns, d18O and date are what I need to plot/visualize
Essentially, my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH
I started using accumarray to accomplish this but I am having issues with the arrays, they are not showing up with all of the unique rows of data
T = readtable("data1.csv", 'VariableNamingRule','preserve');
[G,ID] = findgroups(T.("Site number"));
data = accumarray(G, T{:,4},[], @(x){T{x,:}});
0 commentaires
Réponse acceptée
dpb
le 22 Sep 2023
Modifié(e) : dpb
le 22 Sep 2023
tT=readtable("data1.csv");
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
head(tT)
G=grpstats(tT,{'Site','Depth'},{'mean','median','std'},'DataVars',{'Date','d18O'})
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O'},'OutputFormat','uniform');
function h=doit(x,y)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
end
2 commentaires
dpb
le 22 Sep 2023
NOTA BENE that you can get much more creative with the user function...
tT=readtable("data1.csv");
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O','Site','Depth'},'OutputFormat','uniform');
function h=doit(x,y,s,d)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
legend(compose('Site %3d Depth %3d',s(1),d(1)))
end
Plus de réponses (1)
Voss
le 21 Sep 2023
Modifié(e) : Voss
le 21 Sep 2023
"my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH"
Something like this?
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
gvars = ["Site number","Depth"];
vars = setdiff(T.Properties.VariableNames,gvars);
T_summary = groupsummary(T,gvars,@(x){x});
T_summary = removevars(T_summary,"GroupCount");
T_summary = renamevars(T_summary,"fun1_"+vars,vars)
T_summary.Date{1}
T_summary.d18O{1}
1 commentaire
Voss
le 22 Sep 2023
Or maybe it's more convenient just to sort the table so that rows with the same Site number and Depth are together:
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
T_sorted = sortrows(T,["Site number","Depth"]);
disp(T_sorted);
Voir également
Catégories
En savoir plus sur Tables dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!











