Using accumarray to organize large CSV

Question

Emily le 21 Sep 2023

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2024302-using-accumarray-to-organize-large-csv

Commenté : dpb le 22 Sep 2023

data1.csv

I have a CSV with 4 columns, containing several thousands of rows of data. I have attached a sample data sheet with a small amount of data as an example.

I'm trying to write a code to import the data in a way that will be easy to create scatter plots/visualize the data, as well as calculate some basic stats (mean, median, std. dev., etc).

I want to group the data based on TWO of the columns: SITE NUMBER and DEPTH

The other two columns, d18O and date are what I need to plot/visualize

Essentially, my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH

I started using accumarray to accomplish this but I am having issues with the arrays, they are not showing up with all of the unique rows of data

T = readtable("data1.csv", 'VariableNamingRule','preserve');
[G,ID] = findgroups(T.("Site number"));                                            
data = accumarray(G, T{:,4},[], @(x){T{x,:}});

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

dpb le 22 Sep 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2024302-using-accumarray-to-organize-large-csv#answer_1315552

Modifié(e) : dpb le 22 Sep 2023

Ouvrir dans MATLAB Online

data1.csv

tT=readtable("data1.csv");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
tT.Properties.VariableNames(2)={'Site'};    % shorten to be more convenient to use
head(tT)
    d18O     Site    Date     Depth
    _____    ____    _____    _____

     -1.3    1004    44204     20  
     -1.9    1004    44263     20  
     -1.2    1004    44295     20  
     -1.7    1004    44567     20  
     -1.5    1004    44235     20  
     -9.3    1009    44323     50  
    -7.72    1009    44352     50  
    -7.07    1009    44445     50  
G=grpstats(tT,{'Site','Depth'},{'mean','median','std'},'DataVars',{'Date','d18O'})
G = 6×9 table
               Site    Depth    GroupCount    mean_Date    median_Date    std_Date    mean_d18O    median_d18O    std_d18O
               ____    _____    __________    _________    ___________    ________    _________    ___________    ________

    1004_20    1004     20           5          44313         44263        146.03        -1.52         -1.5       0.28636 
    1004_80    1004     80           6          44404         44489        339.01      -8.5533       -8.695       0.30137 
    1009_50    1009     50          11          44441         44412        158.55      -8.0855        -8.22        1.0068 
    1012_40    1012     40           3          44381         44263        235.02        -5.81        -4.38        2.6254 
    1014_20    1014     20           5          43936         43984        142.16       -6.326        -6.22       0.38546 
    1023_20    1023     20           2          44067         44067        237.59        -4.39        -4.39       0.39598 

rowfun

hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O'},'OutputFormat','uniform');

function h=doit(x,y)

figure

h=scatter(x,y,'filled');

xlabel('Date'),ylabel('d18O')

end

2 commentaires
Afficher AucuneMasquer Aucune

Emily le 22 Sep 2023

this produces what I need, thank you!

dpb le 22 Sep 2023

Ouvrir dans MATLAB Online

data1.csv

NOTA BENE that you can get much more creative with the user function...

tT=readtable("data1.csv");

Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.

tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use

hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O','Site','Depth'},'OutputFormat','uniform');

function h=doit(x,y,s,d)

figure

h=scatter(x,y,'filled');

xlabel('Date'),ylabel('d18O')

legend(compose('Site %3d Depth %3d',s(1),d(1)))

end

Connectez-vous pour commenter.

Answer 2

Voss le 21 Sep 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2024302-using-accumarray-to-organize-large-csv#answer_1315547

Modifié(e) : Voss le 21 Sep 2023

Ouvrir dans MATLAB Online

data1.csv

"my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH"

Something like this?

T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
gvars = ["Site number","Depth"];
vars = setdiff(T.Properties.VariableNames,gvars);
T_summary = groupsummary(T,gvars,@(x){x});
T_summary = removevars(T_summary,"GroupCount");
T_summary = renamevars(T_summary,"fun1_"+vars,vars)
T_summary = 6×4 table
    Site number    Depth        d18O             Date     
    ___________    _____    _____________    _____________

       1004         20      { 5×1 double}    { 5×1 double}
       1004         80      { 6×1 double}    { 6×1 double}
       1009         50      {11×1 double}    {11×1 double}
       1012         40      { 3×1 double}    { 3×1 double}
       1014         20      { 5×1 double}    { 5×1 double}
       1023         20      { 2×1 double}    { 2×1 double}
T_summary.Date{1}
ans = 5×1
       44204
       44263
       44295
       44567
       44235
T_summary.d18O{1}
ans = 5×1
   -1.3000
   -1.9000
   -1.2000
   -1.7000
   -1.5000

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voss le 22 Sep 2023

Ouvrir dans MATLAB Online

data1.csv

Or maybe it's more convenient just to sort the table so that rows with the same Site number and Depth are together:

T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
T_sorted = sortrows(T,["Site number","Depth"]);
disp(T_sorted);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  

Connectez-vous pour commenter.

Using accumarray to organize large CSV

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires
Afficher AucuneMasquer Aucune

Plus de réponses (1)

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

Using accumarray to organize large CSV

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires Afficher AucuneMasquer Aucune

Plus de réponses (1)

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens