Saving Each Table Within a Structure as its Own Entity

Question

0 votes

Large_Table.xlsx

Hello,

I have a large table (over 10,000 rows), and each row's first column's cell has an identifier, with N identifiers total.

Eventually, within each of these N tables, I want to sum the columns from rows 3 through i.

The value for N will vary based on the imported Excel data, and the number of columns will never include columns 1 and 2 (always begin with column 3), but the value of i will also vary with the imported data.

So far, I was able to get my N tables split from the one large table.

However, these tables are stored in either a structure or a cell (I did it two different ways).

The Structure Method looks like this:

for i=1:N
    Count = strcat('v',num2str(i));
    Table.(Count) = Large_Table(Large_Table.Identifier==ID_String(i),:);
end

where the Large_Table.Identifier is the column of the table with the identifying information, and ID_String is a string that lists all the identifier's names.

This method saves everything to a structure, with N fields named v1, v2, ... vN., and each field is a table with the correct data, size, etc.

From here, I would like to add up each element in column 3 in table v1, each element in column 4 in table v1, all the way to each element in column i in table v1.

Then do the same for table v2, table v3, all the way to table vN.

What I need to do is be able to call each table, then sum each column up and repeat.

But what I'm mostly unsure about is how I will set up this for loop, specifically with referencing the variable name in the Boolean statement after the "for", and then having to re-define the number of columns of each of the tables (v1, v2, ..., vN).

I guess when it comes down to it, I am unfamiliar with naming and utlizing variables within a loop.

If this is confusing, please let me know and I can explain further.

For what it's worth, the Cell Method looks like this:

for i=1:N
    Test{i} = Large_Table(Large_Table.Identifier==ID_String(i),:);
end

which returns a cell (1xN), with each item within the cell holding a table.

From here, the same issue I have above applies here.

Thank you.

Edit:

In the attached Excel sheet, you'll see four columns.

Column A would be the identifier. The first part of my code removed the '.' and everything to the right of it.

Column B would be the 2nd column I referenced that is not important, as far as my final solution goes.

Columns C and D are the important columns that will eventually contribute to my final answers.

In this example, N=4 (J, K, L, and M) and i=4 (for the four columns).

I want to split this table into N=4 smaller tables: One with all the J's, one with the K's, the K's and the M's.

Then, with these four tables, I want to add up everything together in Column C for the J Table, and then add up everything together in Column D for the J table.

Then I want to repeat this for the K table.

Then again for the L table and M table.

So at the end of the day, I want essentially 8 values that represent the sum of every column C and column D input for the J, K, L, and M table.

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

dpb le 3 Août 2022

If what a gather you're after is so, you've making a very typical error in approaching calculations by grouping variables by trying to physically segregate the original into multiple other data variables. As you've discovered, this generally leads to just adding complexity rather than simplifying -- use grouping variables instead with the splitapply workflow -- or rowfun can often deal with the issue directly or even groupsummary may do it all almost automagically depending upon just what it is you're actually after.

Jon le 3 Août 2022

@dpb

I tried to recreate the code in a simpler manner but couldn't recreate my issue, and I can't post the original script due to confidentiality in my work.

What I will do which I think will help explain my ultimate goal is post a sample Excel sheet that will show what I want to do.

I'll add an edit in the original post that explains it more.

Thank you for your reply.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

dpb le 3 Août 2022

Modifié(e) : dpb le 3 Août 2022

Ouvrir dans MATLAB Online

0 votes

"I want to split this table into N=4 smaller tables:..."

Again, that's the wrongheaded approach -- it can (and should) be done in two or three lines of code...

tT=readtable('Large_Table.xlsx');                                   % read the raw data
tT.Cat=categorical(extractBefore(tT.Var1,'.'));                     % create the grouping variable
tS=groupsummary(tT,'Cat',"sum",{'Var2','Var3','Var4'});

The above returns

>> tS
tS =
  4×5 table
    Cat    GroupCount    sum_Var2    sum_Var3    sum_Var4
    ___    __________    ________    ________    ________
     J         51          464.1      634.95      1943.1 
     K         29          611.9      883.05      2612.9 
     L         21          600.6       875.7      2574.6 
     M         77         3334.1      4908.8       14345 
>> 

There's a mismatch between the description you provided for the file and the actual number of variables contained in it, I just did the ones that were finite at the beginning to illustrate; you can select whichever variables want/need.

NB: you can also select variables in a table with variables contaning a given name or by column number or variable type or... so you don't even have to have fixed column variable names as above.

Again, READ THE DOC!!! on table and the section on grouping variables and the splitapply workflow although as shown, most problems can be solved without that extra baggage.

6 commentaires
Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens

Stephen23 le 3 Août 2022

Modifié(e) : Stephen23 le 3 Août 2022

Ouvrir dans MATLAB Online

@dpb: note that the OP's uploaded file contains spurious text (a dot) in cell F3. If using the default READTABLE settings (as shown in your answer) the first two rows are not imported:

Torg = readtable('Large_Table_original.xlsx')
Torg = 178×6 table
Var1Var2Var3Var4Var5Var6__________________________________

    {'J.3' }    1.6      1.2     5.6    NaN     {'.'     }
    {'J.4' }    1.9     1.65     6.9    NaN     {0×0 char}
    {'J.5' }    2.2      2.1     8.2    NaN     {0×0 char}
    {'J.6' }    2.5     2.55     9.5    NaN     {0×0 char}
    {'J.7' }    2.8        3    10.8    NaN     {0×0 char}
    {'J.8' }    3.1     3.45    12.1    NaN     {0×0 char}
    {'J.9' }    3.4      3.9    13.4    NaN     {0×0 char}
    {'J.10'}    3.7     4.35    14.7    NaN     {0×0 char}
    {'J.11'}      4      4.8      16    NaN     {0×0 char}
    {'J.12'}    4.3     5.25    17.3    NaN     {0×0 char}
    {'J.13'}    4.6      5.7    18.6    NaN     {0×0 char}
    {'J.14'}    4.9     6.15    19.9    NaN     {0×0 char}
    {'J.15'}    5.2      6.6    21.2    NaN     {0×0 char}
    {'J.16'}    5.5     7.05    22.5    NaN     {0×0 char}
    {'J.17'}    5.8      7.5    23.8    NaN     {0×0 char}
    {'J.18'}    6.1     7.95    25.1    NaN     {0×0 char}

Deleting the text from F3 returns all rows:

Tfix = readtable('Large_Table_fixed.xlsx')
Tfix = 180×4 table
Var1Var2Var3Var4____________________

    {'J.1' }      1      0.3       3
    {'J.2' }    1.3     0.75     4.3
    {'J.3' }    1.6      1.2     5.6
    {'J.4' }    1.9     1.65     6.9
    {'J.5' }    2.2      2.1     8.2
    {'J.6' }    2.5     2.55     9.5
    {'J.7' }    2.8        3    10.8
    {'J.8' }    3.1     3.45    12.1
    {'J.9' }    3.4      3.9    13.4
    {'J.10'}    3.7     4.35    14.7
    {'J.11'}      4      4.8      16
    {'J.12'}    4.3     5.25    17.3
    {'J.13'}    4.6      5.7    18.6
    {'J.14'}    4.9     6.15    19.9
    {'J.15'}    5.2      6.6    21.2
    {'J.16'}    5.5     7.05    22.5

Jon le 4 Août 2022

Thank you very much, that is exactly what I need.

I'll read more about that groupsummary command.

What's the old saying: the simplest solution tends to be the correct solution?

What was accomplished in one line of code I did in about 20 lines.

It's nice to have this community to help teach shortcuts.

dpb le 5 Août 2022

Moral is to spend more time up front reading/studying the doc and the example usages provided -- there's a LOT of stuff implemented but unless one goes looking and invests some time, won't find or know about it.

The time spent will pay great dividends on the investment...

Connectez-vous pour commenter.

Answer 2

Steven Lord le 3 Août 2022

Ouvrir dans MATLAB Online

0 votes

So if I understand what you want to do correctly, you have a vector of identifiers and an array or a table of data:

rng default
ID = char(randi(double('AE'), 10, 1));
data = randi([-10 10], 10, 1);
T = table(ID, data)
T = 10×2 table
    ID    data
    __    ____

    E      -7 
    E      10 
    A      10 
    E       0 
    D       6 
    A      -8 
    B      -2 
    C       9 
    E       6 
    E      10 

and you want to sum up all the rows in data corresponding to the identifier A, all the rows corresponding to identifier B, etc.? If so you can use groupsummary to sum (@sum) the values in the data variable in T, grouped by values in the ID variable in T.

result = groupsummary(T, 'ID', @sum)
result = 5×3 table
    ID    GroupCount    fun1_data
    __    __________    _________

    A         2             2    
    B         1            -2    
    C         1             9    
    D         1             6    
    E         5            19    

In T, two rows have A in the ID variable: rows 3 and 6. The sum of the values in rows 3 and 6 of the data variable is 10 + (-8) or 2. B, C, and D only appear once in T so they're pretty straightforward. The rows in the data variable corresponding to E in the ID variable are -7, 10, 0, 6, and 10 and if you add those 5 numbers together you get a value of 19.

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Jon le 3 Août 2022

Not quite my goal, but thank you.

I'm going to post a sample Excel sheet that should help explain it.

Connectez-vous pour commenter.

Saving Each Table Within a Structure as its Own Entity

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

Réponse acceptée

6 commentaires
Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens

Plus de réponses (1)

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Catégories

Produits

Version

Tags

Community Treasure Hunt

Saving Each Table Within a Structure as its Own Entity

4 commentaires Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

Réponse acceptée

6 commentaires Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens

Plus de réponses (1)

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

6 commentaires
Afficher 4 commentaires plus anciens Masquer 4 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens