Saving Each Table Within a Structure as its Own Entity

Hello,
I have a large table (over 10,000 rows), and each row's first column's cell has an identifier, with N identifiers total.
Eventually, within each of these N tables, I want to sum the columns from rows 3 through i.
The value for N will vary based on the imported Excel data, and the number of columns will never include columns 1 and 2 (always begin with column 3), but the value of i will also vary with the imported data.
So far, I was able to get my N tables split from the one large table.
However, these tables are stored in either a structure or a cell (I did it two different ways).
The Structure Method looks like this:
for i=1:N
Count = strcat('v',num2str(i));
Table.(Count) = Large_Table(Large_Table.Identifier==ID_String(i),:);
end
where the Large_Table.Identifier is the column of the table with the identifying information, and ID_String is a string that lists all the identifier's names.
This method saves everything to a structure, with N fields named v1, v2, ... vN., and each field is a table with the correct data, size, etc.
From here, I would like to add up each element in column 3 in table v1, each element in column 4 in table v1, all the way to each element in column i in table v1.
Then do the same for table v2, table v3, all the way to table vN.
What I need to do is be able to call each table, then sum each column up and repeat.
But what I'm mostly unsure about is how I will set up this for loop, specifically with referencing the variable name in the Boolean statement after the "for", and then having to re-define the number of columns of each of the tables (v1, v2, ..., vN).
I guess when it comes down to it, I am unfamiliar with naming and utlizing variables within a loop.
If this is confusing, please let me know and I can explain further.
For what it's worth, the Cell Method looks like this:
for i=1:N
Test{i} = Large_Table(Large_Table.Identifier==ID_String(i),:);
end
which returns a cell (1xN), with each item within the cell holding a table.
From here, the same issue I have above applies here.
Thank you.
Edit:
In the attached Excel sheet, you'll see four columns.
Column A would be the identifier. The first part of my code removed the '.' and everything to the right of it.
Column B would be the 2nd column I referenced that is not important, as far as my final solution goes.
Columns C and D are the important columns that will eventually contribute to my final answers.
In this example, N=4 (J, K, L, and M) and i=4 (for the four columns).
I want to split this table into N=4 smaller tables: One with all the J's, one with the K's, the K's and the M's.
Then, with these four tables, I want to add up everything together in Column C for the J Table, and then add up everything together in Column D for the J table.
Then I want to repeat this for the K table.
Then again for the L table and M table.
So at the end of the day, I want essentially 8 values that represent the sum of every column C and column D input for the J, K, L, and M table.

4 commentaires

Attach a subsection of the original table as a .mat file -- I can't make heads nor tails from the verbal description -- we need actual data to poke at to have any chance at all methinks...and it's always much easier even when it isn't complicated to have something to work with to illustrate...and it's far more likely somebody will take on the challenge if they don't have to try to invent a datafile that may (or may not) represent the actual situation besides.
Stephen23
Stephen23 le 3 Août 2022
Modifié(e) : Stephen23 le 3 Août 2022
"I guess when it comes down to it, I am unfamiliar with naming and utlizing variables within a loop."
That is very unlikely to be a good approach, unless you particularly want to force yourself into writing slow, complex, inefficient, insecure code that is buggy and hard to debug:
So far you have no given any reason why you cannot simply access the data using the standard approaches, e.g. very simple and efficient indexing (into the cell array) or using slightly more complex dynamic fieldnames (into the structure).
Note that splitting up the data like that is unlikely to be a good approach anyway, most likely you should be using the inbuilt methods for processing groups of data:
Also worth reading: https://xyproblem.info/
If what a gather you're after is so, you've making a very typical error in approaching calculations by grouping variables by trying to physically segregate the original into multiple other data variables. As you've discovered, this generally leads to just adding complexity rather than simplifying -- use grouping variables instead with the splitapply workflow -- or rowfun can often deal with the issue directly or even groupsummary may do it all almost automagically depending upon just what it is you're actually after.
I tried to recreate the code in a simpler manner but couldn't recreate my issue, and I can't post the original script due to confidentiality in my work.
What I will do which I think will help explain my ultimate goal is post a sample Excel sheet that will show what I want to do.
I'll add an edit in the original post that explains it more.
Thank you for your reply.

Connectez-vous pour commenter.

 Réponse acceptée

dpb
dpb le 3 Août 2022
Modifié(e) : dpb le 3 Août 2022
"I want to split this table into N=4 smaller tables:..."
Again, that's the wrongheaded approach -- it can (and should) be done in two or three lines of code...
tT=readtable('Large_Table.xlsx'); % read the raw data
tT.Cat=categorical(extractBefore(tT.Var1,'.')); % create the grouping variable
tS=groupsummary(tT,'Cat',"sum",{'Var2','Var3','Var4'});
The above returns
>> tS
tS =
4×5 table
Cat GroupCount sum_Var2 sum_Var3 sum_Var4
___ __________ ________ ________ ________
J 51 464.1 634.95 1943.1
K 29 611.9 883.05 2612.9
L 21 600.6 875.7 2574.6
M 77 3334.1 4908.8 14345
>>
There's a mismatch between the description you provided for the file and the actual number of variables contained in it, I just did the ones that were finite at the beginning to illustrate; you can select whichever variables want/need.
NB: you can also select variables in a table with variables contaning a given name or by column number or variable type or... so you don't even have to have fixed column variable names as above.
Again, READ THE DOC!!! on table and the section on grouping variables and the splitapply workflow although as shown, most problems can be solved without that extra baggage.

6 commentaires

And, if there really were a real reason to split the data (not been illustrated as to real need yet), then use groupfilter
While you're reading about table, punch the link 'Functions' and look over all the features associated with them -- you'll see how TMW has already provided you with the tools to utilize the table; use them, don't try to reinvent them.
And, that's just scratching the surface...
>> tS=groupsummary(tT,'Cat',{'sum','mean','std'},{'Var2','Var4'})
tS =
4×8 table
Cat GroupCount sum_Var2 mean_Var2 std_Var2 sum_Var4 mean_Var4 std_Var4
___ __________ ________ _________ ________ ________ _________ ________
J 51 464.1 9.1 4.4598 1943.1 38.1 19.326
K 29 611.9 21.1 2.5544 2612.9 90.1 11.069
L 21 600.6 28.6 1.8615 2574.6 122.6 8.0663
M 77 3334.1 43.3 6.7116 14345 186.3 29.083
>>
@dpb: note that the OP's uploaded file contains spurious text (a dot) in cell F3. If using the default READTABLE settings (as shown in your answer) the first two rows are not imported:
Torg = readtable('Large_Table_original.xlsx')
Torg = 178×6 table
Var1 Var2 Var3 Var4 Var5 Var6 ________ ____ ____ ____ ____ __________ {'J.3' } 1.6 1.2 5.6 NaN {'.' } {'J.4' } 1.9 1.65 6.9 NaN {0×0 char} {'J.5' } 2.2 2.1 8.2 NaN {0×0 char} {'J.6' } 2.5 2.55 9.5 NaN {0×0 char} {'J.7' } 2.8 3 10.8 NaN {0×0 char} {'J.8' } 3.1 3.45 12.1 NaN {0×0 char} {'J.9' } 3.4 3.9 13.4 NaN {0×0 char} {'J.10'} 3.7 4.35 14.7 NaN {0×0 char} {'J.11'} 4 4.8 16 NaN {0×0 char} {'J.12'} 4.3 5.25 17.3 NaN {0×0 char} {'J.13'} 4.6 5.7 18.6 NaN {0×0 char} {'J.14'} 4.9 6.15 19.9 NaN {0×0 char} {'J.15'} 5.2 6.6 21.2 NaN {0×0 char} {'J.16'} 5.5 7.05 22.5 NaN {0×0 char} {'J.17'} 5.8 7.5 23.8 NaN {0×0 char} {'J.18'} 6.1 7.95 25.1 NaN {0×0 char}
Deleting the text from F3 returns all rows:
Tfix = readtable('Large_Table_fixed.xlsx')
Tfix = 180×4 table
Var1 Var2 Var3 Var4 ________ ____ ____ ____ {'J.1' } 1 0.3 3 {'J.2' } 1.3 0.75 4.3 {'J.3' } 1.6 1.2 5.6 {'J.4' } 1.9 1.65 6.9 {'J.5' } 2.2 2.1 8.2 {'J.6' } 2.5 2.55 9.5 {'J.7' } 2.8 3 10.8 {'J.8' } 3.1 3.45 12.1 {'J.9' } 3.4 3.9 13.4 {'J.10'} 3.7 4.35 14.7 {'J.11'} 4 4.8 16 {'J.12'} 4.3 5.25 17.3 {'J.13'} 4.6 5.7 18.6 {'J.14'} 4.9 6.15 19.9 {'J.15'} 5.2 6.6 21.2 {'J.16'} 5.5 7.05 22.5
ERRATUM:
The original file is funky; with resaving only first four columns, the resutls are same as @Stephen23 shows. Since this is just a test file should be ok, but you may/may not see similar symptoms depending on the release of Excel and MATLAB you're using; it didn't seem to hit @Stephen23
>> tS=groupsummary(tT,'Cat','sum',{'Var2','Var3','Var4'})
tS =
4×5 table
Cat GroupCount fun1_Var2 fun1_Var3 fun1_Var4
___ __________ _________ _________ _________
J 53 466.4 636 1950.4
K 29 611.9 883.05 2612.9
L 21 600.6 875.7 2574.6
M 77 3334.1 4908.8 14345
>>
Thank you very much, that is exactly what I need.
I'll read more about that groupsummary command.
What's the old saying: the simplest solution tends to be the correct solution?
What was accomplished in one line of code I did in about 20 lines.
It's nice to have this community to help teach shortcuts.
Moral is to spend more time up front reading/studying the doc and the example usages provided -- there's a LOT of stuff implemented but unless one goes looking and invests some time, won't find or know about it.
The time spent will pay great dividends on the investment...

Connectez-vous pour commenter.

Plus de réponses (1)

So if I understand what you want to do correctly, you have a vector of identifiers and an array or a table of data:
rng default
ID = char(randi(double('AE'), 10, 1));
data = randi([-10 10], 10, 1);
T = table(ID, data)
T = 10×2 table
ID data __ ____ E -7 E 10 A 10 E 0 D 6 A -8 B -2 C 9 E 6 E 10
and you want to sum up all the rows in data corresponding to the identifier A, all the rows corresponding to identifier B, etc.? If so you can use groupsummary to sum (@sum) the values in the data variable in T, grouped by values in the ID variable in T.
result = groupsummary(T, 'ID', @sum)
result = 5×3 table
ID GroupCount fun1_data __ __________ _________ A 2 2 B 1 -2 C 1 9 D 1 6 E 5 19
In T, two rows have A in the ID variable: rows 3 and 6. The sum of the values in rows 3 and 6 of the data variable is 10 + (-8) or 2. B, C, and D only appear once in T so they're pretty straightforward. The rows in the data variable corresponding to E in the ID variable are -7, 10, 0, 6, and 10 and if you add those 5 numbers together you get a value of 19.

1 commentaire

Not quite my goal, but thank you.
I'm going to post a sample Excel sheet that should help explain it.

Connectez-vous pour commenter.

Catégories

Produits

Version

R2021a

Question posée :

Jon
le 3 Août 2022

Commenté :

dpb
le 5 Août 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by