Saving Each Table Within a Structure as its Own Entity
Afficher commentaires plus anciens
Hello,
I have a large table (over 10,000 rows), and each row's first column's cell has an identifier, with N identifiers total.
Eventually, within each of these N tables, I want to sum the columns from rows 3 through i.
The value for N will vary based on the imported Excel data, and the number of columns will never include columns 1 and 2 (always begin with column 3), but the value of i will also vary with the imported data.
So far, I was able to get my N tables split from the one large table.
However, these tables are stored in either a structure or a cell (I did it two different ways).
The Structure Method looks like this:
for i=1:N
Count = strcat('v',num2str(i));
Table.(Count) = Large_Table(Large_Table.Identifier==ID_String(i),:);
end
where the Large_Table.Identifier is the column of the table with the identifying information, and ID_String is a string that lists all the identifier's names.
This method saves everything to a structure, with N fields named v1, v2, ... vN., and each field is a table with the correct data, size, etc.
From here, I would like to add up each element in column 3 in table v1, each element in column 4 in table v1, all the way to each element in column i in table v1.
Then do the same for table v2, table v3, all the way to table vN.
What I need to do is be able to call each table, then sum each column up and repeat.
But what I'm mostly unsure about is how I will set up this for loop, specifically with referencing the variable name in the Boolean statement after the "for", and then having to re-define the number of columns of each of the tables (v1, v2, ..., vN).
I guess when it comes down to it, I am unfamiliar with naming and utlizing variables within a loop.
If this is confusing, please let me know and I can explain further.
For what it's worth, the Cell Method looks like this:
for i=1:N
Test{i} = Large_Table(Large_Table.Identifier==ID_String(i),:);
end
which returns a cell (1xN), with each item within the cell holding a table.
From here, the same issue I have above applies here.
Thank you.
Edit:
In the attached Excel sheet, you'll see four columns.
Column A would be the identifier. The first part of my code removed the '.' and everything to the right of it.
Column B would be the 2nd column I referenced that is not important, as far as my final solution goes.
Columns C and D are the important columns that will eventually contribute to my final answers.
In this example, N=4 (J, K, L, and M) and i=4 (for the four columns).
I want to split this table into N=4 smaller tables: One with all the J's, one with the K's, the K's and the M's.
Then, with these four tables, I want to add up everything together in Column C for the J Table, and then add up everything together in Column D for the J table.
Then I want to repeat this for the K table.
Then again for the L table and M table.
So at the end of the day, I want essentially 8 values that represent the sum of every column C and column D input for the J, K, L, and M table.
4 commentaires
dpb
le 3 Août 2022
Attach a subsection of the original table as a .mat file -- I can't make heads nor tails from the verbal description -- we need actual data to poke at to have any chance at all methinks...and it's always much easier even when it isn't complicated to have something to work with to illustrate...and it's far more likely somebody will take on the challenge if they don't have to try to invent a datafile that may (or may not) represent the actual situation besides.
"I guess when it comes down to it, I am unfamiliar with naming and utlizing variables within a loop."
That is very unlikely to be a good approach, unless you particularly want to force yourself into writing slow, complex, inefficient, insecure code that is buggy and hard to debug:
So far you have no given any reason why you cannot simply access the data using the standard approaches, e.g. very simple and efficient indexing (into the cell array) or using slightly more complex dynamic fieldnames (into the structure).
Note that splitting up the data like that is unlikely to be a good approach anyway, most likely you should be using the inbuilt methods for processing groups of data:
Also worth reading: https://xyproblem.info/
dpb
le 3 Août 2022
If what a gather you're after is so, you've making a very typical error in approaching calculations by grouping variables by trying to physically segregate the original into multiple other data variables. As you've discovered, this generally leads to just adding complexity rather than simplifying -- use grouping variables instead with the splitapply workflow -- or rowfun can often deal with the issue directly or even groupsummary may do it all almost automagically depending upon just what it is you're actually after.
Jon
le 3 Août 2022
Réponse acceptée
Plus de réponses (1)
So if I understand what you want to do correctly, you have a vector of identifiers and an array or a table of data:
rng default
ID = char(randi(double('AE'), 10, 1));
data = randi([-10 10], 10, 1);
T = table(ID, data)
and you want to sum up all the rows in data corresponding to the identifier A, all the rows corresponding to identifier B, etc.? If so you can use groupsummary to sum (@sum) the values in the data variable in T, grouped by values in the ID variable in T.
result = groupsummary(T, 'ID', @sum)
In T, two rows have A in the ID variable: rows 3 and 6. The sum of the values in rows 3 and 6 of the data variable is 10 + (-8) or 2. B, C, and D only appear once in T so they're pretty straightforward. The rows in the data variable corresponding to E in the ID variable are -7, 10, 0, 6, and 10 and if you add those 5 numbers together you get a value of 19.
1 commentaire
Jon
le 3 Août 2022
Catégories
En savoir plus sur Tables dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!