How to build a structure that is easier to work with (i.e. for looping through and adding to)
Afficher commentaires plus anciens
I have an app I am writing and what I want it to do is build a structure that will be filled with test data for various things. There will be evaulation data and validation data for battery cells and within each of these, a list of cells and for each cell a list of months and for each month, there is some data which I want as a table. ( I think).
So the full thing looks like this:
structure.Evaluation.Cell_1.Month_1.RawData
Now I've come to realise that whilst this looks nice as you're interacting with the structure in workspace. Its horrible for wanting to loop through because I need to have a way of generating the "Cell_1" and "Month_1", then next loop "Cell_1" and "Month_2"..... etc for each cell. So at the moment I am leaning heavily on "eval" to do this which just feels wrong.
So I think I want it to be more like:
structure.Evaluation.Cells.Months.RawData
So then indexing becomes the easy way to just loop through all the bits. But I am struggling with how this would look.
The RawData is a table and it could be any size but will usually have 8-10 columns and 1000's of rows. Each month has its own table of raw data. There are multiple months of data for 1 cell and then multiple cells. I can't visualise how this would look if I didn't use the first method which is effectively adaptively naming my variables. Which I know is a bit of a no-no.
Can I have the raw data table held in like 1 cell? so "Months(1) = a cell or block containing a 10 x 10,000 data table"?
then going 1 up to "Cells" there would be a cell for each month.
I am sorry if this is poorly explained. I can't really get my head round it. I have attached an example of the structure as it is now.
5 commentaires
dpb
le 4 Sep 2023
It's lunch time and that means "now!" so not enough time to dig into the struct at the moment, but I think it would likely help if you also attached a typical dataset or two so folks can see the starting point as well.
You can programmatically reference struct fieldnames; the question on organization depends greatly also on what is to be done with the data after you have it stored -- is it across all the cells or just within a cell across time, for example? One presumes likely it would be global statistics one would be interested in?
Bruno Luong
le 4 Sep 2023
Déplacé(e) : Bruno Luong
le 4 Sep 2023
The first obvious thing NOT to do is having fields
month_0
% ...
month_5
Create a array 1 x 6 of structures months. if the index 0, .. 5 is important add to the structure months the field "index" that contains scalar integer number in (0:5)
After that fields like MonthCount, CellCount are reduntant and unnecessary. Use size, length, numel, etc... to figure out how many of the structs you have.
"So at the moment I am leaning heavily on "eval" to do this which just feels wrong."
Using EVAL for that is very wrong, you should be using dynamic fieldnames:
But forcing numbers into the fieldnames of a scalar structure like that and also the rest of your rather confusing question both indicate that you should consider using structure arrays:
I just took a look at the MAT file you uploaded: rather than lots of nested scalar structures and a separate count like this:

you should be using a simple 1x6 structure array. Then use NUMEL to loop over them.
Also: get rid of RawData (structures with only one field are inefficient and pointless).
Also: flatten your data a much as possible. For example, if you have N EVALUATIONDATA cells which each have six months of data, then you can easily skip a few layers of nested structures by using a 6xN cell array (i.e. use arrays better!). Flattent, flatten, flatten... such deep nesting is practically ununsable, unless you love lots of nested loops.
As Steven Lord wrote, one simple (time)table is probably the best.
Alex Mason
le 5 Sep 2023
Stephen23
le 5 Sep 2023
"My concern would be how big the table gets but I assume Matlab is OK with the potential for millions of rows?"
MATLAB has no problem with this, it depends more on your available computer memory.
Another option would be to use a datastore / tall arrays:
Réponse acceptée
Plus de réponses (2)
Steven Lord
le 4 Sep 2023
0 votes
I'd probably store this either as a timetable (with the date and time data stored as the RowTimes, and as many data variables as you need) or as a table with multiple colums for your cell and month data. Then you could use logical indexing into the rows of the tabular array (either using matches or startsWith on the column containing your month "names" or using the month function on the RowTimes and selecting the appropriate month numbers.
1 commentaire
Alex Mason
le 5 Sep 2023
Bruno Luong
le 5 Sep 2023
Modifié(e) : Bruno Luong
le 5 Sep 2023
If you want to organize as a single giant table.
IMO if you don't need to mix part of the tables, you should not do this way. Keep array of tables as my other solution is better.
load('structure.mat');
NewDataStruct = struct('DataInfo', shareData.DataInfo, ...
'Data', ConvertRawData2SingleTable(shareData, struct()))
function DataRecord = ConvertRawData2SingleTable(s, info)
f = fieldnames(s);
DataRecord = [];
for k=1:length(f)
fk = f{k};
Tmp = [];
switch fk
case 'RawData'
T = s.(fk);
infof = fieldnames(info);
for j=1:length(infof)
T.(infof{j})(:) = info.(infof{j});
end
Tmp = T;
case {'EvaluationData', 'ValidationData'}
info.Type = string(fk);
otherwise
N = regexp(fk,'Month_(\d+)|Cell_(\d+)', 'tokens', 'once');
if ~isempty(N)
N = str2double(N{1});
fbase = fk(1:find(fk=='_',1)-1);
info.(fbase) = N;
end
end
if isstruct(s.(fk))
Tmp = ConvertRawData2SingleTable(s.(fk), info);
end
if ~isempty(Tmp)
if isempty(DataRecord)
DataRecord = Tmp;
else
DataRecord = [DataRecord; Tmp]; %#ok
end
end
end
end
2 commentaires
Bruno Luong
le 5 Sep 2023
Modifié(e) : Bruno Luong
le 6 Sep 2023
Note that how the extra memory required by single table storage after conversion
>> whos
Name Size Bytes Class Attributes
NewDataStruct 1x1 362504847 struct
shareData 1x1 165267346 struct
dpb
le 6 Sep 2023
IF were to go to timetable, use the datetime for the date rather than augmenting with a month/day extra columns; use lookup within it for time selection to process; retime might be of use.
In a table, the extra memory is compensated for by the handy nature of rowfun and grouping variables to do all kinds of magical analyses in very few lines of code -- again IF the nature of the analysis is by some set of variables.
If it's simply iterating through each dataset one at a time, not a whole lot to be gained as Bruno says...but we've no knowledge of what your end objectives are with which to guide the tools to use.
Catégories
En savoir plus sur Whos dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!