Alternatives for using EVAL to access data in multi-layered struct?

5 vues (au cours des 30 derniers jours)
Sjouke Rinsma
Sjouke Rinsma le 18 Oct 2018
Modifié(e) : Sjouke Rinsma le 19 Oct 2018
So, I have read many forum topics regarding the use of EVAL and it being bad practice, though in my situation I feel it makes my code more compact and actually more readable. Nevertheless, I was wondering whether there are any alternatives using for example indexing, though I do not yet see how I would implement that in this case.
In short; I have a Matlab class which imports data from a given folder, possibly containing a multitude of files of different formats (e.g. CSV, XLSX, or other). The data is sorted in a structure according "data.(filetype).(filename).(tab)" with 'tab' applying to e.g. Excel Workbook files but being omitted for CSV files. Each files' data is then stored into a table, textdata and headers, which adds another layer. To access specific data I use a recursive function to return the structure tree as strings like {'data.xlsx.file1.tab1'; 'data.xslx.file1.tab2'; 'data.xlsx.file1.tab1'}.
I currently use EVAL to acquire specific data such as 'variable1' from all tables, in order to avoid constantly having to determine the fieldnames and using a 3 or 4 layered for-loop, which becomes additionally cumbersome when having to include exceptions like for example missing variables. At the moment I am also pondering about how to write data back to specific fields without have to use EVAL again or some form of 'string-split-at-the-dots' and then dynamically inputting the fieldnames. But then again, maybe my whole approach for using such a multi-layered struct is already poor to begin with, so any suggestions and/or alternatives are more than welcome.

Réponse acceptée

Stephen23
Stephen23 le 19 Oct 2018
Modifié(e) : Stephen23 le 19 Oct 2018
One simple solution is to use getfield and setfield to access nested structures. Instead of returning the structure location as one character vector like this:
S = 'data.xlsx.file1.tab1'
you should return it in a cell array of char vectors, like this:
C = {'xlsx','file1','tab1'}
(this will require only a simple change to the recursive function). Then you can trivially access the data using getfield:
getfield(data,C{:})
and that is all! Here is a simple working demonstration:
>> data.xlsx.file1.tab = 1;
>> data.xlsx.file2.tab = 2;
>> data.csv.file2 = 3;
>> C = {'xlsx','file2','tab'};
>> getfield(data,C{:})
ans = 2
No ugly loops, no evil eval, no problems!
"But then again, maybe my whole approach for using such a multi-layered struct is already poor to begin with..."
Personally I am not a big fan of nested structures, and I notice that they tend to be overused by beginners wanting to reflect the minutae of how they see their data-organization. One of the main risks (which you are doing) is encoding meta-data like filenames and tab names into the code (as filednames). This is a bad way to write code: it make code complex and makes accessing that meta-data slow and buggy. Your approach is very fragile, e.g. because there are many filenames that are not valid fieldnames: what would your code do with the filename a-1.csv ? Or a.2.csv? The approach of mixing meta-data (like filenames and tab names) into data is simply flawed, and should be avoided. Meta-data is data, and it should be stored as data in it own right. Consider those example filenames: if we put them into a structure field named filename, then the code will never break depending on the name itself:
S.filename = 'a-1.2.3-4.csv'
You should consider that a table is a very powerful option and has many advantages for processing groups of data.
Personally I would probably use a single non-scalar structure, where the meta-data are simply encoded as data in fields:
data(1).type = 'xlsx'
data(1).name = 'file1'
data(1).tab = 'tab1'
data(1).data = ...
data(2).type = 'csv'
data(2).name = 'file2'
data(2).tab = [];
data(2).data = ...
This would make accessing and processing the data quite simple, and has some neat syntaxes that you will find very handy:
  2 commentaires
Philip Borghesani
Philip Borghesani le 19 Oct 2018
Modifié(e) : Philip Borghesani le 19 Oct 2018
This does work fine and is simple code however in the long run using this along with setfield will produce quite a bit slower and possibly more difficult to restructure code. If the performance is acceptable then this is a perfectly fine solution. It can also be mixed with handle use in some spots for gradual improvement.
Sjouke Rinsma
Sjouke Rinsma le 19 Oct 2018
Modifié(e) : Sjouke Rinsma le 19 Oct 2018
Yep, this is nice and intuitive, and for my application the most straightforward solution.
Since I don't know the layers of the struct beforehand, I will still use the recursive function to return the references to the structure data in text format, split the strings, create a cell array and that's it.
string = 'xlsx.file2.tab';
D = strsplit(string, '.');
getfield(data, D{:})
I do feel a little silly for not being aware of this set/get functionality for fields 8-)
Anyway, thank you both for responding!
EDIT:
@Stephen: I will also look into your suggestion using tables and see whether this is a fitting alternative.
@Philip: Some data files can indeed be quite large, so I'll keep your solution in mind in case performance becomes an issue. Thanks!

Connectez-vous pour commenter.

Plus de réponses (1)

Philip Borghesani
Philip Borghesani le 18 Oct 2018
I think this is where you went wrong: "To access specific data I use a recursive function to return the structure tree as strings like {'data.xlsx.file1.tab1'; 'data.xslx.file1.tab2'; 'data.xlsx.file1.tab1'}."
Instead store your table objects as handle objects inside the structure data. Then have the recursive function return the handle(s) to the data object(s) in a cell array or object array. Access will then be fast and there will be no need for eval to read or modify the table objects.
  1 commentaire
Sjouke Rinsma
Sjouke Rinsma le 19 Oct 2018
" Instead store your table objects as handle objects inside the structure data."
Pew, that's a brain teaser. Okay, so this sounds like a nice solution, never crossed my mind that I could create handles to the structure data. I found one example in the forum:
is the accepted answer what you are referring to? In that case let's see if I get this correct: I would need to create shortnames (or 'pointers' if you will) like e.g. 'XLSX_Fx_Px'. Then I need to input for example 'data.xlsx.file1.tab1' to the hstruct class and assign the handle to it like:
data.xlsx.file1.tab1 = hstruct(data.xlsx.file1.tab1);
XLSX_F1_P1 = data.xlsx.file1.tab1;
I would use the recursive function to apply this to all structure data and collect all shortnames in a cell array. The assignment in the first line does add a new layer resulting in 'data.xlsx.file1.tab1.DATA' (the hstruct class obj.DATA in caps to avoid confusion). Though I can definitely see the ease of use since I can now access and modify the same data using 'XLSX_F1_P1.DATA'... I'm still wondering whether this is exactly what you mean, since this does involve reorganizing my original structure.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Structures dans Help Center et File Exchange

Produits


Version

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by