How can I properly extract individual filenames, with often times similar naming?

4 vues (au cours des 30 derniers jours)
Florian
Florian le 18 Avr 2023
Commenté : dpb le 19 Avr 2023
Hi experts,
With this issue, I have a table containing a column ('Cell_Id') which carries an identifying number. This number refers to entries in my personal notes as well as to a corresponding file (e.g. "Cell_Id" = "55", corresponding to file "Data55.smrx"). For further processing, I want to extract the fullpath of the file, corresponding to the "Cell_Id" to feed it into a function.
The issue now is, that for each "Cell_Id", there are multiple conditions ("Baseline","pd + sound","pd + pupil") with respective files ("Data55.smrx","Data55-sound.smrx","Data55-pd.smrx").
The way I do it is somewhat impractical and by no means very flexible, which is why I want to ask, whether there is a better, more reliable and flexible option (especially since the Conditions are entered by hand into the database, so i try to take spelling variabilty and errors out of the equation) of extracting the files path according to the "Cell_Id"s condition in the individual iterations.
Thank you very much for your help!
Condition = {'Baseline' ;'PD + Sound' ; 'PD + Pupil';'Baseline'; 'PD + Sound' ; 'PD + Pupil'};
Cell_Id = {'55';'55';'55';'58';'58';'58'};
Folderpath = {'SomePath';'SomePath';'SomePath';'SomePath';'SomePath';'SomePath'};
data = table(Cell_Id,Condition,Folderpath);
for i = 1:size(data,1)
if regexp(lower(data.Condition(i)),'baseline') %Looks for the DataX.smrx files
filename = dir(fullfile(append(directory{2,1},"\",datafolderpath +"\","*",data.Cell_Id(i),".smrx")));
input_path = append(filename.folder,'\',filename.name);
some_func(input_path,data.Folderpath(i))
elseif regexp(lower(data.Condition(i)), 'sound') %Looks for the DataX-sound.smrx files
filename = dir(fullfile(append(directory{2,1},"\",datafolderpath +"\","*",data.Cell_Id(i),"*sound*.smrx")));
input_path = append(filename.folder,'\',filename.name);
some_func(input_path,data.Folderpath(i))
elseif regexp(lower(data.Condition(i)), 'pupil') %Looks for the DataX-pd.smrx files
filename = dir(fullfile(append(directory{2,1},"\",datafolderpath +"\","*",data.Cell_Id(i),"*pd*.smrx")));
input_path = append(filename.folder,'\',filename.name);
some_func(input_path,data.Folderpath(i))
end
end

Réponse acceptée

dpb
dpb le 18 Avr 2023
Modifié(e) : dpb le 18 Avr 2023
Putting meta-data into the file names is a large part of the issue; but if the files are something other than data files that are amenable to simply storing as files with the conditions another data field, then something of the sort is kinda' necessary. But, if forced (more or less) into that, then the better way would be to build an actual database that has all the metadat in it and stores the filename there once and forever; then just retrieve whatever file you wish by selection of the given parameters and return the filename directly.
However, you could create that database from the basics of the code below, but you could make it a little simpler since it is the number (apparently) that is the unique element; get all of those at once and then sort out the remaining. Of course, if you were religiously scrupulous in creating and using the names to meet the given description, then you don't need the dir() to find them at all; you're just looking up what you already know to be the case....unless, of course, there is a misspelling in the name in which case this may or may not work depending upon what that spelling mistake might have been; the search strings themselves may have been the corrupted part. Only trial and error and debugging will let you determine if that is or is not an issue and you'll need to fix that outside the code that tries to use them first.
Before moving on to that idea, a syntax/usage comment --
filename = dir(fullfile(append(directory{2,1},"\",datafolderpath +"\","*",data.Cell_Id(i),".smrx")));
is a misuse of the intent/functionality of fullfile; it is designed specifically for the above case to eliminate string manipulation in building directory paths. I wish it had the ability to add the extension without that need as well, but it is what it is and so far there isn't another builtin that does any more/better. As written above, fullfile is completely superfluous; you've built the full string within the call to append, there's nothing left for fullfile to do. Instead, write the above as
d=dir(fullfile(directory{2,1},datafolderpath),"*"+data.Cell_Id(i)+".smrx"));
In addition, in my thinking filename is not a good variable name for the output of dir(), it returns a directory structure array, not a filename at all. If successful, the struct will contain one or more file locations and names, granted, but not them directly. Anyways, that's kinda' a nit, but...
Condition = {'.' ;'Sound' ; 'Pupil'};
Cell_Id = {'55';'58'};
Folderpath = {'SomePath';'SomePath'};
ext=".smrx";
for i=1:numel(Cell_Id)
d=dir(fullfile(directory{2,1},Folderpath{i}),"*" + Cell_Id{i} + ext));
for C=Condition
ix=contains([d.name],C+ext);
some_func(fullfile(d(ix).folder,d(ix).name),Folderpath(i));
end
end
NOTA BENE: The match patterns are based on your sample filenames above; the "baseline" case is simply the 'DataNN.smrx' so it's the only one that matches just the extension after the number so that's the search string.
It's not at all clear why there's a need for Folderpath at all; so just carried it along for the ride...
  2 commentaires
Florian
Florian le 19 Avr 2023
Thank you very much for your help and advice!
I'm unfortunately not completely sure what you mean by the "putting meta-data in the filename is a large part of the issue" though. But, given that these data-file namings kinda' have to be this way, i think I'll resort to putting the filenames into the table/database.
Anyways, thank you, again, very much for the thorough and detailed answer!
ps: The content of "Folderpath" is not important for the question, I added the column only for completeness, because the column is called in the loop by the function some_func.
dpb
dpb le 19 Avr 2023
The various pieces of the filename are metadata -- it's not the actual data recorded, but is still data that identifies the conditions/type/etc. of the data in the file. As you are seeing, it's harder to deal with it when it's all combined into a string (and in the case of the base case, actually that it isn't there is the information instead of making the naming convention symmetric) and even more so when have to parse it from the filename itself besides.
The solution in the case of a "regular" data file is to add fields for the Case_ID, student number, etc., etc., etc., in the file itself; then you read the file and select records based on the values of the particular features desired.
When it is a special-format file of some sort so you can't actually store the metadata in the file itself, then create a master "database" file of all the ancillary features and store the file name/location in it. Then just read that file and lookup the actual files of interest from it to get to the data.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Data Type Conversion dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by