Fastest way to search files by pattern name
108 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a main folder with a lot of subfolders (thousands). I want to load files from only specific subfolders, that can be found by specific pattern in the subfolder name. Then, in each of the subfolders, there are tens of sub-subfolders, where I also have to go to only specific ones, which again can be found by a pattern in the name. To extract needed files, I have implemented two ways of doing this via dir function: 1) one line, just using the whole path with subfolders and sub-subfolders; 2) firstly, searching for all subfolders and then searching for sub-subfolders in a for loop over the subfolders. Turns out, that the latter is much faster. Could you explain why?
%first way
files = dir(fullfile(main_folder,'*_data/*_file_to_load/file1.mat'));
%second way
subfolders = dir(fullfile(main_folder,'*_data/');
files = cell(1,numel(subfolders));
for i = 1:numel(subfolders)
files{i} = dir(fullfile(subfolders(i).folder,subfolders(i).name,'*_file_to_load/file1.mat'));
end
6 commentaires
Image Analyst
le 16 Avr 2023
@Anton Baranikov did you overlook the Answer below in the official Answer section of the page? Did you only see the comments up here at the top where people are not giving answers but are asking for clarification of the question? If you saw my Answer below, then explain why it doesn't work, or let me know that it did work.
Réponse acceptée
dpb
le 17 Avr 2023
Modifié(e) : dpb
le 17 Avr 2023
As far as the original Q?, it's owing to how the underlying OS processes the dir command -- when you ask for a directory listing of a chain of subdirectories from a higher level, those aren't necessarily stored in sequence on disk in the pattern in which they appear so the dir command has to traverse the whole directory structure from the top until it gets all the way to the bottom; it also doesn't know where the match may stop so it has to do everything possibly reacheable from the very topmost location.
In the second case, you're giving it the starting point underneath the specific folder and that chain to the bottom is undoubtedly only one level deep. It's just not doing nearly as much work in the second case as must do in the first.
The fastest way will be to limit the search to as shallow a depth search as your a priori knowledge of the structure can make it. More shallow searches will virtually always beat one deep one.
2 commentaires
dpb
le 17 Avr 2023
You'll trade some coding complexity/thinking about the actual data structure for better performance this way. The one time investment may well pay off in the long run if it's a case that will occur often; particularly if you can also automate the generation of the order structure programmatically.
Plus de réponses (1)
Image Analyst
le 16 Avr 2023
Use contains to see if the pattern is in the folder or file name. Process the ones you want, and skip the ones you don't want by calling continue
if contains(thisSubFolderName, 'patternIDoNotWant')
continue % Skip to bottom of for loop
end
4 commentaires
dpb
le 17 Avr 2023
Modifié(e) : dpb
le 17 Avr 2023
Actually, contains (and friends) work same...
if contains(thisSubFolderName, 'patternIWant1') || contains(thisSubFolderName, 'patternIWant3') || contains(thisSubFolderName, 'patternIWant3')
could be written as
if contains(thisSubFolderName, {'patternIWant1','patternIWant2','patternIWant3'})
Have to be careful with contains however, that it is the comparison wanted because it matches any substring within the searched string.
Voir également
Catégories
En savoir plus sur File Operations dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!