Effacer les filtres
Effacer les filtres

How should I fix my regular expression to parse this txt file?

4 vues (au cours des 30 derniers jours)
Munho Noh
Munho Noh le 18 Avr 2024
Commenté : Munho Noh le 19 Avr 2024
This is part of my code that reads the text file I attached and searches the file name between 'subsystems.tbl\' and '.sub' according to the given 'sub_sys (Major Role)' and 'location (Minor Role)' using regular expressions.
if ismember(sub_sys, {'spr', 'dpr', 'bum', 'reb'})
block_pattern = ['\/([^\/]+)\.', sub_sys];
elseif ismember(sub_sys, 'susp')
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : suspension','[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, {'steering', 'wheel'})
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : ', sub_sys, '[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, 'tir')
block_pattern = ['PROPERTY_FILE\s*=\s*''[^'']+\/([^\/]+)\.tir'''];
end
name_tokens = regexp(file_content, block_pattern, 'tokens', 'once', 'dotexceptnewline');
it reads well for the front suspension system (susp, spr, dpr, bum, reb, steering, wheel, tir) and returns the correct paths, but for rear suspension system, my code reads rr_susp_path = 'AA_TCAR_WHEEL_RR_22inch' instead of giving me rr_susp_path = 'AA_TCAR_SUSP_RR_RWS_230607'
It seems that my regular expression is way too broad and causing this problem. How should I fix my regular expression?

Réponse acceptée

Stephen23
Stephen23 le 18 Avr 2024
Modifié(e) : Stephen23 le 18 Avr 2024
"It seems that my regular expression is way too broad and causing this problem."
There are several locations where your regular expression matches unlimited amounts of (almost) anything:
  • [^'']+
  • [^>]+
  • [\s\S]*
I doubt that you really want unlimited matches like that.
"How should I fix my regular expression?"
Perhaps something like this:
pf1 = 'suspension';
pf2 = 'rear';
tmp = strcat('\$\s+',{'Major';'Minor'},'\s+Role\s+:\s+',{pf1;pf2},'\s+');
rgx = ['(?<=',tmp{:},'(\$.+\s+)*USAGE\s+=.+?)\w+\.sub']
rgx = '(?<=\$\s+Major\s+Role\s+:\s+suspension\s+\$\s+Minor\s+Role\s+:\s+rear\s+(\$.+\s+)*USAGE\s+=.+?)\w+\.sub'
str = fileread('test_example.txt');
out = regexp(str,rgx,'match','once','dotexceptnewline')
out = 'AA_TCAR_SUSP_RR_RWS_230607.sub'
  1 commentaire
Munho Noh
Munho Noh le 19 Avr 2024
Hello Steven, your answer is always helpful, thank you always.
I modified your answer a little bit like the following to capture only the file name except for the .sub extension.
block_pattern = ['(?<=\$\s+Major\s+Role\s+:\s+', sub_sys, '\s+\$\s+Minor\s+Role\s+:\s+', location, '\s+(\$.+\s+)*USAGE\s+=.+\/)(\w+)(?=\.sub)'];
Thank you for your good advice.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Time Series Events dans Help Center et File Exchange

Produits


Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by