Regexp expression to handle changing format

2 vues (au cours des 30 derniers jours)
jimmy zubiate
jimmy zubiate le 6 Mar 2022
Commenté : jimmy zubiate le 9 Mar 2022
%dummy data
% t,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501
% t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501
S=fileread(filename);
myexpression = ['(?<tvar>w*,'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\w*\.*\w*),'...
'(?<HNL>\w*),'...
'(?<codeTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*,'... % <== This line handles the first line of dummy data
'(?<caprTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*\s*\d*,'... % <== This line handles the first line of dummy data
'(?<logAt>\w*\.*\w*']
parts = regexp(filtered,myexpression,'names')
The third and second to last variables (codeTm, caprTm) change formats within the data. How can I modify or add logic to accept 2 to 3 spaced values within the variable "codeTm" and 3 to 4 spaced values within variable "caprTm"???
2 spaced valued variable (000 00:00:00.00)
3 spaced valued variable (000 00 00:00:00.00) or (343 19:54:20.684 8)
4 spaced valued variable (21 343 19:54:20.684 8)
Thank you for the help. My apologies for making my expresion so complicated. Still learning the in's and out's for expression formats for regexp to read data.
  2 commentaires
Stephen23
Stephen23 le 7 Mar 2022
It is not clear why you are using regular expressions for importing this data: READTABLE et al have options for handling missing field data. Having you considered using the inbuilt data importing functions?
jimmy zubiate
jimmy zubiate le 9 Mar 2022
In the process of learning Matlab. Persued regexp function to create a structure array where I could maneuver through the values to perform analysis needed.
What I'm thinking I should pursue is prep file to remove unwanted white space, headers and other non-useful data and import as a comma space delimited file. Then I can count items inside each variable, marked by spaces and then off to the next step.
Other option is pursue fgetl function and implement logic to read useful data gracefully. I'm attaching dummy test data for your viewing. Thanks.

Connectez-vous pour commenter.

Réponses (1)

Stephen23
Stephen23 le 7 Mar 2022
Modifié(e) : Stephen23 le 7 Mar 2022
You can easily make a group optional or occur a specific number of times using any suitable quantifier, for example:
(..)? % zero or one time
(..)* % zero or more times
(..){2,4} % two to four times
etc.
However, rather than trying to match specific groups of characters I would use a simpler approach of matching sets of characters. I had to fix several other bugs in your regular expression to get this working, mostly missing backslashes and parentheses.
str = fileread('test.txt')
str =
't,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501 t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501'
rgx = ['^\s*(?<tvar>\w*),'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\d*\.?\d*),'...
'(?<HNL>\w*),'...
'(?<codeTm>[ :\w\.]*),'...
'(?<caprTm>[ :\w\.]*),'...
'(?<logAt>\d*\.?\d*)'];
parts = regexp(str,rgx,'names','lineanchors')
parts = 1×2 struct array with fields:
tvar tmCodeRdr tmCodLvl HNL codeTm caprTm logAt
parts.codeTm
ans = '000 00:00:00.00'
ans = '000 00 00:00:00.00'
But personally I would not try and reinvent the wheel for such a data file, READTABLE is much simpler:
tbl = readtable('test.txt','delimiter',',');
tbl.Properties.VariableNames = {'tvar','tmCodeRdr','tmCodLv','HNL','codeTm','caprTm','logAt'}
tbl = 2×7 table
tvar tmCodeRdr tmCodLv HNL codeTm caprTm logAt _____ _________________________ _______ _____ ______________________ _________________________ ______ {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00:00:00.00' } {'343 19:54:20.684 8' } 22.501 {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00 00:00:00.00'} {'21 343 19:54:20.684 8'} 22.501
  1 commentaire
jimmy zubiate
jimmy zubiate le 9 Mar 2022
That should work. Let me try to implement on my side and see what I get. Thanks Stephen!

Connectez-vous pour commenter.

Catégories

En savoir plus sur Characters and Strings dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by