extracting numbers after the particular string from cell array

data={'333', 'AS C37 2021 03 28 00 05 30.000000 1 -0.884071511631E-03','abvc','400 55 a','AS G17 2021 3 28 0 17 30.000000 1 0.416843065644E-03'};
For example in the above cell array, how can I extract all YYYY MM DD HH MM SS (2021 03 28 00 05 30.00 and 2021 3 28 0 17 30.0)?
The related YYYY MM DD HH MM SS values always comes after AS [A-Z][0-9][0-9] (for example, AS C37 and AS G17). So, can we define the codes for extracting these values following this rule? The original size of the data cell array is 1x400000, therefore the speed is also an important factor.

6 commentaires

Firstly, from whence cometh the cell array?
If it's being read from a file, it would look as though it could be parsed as a fixed field width file on input and save one step and probably be faster overall.
The cell array (data) comes from the above codes;
[FileName,pathname,d] = uigetfile('*.clk');
full_file_name = fullfile(pathname,FileName);
Str = fileread(full_file_name);
data = strsplit(Str, '\n');
I figured as much.
Attach the file so can see how to read it directly...or, if it is extremely long, a representative subset...
Please move Answer to Comment. -- dpb
OK, here's a part of the file -- it's nice and regular within sections so parsing won't be any real problem -- but, what, specifically, do you want/need from the file?
3.04 C M RINEX VERSION / TYPE
CCRNXC V5.3 AIUB 10-APR-21 11:26 PGM / RUN BY / DATE
Center for Orbit Determination in Europe (CODE) COMMENT
MGEX clock information for day 2021-087 COMMENT
Consistent to the middle day of the 3-day long-arc solution COMMENT
Clock information consistent with phase and C1W/C2W code data COMMENT
Satellite/receiver clock values at intervals of 30/300 sec COMMENT
High-rate (30 sec) clock interpolation based on phase data COMMENT
Product reference: DOI 10.7892/boris.75882.3 COMMENT
GPS TIME SYSTEM ID
18 LEAP SECONDS GNSS
C GPSEST V5.3 IGS14 SYS / PCVS APPLIED
E GPSEST V5.3 IGS14 SYS / PCVS APPLIED
G GPSEST V5.3 IGS14 SYS / PCVS APPLIED
J GPSEST V5.3 IGS14 SYS / PCVS APPLIED
R GPSEST V5.3 IGS14 SYS / PCVS APPLIED
C GPSEST V5.3 CODE.OSB @ ftp.aiub.unibe.ch/CODE/ SYS / DCBS APPLIED
E GPSEST V5.3 CODE.OSB @ ftp.aiub.unibe.ch/CODE/ SYS / DCBS APPLIED
G GPSEST V5.3 CODE.OSB @ ftp.aiub.unibe.ch/CODE/ SYS / DCBS APPLIED
J GPSEST V5.3 CODE.OSB @ ftp.aiub.unibe.ch/CODE/ SYS / DCBS APPLIED
R GPSEST V5.3 CODE.OSB @ ftp.aiub.unibe.ch/CODE/ SYS / DCBS APPLIED
2 AR AS # / TYPES OF DATA
COM CODE MGEX ANALYSIS CENTER
1 # OF CLK REF
BADG00RUS 12338M002 0.000000000000E+00 ANALYSIS CLK REF
134 IGb14 # OF SOLN STA / TRF
BADG00RUS 12338M002 -838282106 3865777325 4987624574SOLN STA NAME / NUM
ABMF00GLP 97103M001 2919785797 -5383744943 1774604878SOLN STA NAME / NUM
AJAC00FRA 10077M005 4696989194 723994777 4239678729SOLN STA NAME / NUM
ALIC00AUS 50137M001 -4052052783 4212835969 -2545104517SOLN STA NAME / NUM
AMU200ATA 66040M002 57569 -201376 -6359569064SOLN STA NAME / NUM
ANKR00TUR 20805M002 4121948390 2652187845 4069023877SOLN STA NAME / NUM
AREG00PER 42202M008 1942816431 -5804077156 -1796884336SOLN STA NAME / NUM
ASCG00SHN 30602M004 6121151566 -1563978954 -872615291SOLN STA NAME / NUM
ASPA00USA 50503S006 -6100260188 -996502539 -1567977179SOLN STA NAME / NUM
AUCK00NZL 50209M001 -5105681573 461563996 -3782180963SOLN STA NAME / NUM
There may well be (probably is, no undoubtedly is) code to read these files available -- they might already have a MATLAB routine, even. Have you looked for what routines are available?
I just want to extract all dates YYYY MM DD HH MM SS (such as 2021 03 28 00 05 30.000000) from this cell array.

Connectez-vous pour commenter.

 Réponse acceptée

dpb
dpb le 2 Juil 2021
Modifié(e) : dpb le 2 Juil 2021
Oh. I see I didn't look far enough down the file -- the header stuff ends at record 170; the other data starts at record 171.
tCOD=readtable('COD0MGXFIN_20210870000_01D_30S_CLK.clk','FileType','text', ...
'headerlines',170,'readvariablenames',0);
tCOD.Properties.VariableNames(3:8)={'Yr','Mn','Day','Hr','Min','Sec'};
tCOD.DateTime=datetime(tCOD{:,{'Yr','Mn','Day','Hr','Min','Sec'}});
leaves you with
>> [head(tCOD);tail(tCOD)]
ans =
16×12 table
Var1 Var2 Yr Mn Day Hr Min Sec Var9 Var10 Var11 DateTime
______ _____________ ____ __ ___ __ ___ ___ ____ ___________ __________ ____________________
{'AR'} {'BADG00RUS'} 2021 3 28 0 0 0 2 0.00044149 3.7396e-11 28-Mar-2021 00:00:00
{'AR'} {'ABMF00GLP'} 2021 3 28 0 0 0 2 -0.00024309 3.739e-11 28-Mar-2021 00:00:00
{'AR'} {'AJAC00FRA'} 2021 3 28 0 0 0 2 -0.00038427 3.7166e-11 28-Mar-2021 00:00:00
{'AR'} {'ALIC00AUS'} 2021 3 28 0 0 0 2 -2.4277e-09 3.7381e-11 28-Mar-2021 00:00:00
{'AR'} {'AMU200ATA'} 2021 3 28 0 0 0 2 -2.9659e-08 3.7474e-11 28-Mar-2021 00:00:00
{'AR'} {'ANKR00TUR'} 2021 3 28 0 0 0 2 1.9425e-08 3.7349e-11 28-Mar-2021 00:00:00
{'AR'} {'AREG00PER'} 2021 3 28 0 0 0 2 0.00046999 3.7485e-11 28-Mar-2021 00:00:00
{'AR'} {'ASCG00SHN'} 2021 3 28 0 0 0 2 -3.5686e-08 3.7378e-11 28-Mar-2021 00:00:00
{'AS'} {'R16' } 2021 3 28 23 59 30 1 -1.3127e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R17' } 2021 3 28 23 59 30 1 0.00041179 NaN 28-Mar-2021 23:59:30
{'AS'} {'R18' } 2021 3 28 23 59 30 1 7.1344e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R19' } 2021 3 28 23 59 30 1 -0.00013759 NaN 28-Mar-2021 23:59:30
{'AS'} {'R20' } 2021 3 28 23 59 30 1 -4.6221e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R21' } 2021 3 28 23 59 30 1 -0.00019777 NaN 28-Mar-2021 23:59:30
{'AS'} {'R22' } 2021 3 28 23 59 30 1 -0.00010502 NaN 28-Mar-2021 23:59:30
{'AS'} {'R24' } 2021 3 28 23 59 30 1 3.6747e-05 NaN 28-Mar-2021 23:59:30
>>
There are only two (2) variables past the time field at the end of the table instead of three (3), hence the NaN elements for Var11.
You can either scan the file for the location of the "END OF HEADER" record to find the number of headerlines to skip or the probably is sufficient data within the file header to compute where that is -- although if the COMMENTS are freeform, there may not be a fixed number of records there and so it may just take scanning the file first...
Either way, this is much simpler and straightforward than trying to parse the cell array...that's fraught with difficulty in comparison.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by