Using Textscan to read Sinex file rows with variable delimiters

2 vues (au cours des 30 derniers jours)
Simon Hunter
Simon Hunter le 31 Déc 2021
Modifié(e) : Stephen23 le 31 Déc 2021
Hi Everyone,
I'm having trouble with reading a block of text from a sinex file, where the general delimiter seems to be a space but then there are two columns where this is changeable.
+SITE/ID
*CODE PT __DOMES__ T _STATION DESCRIPTION__ APPROX_LON_ APPROX_LAT_ _APP_H_
7835 A 10002S001 L Grasse, France 6 55 16.0 43 45 16.8 1322.8
7845 A 10002S002 L Grasse CDP7845 6 55 17.6 43 45 16.6 1323.3
7848 A 10077M002 L Ajaccio 8 45 45.7 41 55 38.6 97.3
7805 A 10503S001 L Metsahovi, Finland 24 23 40.3 60 13 02.3 78.2
7806 A 10503S014 L Metsahovi, Finland 24 23 40.3 60 13 01.7 74.0
7839 A 11001S002 L Graz, fixed 15 29 36.0 47 04 01.6 539.4
.
.
.
98 lines total
-SITE/ID
*
For example the station_description column has a comma in between the strings 'Grasse' and 'France', then a space between the second row 'Grasse' and 'CDP7845' and the third ends on a delimiter. I have tried using %q, collectoutput and multipledelimsasone,delims: ' ' and {' ',','} to read the lines but it seems to either stop at the first delimiter on line 1 before France, or just skip the word Ajaccio completely. The next issue is then reading the long/lat as one float effectively and I'd like to end up with two columns, one for the lat and one for long so I can use for plotting later.
The code I am currently using below: positions(2) - positions(1) is the end-start positions from my searcher that finds and points to the block within the sinex file. That works fine.
Code:
location_import = textscan(fileID(1),'%d %c %q %c %s %s',...
positions(2)-positions(1),'Delimiter',' ','MultipleDelimsAsOne',true);
Current Output:
79x1 int32 79x1 char 79x1 cell 79x1 char 79x1 cell 79x1 cell
7835 'A' 1002S001 'L' Grasse, France
6 '5' 5 '1' 6.0 43
45 '1' 6.8 '1' 322.8
7845 'A' 10002S002 'L' Grasse CDP7845
6 '5' 5 '1' 7.6 43
45 '1' 6.6 '1' 323.3
7848 'A' 10077M002 'L' Ajaccio 8
I just cant figure out the textscan command I need to get it to work, especially for the station_description formatting changes. Any help would be massively appreciated so I can learn from this.
Thanks very much!

Réponse acceptée

Stephen23
Stephen23 le 31 Déc 2021
Modifié(e) : Stephen23 le 31 Déc 2021
This is a fixed-width file, especially e.g. the presence of space characters in the location names indicates this. The header underscores also seem to be used to indicate the fieldwidths. You could probably use READTABLE's fixed-width options to import it.
"The next issue is then reading the long/lat as one float effectively and I'd like to end up with two columns, one for the lat and one for long"
The long/lat each consist of three values, not one, so you will have a total of six columns, not two. If you keep the values as text then you could have two columns, but this would make processing/plotting them harder.
opt = {'MultipleDelimsAsOne',true, 'CollectOutput',true, 'HeaderLines',2};
fmt = '%d%s%d%s%22c%f%f%f%f%f%f%f';
fid = fopen('test.txt','rt');
tmp = textscan(fid,fmt,opt{:})
tmp = 1×6 cell array
{6×1 int32} {6×1 cell} {6×1 int32} {6×1 cell} {6×22 char} {6×7 double}
fclose(fid);
format short G
tmp{:}
ans = 6×1
7835 7845 7848 7805 7806 7839
ans = 6×1 cell array
{'A'} {'A'} {'A'} {'A'} {'A'} {'A'}
ans = 6×1
10002 10002 10077 10503 10503 11001
ans = 6×1 cell array
{'S001'} {'S002'} {'M002'} {'S001'} {'S014'} {'S002'}
ans = 6×22 char array
'L Grasse, France ' 'L Grasse CDP7845 ' 'L Ajaccio ' 'L Metsahovi, Finland ' 'L Metsahovi, Finland ' 'L Graz, fixed '
ans = 6×7
1.0e+00 * 6 55 16 43 45 16.8 1322.8 6 55 17.6 43 45 16.6 1323.3 8 45 45.7 41 55 38.6 97.3 24 23 40.3 60 13 2.3 78.2 24 23 40.3 60 13 1.7 74 15 29 36 47 4 1.6 539.4
Convert Longitude and Latitude into degrees (e.g. for plotting):
format long G
lon = tmp{6}(:,1:3) * [1;1/60;1/60/60] % longitude
lon = 6×1
6.92111111111111 6.92155555555556 8.76269444444444 24.3945277777778 24.3945277777778 15.4933333333333
lat = tmp{6}(:,4:6) * [1;1/60;1/60/60] % latitude
lat = 6×1
43.7546666666667 43.7546111111111 41.9273888888889 60.2173055555556 60.2171388888889 47.0671111111111
or into seconds:
lon = tmp{6}(:,1:3) * [60*60;60;1]
lon = 6×1
1.0e+00 * 24916 24917.6 31545.7 87820.3 87820.3 55776

Plus de réponses (1)

Simon Hunter
Simon Hunter le 31 Déc 2021
Stephen,
It turns out that I had to remove the 'HeaderLines' as my start-position 'finder' returns the first line of data rather than the first header line, but it works perfectly.
Thank you very much for your help on this and I wish you a happy new year!

Catégories

En savoir plus sur Text Data Preparation dans Help Center et File Exchange

Produits


Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by