How to read particular dates from an HTML script?
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have downloaded HTML script of of few webpages; one is uploaded for reference. My task is to extract some information from the this script. On line 851 latitude and longitude are given which I extracted using the following code:
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
fileID=fopen(filename); % fileID
open_file=textscan(fileID,'%s','%f'); % parsing the file
open_file=open_file{1,1};
lat_id=find(ismember(open_file,... % Finding the position of Latitude in text-file
'<dd>Latitude'));
long_id=find(ismember(open_file,... % Finding the position of Longitude in text-file
'Longitude'));
lat(i)=open_file(lat_id+1); % latitude
long(i)=open_file(long_id+1); % longitude
proj=open_file(long_id+3); % projection type, e.g., NAD27, NAD83
But I am not able to use similar code for reading the data in line 865, which contains the time-range of the some data. The problem is that the variable open_file do not seem to contain these values. Any suggestions will be helpful.
0 commentaires
Réponse acceptée
Walter Roberson
le 29 Jan 2018
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
S = fileread(filename);
place_info = regexp(S, 'Latitude\s+(?<lat>[^ ,]+),\s*\S+\s*Longitude\s+(?<long>\S+)\s*\S+\s*(?<proj>\w+)', 'names', 'once');
periods_info = regexp(S, '''begin_date''[^\d]*(?<begin_date>\d+-\d+(-\d+)?).*?end_date[^\d]*(?<end_date>\d+-\d+(-\d+)?).*?sites_selection_links\W*(?<stats_type>[^<]+)', 'names');
other_info = regexp(S, 'site_no=\d+">(?<stats_type>.*?)</a>.*?''begin_date''[^d]*?(?<begin_date>\d+(-\d+(-\d+)?)?).*?end_date[^\d]*?(?<end_date>\d+(-\d+(-\d+)?)?)', 'names');
combined_info = [periods_info, other_info];
Now:
place_info is a struct with fields 'lat', 'long', and 'proj' reflecting latitude, longitude, and projection. The lat and long are in the form they were stored in the file, so they may have a ° in them, corresponding to the ° symbol.
combined_info is a struct with fields begin_date, end_date, and stats_type . stats_type is the information about what the period is describing. In the sample data file those are
'Daily Statistics' 'Monthly Statistics' 'Annual Statistics' 'Current / Historical Observations' 'Peak streamflow' 'Field measurements' 'Field/Lab water-quality samples' and 'Water-Year Summary'
1 commentaire
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Web Services dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!