How to read a website and download pdf files
28 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Is there a way in matlab to essentially batch read a website and download the pdfs without having the specific pdf url?
For example, on the websites I'm looking through, each page has the following type of link where you download the pdf: https://restservice.epri.com/publicdownload/000000000001013457/0/Product
I just have an excel spreadsheet of all the URLs I want to look through, not the PDF file URLs.
0 commentaires
Réponses (2)
Saffan
le 5 Avr 2023
Here is an example code snippet to extract all the URLs present in a particular webpage:
%extract entire source code of the page
html_text = webread(url);
%extracts URLs present in the source code
all_urls = regexp(html_text,'https?://[^"]+','match');
Once you have obtained the URLs of the downloadable PDFs, you can use the "websave" function to download them. Here is an example code snippet to demonstrate this:
websave('filename.pdf',pdf_url);
1 commentaire
Voss
le 23 Août 2023
URL = readtable('EPRI NMAC Repository.xlsx','Range','I2:K638'); % Load excel from specified sheet
substr = 'Nuclear Maintenance Applications Center';
selectedrow = contains(URL{:,1},substr);
pdf_url = URL{selectedrow,3};
% make valid file names from the urls:
pdf_fn = regexprep(pdf_url,'[/\\.*<>|?"]','_');
pdf_fn = strcat(pdf_fn,'.pdf');
% download the pdf files:
for ii = 1:numel(pdf_url)
websave(pdf_fn{ii},pdf_url{ii});
end
0 commentaires
Voir également
Catégories
En savoir plus sur Downloads dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!