Extracting data from pdf files
72 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
joseph Frank
le 19 Avr 2014
Réponse apportée : Christopher Creutzig
le 27 Avr 2021
Hi,
I have around 300 pdf files with 19 pages each. I want to extract from each of them a fraction of a table on page 4 in order to build a research data set. Is i possible to do so using matlab? if so,which toolboxes and functions I need. I have matlab 2013a.
0 commentaires
Réponse acceptée
Kristian Gennaci
le 21 Avr 2014
Hi Joseph,
Have you tried using this File Exchange submission?
This seems like the most promising solution. Alternatively, if you could convert the tables to an excel spreadsheet/CSV format, they can then easily be parsed using MATLAB's Excel/CSV functions:
I'll let you know if I find any other solutions.
Best,
Kristian
0 commentaires
Plus de réponses (1)
Christopher Creutzig
le 27 Avr 2021
JFTR, since R2017b, extractFileText('filename.pdf','Pages',4) from Text Analytics Toolbox gives you the text on ("physical") page 4 of the PDF, from which you can then extract the parts you need with string operations (extractBetween, regexp, etc.).
0 commentaires
Voir également
Catégories
En savoir plus sur Startup and Shutdown dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!