Word Count in a PDF file
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Ahmed Alsaadi
le 20 Déc 2018
Modifié(e) : Omer Yasin Birey
le 21 Déc 2018
I have a PDF file "EHP.pdf", I want to count the total number of words in that file? This file has many sections I want to exclude the last section from the calculations. Any suggestions?
2 commentaires
Réponse acceptée
Omer Yasin Birey
le 20 Déc 2018
Modifié(e) : Omer Yasin Birey
le 21 Déc 2018
Hi Ahmed, you can use extractFileText. You must choose a starter word and a finisher word, this word must be unique. Because, counting will end when Matlab encounters this word. By this way you can count the words between the starter and finisher.
str = extractFileText("EHP.pdf");
i = strfind(str,"firstWord"); % write here the first word of your pdf
ii = strfind(str,"lastWord"); % write here the last word of your pdf, that must be distinctive
start = i(1);
fin = ii(1);
extracted = extractBetween(str,start,fin-1)
uniqueWordNumbers = wordCloudCounts(extracted);
counter = uniqueWordNumbers(:,2);
counterArray = table2array(counter);
totalWords = sum(counterArray);
3 commentaires
Omer Yasin Birey
le 20 Déc 2018
Ah, You are right Ahmed. I made a typo and also forgot a line there, try this instead:
counter = uniqueWordNumbers(:,2);
counterArray = table2array(counter);
totalWords = sum(counterArray);
add this table2array line and change the input of sum with this
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Display and Presentation dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!