Count number of words in a PDF document.
Afficher commentaires plus anciens
I want to count the number of words in a pdf. I have a pdf in Arabic and I want to know, for each word, how many times it occurs, like a histogram. For example WORK is in the pdf, so I want to know how many times did the work word occur in the pdf. I want this word to process as an image. So please help.
Réponses (1)
KSSV
le 15 Fév 2022
You can read your pdf file using:
str = extractFileText("Test.pdf"); % give your pdf name
The above will read the conent of pdf into a string. And after you can use functions like strcmp, strcmpi, strfind to check whether the given word is present in the str. Then you can get the number.
s = strsplit(str) ; % split string to words of cell array
idx = strcmpi(s,word) ; % give your word
nnz(idx) % count how many times word is present
2 commentaires
sajid khan
le 15 Fév 2022
Image Analyst
le 15 Fév 2022
Modifié(e) : Image Analyst
le 15 Fév 2022
@KSSV I didn't know about extractFileText(). Is it in the TextAnalytics Toolbox?
@sajid khan what do you mean by "I want this word to process as an image." If you can get the words directly from the data, why render the page as an image and then try to do OCR on it?
Catégories
En savoir plus sur Image Type Conversion dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!