- Every word ends with a space
- Every line ending has a carriage return and line feed
How can I get the word count of each line from an extracted PDF file
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi, I extracted text from a PDF file with many lines/entries of comments. I want to get the word count of each line, the average word count all lines, and the number of lines that only has one word. Is this possible..? Thanks!!
0 commentaires
Réponses (1)
Kiran Felix Robert
le 2 Fév 2021
Hi Yao,
I assume that you have extracted the text from a pdf file which is saved as a string variable. You can convert the string to a character array (convertStringsToChars) and count the words and lines.
Assume that
Using the built-in MATLAB example, the following program gives you the total line count and word count in the section of the file.
str = extractFileText("exampleSonnets.pdf");
ii = strfind(str,"II");
iii = strfind(str,"III");
start = ii(1);
fin = iii(1);
stringText = extractBetween(str,start,fin-1);
B = convertStringsToChars(stringText);
% Define the space character and end-of-line character
SpaceCharacter = B(3);
CarraigeReturnCharacter = B(4);
lineCount = 0;
wordCount = 0;
i = 1;
while i <= length(B)
if B(i) == CarraigeReturnCharacter
lineCount = lineCount + 1; % Total line count
end
if B(i) == SpaceCharacter
wordCount = wordCount + 1; % Total Word Count
end
i = i + 1;
end
Kiran
0 commentaires
Voir également
Catégories
En savoir plus sur Text Files dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!