Effacer les filtres
Effacer les filtres

Using bag of words function

3 vues (au cours des 30 derniers jours)
Ben Hischar
Ben Hischar le 31 Août 2021
Réponse apportée : DGM le 31 Août 2021
I am trying to use a bag of words function but i am getting an error.
im trying to lear using this matlab link
https://au.mathworks.com/help/textanalytics/ref/bagofwords.html
clear all
clc
filename = "SampleText.txt";
str = extractFileText(filename);
textData = split(str,newline);
documents = tokenizedDocument(textData);
bag = bagOfWords(documents);
tbl = topkwords(bag,10);
this is the error i recieve:
'extractFileText' requires Text Analytics Toolbox.
Error in Word_lengths (line 6)
str = extractFileText(filename);
thanks for the help again
  1 commentaire
DGM
DGM le 31 Août 2021
Modifié(e) : DGM le 31 Août 2021
All of these functions (except split()) require that toolbox. If you don't have that toolbox, you can't use those functions because they won't exist. There might be alternative ways to do the same thing, but since it appears that you're discarding a lot of information in the process, one has to ask what exactly is required. If all you need is the topk word list, finding a workaround may be simpler than if you wanted everything else.

Connectez-vous pour commenter.

Réponses (1)

DGM
DGM le 31 Août 2021
If you just want the word frequency table:
filename = 'SampleText.txt';
str = fileread(filename);
words = regexp(lower(str),'[a-zA-Z0-9]*','match');
uwords = unique(words).';
counts = cellfun(@(x) sum(strcmp(words,x)),uwords);
[counts idx] = sort(counts,'descend');
uwords = uwords(idx);
wordfreqtable = table(uwords(1:10),counts(1:10))
wordfreqtable = 10×2 table
Var1 Var2 _____________ ____ {'of' } 4 {'the' } 4 {'an' } 2 {'and' } 2 {'is' } 2 {'society' } 2 {'1928' } 1 {'another' } 1 {'bernays' } 1 {'conscious'} 1
This differs somewhat from the behavior of topkwords() in that it does not consider punctuation marks to be words, and it sorts equal-frequency elements alphabetically instead of by order of occurrence.

Catégories

En savoir plus sur Matrix Indexing dans Help Center et File Exchange

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by