Text Analytics Toolbox seems making lots of mistakes on recognizing language and PartOfSpeech
2 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi,
My input is a list of VERY BASIC ENGLISH words shown below. I would like to find out the part of speech of them.
kid
killer
kind
king
kiss
kitchen
knee
knife
knowledge
words = {'kid','killer','kind','king','kiss','kitchen','knee','knife','knowledge'};
words = string(words);
documents = tokenizedDocument(words);
documents = addPartOfSpeechDetails(documents);
tdetails = tokenDetails(documents);
And this is where the mistakes are when I check the 'tdetails' (see below).
Why Matlab thinks these words are german (should be 'en' for 'english') and adjectives (most of them should be nouns)?
tdetails =
9×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
___________ ______________ ______________ __________ _______ ________ ____________
"kid" 1 1 1 letters de adjective
"killer" 2 1 1 letters de adjective
"kind" 3 1 1 letters de adjective
"king" 4 1 1 letters de adjective
"kiss" 5 1 1 letters de adjective
"kitchen" 6 1 1 letters de adjective
"knee" 7 1 1 letters de adjective
"knife" 8 1 1 letters de adjective
"knowledge" 9 1 1 letters de adjective
0 commentaires
Réponses (1)
Christopher Creutzig
le 9 Mar 2020
Language detection also works very much better on longer text. It is not trying to do a dictionary lookup (and several of your words are valid German, anyway), it uses statistical information of letter distribution.
Part of speech detection relies heavily on the context in a sentence.
documents = tokenizedDocument("My kid is a king");
documents = addPartOfSpeechDetails(documents);
tokenDetails(documents)
ans =
5×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
______ ______________ ______________ __________ _______ ________ ______________
"My" 1 1 1 letters en pronoun
"kid" 1 1 1 letters en noun
"is" 1 1 1 letters en auxiliary-verb
"a" 1 1 1 letters en determiner
"king" 1 1 1 letters en noun
0 commentaires
Voir également
Catégories
En savoir plus sur Text Data Preparation dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!