how to extract a list of unique words from a set of one row strings

Question

Harrison le 14 Nov 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings

Commenté : Harrison le 15 Nov 2024

Basically I have a set of 11 strings of words, and each string has no repeating words, but I need a list of every unique word in all 11 strings.

I've found that this works for one string at a time, but I can't get a list for all 11 strings this way.

A{1} = updatedDocuments(1,1)

B{1} = strjoin(unique(strtrim(strsplit(A{1}, ',')))', '')

Is it possible to index A{1} as updatedDocuments(1:11,1) or do something similar?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Madheswaran le 14 Nov 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1545194

Modifié(e) : Madheswaran le 15 Nov 2024

Ouvrir dans MATLAB Online

Hi @Harrison,

I am assuming the following:

'updatedDocuments' is an array of 'tokenizedDocument'
Each document contains text that is comma seperated and doesn't end with a comma

To get the unique words from the entire set of strings, you can follow the below approach:

% remove comma from the documents if you don't want comma to be 
% included in 'uniqeWords'
updatedDocuments = removeWords(updatedDocuments, ","); 
uniqueWords = updatedDocuments.Vocabulary;

If the 'updatedDocuments' is an cell array of char vector, you can follow the below approach:

updatedDocuments = strcat(updatedDocuments, ','); % Add comma at end of each cell
allWords = strjoin(updatedDocuments(1:11,1), ' '); % Join all words into a single string
allWords = strtrim(strsplit(allWords, ',')); % Split with comma as delimiter and trim
uniqueWords = unique(allWords); % unique words (1 x n cell where n is the number of unique words)

For more information, refer to the following documentations:

Hope this helps!

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Madheswaran le 15 Nov 2024

That is because I assumed 'updatedDocument' to be a cell array of character vectors. If 'updatedDocument' were an array of 'tokenizedDocument', resolving this issue would be straightforward. I have updated the answer by including a solution for when 'updatedDocument' is a 'tokenizedDocument', in addition to the existing explanation.

Let me know if that helps!

Harrison le 15 Nov 2024

Thats exactly right! Thank you!!

Connectez-vous pour commenter.

Answer 2

Paul le 14 Nov 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1544974

Ouvrir dans MATLAB Online

If UpdatedDocuments is a 1D cell array of chars ...

UpdatedDocuments{1} = 'one,two,three,one';
UpdatedDocuments{2} = 'one,two,three,two';
UpdatedDocuments{3} = 'one,two,three,three';
result = cellfun(@(S) strjoin(unique(strtrim(strsplit(S, ','))),','),UpdatedDocuments,'Uni',false)
result = 1x3 cell array
    {'one,three,two'}    {'one,three,two'}    {'one,three,two'}

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Paul le 15 Nov 2024

Ouvrir dans MATLAB Online

The Vocabulary property of tokenizedDocument returns the uniqew words in the array

documents = tokenizedDocument([
    "an example of a short sentence  an example of a short sentence " 
    "a second short sentence a second short sentence"]);
documents
documents = 
  2x1 tokenizedDocument:

    12 tokens: an example of a short sentence an example of a short sentence
     8 tokens: a second short sentence a second short sentence
documents.Vocabulary
ans = 1x7 string array
    "an"    "example"    "of"    "a"    "short"    "sentence"    "second"

Connectez-vous pour commenter.

how to extract a list of unique words from a set of one row strings

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Plus de réponses (1)

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

how to extract a list of unique words from a set of one row strings

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

3 commentaires Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Plus de réponses (1)

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens