How to find the exact location of a word in a string?

Question

Yunfei Zhang le 13 Fév 2016

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string

Commenté : Guillaume le 13 Fév 2016

I have a string that 'chemical engineering is a challenge for electrical engineer'. I used to use 'strfind' function to find the exact location of the word‘engineer'. However, there is a problem that word engineering is also included in my results. How can i just get the location of word 'engineer' instead of 'engineering'.

 list='chemical engineering is a challenge for electrical engineer';
 temp=findstr(list,'engineer')

The result is

temp =
      10    52

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Star Strider le 13 Fév 2016

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string#answer_209694

Ouvrir dans MATLAB Online

This regexp call will pick up only ‘engineer’:

Str = 'chemical engineering is a challenge for electrical engineer';
idxs = regexp(Str, 'engineer\>')
idxs =
    52

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Yunfei Zhang le 13 Fév 2016

Modifié(e) : Yunfei Zhang le 13 Fév 2016

Ouvrir dans MATLAB Online

Sorry for confusion. Before asking this question, i simplified the question. 'Pre' is a cell matrix containing 20 documents and each document is a long string. 'word' is a cell matrix and containing 1099 words from these 20 document after removing stopwords. What I wanted to do is to construct a 20*1099 matrix to show each word's frequency in different documents and it leaded to the problem mentioned above that 'engineer' may have higher frequency than the 'engineering' for the word dictionary. However, I think the function you suggested is the correct way to find the location of each word. After finding the correct location of words like 'enginer', I can calculate the frequency of this word and indicate it at the corresponding location using code below. Guillaume provided me with a method of building the regular expression for each word and it works. However, it is based on the sacrifice of time to achieve higher accuracy and it takes much longer time when processing a large number of articles (when 'pre' contains a large number of long strings.)

if(~isempty(temp))     
        docum(i,j)=size(temp,2);  
end

Guillaume le 13 Fév 2016

Modifié(e) : Guillaume le 13 Fév 2016

Ouvrir dans MATLAB Online

You can prebuild the regular expressions before the loops if you wish.

word = strcat(word, '\>')

Yunfei Zhang le 13 Fév 2016

Thank you! It helps a lot for controlling the processing time as i also want to do the feature selection and clustering for my data.

Connectez-vous pour commenter.

Answer 2

Guillaume le 13 Fév 2016

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string#answer_209710

Modifié(e) : Guillaume le 13 Fév 2016

Ouvrir dans MATLAB Online

Another option, since the words you're trying to match are always delimited by spaces or the end of the sentence (other punctuation marks are already embedded in the words), is to add a space to the end of each word and to the end of each sentences. That way 'engineer ' does not match 'engineering ' anymore:

tic
docum = zeros(numel(pre), numel(word));
word2 = strcat(word, {' '}); %strcat removes trailing ' ' if it's not in a cell array
pre2 = strcat(vertcat(pre{:}), {' '}); %why is your pre a cell array of 1x1 cell arrays?
for widx = 1:numel(word)
   docum(:, widx) = cellfun(@numel, strfind(pre2, word2{widx}));
end
toc

I'm not convinced it's going to be faster than regexp:

tic
docum = zeros(numel(pre), numel(word));
word2 = strcat(word, '\>'); 
pre2 =vertcat(pre{:}); %why is your pre a cell array of 1x1 cell arrays?
for widx = 1:numel(word)
   docum(:, widx) = cellfun(@numel, regexp(pre2, word2{widx}));
end
toc

In my testing they take both more or less the same time.

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Star Strider le 13 Fév 2016

@Guillaume — Thank you. I had to be away for a few minutes.

Guillaume le 13 Fév 2016

@Yunfei, what is probably having the most effect on the processing speed is that I apply the regexp or strfind to all the sentences at once. There is only one loop, looping over the individual words.

Connectez-vous pour commenter.

How to find the exact location of a word in a string?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Plus de réponses (1)

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Voir également

Catégories

Tags

Community Treasure Hunt

How to find the exact location of a word in a string?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

6 commentaires Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Plus de réponses (1)

3 commentaires Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien