Adding Word Frequencies in Various Text Files

Question

Mazhar Iqbal Rana le 31 Déc 2013

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/111136-adding-word-frequencies-in-various-text-files

Réponse apportée : Sarah Palfreyman le 30 Avr 2018

I am having four files i.e. Text Files. I have already calculated their individual word frequencies. Now, I wish to add their frequencies like if Kill appears 10 times in File1 then I will check this word Kill in other files too and will add to its current frequency. So, comparing all four files and making one single file in the end.

I need to add frequencies of related words I mean indexes they appear. Like if in frequencies file the word kill is on 10th index, it maybe on 100th index in other file. So total frequency of kill will be calculated. That way I need to work, In Simple, if word Kill appears 10 times in file1 and 4 times in file2, I need 14 against this Kill word and for all other words same scenario.

Right now I am doing this for words frequencies calculation of a single txt file:

fid = fopen('File.txt'); 
words = textscan(fid, '%s'); 
status = fclose(fid);

Then: Unique words and their frequencies are below calculated...…

unique_words = unique(words{1,1}); 
frequencies = zeros(numel(unique_words), 1); 
for i = 1:numel(unique_words) 
  if max(unique_words{i} ~= ' ') 
    for j = 1:numel(words{1,1}) 
      if strcmp(words{1,1}(j), unique_words{i}) 
        frequencies(i) = frequencies(i) + 1; 
      end 
    end 
  end 
end

Please guide me if anyone can help.

Thanks a lot.

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Mazhar Iqbal Rana le 31 Déc 2013

Modifié(e) : Walter Roberson le 31 Déc 2013

Ouvrir dans MATLAB Online

I am using Matlab where I am calculating unique words as well as their frequencies. Its in matrix form.

Frequencies are calculated like :

for i = 1:numel(unique_words)
    if max(unique_words{i} ~= ' ')
        for j = 1:numel(words{1,1})
            if strcmp(words{1,1}(j), unique_words{i})
                frequencies(i) = frequencies(i) + 1;                  
            end
        end
    end
end

This is for one file right.. Now word appearing in one file is suppose 10th times and in other file it appears 4 times, its frequency will be 14. I want to calculate this for each word.

dpb le 31 Déc 2013

I still don't see anything against the secondary file -- create it and store the word (or better yet, a hash function value to the word for quicker lookup) and the associated accumulated frequency for the file(s) processed.

If you have the Statistics Toolbox there's the dataset object that has a lot of the functionality builtin--it's basically just a structure with named fields with some higher-level stuff builtin so that's one way fairly simply within base Matlab.

As Walter asks, how, specifically the data is currently actually stored plus a better view of how you're doing the processing could help -- not the (rather trivial) counting of a given file, but the larger picture of how you get the files, which are processed at any given time, how to know when need another update, etc., etc., etc,. ...

To reiterate, why not the central database for the results--seems as though would simplify life significantly.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Walter Roberson le 31 Déc 2013

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/111136-adding-word-frequencies-in-various-text-files#answer_119798

You cannot calculate the joint frequencies with the information you are storing in the files.

In order to calculate the joint frequencies, you need to also store the information about which word each count corresponds to.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Sarah Palfreyman le 30 Avr 2018

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/111136-adding-word-frequencies-in-various-text-files#answer_318002

See Text Analytics Toolbox

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Adding Word Frequencies in Various Text Files

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

Adding Word Frequencies in Various Text Files

6 commentaires Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens