- Process large text files to find unique words and their frequencies.
- Visually represent those word frequencies, there are thousands of unique words.
Large amount of text frequency representation visually
7 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I am working on text mining. Now i have some text files which contains millions of words. So i want to determine thier words frequncies. I have two probelms
- how to process large data in matlab for unique words findings and thier occurance for any text document(contains words in millions)
- after finding unique words and thier occurance how to represent them in circos/pi etc any graphical representation (as unique words can be in thousands)
0 commentaires
Réponses (1)
Samayochita
le 18 Juin 2025
Hi moin khan,
I understand that while working on large-scale text mining in MATLAB, the goal is to:
To efficiently process large text data in MATLAB:
Step 1: Read large files
Use memory-efficient reading using fileread or fopen and fscanf.
textData = fileread('largeTextFile.txt'); % Suitable for moderately large files
For very large files, prefer reading in chunks:
fid = fopen('largeTextFile.txt','r');
while ~feof(fid)
line = fgetl(fid);
% process line
end
fclose(fid);
Step 2: Tokenize text and clean it (optional but preferred)
Break the text into words, convert to lowercase, remove punctuation, etc.
cleanedText = lower(regexprep(textData, '[^\w\s]', '')); % remove punctuation
words = split(cleanedText); % tokenize
words = words(~cellfun('isempty',words)); % remove empty strings
Step 3: Count word frequencies
Use “unique” and “accumarray” functions OR “tabulate” function.
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
[uniqueWords, ~, idx] = unique(words);
counts = accumarray(idx, 1);
OR
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
T = tabulate(words)
Step 4: Visualize word frequencies using word cloud
Ideal to create a word cloud chart for hundreds or thousands of words.
wordcloud(uniqueWords, counts);
Please refer to the following documentation links for more information:
Hope this is helpful!
0 commentaires
Voir également
Catégories
En savoir plus sur Graph and Network Algorithms dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!