Reduce the Size of Matrix
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I need to Reduce my Matrices(Xbool_last and Xfreq_last) , because in the 1000th step of loop(it means docx=1000) , Matlab said : out of Memory!(loop is from 1 to 5549 !! )
please look at the part of code with information in it:
%%The loop for exploring ALL the documents to create the tf-idf weight matrix
for docx = 1 : length(DBlast)
docx
for word = 1 : length(DBlast{docx})
% In docx , we search all words in docx
word_xi = DBlast{docx}{word,1} ;
for docy = 1 : length(DBlast)
% While the source words are from docx search for them in
% the rest of documents
% if word_1i found in document i(=doc) vote 1
if sum(strcmpi(DBlast{docy},word_xi)) ~= 0
ind = find(strcmpi(DBlast{docy},word_xi) ~= 0) ;
Xbool(word,docy) = 1 ;
Xfreq(word,docy) = Freqlast{docy}(ind) ;
else
% else vote 0
Xbool(word,docy) = 0 ;
Xfreq(word,docy) = 0 ;
end
end
end
Xbool_last = [Xbool_last;uint8(Xbool)];
Xfreq_last = [Xfreq_last;uint8(Xfreq)];
Xbool = [] ;
Xfreq = [] ;
end
===============================================================================
So, questions: 1- how can i Reduce the size of Xbool_last and Xfreq_last? if i need to export Matrices TO .txt file (or something else) for Using it , How can I save them? or load them?
can you say the recommended code?
2. How can I use, the output of above code in tf-idf algorithm?(if you konw),
the tf-idf code is attached
0 commentaires
Réponse acceptée
Guillaume
le 22 Nov 2014
Modifié(e) : Guillaume
le 22 Nov 2014
You're already using uint8 to store your values. There isn't a smaller type unless you start packing booleans into bits which I assume is not possible for Xfreq_last anwyay. Using bits to store boolean is bound to be slow in matlab and awkward in matlab. There's no built-in function for that.
However, your storage looks incredibly inefficient to me. Say you're processing the first word of the first document. You find it in documents 2, 10, 150, 2048, 4125 for example. For a start, instead of storing those values (which woudln't take much memory, ~20 bytes as uint32), you store a boolean array of size 1x5549 (~5549 bytes) with only a few ones. But more importantly, in document 2, you're going to be looking for the exact same word, which you'll find in the exact same documents and store that again. Why?
Why not do the storage per word, instead of document, and for each word, just store which document it's found in?
1 commentaire
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur NaNs dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!