How to send a big data (loaded into datastore object) to a classifier in Matlab?
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
this is my first experince working with data storages in `Matlab`. I hoping I can get some guidance here. I have a big data that I have saved features and corresponding labels of each rows into two `txt` file: one is `data.txt` and one is `label.txt`. Each file has `264e6 rows`. I did the following steps:
%creating datastore objects
datafile='data.txt';
ds=datastore(datafile,'TreatAsMissing','NA');
labelfile='label.txt';
ds_lbl=datastore(labelfile,'TreatAsMissing','NA');
After sending to classifier, I am facing the following error:
Mdl=fitcnb(read(ds),read(ds_lbl));
Error using classreg.learning.FullClassificationRegressionModel.prepareDataCR (line 201)
X and Y do not have the same number of observations.
Error in classreg.learning.classif.FullClassificationModel.prepareData (line 487)
classreg.learning.FullClassificationRegressionModel.prepareDataCR(...
Error in ClassificationNaiveBayes.prepareData (line 143)
prepareData@classreg.learning.classif.FullClassificationModel(X,Y,varargin{:},'OrdinalIsCategorical',true);
Error in classreg.learning.FitTemplate/fit (line 213)
this.PrepareData(X,Y,this.BaseFitObjectArgs{:});
Error in ClassificationNaiveBayes.fit (line 132)
this = fit(temp,X,Y);
Error in fitcnb (line 307)
this = ClassificationNaiveBayes.fit(X,Y,RemainingArgs{:});
With predefined `Readsize`, which is `20000` the classifier works. But even whenever I change the Readsize to `1e6`, it is showing the same error. The other point is that with predefined readsize, classifier is only able to classify `20000` records, while I have `264e6 rcords`.
I really appreciate if you suggest a solution. How can I send datastorage to the classifier?
0 commentaires
Réponses (1)
Don Mathis
le 30 Mai 2017
I think you need to pass tall arrays or a tall table to fitcnb. See the documentation here: http://www.mathworks.com/help/stats/fitcnb.html?searchHighlight=fitcnb&s_tid=doc_srchtitle#bvnjlgv
and here:
You can get a tall table from a datastore like this:
tt = tall(ds)
3 commentaires
Don Mathis
le 5 Juin 2017
Modifié(e) : Don Mathis
le 5 Juin 2017
I have not tried to do this myself, but from the error message it looks like you need to create your two tall arrays from the same datastore. So you'll need to put your labels in the same datastore as your features. I guess you could concatenate your two txt files "side by side", and then create your single datastore. After that, I think you would create a single tall array from that datastore, and then pass the 'features' columns of that as X and the 'label' column as Y, using the syntax fitcnb(X,Y).
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!