How to send a big data (loaded into datastore object) to a classifier in Matlab?

1 vue (au cours des 30 derniers jours)
Sara Salimi
Sara Salimi le 29 Mai 2017
Commenté : Sara Salimi le 5 Juin 2017
this is my first experince working with data storages in `Matlab`. I hoping I can get some guidance here. I have a big data that I have saved features and corresponding labels of each rows into two `txt` file: one is `data.txt` and one is `label.txt`. Each file has `264e6 rows`. I did the following steps:
%creating datastore objects
datafile='data.txt';
ds=datastore(datafile,'TreatAsMissing','NA');
labelfile='label.txt';
ds_lbl=datastore(labelfile,'TreatAsMissing','NA');
After sending to classifier, I am facing the following error:
Mdl=fitcnb(read(ds),read(ds_lbl));
Error using classreg.learning.FullClassificationRegressionModel.prepareDataCR (line 201)
X and Y do not have the same number of observations.
Error in classreg.learning.classif.FullClassificationModel.prepareData (line 487)
classreg.learning.FullClassificationRegressionModel.prepareDataCR(...
Error in ClassificationNaiveBayes.prepareData (line 143)
prepareData@classreg.learning.classif.FullClassificationModel(X,Y,varargin{:},'OrdinalIsCategorical',true);
Error in classreg.learning.FitTemplate/fit (line 213)
this.PrepareData(X,Y,this.BaseFitObjectArgs{:});
Error in ClassificationNaiveBayes.fit (line 132)
this = fit(temp,X,Y);
Error in fitcnb (line 307)
this = ClassificationNaiveBayes.fit(X,Y,RemainingArgs{:});
With predefined `Readsize`, which is `20000` the classifier works. But even whenever I change the Readsize to `1e6`, it is showing the same error. The other point is that with predefined readsize, classifier is only able to classify `20000` records, while I have `264e6 rcords`.
I really appreciate if you suggest a solution. How can I send datastorage to the classifier?

Réponses (1)

Don Mathis
Don Mathis le 30 Mai 2017
I think you need to pass tall arrays or a tall table to fitcnb. See the documentation here: http://www.mathworks.com/help/stats/fitcnb.html?searchHighlight=fitcnb&s_tid=doc_srchtitle#bvnjlgv
and here:
You can get a tall table from a datastore like this:
tt = tall(ds)
  3 commentaires
Don Mathis
Don Mathis le 5 Juin 2017
Modifié(e) : Don Mathis le 5 Juin 2017
I have not tried to do this myself, but from the error message it looks like you need to create your two tall arrays from the same datastore. So you'll need to put your labels in the same datastore as your features. I guess you could concatenate your two txt files "side by side", and then create your single datastore. After that, I think you would create a single tall array from that datastore, and then pass the 'features' columns of that as X and the 'label' column as Y, using the syntax fitcnb(X,Y).
Sara Salimi
Sara Salimi le 5 Juin 2017
Thank you so much sir for mentioning this. I really appreciate it. Many thanks once again for all your helps and supports.

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by