- Divide your dataset carefully so that it’s not biased
- Train on each chunk
- Combine models by averaging all the predictions
TreeBagger Training, large datasets
4 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I want to train the TreeBagger Classifier with a large dataset (4 mio x 1 array). My PC runs out of memory if I try to do this in one run! Is their a chance to run the Training in a loop? I was wodering if I could first use a subsets of the training data to train the TreeBagger algorithm and update it with the missing subsets. Could I use the results of the first Training-run as some kind of prior for the next?
Thanks, Claire
0 commentaires
Réponses (1)
TED MOSBY
le 15 Nov 2024
Modifié(e) : TED MOSBY
le 18 Nov 2024
The ‘TreeBagger’ class in MATLAB does not natively support incremental learning, which means you can't directly update an existing model with new data subsets.
You can try the following methods for efficient memory usage:
Train Multiple Models on Data Subsets:
Preprocess data:
Consider down sampling or preprocessing your data before training. Feature selection, dimensionality reduction (e.g., PCA), or using a smaller, more representative subset of the data helps reduce the memory footprint.
Alternative algorithms:
If the above methods don’t work you can consider using other machine learning algorithms like XGBoost and LightGBM that can handle large datasets efficiently.
Hope this helps!
0 commentaires
Voir également
Catégories
En savoir plus sur Classification Ensembles dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!