Do Catboost in Matlab for high dimensional dataset
Afficher commentaires plus anciens
Dear friend,
Currently, I am trying various approaches to improve the performance of my model on a high dimensional spectrometry dataset for binary classification. My aim is to improve upon python's lightGBM's 0.74 AUC for this dataset. However, I am struggling to get anywhere close to this to this using the matlab packages for variable selection and stats ml modelling packages. Is there a possibility to provide Catboost for matlab or a model that would perform better than lightGBM for a high dimensional dataset (e,g, with 6000 variables spectrometry dataset) ?
Thanks,
s0810110
Réponses (1)
Shubham
le 18 Jan 2024
0 votes
Hi Tim,
There isn't a direct implementation of CatBoost for MATLAB. However, there are a few strategies you could consider to potentially improve the performance of your models on high-dimensional data in MATLAB:
Feature Selection/Reduction:
- Use MATLAB's built-in functions for feature selection, such as sequentialfs (sequential feature selection), relieff (ReliefF algorithm), or fscmrmr (Minimum Redundancy Maximum Relevance). Refer to this documentation link: https://in.mathworks.com/help/stats/sequentialfs.html
- Consider dimensionality reduction techniques like PCA (pca function) or t-SNE (tsne function) to reduce the number of variables while retaining most of the variance in the data. Refer to this documentation link: https://in.mathworks.com/help/stats/tsne.html
Ensemble Methods:
- MATLAB's Statistics and Machine Learning Toolbox offers ensemble methods such as random forests (TreeBagger or fitcensemble for classification).
- You can build an ensemble of different models and use a voting scheme to improve predictions. Refer to this documentation link: https://in.mathworks.com/help/stats/select-predictors-for-random-forests.html
Hyperparameter Optimization:
- Use bayesopt or hyperparameters functions for Bayesian optimization to fine-tune the hyperparameters of your models. Refer to this documentation link:https://in.mathworks.com/help/stats/bayesopt.html
Advanced Preprocessing:
- Normalize or standardize your data using normalize or zscore. Refer to this documentation: https://in.mathworks.com/help/matlab/ref/double.normalize.html
- Explore advanced preprocessing techniques like variable clustering or filtering methods to remove noisy features.
Deep Learning:
- For high-dimensional data, deep learning models might be effective. MATLAB's Deep Learning Toolbox provides functions and apps for designing, training, and evaluating deep neural networks. Refer to this documentation link: https://in.mathworks.com/help/deeplearning/referencelist.html?type=function&s_tid=CRUX_topnav
AUC is a good metric for binary classification problems, but you should also consider others such as accuracy, precision, recall, and F1-score for a comprehensive evaluation.
Catégories
En savoir plus sur Dimensionality Reduction and Feature Extraction dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!