How to resolve if Validation and Testing accuracy are widely different?

6 vues (au cours des 30 derniers jours)
Sahil Bajaj
Sahil Bajaj le 4 Juil 2021
Modifié(e) : Prince Kumar le 19 Nov 2021
Dear experts,
I wrote a script in MATLAB to run my machine learning analysis (classification problem). I see a consistent but weird issue in my results (briefly I always get good/high, reproducible validation/training accuracy but my test accuracy is always too low). I checked all five tips mentioned here: https://stackoverflow.com/questions/48718663/validation-and-testing-accuracy-widely-different, but I am still unable to resolve the problem.
I would really appreciate if someone could help me in figuring out the solution.
Thanks,
Sahil

Réponses (1)

Prince Kumar
Prince Kumar le 19 Nov 2021
Modifié(e) : Prince Kumar le 19 Nov 2021
Hi Sahil Bajaj,
This generally happens when your model is learning the data instead of learning the pattern. This scenario is called 'Overfitting'.
You can try the following few things:
  • Use of regularization technique
  • Make sure each set (train, validation and test) has sufficient samples like 60%, 20%, 20% or 70%, 15%, 15% split for training, validation and test sets respectively.
  • Perform k-fold cross validation
  • Randomly shuffle the data before doing the spit, this will make sure that data distribution is nearly the same.If your data is in datastore you can use 'shuffle' function else you can use "randperm" function.

Catégories

En savoir plus sur Statistics and Machine Learning Toolbox dans Help Center et File Exchange

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by