Data partitioning for Machine learning

2 vues (au cours des 30 derniers jours)
Akshita Gupta
Akshita Gupta le 30 Mar 2019
what does the warning that the training set does not contain points from all groups in partitioning the data means ? And how can it be removed.

Réponses (1)

Gagan Agarwal
Gagan Agarwal le 30 Mai 2024
Hi Akshita
The warning that the training set does not contain points from all groups in partitioning the data typically arises in scenarios where you're splitting your dataset into training and testing (or validation) sets and at least one of the splits (training, testing, or validation set) does not contain data points from all the groups or categories that are present in the original dataset.
This situation can lead to several issues, including:
  • Biased Model Training: The model may not learn to generalize well across all groups since it hasn't seen examples from each group during training.
  • Inaccurate Evaluation: The testing or validation set may not accurately represent the performance of the model across all groups if it lacks data from some of them.
The warning can be removed by cosidering the following possibilities and using the following techniques:
  1. Check for Small or Rare Groups: Look for any groups that have very few samples and consider merging them with similar groups or using oversampling techniques to increase their representation.
  2. If you're using stratified splitting, ensure that your stratification strategy accounts for the size and distribution of all groups.
  3. Implement custom logic for splitting the dataset that ensures all groups are represented in each split.
I hope it helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by