How to decrease sample size of each class in dataset?

2 vues (au cours des 30 derniers jours)
Jacob Ebilane
Jacob Ebilane le 31 Mai 2022
Commenté : dpb le 31 Mai 2022
I've been trying to create a limited dataset from the emnist byclass dataset. Its currently at 400k samples and I want to downsize it to about 120k or lower without the risk of having unequal samples for each class. Is there a way I can do so without having to manually go through each class and find out the sample size and decrease.
these are the curent classes/labels included
[0 1 2 3 4 5 6 7 8 9 10 12 16 19 22 23 28 29 31 32 38 42 46 48 49 51 54 57]
  3 commentaires
Jacob Ebilane
Jacob Ebilane le 31 Mai 2022
Would that take some time? I can probably use something to measure the frequency of each label, but my problem would be on how I could take n amounts of samples for each class, since not all of them have equal sizes and I can't really pinpoint which rows the class starts/ends.
dpb
dpb le 31 Mai 2022
"limited dataset from the emnist byclass dataset"
OK, I'll bite. What's an emnist?
I don't see how you can do anything by a class however that is determined if you can't ID the class locations in the dataset. One would presume there must either be a class variable in the dataset by observation or a grouping variable to serve the same purpose available somewhere; otherwise what purpose would having classes serve?
Need reference and/or background to have any context for the Q?
However, groupsummary over whatever that grouping variable is will give you the group counts directly.

Connectez-vous pour commenter.

Réponses (0)

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by