How to decrease sample size of each class in dataset?

2 vues (au cours des 30 derniers jours)
Jacob Ebilane
Jacob Ebilane le 31 Mai 2022
Commenté : dpb le 31 Mai 2022
I've been trying to create a limited dataset from the emnist byclass dataset. Its currently at 400k samples and I want to downsize it to about 120k or lower without the risk of having unequal samples for each class. Is there a way I can do so without having to manually go through each class and find out the sample size and decrease.
these are the curent classes/labels included
[0 1 2 3 4 5 6 7 8 9 10 12 16 19 22 23 28 29 31 32 38 42 46 48 49 51 54 57]
  3 commentaires
Jacob Ebilane
Jacob Ebilane le 31 Mai 2022
Would that take some time? I can probably use something to measure the frequency of each label, but my problem would be on how I could take n amounts of samples for each class, since not all of them have equal sizes and I can't really pinpoint which rows the class starts/ends.
dpb
dpb le 31 Mai 2022
"limited dataset from the emnist byclass dataset"
OK, I'll bite. What's an emnist?
I don't see how you can do anything by a class however that is determined if you can't ID the class locations in the dataset. One would presume there must either be a class variable in the dataset by observation or a grouping variable to serve the same purpose available somewhere; otherwise what purpose would having classes serve?
Need reference and/or background to have any context for the Q?
However, groupsummary over whatever that grouping variable is will give you the group counts directly.

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Logical dans Help Center et File Exchange

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by