How to decrease sample size of each class in dataset?

I've been trying to create a limited dataset from the emnist byclass dataset. Its currently at 400k samples and I want to downsize it to about 120k or lower without the risk of having unequal samples for each class. Is there a way I can do so without having to manually go through each class and find out the sample size and decrease.
these are the curent classes/labels included
[0 1 2 3 4 5 6 7 8 9 10 12 16 19 22 23 28 29 31 32 38 42 46 48 49 51 54 57]

3 commentaires

You may find how many samples are there for each class. Find out the minimum number of samples present in a class, and take these many samples for each class?
Would that take some time? I can probably use something to measure the frequency of each label, but my problem would be on how I could take n amounts of samples for each class, since not all of them have equal sizes and I can't really pinpoint which rows the class starts/ends.
dpb
dpb le 31 Mai 2022
"limited dataset from the emnist byclass dataset"
OK, I'll bite. What's an emnist?
I don't see how you can do anything by a class however that is determined if you can't ID the class locations in the dataset. One would presume there must either be a class variable in the dataset by observation or a grouping variable to serve the same purpose available somewhere; otherwise what purpose would having classes serve?
Need reference and/or background to have any context for the Q?
However, groupsummary over whatever that grouping variable is will give you the group counts directly.

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Text Analytics Toolbox dans Centre d'aide et File Exchange

Produits

Version

R2021a

Commenté :

dpb
le 31 Mai 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by