why using imageDataAugmenter doesn't increase the size of my training data set ?

Question

caesar le 7 Nov 2017

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set

Réponse apportée : Guy Reading le 27 Sep 2019

I am trying to use imageDataAugmenter to increase the size of my training dataset (number of training images) but it seems like it has no effect at all. to explain : I had used simple CNN to classify an image from three categories. Each category has 200 images (120 training, 40 validation and 40 for testing). creating the imageDatastores:

*[TrainDataStore,ValDataStore,TestDatastore] = splitEachLabel(imds,0.6,0.2,'randomize'); *

training the network

*net = trainNetwork(TrainDataStore,mynet_1,options);*

so, as the number of Epochs and miniBatch are the same in all cases (5) and (60), I got 30 iterations and 6 iterations per epoch. 6 (iterations) * 60 (miniBatch)= 360 images (120 per each label).

I tried to use Data Augmentation and as follow :

 *augmenter = imageDataAugmenter('RandRotation',[0 30]);*
 *[TrainDataStore,ValDataStore,TestDatastore] = splitEachLabel(i_mds,0.6,0.2,'randomize');*

Traindatasource = augmentedImageSource([200 200 3],TrainDataStore,'DataAugmentation',augmenter);

net = trainNetwork(Traindatasource,mynet_1,options);

and again I ended up with (6) iterations per Epoch, 5 Epochs which means the total number of images is the same (360) even though it should be increased because we have a rotation property.

I don't know how the augmented data set size will be but its definately should be more than the original one. If there is something missing in my approach please let me know.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

J le 23 Mar 2018

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set#answer_311587

I am guessing that when augmentation is on, it trains "exactly" the way it trained when it was turned off, but performs a random transformation (rotation, in your case) on each training example and presents that to the network instead of the original training example. The network rarely sees the exact same training example twice, as this is improbable given that the transformations are random, and it rarely sees exact copies of your actual training examples, since the random generator needs to rotate the image (in yours case) by an amount close enough to 0 degrees that it effectively doesn't rotate the image at all when discretization is accounted for, which is possible but improbable.

This is clearly different from what you were expecting; i.e., you expected it to generate one larger set of augmented training data, then break that up into mini batches that it presents to the network over and over again each Epoch, so that more iterations per Epoch would occur and it would see the same images each Epoch.

Unfortunately, this is not addressed in the R2017b documentation at least, and I doubt it is addressed in the 2018a as well. Your question is valid, and Mathworks should probably put more resources on their NN Toolbox for documentation and functionality if they want to be long-term players here.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Xu MingJie le 8 Août 2018

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set#answer_332034

Because. the data augmentation of imageDataAugmenter function is not the traditional increase of data in memory. It is supposed that your dataset is too big to allocate themselves in memory. Therefore, the staff of matlab utilize the idea of data augmentation and fit the limited memory of computers, as reference to this website: https://ww2.mathworks.cn/help/nnet/ug/preprocess-images-for-deep-learning.html#mw_ef499675-d7a0-4e77-8741-ea5801695193.

In more details, after you configure image transformation options, the size of training dataset is always same in each epoch. However, for each iteration of training, the augmented image datastore applies a random combination of transformations to the mini-batch of training data. Thus, in each epoch, the amount of training dataset is always same, but every training images have a little bit different caused by your transformation operations such as rotation.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 3

Guy Reading le 27 Sep 2019

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set#answer_393773

For all those still reading this: there is a solution!

I was making the same assumption as you, caesar. However, given J's answer, there's a work-around. If the network rarely sees the same training example twice, given what the augmenter does, we can just increase the number of epochs in trainingOptions:

https://uk.mathworks.com/help/deeplearning/ref/trainingoptions.html

That way, although we don't present the whole dataset within one epoch, we present something like the whole dataset in N number of epochs, where N is the multiple which we assumed the augmenter multipled our sample size by. If we increase the epoch number by N, we get something like what we expected in the first place, I believe (correct me if I'm wrong!)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

why using imageDataAugmenter doesn't increase the size of my training data set ?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

why using imageDataAugmenter doesn't increase the size of my training data set ?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens