About net.divideParaM.valRatio

26 vues (au cours des 30 derniers jours)
mike mike
mike mike le 22 Sep 2018
Commenté : mike mike le 26 Sep 2018
I know it's possible to use
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
to divide the percentage of data into inputs for training, testing and validation. Now, in a classification problem, I didn't want the validation to be too low and I set
net.divideParam.valRatio = 0/100;
In fact, the neural network seemed not to use early stopping after 6 iterations of validation; by chance, I left the other parameters unchanged and so, in the code I wrote,
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio =0/100;
net.divideParam.testRatio = 15/100;
When the sum of the percentages of data distribution between training, testing and validation did not make 100% but the neural network runs the same without giving problems and without appearing error messages. I have done other tests modifying the percentages always so that it did not do 100 % as in the following cases:
net.divideParam.trainRatio = 35/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 25/100;
or
net.divideParam.trainRatio = 35/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 65/100;
my question is how to interpret the subdivision of the dataset between validation, test and training when the sum is not 100% and/or some data is set to 0%. If, for example, I put the training data at 0%, does this mean that the network is not being trained? Or if I put the test data at 0%, does it mean that the network is not being tested? And if the data distribution is greater than 100% does that mean that the remaining % of the inputs of the dataset is not used? And if the percentage distribution of the data is greater than 100% does this mean that some input is used both for the test and also, for example, for the validation?

Réponses (2)

Greg Heath
Greg Heath le 23 Sep 2018
Modifié(e) : Greg Heath le 23 Sep 2018
1. Now, in a classification problem, I didn't want the validation to be too low and I set
net.divideParam.valRatio = 0/100;
% Your statement makes no sense: You have
eliminated the val subset!
2. In fact, the neural network seemed not to use early stopping after 6 iterations of validation;
% Of course! valRatio = 0 eliminates the val subset!
3. In fact, the neural network seemed not to use early stopping after 6 iterations of validation; by chance, I left the other parameters unchanged and so, in the code I wrote,
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 0/100;
net.divideParam.testRatio = 15/100;
% The progam will AUTOMATICALLY CHANGE the fractions to have
a unit sum. To find out what they are use
a = net.divideParam.trainRatio
b = net.divideParam.valRatio
c = net.divideParam.testRatio
4. my question is how to interpret the subdivision of the dataset between validation, test and training when the sum is not 100% and/or some data is set to 0%. If, for example, I put the training data at 0%, does this mean that the network is not being trained? Or if I put the test data at 0%, does it mean that the network is not being tested? And if the data distribution is greater than 100% does that mean that the remaining % of the inputs of the dataset is not used? And if the percentage distribution of the data is greater than 100% does this mean that some input is used both for the test and also, for example, for the validation?
See my answer to question 3.
Hope this helps.
%%Thank you for formally accepting my answer%%
Greg

mike mike
mike mike le 23 Sep 2018
Thank you greg everything is clear but I want to explain why I want to delete the validation set. I am trying to create a neural network to predict the direction of a stock index based on some indicators of technical analysis as input. I was inspired by the following work _ Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange [Yakup Kara , Melek Acar Boyacioglu, Ömer Kaan Baykan, 2010]_
you can easily find it on the internet.
In this work we talk about datasets training and datasets hold out.
In this work we talk about datasets for training and datasets for hold out. I did some research and discovered that hold-out data is synonymous with test data, so I thought I should delete the validation data and divide the data into 50% for training data and 50% for hold-out data (as the authors of the article do). If I tried to build the neural network studied in the article, implement it in Matlab with the data included in the article, with similar setting parameters but using the breakdown of the default dataset of Matlab (training, testing and validation) do not come close even remotely to the performance that are declared in the article for the training phase (I'm not importing the over fitting at the moment). If instead I divide the data into 50% training and 50% test, at least for the training phase I get very high performance data and compare them to the performance phase of the training phase of the article network. It's obvious that it's important that the net doesn't overfit and doesn't extrapolate but I want to see this in the next phase, once I understand the meaning of hold-out.
  2 commentaires
Greg Heath
Greg Heath le 26 Sep 2018
The question is:
Do you understand the purpose of the validation subset?
Greg
mike mike
mike mike le 26 Sep 2018
Yes, the validation data set is intended to avoid overfitting.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Sequence and Numeric Feature Data Workflows dans Help Center et File Exchange

Produits


Version

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by