optimal hidden nodes number
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Dear friend,
I want to determine the optimal number of hidden nodes using narnet in order to predict the next'day index, i have just a question:
I found two proposition about Hmax:
1) Hmax= Hub
or
2) Hmax=floor(Hub/10)% for example, but I have not understand how we can determine the number "10"
What is the difference between these two propositions and what's the right one.
Thanks
0 commentaires
Réponse acceptée
Greg Heath
le 13 Jan 2015
Neither is always the right one. There are many ways to choose a value that works. I typically start searching with ~10 values in a range 1<= Hmin <= H <= Hmax <= Hub by trial and error. The upper bound Hub is chosen so that the number of training equations, Ntrneq, is not less than the number of unknown weights Nw. For robust designs it is desired that Hmax << Hub. That is where the empirical factor of 10 comes from. For each value in the range I usually design Ntrials = 10 candidates for a total of 100 designs. On rare occasions I have used Ntrials = 15 or 20.
I have explained this logic so many times it is ridiculous for me to say any more than search the NEWSGROUP and/or ANSWERS using any subset of the above variables. Usually
greg Hub Ntrials
is sufficient.
If there is not enough data to provide enough equations so that Ntrneq >> Nw, it is wise to use or combine an alternate approach like validation-set-stopping and/or regularization. I tend to use valstop. For the latter search on
help msereg
doc msereg
help trainbr
doc trainbr
There is recent evidence (Sorry, I lost the reference) that, for difficult designs, combining valstop and regularization can be very effective.
Hope this helps.
Thank you for formally accepting my answer
Greg
3 commentaires
Greg Heath
le 14 Jan 2015
No. Typically, designs are chosen based on the validation data performance because the training data performance tends to be optimistically biased.
However, the training bias can be mitigated somewhat by taking into account the corresponding loss in degrees of freedom. Consequently, instead of dividing SSEtrn by Ntrneq = Ntrn*O, it is divided by the number of degrees of freedom (DOF) that results after the Nw weights are estimated
Ntrndof = Ntrneq - Nw
SSEtrn = sse(ttrn-ytrn)
MSEtrn = SSEtrn/Ntrneq % = mse(ttrn-ytrn)
MSEtrna = SSEtrn/Ntrndof % = Ntrneq*MSEtrn/Ntrndof
% a ==> 'a'djusted for the loss in DOF
% DOFA ==> Degree-of-Freedom-adjusted
If you search in the NEWSGROUP and ANSWERS using greg and some of the above terms you will find many, many examples. For example, try subsets of
greg MSEtrna Ntrndof or Ndof
When searching to find the optimum number for H, I sometimes plot
MSEtrn, MSEtrna, MSEval and MSEtst vs H
The choice of H is based on minimizing MSEtrna or MSEval
Hope this helps.
Greg
P.S. if the training stops because MSEval goes through a minimum, obviously MSEval is also biased. However, I usually find this bias to be negligible. Nevertheless, if you have to be absolutely above board with client and/or research sponsor, use MSEtst which is UNBIASED and the legal prediction of performance on nontraining data. Summary statistics over multiple trials will yield performance summary statistics e.g., min, median, mean, stddev and max. Typically, I would only use the statistics of the top 10 to 30 designs.
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Interpolation dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!