Why my Neural Network (NN tool box) is not able to predit output from input

Hello,
I'm new in using NN for input/output prediction problems. I understood the theory behind NN, specially that they are a universel nonlionear function approximator. But why it does not approximate my input/output function (at least in the range of my variables)? I'm sure that there is something here that I didn't understood. When I designed the static NN I did not do any data pre-processing operation vand I used brut data as i harvested them during the measurement compaign (data in function of time and with a lot of fluctuations). I get a so big EMS and an R very close to 1 what does it mean? that there is quiet a relationship between my Input/Output but the NN couldn't get it?
Thank you in advance for your response.

6 commentaires

NOT CLEAR:
Is your problem static (ID = FD =[]) or dynamic?
If it is dynamic, is Regressive min(ID) = 0 or Predictive min(ID) > 0?
If there is a sufficient deterministic relationship between the input and output and you have enough data with a large enough signal to contamination ratio, you should be able to design a net that represents the I/O underlying relationship at least as well as a linear model.
If you have enough nodes in your hidden layer you should be able to memorize the training data I/O relationship. However, that net is not guaranteed to be useful for nontraining data unless you take overfitting/overtraining precautions.
We cannot help unless you reveal more details.
I'll try to describe my problem in detail:
I want to model a soloar collector with 7 Input (T1(t),T2(t),..,T7(t)) and 1 Output (A(t)), data were collected in function of time with a step time of 3 minutes (over 12 days). First I want to start with a simple NN which is a static one (is it true?) so I put in Input: T1(t),T2(t),..,T7(t) and Output A(t). After using the NN tool box for fitig problems and I tested a lot of number of N in hiden layer, I get bad results: very big EMS but a good correlation coefficent. Variables that I choosed for modeling are normally sufficient to model the collector In/Out relationship but I'm sure that the measurement contain some noises. Could you tell me if this architecture is sufficient or I had to change it and use a feedback NN or use a NARX NN, In this case how can I determine the 'd' number in Ti(t), Ti(t-1),..,(Ti(t-d)?
In litterature they say that before using NN we have to do some data pre-processing...This step it is included in the MATLAB NN program or I have to do it by my self?
I'm interedted in any advice about using NN efficiently !
Thank you
Still, not enough details.
What are the significant crosscorrelation lags between A(t) and each of the seven inputs Ti(t)? That will help choose an effective ID.
What are the significant autocorrelation lags of A(t)? That will help choose an effective FD.
You say
"...I get a so big EMS and an R very close to 1 "
What is an EMS? Do you mean MSE = mse(target-output) is large? If so, R will be close to 0, not 1.
So apparently, you are not getting your point across.
Qualitative statements don't mean much if they don't convey useful info.
So, you need to post relevant code with informative comments.
It would help if you ran your code on one of the nndatasets and posted the results so that we can compare
help nndatasets
Greg
PS Use as many defaults as possible when starting out.
First of all I thank you for your response. I ran my code on pollution data set and this is the code with my comments and questions:
inputSeries = pollutionInputs; targetSeries = pollutionTargets;
% Create a Nonlinear Autoregressive Network with External Input inputDelays = 1:3; % let us suppose that 3 is the optimal ID and FD...
feedbackDelays = 1:3;
hiddenLayerSize = 10;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
% If I change the processFcns using processpca and mapstd do you think that results may be better?
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};
% I did not understand this one, I did not get the function of preparets! [inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
net.divideFcn = 'dividerand'; % let us suppose that I want to use 'dividerand' and not another function!
net.divideMode = 'value'; % Divide up every value
net.divideParam.trainRatio = 80/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 5/100; % suppose that I want to test the network by myself with oanother data so I want to use the maximum of the present data to train the network it is why I put 80/100
net.trainFcn = 'trainlm'; % Levenberg-Marquardt ok
net.performFcn = 'mse'; % Mean squared error ok
net.plotFcns = {'plotperform','plottrainstate','plotresponse', ... 'ploterrcorr', 'plotinerrcorr'};
% Train the Network [net,tr] = train(net,inputs,targets,inputStates,layerStates);
% Test the Network % is this test will be done with the 5/100 data?
outputs = net(inputs,inputStates,layerStates);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
% Recalculate Training, Validation and Test Performance
% What does it mean?
trainTargets = gmultiply(targets,tr.trainMask);
valTargets = gmultiply(targets,tr.valMask);
testTargets = gmultiply(targets,tr.testMask);
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
% View the Network
view(net)
% If I understood now we have created I NN without feed-back, it is like the neural network fitting tool box with the following Input and Output:
Input:
Our 7 inputpollution variables with 1 time delay (Vi(t-1) with i from 1 to 7 and t from 1 to 508)
Our 7 inputpollution variables with 2 time delay (Vi(t-2) with i from 1 to 7 and t from 1 to 508)
Our 7 inputpollution variables with 3 time delay (Vi(t-3) with i from 1 to 7 and t from 1 to 508)
Output:
Our 3 outputpollution variables without time delay (Vi(t) with i from 1 to 3 and t from 1 to 508)
% Closed Loop Network
% Now we will add to the previous Inputs:
Our 3 outputpollution variables with 1 time delay (Vi(t-1) with i from 1 to 3 and t from 1 to 508)
Our 3 outputpollution variables with 2 time delay (Vi(t-2) with i from 1 to 3 and t from 1 to 508)
Our 3 outputpollution variables with 3 time delay (Vi(t-3) with i from 1 to 3 and t from 1 to 508)
It is true?
% Use this network to do multi-step prediction.
% The function CLOSELOOP replaces the feedback input with a direct % connection from the outout layer.
netc = closeloop(net); netc.name = [net.name ' - Closed Loop']; view(netc) [xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries); yc = netc(xc,xic,aic); closedLoopPerformance = perform(netc,tc,yc)
% Early Prediction Network
% I did not understand this too!! Could you explain it to me please? It is a third NN?
nets = removedelay(net); nets.name = [net.name ' - Predict One Step Ahead']; view(nets) [xs,xis,ais,ts] = preparets(nets,inputSeries,{},targetSeries); ys = nets(xs,xis,ais); earlyPredictPerformance = perform(nets,ts,ys)
When I ran this code I get the following results:
Training: MSE=23, R2=99.7%
Validation: MSE=40, R2=99.5%
Test : MSE=40, R2=99.6%
Now I come to the most important question for me. I search a lot but I did not find a precise response.
With this code I want to predict the Output from New Inputs. How can I do that? In nnftool it is so simple because Outpredict=net(A) give output prediction (In this case the dimension of A will be 7*XX and 3*XX fofr outpredict XX depends on available time steps). With NARX I don’t know how can I do, It seems complicated…to calculate Outpredict from A we have first to know at least Outpredict(1), Outpredict(2), Outpredict(3) isn’t it? How to do that? Can you please write to me the commands to do that?
Thank you in advance.
1. You have
net.divideMode = 'value'; % Divide up every value
which looked strange to me. The documentation:
nnproperty.net_divideMode
Neural network divideMode property.
NET.divideMode
This property defines the target data dimensions which to divide up when
the data division function net.divideFcn is called.
Its default value is 'sample' for static networks and 'time' for
dynamic networks.
It may also be set to 'sampletime' to divide targets by both sample and
timestep, 'all' to divide up targets by every scalar value, or 'none'
to not divide up data at all (in which case all data us used for
training, none for validation or testing).
I do not know what 'value' does. Compare results using that with using the default 'time'
2. You stated
Training: MSE=23, R2=99.7%
Validation: MSE=40, R2=99.5%
Test : MSE=40, R2=99.6%
How, exactly, did you calculate those numbers?
3.Please initialize the RNG with
rng(0)
directly before [ net tr ] = train(...
Then we can compare numbers.
Greg
2)
The NN tool box gives thoses results directly after the NN training.

Connectez-vous pour commenter.

 Réponse acceptée

% First of all I thank you for your response. I ran my code on pollution data set and this is the code with my comments and questions:
close all, clear all, clc %GEH1
tic
load pollution_dataset %GEH2
inputSeries = pollutionInputs;
targetSeries = pollutionTargets;
% Create a Nonlinear Autoregressive Network with External Input. Let us suppose that 3 is the optimal ID and FD...
inputDelays = 1:3;
feedbackDelays = 1:3;
hiddenLayerSize = 10;
GEH3: NO. ASSUME DELAYS ARE NONOPTIMAL. THEN EITHER ACCEPT THE RESULTS OR TRY TO IMPROVE THEM.
Will get Nw = (3*8+3*3+1)*10+(10+1)*3=373 unknown weights to be estimated from Ntrneq = 0.8*(508-3)*3=1212 training equations. No worries about overfitting.
rng(0) % GEH4: OTHERWISE HARD TO DUPLICATE RUNS
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize)
% Q1: If I change the processFcns using processpca and mapstd do you think that results may be better?
GEH5: CANNOT TELL A PRIORI. I PREFER USING ZSCORE AND LOOKING AT COEFFICIENTS OF A LINEAR MODEL VIA SLASH AND STEPWISEFIT (BACKWARD AND FORWARD)
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};
GEH6: Since these are defaults, CAN DELETE.
%Q2: I did not understand this one, I did not get the function of preparets!
[inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{}, ... targetSeries);
GEH7. Just as it's name implies: PREPARE for "T"IME "S"eries. Look at the dimensions resulting from the command
whos inputSeries targetSeries inputs inputStates layerStates targets %GEH8
net.divideFcn = 'dividerand'; % let us suppose that I want to use 'dividerand' and not another function!
GEH9: NO. CANNOT MAINTAIN UNIFORM SPACING AND RELIABLE CORRELATIONS WITH DIVIDERAND
GEH10 ; USE DIVIDEBLOCK AND/OR DIVIDEIND TO AVOID THIS UNNECESSARY UNCERTAINTY
net.divideMode = 'value'; % Divide up every value
net.divideParam.trainRatio = 80/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 5/100;
% suppose that I want to test the network by myself with another data so I want to use the maximum of the present data to train the network it is why I put 80/100
GEH11: DOESN'T MAKE SENSE TO ME. THE TEST SET IS THE ONLY PERFORMANCE RESULT THAT IS UNBIASED. IF IT IS TOO SMALL IT IS UNRELIABLE. IF IT IS UNRELIABLE YOU SHOULD STATE THE CONFIDENCE LEVEL OR AT LEAST STANDARD ERROR. SEE WIKIPEDIA FOR STATISTICS OF THE CHISQUARE DISTRIBUTION. I WOULD MAKE MULTIPLE DESIGNS TO INCREASE RELIABILITY.
net.trainFcn = 'trainlm'; % Levenberg-Marquardt ok
net.performFcn = 'mse'; % Mean squared error ok
net.plotFcns = ... {'plotperform','plottrainstate','plotresponse', 'ploterrcorr', 'plotinerrcorr'};
GEH12: SINCE THESE ARE DEFAULTS, CAN DELETE
% Train the Network
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
[net tr Ys Es] = train(net,inputs,targets,inputStates,layerStates);%GEH13
% Test the Network. is this test will be done with the 5/100 data?
GEH14: UNFORTUNATELY, YES.
GEH15: FORTUNATELY, ALL RESULTS CAN BE RETRIEVED FROM OUTPUT Ys, ERROR Es AND THE TRAINING HISTORY OUTPUT TR. THE CALCULATIONS BELOW ARE UNNECESSARY (UNLESS YOU JUST WANT TO VERIFY WHAT YOU CAN GET FROM TR )
tr = tr % NO SEMICOLON. STUDY THE RESULT. GEH16
outputs = net(inputs,inputStates,layerStates);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
GEH17:OBTAIN SCALE FREE PERFORMANCE MEASURES NMSE AND/OR COEFFICIENT OF DETERMINATION R^2 = 1-NMSE BY NORMALIZING MSE WITH MSE00 = mse(targets-mean(ttrn,2))
%Recalculate Training, Validation and Test Performance What does it mean?
GEH18: TEST PERFORMANCE IS THE ONLY UNBIASED ESTIMATE OF PERFORMANCE ON UNSEEN NONTRAINING DATA AND THEREFORE, THE ONLY A PRIORI UNBIASED ESTIMATE OF THE NET'S ABILITY TO "GENERALIZE". HOWEVER, IF tr.stop IS NOT "Validation stopping.", THE VALIDATION PERFORMANCE MIGHT BE CONSIDERED UNBIASED ALSO.
trainTargets = gmultiply(targets,tr.trainMask);
valTargets = gmultiply(targets,tr.valMask);
testTargets = gmultiply(targets,tr.testMask);
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
% View the Network
view(net)
% If I understood now we have created I NN without feed-back, it is like the neural network fitting tool box with the following Input and Output:
Input
Our 7 inputpollution variables with 1 time delay (Vi(t-1) with i from 1
to 7 and t from 1 to 508)
Our 7 inputpollution variables with 2 time delay (Vi(t-2) with i from 1
to 7 and t from 1 to 508)
Our 7 inputpollution variables with 3 time delay (Vi(t-3) with i from 1
to 7 and t from 1 to 508)
Output:
Our 3 outputpollution variables without time delay (Vi(t) with i from 1
to 3 and t from 1 to 508)
% Closed Loop Network
% % Now we will add to the previous Inputs:
Our 3 outputpollution variables with 1 time delay (Vi(t-1) with i from 1
to 3 and t from 1 to 508)
Our 3 outputpollution variables with 2 time delay (Vi(t-2) with i from 1
to 3 and t from 1 to 508)
Our 3 outputpollution variables with 3 time delay (Vi(t-3) with i from 1
to 3 and t from 1 to 508)
% It is true?
GEH19: INCORRECT. IF YOU COMBINE THE CELL2MAT AND WHO COMMANDS, YOU CAN SEE, EXACTLY, HOW THE INFO IS FED TO THE OPENLOOP NET
% Use this network to do multi-step prediction.
% The function CLOSELOOP replaces the feedback input with a direct connection from the outout layer.
netc = closeloop(net);
netc.name = [net.name ' - Closed Loop'];
view(netc)
[xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries);
yc = netc(xc,xic,aic);
closedLoopPerformance = perform(netc,tc,yc)
% Early Prediction Network
% I did not understand this too!! Could you explain it to me please? It is a third NN?
% GEH20: NOT FAMILIAR WITH THIS. MUST RESPOND LATER. HOWEVER, USING WHOS AFTER PREPARETS USUALLY HELPS UNDERSTANDING
nets = removedelay(net);
nets.name = [net.name ' - Predict One Step Ahead'];
view(nets)
[xs,xis,ais,ts] = preparets(nets,inputSeries,{},targetSeries);
ys = nets(xs,xis,ais);
earlyPredictPerformance = perform(nets,ts,ys)
% When I ran this code I get the following results:
Training: MSE=23, R2=99.7%
Validation: MSE=40, R2=99.5%
Test : MSE=40, R2=99.6%
GEH21:THESE RESULTS ARE SUSPICIOUSLY GOOD. MY IMMEDIATE REACTIONS ARE:
1.WHY SO GOOD IF DIVIDERAND IS CORRUPTS CORRELATIONS?
2. WHAT IS THE MINIMUM ALLOWABLE VALUE OF H?
3. WHAT ARE THE OPTIMAL VALUES FOR ID AND FD?
4. NEED MORE DESIGNS WITH DIFFERENT RNG SETTINGS TO ESTIMATE CONFIDENCE LEVELS FOR R2tst
% Now I come to the most important question for me. I search a lot but I did not find a precise response.
% With this code I want to predict the Output from New Inputs. How can I do that? In nnftool it is so simple because Outpredict=net(A) give output prediction (In this case the dimension of A will be 7*XX and 3*XX fofr outpredict XX depends on available time steps). With NARX I don’t know how can I do, It seems complicated…to calculate Outpredict from A we have first to know at least Outpredict(1), Outpredict(2), Outpredict(3) isn’t it? How to do that? Can you please write to me the commands to do that?
% Thank you in advance.
GEH22: Excellent question. With the current design, my best answer is to use average values from the original data.
GEH23: HOWEVER, If you anticipate such future scenarios, then make the most robust design that you can that meets your performance objective. I would try to maximize the estimation degees of freedom
Ndof = Ntrneq - Nw
= Ntrn*O - [ (NID*I+NFD*O)*H - (H+1)*O ]
= (Ntrn-1)*O - (NID*I+NFD*O)*H
[ I Ntrn ] = size(xtrn)
[ O Ntrn ] = size(ttrn)
H = No. of hidden nodes
NID = numel(ID)
NFD = numel(FD)
This will take many repetitions and changes in ID, FD, H, and rng(:).
In addition, I would investigate using average values to initialize the delay buffer.
Hope this helps.
Thank you for formally accepting my answer
Greg

1 commentaire

Why is the R value (obtained through plotregression function from tr using indices ) for training, validation and testing is different from the Regression plot that is obtained through the pop up window?

Connectez-vous pour commenter.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by