The function dlgradient only returns zeroes when applied to neural net

7 vues (au cours des 30 derniers jours)

Afficher commentaires plus anciens

Steven le 8 Nov 2023

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2044732-the-function-dlgradient-only-returns-zeroes-when-applied-to-neural-net

Commenté : Steven le 9 Nov 2023

Ouvrir dans MATLAB Online

Dear all,

I am trying to replicate a machine learning method from an economics paper using the Matlab Deep Learning package.

The setup of the deep learning problem is unconventional as I need to apply the neural net multiple times hence the guides that are available are no help unfortunately.

I try to calculate the gradient of my neural net using the following code. However, the gradient is always zero hence I must be doing something wrong.

function [loss,gradients] = modelLoss(dlnet,X,T,par)
Y = forward(dlnet,normalize(X(1:5,:),par));
k1 = exp(Y(2,:));
b1 = exp(Y(3,:));
% transitions of the exogenous processes
rknext = X(1,:) * par.rho_rk + X(6,:);
rbnext = X(2,:) * par.rho_rb + X(7,:);
wnext  = X(3,:) * par.rho_w  + X(8,:);
X1 = vertcat(rknext, rbnext, k1, b1, wnext);
Y1 = forward(dlnet,normalize(X1,par));
% transitions of the exogenous processes
rknext = X(1,:) * par.rho_rk + X(9,:) ;
rbnext = X(2,:) * par.rho_rb + X(10,:);
wnext  =  X(3,:) * par.rho_w  + X(11,:);
X2 = vertcat(rknext, rbnext, k1, b1, wnext);
Y2 = forward(dlnet,normalize(X2,par));
loss = Stoch_loss(X,X1,X2,Y,Y1,Y2,T,par);
gradients = dlgradient(loss,dlnet.Learnables);
end

These are the functions that calculate the loss

function loss = Stoch_loss(X,X1,X2,Y,Y1,Y2,T,par)
 
    [R1_e1, R2_e1, R3_e1] = residuals(X,X1,Y,Y1,par);
    [R1_e2, R2_e2, R3_e2] = residuals(X,X2,Y,Y2,par);
           
    R_squared = (R1_e1 .* R1_e2) + (R2_e1 .* R2_e2) + (R3_e1 .* R3_e2);
    
    loss = l2loss(R_squared,T);
function [R1,R2,R3] = residuals(X,X1,Y,Y1,par)
    %Calculate residuals
    rk = X(1,:);
    rb = X(2,:);
    w  = X(3,:);
    k  = X(4,:);
    b  = X(5,:);
    
    c = exp(Y(1,:))+ 0.1;
    k1 = exp(Y(2,:));
    b1 = exp(Y(3,:));
    c1 = exp(Y1(1,:))+ 0.1;
    k2 = exp(Y1(2,:));
    rknext = X1(1,:);
    rbnext = X1(2,:);
    d = k1 -  par.rbar_rk.*exp(rk).*k;
    d1 = k2 - par.rbar_rk.*exp(rknext).*k1;
    
    R1 = 1 - par.beta .* (c1./c).^(-par.gamma) .*  par.rbar.*exp(rbnext);
    R2 = (w + par.rbar.*exp(rb) .* b - c - b1 - par.x0 .* abs_appr(d).^par.x1 - d);
    R3 = (1 + d .* par.x0 .* par.x1 .* abs_appr(d) .^ (par.x1-2)) - (par.beta .* (c1./c).^(-par.gamma) .* par.rbar_rk.*exp(rknext) .* (1 + d .* par.x0 .* par.x1 .* abs_appr(d1).^(par.x1-2) ));

I have already solved this problem in Python with Tensorflow hence the setup in general is correct. So it must be my Matlab application where the issue lies.

Has anyone any idea how to solve my issue?

With kind regards,

Steven

2 commentaires
Afficher AucuneMasquer Aucune

Matt J le 8 Nov 2023

However, the gradient is always zero hence I must be doing something wrong.

We don't know what "always" means. Surely there will be some cases where the gradients could all be zero, for example if there are negative activations going into all the networks ReLUs.

Steven le 9 Nov 2023