Levenberg Marquardt algortihm as custom training function using dlupdate

Question

1 vote

Hi togehter,

I'm trying to implement the levenberg-marquardt algortihm in matlab with dlupdate as shown in the example (Use dlupdate to Train Network Using Custom Update Function) (https://de.mathworks.com/help/deeplearning/ref/dlupdate.html). The biggest challange is to calculate the jacobian matrix.

In the new deeplearning toolbox there are just the algorithm: sgdm, rmsprop, adam. But the levenberg-marquardt is not implemented.

Is there a easy way to calculate the jacobian matrix with dlgradient?

This is my code right now. It works somehow but it´s very slow and i am not sure if it´s correct.

clc;
clear;
%% Random data
XTrain = rand(15,1000)*0.1;
XTrain = XTrain-5;
TTrain = XTrain(1,:).^2 + XTrain(2,:).^2+ XTrain(3,:).^2+ XTrain(4,:).^2+ XTrain(5,:).^2+ XTrain(6,:).^2+ XTrain(7,:).^2+ XTrain(8,:).^2+...
     XTrain(9,:).^2+ XTrain(10,:).^2+ XTrain(11,:).^2+ XTrain(12,:).^2+ XTrain(13,:).^2+ XTrain(14,:).^2+ XTrain(15,:).^2;
TTrain = TTrain/10;
%% Define Network
layers = [
featureInputLayer(15)
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(1)
functionLayer(@(x) x)
];
net = dlnetwork(layers);
%% Training Options
miniBatchSize =  128;
numEpochs = 2;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
XTrain = dlarray(XTrain, 'CB');
TTrain = dlarray(TTrain, 'CB');
%% Train Network
for epoch = 1:numEpochs
    % Shuffle data.
    idx = randperm(numel(TTrain));
    XTrain = XTrain(:,idx);
    TTrain = TTrain(idx);
    for iteration = 1:numIterationsPerEpoch
        % Get a batch of data.
        indices = (iteration-1)*miniBatchSize+1:iteration*miniBatchSize;
        XBatch = XTrain(:,indices);
        TBatch = TTrain(:,indices);
        [loss, J,e] = dlfeval(@modelLoss,net,XBatch,TBatch);
        e = extractdata(e);
        updateFcn = @(net,J) lmFunction(net,J, e);
        net = dlupdate(updateFcn,net,J);
        % Report the loss
        fprintf('Loss: %f\n', extractdata(loss));
    end
end
Loss: 646.412170
Loss: 7.421673
Loss: 11.123130
Loss: 7.209948
Loss: 5.831234
Loss: 1.093452
Loss: 0.242863
Loss: 0.870988
Loss: 0.398049
Loss: 0.738563
Loss: 0.304589
Loss: 0.171688
Loss: 0.072695
Loss: 0.107126
function [loss,J,e] = modelLoss(net,X,T)
    Y = forward(net,X);
    loss = mse(Y,T)/size(T,1);
    e = (Y-T).^2;
    % compute jacobian matrix
    J = dlgradient(e(1),net.Learnables);
    for j = 1:size(J.Layer,1)                   %iteration thru weitghts and biases of the layers
        layergrad = nan(numel(net.Learnables{:,"Value"}{j}),length(X));
        for i=1:size(X,2)                       
            grad = dlgradient(e(i),net.Learnables);
            layergrad(:,i) = reshape(grad{j,"Value"}{1},[],1);
        end   
        J{j,"Value"}{1} = layergrad;
    end
end
function parameters = lmFunction(parameters,J,e)
    % update rule for mu is not yet implemented
    mu = 10;
    H = J*J';
    I = eye(size(H,1));
    lmupdate = (H+mu*I)\(J*e'); 
    parameters = parameters - reshape(lmupdate,size(parameters));
end

Is there anybody had the same/similar problem in the past and can help me? Or can somebody give me some usefull hints? Thanks in advance.

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Matt J le 5 Sep 2023

Modifié(e) : Matt J le 5 Sep 2023

In the network you've shown there appears to be only 1 residual (assuming its a regression network at all), since the final fully connected layer has only 1 output. With only 1 residual, there really is no appreciable difference between Levenberg-Marquardt and standard steepest descent.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Matt J le 5 Sep 2023

Modifié(e) : Matt J le 5 Sep 2023

0 votes

Levenberg-Marquardt would only be practical for very small networks and training data sizes. That is the case in the code you've shown, but if that is representative of your actual problem, it might be more appropriate just to use standard algorithms with 1 minibatch (i.e., with no division of the data into batches). That might improve convergence a lot, and would be a good idea to test before diving into Levenberg-Marquardt.

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Matt J le 6 Sep 2023

I'm less familiar with the old toolbox, but,

(a) I don't see an sgd option in the old toolbox, so I don't see how you would have compared the new and old toolboxes fairly.

(b) the new toolbox is more expressly designed for Deep Learning. The problem dimension (data size, number of unknown parameters) in deep learning is greater than what the old toolbox seems to support, which limits the choice of algorithms.

(c) It is premature to conclude that the new toolbox is slow until you fix the problem with your minibatch selection. You are using a very large number of minibatches compared to your data size, which is why I recommended that you drop down to 1 minibatch, or at least something much smaller than 128.

Leo le 7 Sep 2023

Hi Matt,

thanks again for your answer:

a) sorry i mean gdm not sgd. My fault.

b) Ok thats the reason why.

c) Ok i didn´t recognize that from your last answer. I will try that.

BR

Connectez-vous pour commenter.

Levenberg Marquardt algortihm as custom training function using dlupdate

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponses (1)

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Version

Tags

Community Treasure Hunt

Levenberg Marquardt algortihm as custom training function using dlupdate

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponses (1)

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien