Levenberg Marquardt algortihm as custom training function using dlupdate

Hi togehter,
I'm trying to implement the levenberg-marquardt algortihm in matlab with dlupdate as shown in the example (Use dlupdate to Train Network Using Custom Update Function) (https://de.mathworks.com/help/deeplearning/ref/dlupdate.html). The biggest challange is to calculate the jacobian matrix.
In the new deeplearning toolbox there are just the algorithm: sgdm, rmsprop, adam. But the levenberg-marquardt is not implemented.
Is there a easy way to calculate the jacobian matrix with dlgradient?
This is my code right now. It works somehow but it´s very slow and i am not sure if it´s correct.
clc;
clear;
%% Random data
XTrain = rand(15,1000)*0.1;
XTrain = XTrain-5;
TTrain = XTrain(1,:).^2 + XTrain(2,:).^2+ XTrain(3,:).^2+ XTrain(4,:).^2+ XTrain(5,:).^2+ XTrain(6,:).^2+ XTrain(7,:).^2+ XTrain(8,:).^2+...
XTrain(9,:).^2+ XTrain(10,:).^2+ XTrain(11,:).^2+ XTrain(12,:).^2+ XTrain(13,:).^2+ XTrain(14,:).^2+ XTrain(15,:).^2;
TTrain = TTrain/10;
%% Define Network
layers = [
featureInputLayer(15)
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(1)
functionLayer(@(x) x)
];
net = dlnetwork(layers);
%% Training Options
miniBatchSize = 128;
numEpochs = 2;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
XTrain = dlarray(XTrain, 'CB');
TTrain = dlarray(TTrain, 'CB');
%% Train Network
for epoch = 1:numEpochs
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,idx);
TTrain = TTrain(idx);
for iteration = 1:numIterationsPerEpoch
% Get a batch of data.
indices = (iteration-1)*miniBatchSize+1:iteration*miniBatchSize;
XBatch = XTrain(:,indices);
TBatch = TTrain(:,indices);
[loss, J,e] = dlfeval(@modelLoss,net,XBatch,TBatch);
e = extractdata(e);
updateFcn = @(net,J) lmFunction(net,J, e);
net = dlupdate(updateFcn,net,J);
% Report the loss
fprintf('Loss: %f\n', extractdata(loss));
end
end
Loss: 646.412170 Loss: 7.421673 Loss: 11.123130 Loss: 7.209948 Loss: 5.831234 Loss: 1.093452 Loss: 0.242863 Loss: 0.870988 Loss: 0.398049 Loss: 0.738563 Loss: 0.304589 Loss: 0.171688 Loss: 0.072695 Loss: 0.107126
function [loss,J,e] = modelLoss(net,X,T)
Y = forward(net,X);
loss = mse(Y,T)/size(T,1);
e = (Y-T).^2;
% compute jacobian matrix
J = dlgradient(e(1),net.Learnables);
for j = 1:size(J.Layer,1) %iteration thru weitghts and biases of the layers
layergrad = nan(numel(net.Learnables{:,"Value"}{j}),length(X));
for i=1:size(X,2)
grad = dlgradient(e(i),net.Learnables);
layergrad(:,i) = reshape(grad{j,"Value"}{1},[],1);
end
J{j,"Value"}{1} = layergrad;
end
end
function parameters = lmFunction(parameters,J,e)
% update rule for mu is not yet implemented
mu = 10;
H = J*J';
I = eye(size(H,1));
lmupdate = (H+mu*I)\(J*e');
parameters = parameters - reshape(lmupdate,size(parameters));
end
Is there anybody had the same/similar problem in the past and can help me? Or can somebody give me some usefull hints? Thanks in advance.

1 commentaire

Matt J
Matt J le 5 Sep 2023
Modifié(e) : Matt J le 5 Sep 2023
In the network you've shown there appears to be only 1 residual (assuming its a regression network at all), since the final fully connected layer has only 1 output. With only 1 residual, there really is no appreciable difference between Levenberg-Marquardt and standard steepest descent.

Connectez-vous pour commenter.

Réponses (1)

Matt J
Matt J le 5 Sep 2023
Modifié(e) : Matt J le 5 Sep 2023
Levenberg-Marquardt would only be practical for very small networks and training data sizes. That is the case in the code you've shown, but if that is representative of your actual problem, it might be more appropriate just to use standard algorithms with 1 minibatch (i.e., with no division of the data into batches). That might improve convergence a lot, and would be a good idea to test before diving into Levenberg-Marquardt.

3 commentaires

Hi Matt J,
thanks a lot for your answer. The reason why i want to use the Levenberg-Marquardt is that i want to compare the new deeplearning toolbox and the old neural network toolbox. In the past i used the levenberg-marquardt (trainlm) because of the good performance. Now i also want to use the levenberg-marquardt in the new deeplearning toolbox. When i´m using e.g. sgd it seems to be that the newer toolbox is slower and the performance is worse in compare to the older toolbox. Can you agree? What is your opinion about that? And why are in the new toolbox just those 3 algorithm?
BR
I'm less familiar with the old toolbox, but,
(a) I don't see an sgd option in the old toolbox, so I don't see how you would have compared the new and old toolboxes fairly.
(b) the new toolbox is more expressly designed for Deep Learning. The problem dimension (data size, number of unknown parameters) in deep learning is greater than what the old toolbox seems to support, which limits the choice of algorithms.
(c) It is premature to conclude that the new toolbox is slow until you fix the problem with your minibatch selection. You are using a very large number of minibatches compared to your data size, which is why I recommended that you drop down to 1 minibatch, or at least something much smaller than 128.
Hi Matt,
thanks again for your answer:
a) sorry i mean gdm not sgd. My fault.
b) Ok thats the reason why.
c) Ok i didn´t recognize that from your last answer. I will try that.
BR

Connectez-vous pour commenter.

Catégories

En savoir plus sur Deep Learning Toolbox dans Centre d'aide et File Exchange

Produits

Version

R2022b

Question posée :

Leo
le 5 Sep 2023

Commenté :

Leo
le 7 Sep 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by