1D-CNN: Replicating SeriesNetwork results using dlNetwork

4 vues (au cours des 30 derniers jours)
Ioannis Tsitsimpelis
Ioannis Tsitsimpelis le 23 Fév 2024
Hi,
I have trained a 1D-CNN with sequence data of varying size, but I would like to have finer control and so naturally I have been trying to replicate my results with a training loop. However, my training loop won't work unless I pad each batch before making it a dlarray. I don't want to pad or do anything else that would distort my data. Could anyone advise how a miniBatch containing varying size vectors is handled using the trainNetwork and how could I do the same in the training loop?
Otherwise, to replicate the results with both ways, Could I perhaps implement a dynamic batching method common to both? See snippets 1 and 2 for reference.
% Snippet 1
rng(369,'twister')
load train_data.mat
load train_data_id.mat
data = train_data;
data_id = train_data_id;
num_outputs = numel(categories(data_id))
% Split train data to train-test-eval
[idxTrain,idxTest,idxVal] = trainingPartitions(size(data,2), [0.7 0.15 0.15]);
XTrain = data(idxTrain)
TTrain = data_id(idxTrain);
XTest = data(idxTest);
TTest = data_id(idxTest);
XVal = data(idxVal);
TVal = data_id(idxVal);
layers = [
sequenceInputLayer(1, 'MinLength',500)
convolution1dLayer(20,32)
batchNormalizationLayer
reluLayer
dropoutLayer(0.2)
convolution1dLayer(20,64)
batchNormalizationLayer
reluLayer
dropoutLayer(0.2)
globalMaxPooling1dLayer
fullyConnectedLayer(9)
softmaxLayer
classificationLayer];
options = trainingOptions('rmsprop', ...
'MaxEpochs',75, ...
'Shuffle','every-epoch', ...
'Verbose',false, ...
'Plots','training-progress',...
'MiniBatchSize',8);
% Train network
[net, info] = trainNetwork(XTrain,TTrain,layers,options);
% Snippet 2 built based on Mathwork's training loop help page
rng(369,'twister')
load train_data.mat
load train_data_id.mat
data = train_data;
data_id = train_data_id;
% Count number of labels
numClasses = numel(categories(data_id));
classes = categories(data_id);
% Split train data to train-test-eval
[idxTrain,idxTest,idxVal] = trainingPartitions(size(data,2), [0.7 0.15 0.15]);
XTrain = data(idxTrain);
TTrain = data_id(idxTrain);
XTest = data(idxTest);
TTest = data_id(idxTest);
XVal = data(idxVal);
TVal = data_id(idxVal)
% Specify network architecture
layers = [
sequenceInputLayer(1, 'MinLength',500)
convolution1dLayer(20,32)
batchNormalizationLayer
reluLayer
dropoutLayer(0.2)
convolution1dLayer(20,64)
batchNormalizationLayer
reluLayer
dropoutLayer(0.2)
globalMaxPooling1dLayer
fullyConnectedLayer(9)
softmaxLayer];
% Create a dlnetwork object from the layer array
net = dlnetwork(layers)
% Specify the options to use during training
numEpochs = 75;
miniBatchSize = 8;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
% Initialize the squared average gradients
averageSqGrad = [];
% Calculate the total number of iterations for the training progress monitor
numIterations = numEpochs * numIterationsPerEpoch;
% Initialize the TrainingProgressMonitor object. Because the timer starts
% when you create the monitor object, make sure that you create the object
% close to the training loop
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
% Train the model using a custom training loop. For each epoch, shuffle the
% data and loop over mini-batches of data. Update the network parameters
% using the rmspropupdate function. At the end of each iteration,
% display the training progress
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data
idx = randperm(numel(XTrain));
XTrain = XTrain(idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini batch of data and convert the labels to dummy variables
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(idx);
T = zeros(numClasses,miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Determine max length for padding
maxLength = max(cellfun(@length, X));
% Pad sequences to have the same length
paddedX = cellfun(@(seq) padarray(seq, [0, maxLength - length(seq)], ...
'post'), X, 'UniformOutput', false);
paddedMatrix = cat(3, paddedX{:}); % Assuming sequences are row vectors
paddedMatrix2 = permute(paddedMatrix, [3, 2, 1]); % Adjust dimensions as needed for dlarray
paddedMatrix3 = permute(paddedMatrix2, [1, 2]);
dlX = dlarray(single(paddedMatrix3), 'BTC');
% Convert mini-batch of data to a dlarray.
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
dlX = gpuArray(dlX);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,dlX,T);
% Update the network parameters using the RMSProp optimizer.
[net,averageSqGrad] = rmspropupdate(net,gradients,averageSqGrad);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
function [loss,gradients] = modelLoss(net,dlX,T)
Y = forward(net,dlX);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end

Réponses (1)

Avadhoot
Avadhoot le 20 Mar 2024
I understand that you are using padding because your data has sequences of variable length. But you want to keep your model free from biases introduced due to padding. Regarding your question about the "trainNetwork" function, even that uses padding internally while training the network. But when you use custom training loop, you need to perform the padding manually. There are 2 ways to go about it. Both are listed below:
1) Dynamic Batching:
As you have already mentioned in your question, dynamic batching can be used in both the cases. While dynamic batching will not remove the need for padding completely, it will keep the padding at the minimum. To implement dynamic batching, you will have to sort your data according to the lengths of the sequences and then group the data of similar sequence size in batches. This is a relatively easy step and will keep padding at the minimum.
2) Custom padding and masking:
This approach can be used if dynamic batching is not feasible. Here you can implement your custom padding and then introduce a masking layer in the model to ignore the padding. This way your model will be unaffected by the padding and no biases will be introduced.
Here is the simplified approach to implement this:
  1. Pad Sequences: Pad the sequences to match the length of the longest sequence before converting them to "dlarray". You have already performed this in Snippet 2.
  2. Implement masking: You need to introduce a masking layer into your network. MATLAB does not have a built-in masking layer so you will have to write your own layer with the appropriate forward and backward functions for masking operation. Alternatively, you can manually apply a mask to the output of the network before calculating the loss so that the padded values are ignored. Care must be taken to ensure that the gradients are computed correctly.
In conclusion, his approach will be difficult to implement but it will achieve what you intended. The model will be free from the effects of padding. Otherwise, you have the option of dynamic batching which keeps the padding to a minimum.
I hope this helps.

Catégories

En savoir plus sur Custom Training Using Automatic Differentiation dans Help Center et File Exchange

Produits


Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by