Effacer les filtres
Effacer les filtres

Trainnet with parallel-CPU mode giving incorrect results

9 vues (au cours des 30 derniers jours)
Collin Rich
Collin Rich le 25 Mai 2024
Commenté : Collin Rich le 25 Mai 2024
I'm using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130x326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints--fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I'm running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below...
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions('sgdm', ...
'InitialLearnRate',1e-7, ...
'LearnRateSchedule','piecewise',...
'LearnRateDropPeriod',500,...
'LearnRateDropFactor',.25,...
'MaxEpochs',1000, ...
'Verbose',false, ...
'ExecutionEnvironment','parallel',...
'Shuffle','every-epoch',...
'Plots','training-progress', ...
'OutputNetwork','last-iteration');
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);
function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
end
  3 commentaires
Matt J
Matt J le 25 Mai 2024
We can't run the code without trainingData. Please attach your two data point test case in a .mat file (as an arrayDatastore).
Collin Rich
Collin Rich le 25 Mai 2024
Here are the two test images and coordinates. (Sorry for not putting in an arrayDatastore; I'm not sure how to put both in a single arrayDatastore. Still learning the ropes...)

Connectez-vous pour commenter.

Réponses (0)

Produits


Version

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by