Training error when use selfAttentionLayer with DropOut

Question

0 votes

I want to use selfAttentionLayer to construct the time series prediction model. However, when I use selfAttentionLayer with DropOut, the training process generates error. The error messages are shown as follows:

Error using max

Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in nnet.internal.cnn.util.boundAwayFromZero (line 10)

x = max(x, eps(precision), 'includenan');

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in gpuArray/internal_softmaxBackward (line 13)

Z = nnet.internal.cnn.util.boundAwayFromZero(Z);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in nnet.internal.cnnhost.scaledDotProductAttentionBackward (line 23)

dU = internal_softmaxBackward(matlab.lang.internal.move(dW), W, 1);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in gpuArray/internal_attentionBackward (line 34)

[dQ, dK, dV] = nnet.internal.cnnhost.scaledDotProductAttentionBackward(dZ, Q, K, V, ...

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.recording.operations.AttentionOp/backward (line 48)

[dQ,dK,dV] = internal_attentionBackward(dZ,Q,K,V,dataForBackward,M,op.Args{1:6});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.recording.RecordingArray/backwardPass (line 99)

grad = backwardTape(tm,{y},{initialAdjoint},x,retainData,false,0);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in dlarray/dlgradient (line 132)

[grad,isTracedGrad] = backwardPass(y,xc,pvpairs{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in EHSAPressureStatePrediction>modelLoss (line 223)

gradients = dlgradient(loss,net.Learnables);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.dlfeval (line 17)

[varargout{1:nargout}] = fun(x{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in deep.internal.dlfevalWithNestingCheck (line 19)

[varargout{1:nargout}] = deep.internal.dlfeval(fun,varargin{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error in dlfeval (line 31)

[varargout{1:nargout}] = deep.internal.dlfevalWithNestingCheck(fun,varargin{:});

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The dlnetwork:

numIn = 5; 
numOut = 2; 
peDim = 6; 
seqLen = 4096;
layers = [sequenceInputLayer(numIn,"Normalization","none","Name","Input","MinLength",seqLen)
          positionEmbeddingLayer(peDim,seqLen)
          selfAttentionLayer(5,10,"DropoutProbability",0.2)
          convolution1dLayer(3,20,"DilationFactor",2,"Padding","causal")
          layerNormalizationLayer
          convolution1dLayer(5,25,"DilationFactor",4,"Padding","causal")
          layerNormalizationLayer
          convolution1dLayer(7,30,"DilationFactor",8,"Padding","causal")
          fullyConnectedLayer(20)
          reluLayer
          fullyConnectedLayer(10)
          reluLayer
          fullyConnectedLayer(5)
          reluLayer
          fullyConnectedLayer(numOut,"Name","output")];
net = dlnetwork(layers);
% analyzeNetwork(net);

I want to know why the dropout in the selfAttentionLayer causes this error.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Ritam il y a environ une heure

Ouvrir dans MATLAB Online

0 votes

I was able to run the provided "dlnetwork" model code without encountering any errors. Based on this, the issue does not appear to be inherently related to the "selfAttentionLayer". Instead, it is likely due to limitations in the available GPU memory on your system.

As potential workarounds, you may consider the following options:

Use a less memory‑intensive data type, such as single, instead of double precision.
Train the network using mini‑batches. Feedforward networks do not natively support mini‑batch training, so this needs to be implemented manually.

To implement manual mini‑batch training, you can split your dataset into smaller subsets (for example, x{i} and t{i} for inputs and targets). Then, set the number of training epochs to 1 within the training function and use nested loops—one for epochs and another for iterations. A simplified example is shown below:

net = feedforwardnet(10);
net.trainFcn = 'trainscg';
net.trainParam.epochs = 1;
for e = 1:nEpochs
    for i = 1:nIterations
        net = train(net, x{i}, t{i});
    end
end

Please also ensure that batches are loaded into memory only at the time of training and not all at once.

If the issue persists after applying these changes, I would recommend reaching out to MathWorks Technical Support for further assistance specific to your setup at https://in.mathworks.com/company/aboutus/contact_us.html

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Chuguang Pan il y a environ une heure

@Ritam. Thanks for your answer, the issue is caused by limited memory.

Connectez-vous pour commenter.

Training error when use selfAttentionLayer with DropOut

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (0)

Catégories

Produits

Version

Tags

Community Treasure Hunt

Training error when use selfAttentionLayer with DropOut

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (0)

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens