Training error when use selfAttentionLayer with DropOut

I want to use selfAttentionLayer to construct the time series prediction model. However, when I use selfAttentionLayer with DropOut, the training process generates error. The error messages are shown as follows:
Error using max
Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.
Error in nnet.internal.cnn.util.boundAwayFromZero (line 10)
x = max(x, eps(precision), 'includenan');
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in gpuArray/internal_softmaxBackward (line 13)
Z = nnet.internal.cnn.util.boundAwayFromZero(Z);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in nnet.internal.cnnhost.scaledDotProductAttentionBackward (line 23)
dU = internal_softmaxBackward(matlab.lang.internal.move(dW), W, 1);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in gpuArray/internal_attentionBackward (line 34)
[dQ, dK, dV] = nnet.internal.cnnhost.scaledDotProductAttentionBackward(dZ, Q, K, V, ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in deep.internal.recording.operations.AttentionOp/backward (line 48)
[dQ,dK,dV] = internal_attentionBackward(dZ,Q,K,V,dataForBackward,M,op.Args{1:6});
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in deep.internal.recording.RecordingArray/backwardPass (line 99)
grad = backwardTape(tm,{y},{initialAdjoint},x,retainData,false,0);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in dlarray/dlgradient (line 132)
[grad,isTracedGrad] = backwardPass(y,xc,pvpairs{:});
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in EHSAPressureStatePrediction>modelLoss (line 223)
gradients = dlgradient(loss,net.Learnables);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in deep.internal.dlfeval (line 17)
[varargout{1:nargout}] = fun(x{:});
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in deep.internal.dlfevalWithNestingCheck (line 19)
[varargout{1:nargout}] = deep.internal.dlfeval(fun,varargin{:});
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in dlfeval (line 31)
[varargout{1:nargout}] = deep.internal.dlfevalWithNestingCheck(fun,varargin{:});
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The dlnetwork:
numIn = 5;
numOut = 2;
peDim = 6;
seqLen = 4096;
layers = [sequenceInputLayer(numIn,"Normalization","none","Name","Input","MinLength",seqLen)
positionEmbeddingLayer(peDim,seqLen)
selfAttentionLayer(5,10,"DropoutProbability",0.2)
convolution1dLayer(3,20,"DilationFactor",2,"Padding","causal")
layerNormalizationLayer
convolution1dLayer(5,25,"DilationFactor",4,"Padding","causal")
layerNormalizationLayer
convolution1dLayer(7,30,"DilationFactor",8,"Padding","causal")
fullyConnectedLayer(20)
reluLayer
fullyConnectedLayer(10)
reluLayer
fullyConnectedLayer(5)
reluLayer
fullyConnectedLayer(numOut,"Name","output")];
net = dlnetwork(layers);
% analyzeNetwork(net);
I want to know why the dropout in the selfAttentionLayer causes this error.

 Réponse acceptée

Ritam
Ritam il y a environ une heure
I was able to run the provided "dlnetwork" model code without encountering any errors. Based on this, the issue does not appear to be inherently related to the "selfAttentionLayer". Instead, it is likely due to limitations in the available GPU memory on your system.
As potential workarounds, you may consider the following options:
  1. Use a less memory‑intensive data type, such as single, instead of double precision.
  2. Train the network using mini‑batches. Feedforward networks do not natively support mini‑batch training, so this needs to be implemented manually.
To implement manual mini‑batch training, you can split your dataset into smaller subsets (for example, x{i} and t{i} for inputs and targets). Then, set the number of training epochs to 1 within the training function and use nested loops—one for epochs and another for iterations. A simplified example is shown below:
net = feedforwardnet(10);
net.trainFcn = 'trainscg';
net.trainParam.epochs = 1;
for e = 1:nEpochs
for i = 1:nIterations
net = train(net, x{i}, t{i});
end
end
Please also ensure that batches are loaded into memory only at the time of training and not all at once.
If the issue persists after applying these changes, I would recommend reaching out to MathWorks Technical Support for further assistance specific to your setup at https://in.mathworks.com/company/aboutus/contact_us.html

1 commentaire

Chuguang Pan
Chuguang Pan il y a environ une heure
@Ritam. Thanks for your answer, the issue is caused by limited memory.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Produits

Version

R2025a

Question posée :

le 2 Avr 2026 à 2:41

Commenté :

il y a environ une heure

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by