What is the cause of CUDA_ERROR_LAUNCH_FAILED?
18 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I was working on multi-GPU training of a Neural Network and occasionally receive the error, "CUDA_ERROR_LAUNCH_FAILED" (full error and code below). What might be the cause of this? I successfully ran the code to completion once, tried to change some hyperparameters, then received this message. Reverting the hyperparameter changes did not fix the problem. Thanks in advance.
The code I ran:
%{
Test out transfer learning with pretrained model
See example 'Transfer Learning Using AlexNet'
%}
imds = imageDatastore('PetImages', ...
'IncludeSubfolders',true, ...
'LabelSource','foldernames');
[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');
net = alexnet;
inputSize = net.Layers(1).InputSize;
layersTransfer = net.Layers(1:end-3);
numClasses = numel(categories(imdsTrain.Labels));
layers = [
layersTransfer
fullyConnectedLayer(100,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20)
fullyConnectedLayer(100,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20)
fullyConnectedLayer(numClasses,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20)
softmaxLayer
classificationLayer];
pixelRange = [-30 30];
imageAugmenter = imageDataAugmenter( ...
'RandXReflection',true, ...
'RandXTranslation',pixelRange, ...
'RandYTranslation',pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ...
'DataAugmentation',imageAugmenter);
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
options = trainingOptions('sgdm', ...
'MiniBatchSize',1000, ...
'MaxEpochs',6, ...
'InitialLearnRate',1e-4, ...
'Shuffle','every-epoch', ...
'ValidationData',augimdsValidation, ...
'ValidationFrequency',3, ...
'Verbose',false, ...
'Plots','training-progress',...
'ExecutionEnvironment','multi-gpu');
netTransfer = trainNetwork(augimdsTrain,layers,options);
And the full error text:
Error using trainNetwork (line 150)
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_FAILED
Error in transferLearning (line 50)
netTransfer = trainNetwork(augimdsTrain,layers,options);
Caused by:
Error using nnet.internal.cnn.DistributedDispatcher/computeInParallel
(line 190)
Error detected on worker 1.
Error using nnet.internal.cnn.TrainerGPUStrategy/computeAccumImage
(line 23)
An unexpected error occurred during CUDA execution. The CUDA error
was:
CUDA_ERROR_LAUNCH_FAILED
1 commentaire
Joss Knight
le 5 Nov 2018
This doesn't look great, sorry about that. Does the problem stop recurring if you reduce the MiniBatchSize?
Réponses (2)
cui,xingxing
le 20 Juil 2019
1 commentaire
Joss Knight
le 22 Juil 2019
This error occurs in all sorts of circumstances, usually because your card does not have enough memory. Try posting a new question, provide reproduction code, and give us the output of gpuDevice.
Voir également
Catégories
En savoir plus sur GPU Computing dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!