CUDA_ERROR​_ILLEGAL_A​DDRESS when runnin Faster R-CNN on Matlab 2018b

Ghada Khaled
Ghada Khaled le 10 Nov 2018
Commenté : Ghada Khaled le 11 Nov 2018
I'm running faster R-CNN in `matlab 2018b` on a Windows 10. I face an exception `CUDA_ERROR_ILLEGAL_ADDRESS` when I increase the number of my training items or when I increase the `MaxEpoch`.
Below are the information of my `gpuDevice`
CUDADevice with properties:
Name: 'GeForce GTX 1050'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 9.2000
ToolkitVersion: 9.1000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 4.2950e+09
AvailableMemory: 3.4635e+09
MultiprocessorCount: 5
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
And this is my code
latest_index =0;
for i=1:6
load (strcat('newDataset', int2str(i), '.mat'));
len =length(vehicleDataset.imageFilename);
for j=1:len
filename = vehicleDataset.imageFilename{j};
fulldata.imageFilename{latest_index} = filename;
fulldata.vehicle{latest_index} = vehicleDataset.vehicle{j};
trainingDataTable = table(fulldata.imageFilename', fulldata.vehicle');
trainingDataTable.Properties.VariableNames = {'imageFilename','vehicle'};
data.trainingDataTable = trainingDataTable;
% Split data into a training and test set.
idx = floor(0.6 * height(trainingDataTable));
trainingData = trainingDataTable(1:idx,:);
testData = trainingDataTable(idx:end,:);
% Create image input layer.
inputLayer = imageInputLayer([32 32 3]);
% Define the convolutional layer parameters.
filterSize = [3 3];
numFilters = 64;
% Create the middle layers.
middleLayers = [
convolution2dLayer(filterSize, numFilters, 'Padding', 1)
convolution2dLayer(filterSize, numFilters, 'Padding', 1)
maxPooling2dLayer(3, 'Stride',2)
finalLayers = [
% Add a ReLU non-linearity.
% Add the softmax loss layer and classification layer.
layers = [
% Options for step 1.
optionsStage1 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 2.
optionsStage2 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 3.
optionsStage3 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
% Options for step 4.
optionsStage4 = trainingOptions('sgdm', ...
'MaxEpochs', 2, ...
'MiniBatchSize', 1, ...
'InitialLearnRate', 1e-3, ...
'CheckpointPath', tempdir);
options = [
doTrainingAndEval = true;
if doTrainingAndEval
% Set random seed to ensure example training reproducibility.
% Train Faster R-CNN detector. Select a BoxPyramidScale of 1.2 to allow
% for finer resolution for multiscale object detection.
detector = trainFasterRCNNObjectDetector(trainingData, layers, options, ...
'NegativeOverlapRange', [0 0.3], ...
'PositiveOverlapRange', [0.6 1], ...
'BoxPyramidScale', 1.2);
data.detector= detector;
% Load pretrained detector for the example.
detector = data.detector;
save mix_data data
if doTrainingAndEval
% Run detector on each image in the test set and collect results.
resultsStruct = struct([]);
for i = 1:height(testData)
% Read the image.
I = imread(testData.imageFilename{i});
% Run the detector.
[bboxes, scores, labels] = detect(detector, I);
% Collect the results.
resultsStruct(i).Boxes = bboxes;
resultsStruct(i).Scores = scores;
resultsStruct(i).Labels = labels;
% Convert the results into a table.
results = struct2table(resultsStruct);
data.results = results;
save mix_data data
% Load results from disk.
results = data.results;
% Extract expected bounding box locations from test data.
expectedResults = testData(:, 2:end);
% Evaluate the object detector using Average Precision metric.
[ap, recall, precision] = evaluateDetectionPrecision(results, expectedResults);
% Plot precision/recall curve
grid on
title(sprintf('Average Precision = %.2f', ap))
First it prints the warning multiple time and throws the below exception
> Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
> In trainFasterRCNNObjectDetector (line 320)
In rcnn_trail (line 184)
> Error using -
An unexpected error occurred during CUDA execution. The CUDA error was:
> Error in vision.internal.cnn.layer.SmoothL1Loss/backwardLoss (line 156)
idx = (X > -one) & (X < one);
> Error in nnet.internal.cnn.DAGNetwork/computeGradientsForTraining/efficientBackProp (line 585)
dLossdX = thisLayer.backwardLoss( ...
> Error in nnet.internal.cnn.DAGNetwork>@()efficientBackProp(i) (line 661)
@() efficientBackProp(i), ...
> Error in nnet.internal.cnn.util.executeWithStagedGPUOOMRecovery (line 11)
[ varargout{1:nOutputs} ] = computeFun();
> Error in nnet.internal.cnn.DAGNetwork>iExecuteWithStagedGPUOOMRecovery (line 1195)
[varargout{1:nargout}] = nnet.internal.cnn.util.executeWithStagedGPUOOMRecovery(varargin{:});
> Error in nnet.internal.cnn.DAGNetwork/computeGradientsForTraining (line 660)
theseGradients = iExecuteWithStagedGPUOOMRecovery( ...
> Error in nnet.internal.cnn.Trainer/computeGradients (line 184)
[gradients, predictions, states] = net.computeGradientsForTraining(X, Y,
needsStatefulTraining, propagateState);
> Error in nnet.internal.cnn.Trainer/train (line 85)
[gradients, predictions, states] = this.computeGradients(net, X, response,
needsStatefulTraining, propagateState);
> Error in vision.internal.cnn.trainNetwork (line 47)
trainedNet = trainer.train(trainedNet, trainingDispatcher);
> Error in fastRCNNObjectDetector.train (line 190)
[network, info] = vision.internal.cnn.trainNetwork(ds, lgraph, opts, mapping,
> Error in trainFasterRCNNObjectDetector (line 410)
[stage2Detector, fastRCNN, ~, info(2)] = fastRCNNObjectDetector.train(trainingData, fastRCNN,
options(2), iStageTwoParams(params), checkpointSaver);
> Error in rcnn_trail (line 184)
detector = trainFasterRCNNObjectDetector(trainingData, layers, options, ...
Walter Roberson
Walter Roberson le 10 Nov 2018
That should be okay. You could experiment with CUDA 10 driver instead, as long as you are not using cudamex() or GPU Coder.
Ghada Khaled
Ghada Khaled le 11 Nov 2018
This is what I expected but still, I’m getting this error. Could the memory size be an issue? Since I only have 8 Gig of memory

Translated by