- Close MATLAB and open a new session.
- Open the example.
- Before doing anything else in the example, change the ExectutionEnvironment value to cpu on line 100.
- Then run the script.
- If you are still getting a gpu-related error after that, please do update with that info.
Why am I getting an "out of memory on device" error when trying to run the Speaker Recognition Using X Vectors Example?
4 views (last 30 days)
I am trying to run the live script for the Speaker Recognition Using X Vectors example available here on the Matlab help center: https://www.mathworks.com/help/audio/ug/speaker-recognition-using-x-vectors.html. I have all the necessary toolboxes installed on my version of Matlab. I've tried running the example without making any changes, and I get an error at the code block starting at line 108. The error occurs at line 143 and reads
Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.
X = nnet.internal.cnngpu.batchNormalizationForwardPredict(...
xdata = internal_batchnorm(xdata, offset, scale, args.Epsilon, channelDim, "inference", ...
Y = batchnorm(Y, ...
It's clear to me that something is going wrong with the use of my computer's gpu in this example, but I don't understand gpu's and parallel processing well enough to understand what. If I try to change the execution environment from gpu to cpu in line 100, I still get the same error. If I enter "gpuDevice()" into the command line, I get the following information about my computer's gpu:
CUDADevice with properties:
Name: 'GeForce RTX 2060 with Max-Q Design'
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
I'm honestly not sure what the above gpu information means, but maybe it can be helpful in understanding why I'm getting this error?
If I try to reset the device like suggested by putting "gpuDevice(1)" right above line 143, I get the following errors:
"Error using gpuArray/reshape
The data no longer exists on the device.
Error in deep.internal.dlarray.extractConvolutionFilter>iConvertWeightsToInternalAPIformat (line 159)
weights = reshape(weights, szW);
Error in deep.internal.dlarray.extractConvolutionFilter>iSizesOfNumericWeights (line 149)
weights = iConvertWeightsToInternalAPIformat(weights,nbFiltersPerGroup,nbChannelsPerGroup,nSpatialDimsInX,nbGroups);
Error in deep.internal.dlarray.extractConvolutionFilter (line 40)
[weights, nbChannelsPerGroup, nbFiltersPerGroup, nbGroups] = iSizesOfNumericWeights(weightsData, nSpatialDimsInX);
Error in dlarray/dlconv (line 214)
[weights, filterSize, nbChannelsPerGroup, nbFiltersPerGroup, nbGroups, isWeightsLabeled] = deep.internal.dlarray.extractConvolutionFilter(weights, ...
Error in xvecModel (line 15)
Y = dlconv(X,parameters.conv1.Weights,parameters.conv1.Bias,'DilationFactor',1);
I've showed this to my advisor and we're going to try running it on a different computer to see if that helps at all. If you can help me understand what's going wrong on my computer and how to fix it, that would be very helpful and very much appreciated.
Brian Hemmat on 7 Jun 2021
Are you sharing that GPU with other programs (for example, is it also being used for your graphics?)? That can cause out-of-memory issues.
You can try reducing the miniBatchSize (see line 79). The default is 128. Try 64 or 32. Larger sizes generally train faster, so this will come at the cost of a longer time training.
Setting ExecutionEnvironment to cpu should make the issue goes away. I suspect you changed the value of the dropdown but didn't run the cell again. Try:
Regarding using gpuDevice(1). You can try that outside of any training loop, not within it. gpuDevice(1) clears data from your GPU, hence the error message saying that the data no longer exists. In this example, instead of putting it on line 143, you could have tried it before before executing any lines of the example.