'radix_sort: failed to get memory buffer' when executing accumarray on gpuArrays of certain size

Hello,
I'm trying to use accumarray on large gpuArrays, but get the error 'radix_sort: failed to get memory buffer'.
This is a minimal example that gives me the error:
a = randi(intmax, 2^28-2048, 1, 'gpuArray');
b = gpuArray(randi(3, 2^28-2048, 3, 'uint16'));
c = accumarray(b,a);
When I do the same with arrays of size [2^28-2047 1] and [2^28-2047 3] it works.
This is my gpuDevice after creating a and b:
CUDADevice with properties:
Name: 'GeForce GTX 1080 Ti'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 10.1000
ToolkitVersion: 9.1000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.1718e+10
AvailableMemory: 7.6692e+09
MultiprocessorCount: 28
ClockRateKHz: 1683000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Shouldn't this be enough memory for this kind of operation?
I'm running version 9.5.0.944444 (R2018b) on Linux.
I can work around this problem but I'd like to understand it so I can adapt my code accordingly.
Best wishes,
Daniel

Réponses (2)

I have run the code on TitanV and it works fine. The array of larger size is working fine. So, I think there is no memory issue in it.
Try to clear the memory of GPU device through reset command. Here is the link
Now, try to re-run the code.

6 commentaires

Thanks Ganesh.
If I reset the gpuDevice as you suggested it works (even if I use 2^28 and 3*2^28 number of elements for a and b). However, if after I run it once I reassign a and b to arrays of different sizes (e.g. subtracting 1 row), I again get the radix_sort error.
>> reset(gpuDevice)
>> a = randi(intmax, 2^28, 1, 'gpuArray');
>> b = gpuArray(randi(3, 2^28, 3, 'uint16'));
>> c = accumarray(b,a);
>> whos
Name Size Bytes Class Attributes
a 268435456x1 4 gpuArray
b 268435456x3 4 gpuArray
c 3x3x3 4 gpuArray
>> a = randi(intmax, 2^28-1, 1, 'gpuArray');
>> b = gpuArray(randi(3, 2^28-1, 3, 'uint16'));
>> c = accumarray(b,a);
radix_sort: failed to get memory buffer
Does this mean that MATLAB doesn't properly free the RAM?
Hi Daniel,
Run the following command after excuting each line after the first assignment of a b c.
nvidia-smi
Can you share the output, so I can go through it.
Thank you. This is what I get:
>> reset(gpuDevice)
>> unix('nvidia-smi');
Mon Aug 5 11:29:00 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 47C P2 57W / 280W | 245MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3275 C ...alysis/MATLAB/R2018b/bin/glnxa64/MATLAB 217MiB |
| 0 3520 G /usr/lib/xorg/Xorg 9MiB |
| 0 3558 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
>> a = randi(intmax, 2^28, 1, 'gpuArray');
>> unix('nvidia-smi');
Mon Aug 5 11:29:18 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 48C P2 57W / 280W | 2297MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3275 C ...alysis/MATLAB/R2018b/bin/glnxa64/MATLAB 2269MiB |
| 0 3520 G /usr/lib/xorg/Xorg 9MiB |
| 0 3558 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
>> b = gpuArray(randi(3, 2^28, 3, 'uint16'));
>> unix('nvidia-smi');
Mon Aug 5 11:29:41 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 50C P2 62W / 280W | 3833MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3275 C ...alysis/MATLAB/R2018b/bin/glnxa64/MATLAB 3805MiB |
| 0 3520 G /usr/lib/xorg/Xorg 9MiB |
| 0 3558 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
>> c = accumarray(b,a);
>> unix('nvidia-smi');
Mon Aug 5 11:29:57 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 51C P2 70W / 280W | 4857MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3275 C ...alysis/MATLAB/R2018b/bin/glnxa64/MATLAB 4829MiB |
| 0 3520 G /usr/lib/xorg/Xorg 9MiB |
| 0 3558 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
>> a = randi(intmax, 2^28-1, 1, 'gpuArray');
>> unix('nvidia-smi');
Mon Aug 5 11:30:26 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 54C P2 71W / 280W | 5881MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3275 C ...alysis/MATLAB/R2018b/bin/glnxa64/MATLAB 5853MiB |
| 0 3520 G /usr/lib/xorg/Xorg 9MiB |
| 0 3558 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
>> b = gpuArray(randi(3, 2^28-1, 3, 'uint16'));
>> unix('nvidia-smi');
Mon Aug 5 11:30:52 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 6% 55C P2 66W / 280W | 5369MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3275 C ...alysis/MATLAB/R2018b/bin/glnxa64/MATLAB 5341MiB |
| 0 3520 G /usr/lib/xorg/Xorg 9MiB |
| 0 3558 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
>> c = accumarray(b,a);
radix_sort: failed to get memory buffer
>> unix('nvidia-smi');
Mon Aug 5 11:31:02 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 7% 56C P2 76W / 280W | 4857MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3275 C ...alysis/MATLAB/R2018b/bin/glnxa64/MATLAB 4829MiB |
| 0 3520 G /usr/lib/xorg/Xorg 9MiB |
| 0 3558 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
Hi Daniel,
There is no concept of cleaning the memory implicitly in GPU. Only ways to clean memory are
  1. Restart the session
  2. Reset the Device
So once the memory is created, it stays in the GPU memory until you clean it explicitly.
Coming to your case of accumArray function, the cause for the issue is at some of executing the internal code, a memory buffer is created which is exceeding the RAM limit.
Thanks. So that means it's not possible to free up the memory of a variable when it's reassigned?
Is this an issue with GPUs in general (i.e. do they only support a full wipe?) or can MATLAB just not do it?
This isn't strictly true. MATLAB holds onto a quarter of GPU memory, once assigned, as an optimisation to prevent unnecessary device synchronization. Memory is then re-used. MATLAB will never return an out-of-memory error because it is holding onto the memory of a variable that has gone out of scope. However, it appears that in this case the NVIDIA thrust library is allocating its own memory buffer and MATLAB doesn't know about this, so it doesn't know to free up its memory pool to make space. This should be fixed for you in MATLAB R2019a.
In the meantime, try
feature('GpuAllocPoolSizeKb', 0);
as a temporary measure to turn off the pooling of memory that's causing this issue.

Connectez-vous pour commenter.

There is an issue in an NVIDIA library that is not functioning correctly when memory is limited. This is fixed in CUDA 10 / MATLAB R2019a.
In the meantime, try
poolSize = feature('GpuAllocPoolSizeKb', 0);
as a temporary measure to turn off the pooling of memory that's underlying this issue. When you are ready to enable pooling again use
feature('GpuAllocPoolSizeKb', poolSize);
This is advisable since turning off pooling will significantly reduce performance.

Catégories

En savoir plus sur Parallel Computing Toolbox dans Centre d'aide et File Exchange

Produits

Version

R2018b

Modifié(e) :

le 17 Août 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by