Error when processing on HPC: Unable to allocate space for the FFT calculation. This might be due to insufficient memory on the GPU.

2 vues (au cours des 30 derniers jours)
Hello,
Error message: Unable to allocate space for the FFT calculation. This might be due to insufficient memory on the GPU.
I received this error message when I'm processing multiple images on a Slurm server. The code used both GPU and multi-core computing. The for loop goes over all the images are not parallelized, within each image, the cores work together to produce the result for this simgle image.
The error message shows up after going through around 4000 images. I tried to clear all the variables after completing every single image, and reset the GPU device every 2000 images, and the error message is still there.
The error results in a stop in calculation, the server gets a return 0 message (which means a normal exit on the server).
Please help.

Réponse acceptée

Joss Knight
Joss Knight le 5 Mar 2023
At a guess you are trying to share one GPU between multiple workers on a pool. Depending on how work is scheduled one or two workers may have allocated all GPU memory leaving none for others.
Options:
  • Reduce the size of the pool
  • If on R2022b or later, try setting the gpuDevice CachePolicy to "minimum"
  • Place your code inside a try... catch block and ignore out of memory errors, or use the CPU instead if the GPU errors
  5 commentaires
Joss Knight
Joss Knight le 6 Mar 2023
The ifft is computed in a data-parallel way but there is no overlap between the computations being run on different workers that share a GPU. Some overheads will be reduced but the main gains you see will be the fact that you have 4 GPUs.
Xiguang Zhang
Xiguang Zhang le 10 Mar 2023
Thanks for the information. The error does not show up after reducing the number of workers.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Parallel and Cloud dans Help Center et File Exchange

Produits


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by