CNN training failing on multi-gpu environment

2 vues (au cours des 30 derniers jours)
Kevin Shi
Kevin Shi le 24 Avr 2019
Réponse apportée : Peng le 23 Déc 2019
I am trying to train a CNN using the multi-gpu execution environment. It trains fine on the 'auto' or 'gpu' option using only one gpu, but I am trying to make use of the four I have available. All are on the local machine running CentOS. The drivers are up to date. I also tested using all gpus in a local pool with the MATLAB example found here and it worked fine. https://www.mathworks.com/help/parallel-computing/examples/run-matlab-functions-on-multiple-gpus.html
These are the errors I receive. What can I do to make this work?
Error using trainNetwork (line 150)
The parallel pool that SPMD was using has been shut down.
Caused by:
Error using nnet.internal.cnn.DistributedDispatcher/computeInParallel (line 190)
The parallel pool that SPMD was using has been shut down.
Error using internal.matlab.desktop.editor.clearAndSetBreakpointsForFile (line 45)
The client lost connection to worker 3. This might be due to network problems, or the interactive communicating job
might have errored.
Warning: 4 worker(s) crashed while executing code in the current parallel pool. MATLAB will attempt to run the code
again on the remaining workers of the pool. View the crash dump files to determine what caused the workers to crash.
The client lost connection to worker 3. This might be due to network problems, or the interactive communicating job
might have errored.
Warning: 4 worker(s) crashed while executing code in the current parallel pool. MATLAB will attempt to run the code
again on the remaining workers of the pool. View the crash dump files to determine what caused the workers to crash.

Réponses (1)

Peng
Peng le 23 Déc 2019
Hi I've got the same problem. Have you solved this already? I didn't find any solution to this yet. I'm using MATLAB R2018b, runing on a computer with 2 GPUs and Ubuntu OS.

Catégories

En savoir plus sur Parallel and Cloud dans Help Center et File Exchange

Produits


Version

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by