Effacer les filtres
Effacer les filtres

parallel computing toolbox - memory issues - possibly leak

6 vues (au cours des 30 derniers jours)
Andrew J
Andrew J le 4 Juin 2018
Commenté : Andrew J le 4 Juin 2018
I am seeing a problem - which has the appearance of a memory leak in PCT.
I have a function which I am running in parallel - I have tried using parfeval and spmd batching - both with the same result. the function runs fine when being computed serially 'in-process', but when it is executed using PCT the memory footprint of the worker nodes gradually increases until it fails.
The first set of tests was on a machine with 32GB RAM and 6 worker nodes. Before long all memory is consumed by the MatLab processes - I repeated on machine with 256 GB - with the same result - all 256 GB consumed. a back of the envelope calculation suggests that the memory required by the function directly should be more like 1% of this.
I looked in more detail at what is going on wrt memory. The JVM max memory is set to around 1GB (which is much greater than needed) - if I put tracing inside the worker nodes - getting info from the java runtime and also the matlab 'memory()' function. I also traced the contents of the function and base workspaces - nothing obviously big or wrong.
the tracing on the worker nodes confirms jvm max memory as 1GB, and the jvm used memory is around 50 MB (per worker) - but the 'MatLab used memory' grows to many GB per process (from MatLab 'memory' function - which agrees with task manager stats).
I took the function which is being evaluated apart - and found that this problem occurs when calling the MatLab inv() function in the function which is being parallellized using PCT (several functions deep) - and whatever is going on - the 'leaking' memory is apparently not being managed by the jvm. I have avoided using the inv() function using a workaround for test purposes - but that isn't a viable log-term solution.
Could anyone suggest what to do next.
Andrew
  3 commentaires
Walter Roberson
Walter Roberson le 4 Juin 2018
inv() is generally recommended against for numeric stability purposes (quite apart from any memory leak.) Would it be possible to replace the inv() with using \ (mldivide) ?
Andrew J
Andrew J le 4 Juin 2018
hello, it is a relatively small numeric non-sparse numeric matrix.
in this case there is no reason to think it should be a particularly demanding calculation - there are several ways that I could organize it - although inv() might not be the best way - it should certainly not consume all system memory.
what Im concerned about is why this causes PCT to fail - consuming a large multiple of the memory that should be required - with a problem which is not seen when not using PCT.
without being able to see how this is implemented under the bonnet - this has the feel of buffers/workspace being allocated inside the inv() function - which is perhaps cached and then not deallocated - but only when run inside pct. in that light - two possibilities come to mind - first - a combination of 'unmanaged' libraries with context dependent memory management, or second - issues around memory buffers used to handle stdio/stderr streams - which I imagine would be handled in a context sensitive way - and might not be flushed properly - but its all guesswork.
either way - a large number of relatively small matrix inversions - shouldn't consume all resources and cause the system to fail - and if it were legitimate memory use - then I would expect serial execution to also show significant memory use - approx scaled down by the number of worker processes - and it doesnt.

Connectez-vous pour commenter.

Réponses (1)

Jan
Jan le 4 Juin 2018
Without some code, which reproduces the problem it is hard to guess, what's going on. While it is recommended at all to avoid using inv, because there are really very few cases where this is needed for numerical computations, I'm sure that it does not interfere with the memory used by Java.
How do you determine the memory usage of Matlab? If you create a lot of variables, the operating system delivers memory to Matlab, but it would be a waste of time to release the memory, when it is not used anymore. Therefore the OS cleans up the memory only, if it is requested later to save time. This might look like Matlab "consumes" a lot of memory, although most of the memory is ready to be reassigned. So please explain exactly, which tool you are using and what you observe to get the impression, that memory is leaked.
  1 commentaire
Andrew J
Andrew J le 4 Juin 2018
Hi yes - i understand what you mean - I know a reasonable amount about how memory is managed - having written lots of analytics in C++ / MPI in the past - although matlab is a different animal.
The jvm info was obtained by inserting tracers onto the worker nodes - which accessed the java runtime directly and got mem stats - other info from matlab using the memory() function which returns memory usage info. I also included something that tracks the size of my variables - as far as is possible.
Since I posted originally - I have found more info. The problem seems to be along the lines that I guessed. The key thing is what might be different when code runs under PCT - and one of those is how the output and error streams are handled - it looks as if the issue was that some matrices are not well conditioned - producing warnings - which get buffered and build up - it is something like that - and I have now found other posts that suggest this. I was using 2016b - now installed 2017b - and it appears to be fixed in this version. Thx A

Connectez-vous pour commenter.

Produits


Version

R2016b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by