Issue with Mexfile in parfor loops
6 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
To speed up some heavy calculations I wrote a C file with I compiled with Matlabs mex compiler. It appears to run smoothly giving correct results when using only single threads/no parfor loops and I have run it > 100 times without any error.
However, when I run several calculations in parallel, one or two of my workers usually die, which lets the parfoor loop restart. After a while though all workers are able to finish. These calculations are done using SLURM, so on another machine in our network. Anyone got an idea? Perhaps my MexFile does something illegal I am not aware of.
My main script has this structure:
parfor i=1:numWorkers
doWork();
end
and doWork() is basically like
function doWork()
doSomestuff();
[a,b,c,d,e,f] = initialize();
myMexFunc(a,b,c,d,e,f);
doMoreStuff();
end
and my Mex file is the following:
#include "mex.h"
#include "stdio.h"
void calcModulation(double* A, unsigned int* B, double* C, unsigned int* D, unsigned int L, double* E, unsigned int num_col, double* F)
{
// First Task
for(unsigned int n=0;n < L; ++n)
{
for(unsigned int m=0; m < 132; ++m)
{
A[D[n]+ 22*(B[n]+m)] = A[D[n] + 22*(B[n]+m)] + C[m+132*n];
}
}
// Second Task
for(unsigned int n=0;n < num_col; ++n)
{
for(unsigned int m=0; m < 22; ++m)
{
E[n] = E[n] + F[m + 22*(n)] * A[m + 22*(n)];
}
}
}
/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
// Names changed as part of the original code is secret
unsigned int num_col = mxGetN(prhs[0]);
unsigned int L = mxGetN(prhs[2]);
double* myMatrix_A = mxGetData(prhs[0]); // N x L
unsigned int *myVector_C, *myVector_D;
myVector_C = (unsigned int*) mxGetData(prhs[1]); // N x 1
double* myMatrix_B = mxGetData(prhs[2]); // N x L
myVector_D = (unsigned int*) mxGetData(prhs[3]); // N x 1
double* myVector_E = mxGetData(prhs[4]); //1 x L
double* myMatrix_D = mxGetData(prhs[5]); //N X L
calcModulation(myMatrix_A, myVector_C, myMatrix_B, myVector_D, L, myVector_E, num_col, myMatrix_D);
}
Is there something wrong about the way I set the pointers in the mex file?
The dimensions of the Matlab variables are stated next to the "mxGetData" calls. All are double except for those casted to unsigned int*.
2 commentaires
James Tursa
le 8 Déc 2020
Are the unsigned int* variables actually uint32 class at the MATLAB m-file level?
There is no way for us to determine if your indexing is correct because you don't show us the inputs, and these input values are actually used as indexing into other variables.
Also, you are modifying variables inplace, which is against the rules. I.e., the A and E in calcModulation come from prhs variables which according to the official rules are const.
And you never check that the prhs inputs are actually the class and sizes you expect before you use them.
We don't really have much else to examine based on what you have posted thus far, but I would start with the above comments.
Réponse acceptée
Plus de réponses (1)
James Tursa
le 8 Déc 2020
Modifié(e) : James Tursa
le 8 Déc 2020
Regarding the inplace modification in MATLAB, here is the actual situation:
MATLAB uses a system behind the scenes that is often known as "copy-on-write". That is, multiple variables can share the same data memory. A deep copy is only made when changes are made. The actual behaviour varies a bit depending on MATLAB version, but goes something like this in a recent version:
A = 1:10; % variable A is created, but it is sharing the same data area as a background varible you know nothing about
B = A; % variable B is sharing the same data area as A and the background variable.
% at this point in the code, there are actually three variables sharing the same data area
mymexfunction(A) % suppose this mex function changes the values of A inplace
% at this point in the code, variable B and the background variable have been changed inplace, a nasty side effect
C = 1:10; % variable C maybe gets created as a shared copy of the background variable with the changed values!!!
You are screwed at this point. MATLAB saw the 1:10 pattern when creating C so it might use the background variable for this, but you had inadvertently changed the values of that background variable inplace with your mex routine. If you subsequently did the A = 1:10 line again you would definitely be screwed since the variable is the same.
What to do? You can sometimes get away with modifying variables inplace in a mex routine, but only if you really, really know what you are doing and take extra precautions to make sure the variable isn't shared with any other variable prior to calling your mex routine. Since MATLAB gives you no official tools to determine this, it can be a bit of a crap shoot to know if your code is going to work as you want or expect. See this link for a nasty example:
One method that seems to work for making sure a variable is unshared is the following:
A = something potentially shared with other variables
A(1) = A(1); % MATLAB sees the assignment so it will unshare A first.
mymexfunction(A); % modifying A inplace will *probably* work OK now.
Even so, I am not sure what to expect if you are using parfor loops and each thread is trying to write into the same workspace variable inplace.
2 commentaires
James Tursa
le 8 Déc 2020
You can use the crude debugger (i.e., lots of print statements to make sure your indexing is not running off the end of the valid memory areas), or e.g. in Visual Studio you can compile your mex routine in debug mode and then attach the MATLAB process to your Visual Studio session and try to do the debugging there. But I don't have any experience doing this with parfor.
Voir également
Catégories
En savoir plus sur MATLAB Compiler dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!