Advice constructing large sparse Jacobian arrays
Afficher commentaires plus anciens
I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces. Constructing the array of partial derivatives of the model predictions with respect to the model parameters (the Jacobian) is the most time consuming and memory limiting step. The arrays are large (>100,000 by >10,000) with the saving grace being that they are sparse (<2% non-zero). I have working code that strains a mac ultra (24 cores and 128 GB unified memory). With care in problem size choices things work as requested. However, I crash MATLAB and the macOS when I am insufficiently careful to not exceed memory limits.
As a practical matter, I watch MATLAB demand greater memory as it sets up the parallel loop and then release it while the loop executes. It is during the loop startup that MATLAB can run out of application memory.- the result is a crash giving: JavaScript Alert - https://127.0.0.1:31515 Message Service fatally disconnected. At this point, I have to force quit MATLAB or in some cases, restart the computer. I find that I can run a bigger problem if I submit it as a batch job rather than running the script in the command line.
At this point I am looking for advise on how to improve my code for greater speed with as little memory as possible required. Attached are code fragments that show what I have at this point with two versions - an ignorant parfor loop that lets MATLAB discover the non-zero array elements and an intellegent parfor loop that is aware of which data sites are sensitive to which model parameters. The second is slightly faster than the first but the speed enhancement is not as great as I expected.
There are also implicit questions in how MATLAB handles sparse arrays -
1. Do I save memory by type casting array indices as integers for the sparse array manipulations?
2. How small a floating point number is needed for MATLAB to label it zero for a sparse array?
3. I "invented" algorithms below to handle the parsing of array indices as needed for construction of the Jacobian. Perhaps there are cleaner quicker ways to accomplish this?
% Following are code fragments for non-linear optimization of tensor spline
% coefficients. Two methods are shown for the construction of the
% Jacobian - the matrix of partial derivatives of the model at all data
% sites with respect to the spline parameters (the coefficents)
% Actual problems are associated with roughly 500,000 data sites and 15,000
% model parameters. The code is running on a mac studio ultra with 24
% cores and 128 GB of ram. The number of workers has to be set to less than
% 24 in order to work within memory limits. The typical time for running
% an optimization ranges from 10 to 30 minutes. The majority of the time is
% consumed in the loop shown below. Thus, the bottlenck in calculations is
% the time and memory requirements of this loop to determine the Jacobians
% The spline coefficients are in vector "coefs"
% The collocation arrays (the spline basis function values) are in structure "Cntrl"
% The function "model" calculates a number of properties that are returned in a structure as separate
% vectors: y1, y2, ....
%
% In the following we track one prediction of the model, 'velsq'. The actual code has to work with multiple predictions
% The value for the finite difference derivative increment, dG, must be set
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 1 => code that is oblivious to the pattern of sparseness of the
% collocation arrays:
% ndat is number of data sites
% nspl is number of spline coefficients (model parameters)
predct1=model(coefs,Cntrl); %model output using the base set of coefficients
nz=0.02*ndat*nspl; % experience shows that the Jacobian sparse array has less than 2% non-zero elements
% preallocate an array to hold the Jacobian for model predictions:
A=sparse([],[],[],ndat,nspl,nz);
% set up parallel pool constants:
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
% Step through and perturb model parameter to determine finite difference
% derivatives of data as a function of model parameters - the Jacobian
parfor (i=1:nspl,num_worker)
coefs_tmp=C.Value;
coefs_tmp(i)=coefs_tmp(i)+dG;
predct2=model(coefs_tmp,D.Value);
A(:,i)=(predct2.velsq-E.Value.velsq)/dG; % blind calculation of prediction differences
%for all data sites. Let MATLAB decide which differences produce zeros for the sparse array A
end
% on exiting this loop we have the Jacobian in sparse array "A". This
% loop assumes no knowledge of the sparseness of the prediction differences
% and lets MATLAB make the call on what to place in the sparse array A
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 2 - Requires two steps - This is faster than the first example
% Below an effort is made to reduce numerical (hopefully memory as well) demands of the loop
% Step 1
% determine indices for data impacted by each parameter (do this once in an optimization run and place in Cntrl structure)
[jg,ig]=find(Cntrl.G); % get indices of all non-zero elements of the collocation array
Indexes=[jg(:) ig(:)]; % put indices in a matrix to facilitate making a list
% The cell structure Aindx{i} contains indices for data sites impacted by parameter "i"
Aindx=cell(nspl,1); % alocate a cell variable to hold all the data site indices associated with each parameter
C = parallel.pool.Constant(Indexes);
% loop through all parameters and gather indices of the data impacted by
% each parameter - make int32 to reduce memory required
parfor i=1:nspl
tmp=int32(C.Value(C.Value(:,2)==i,1));
Aindx{i}=tmp;
end
Cntrl.Aindx=Aindx; % put results in the Cntrl structure to make it available later.
% Step 2
% Construct the Jacobian - New Jacobian calculated with every iteration of the optimization
predct1=model(coefs,Cntrl); %model output using the base set of coefficients evaluated at all data sites
sdat=cell(nspl,1); % preallocate a cell structure to hold non-zero Jacobian values
% Step through and perturb model parameter to determine finite difference
% derivatives of data as a function of model parameters - the Jacobian
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
parfor (i=1:nspl,num_worker)
coefs_tmp=C.Value;
parm_id=D.Value.Aindx{i};
coefs_tmp(i)=coefs(i)+dG;
predict2=model(coefs_tmp,D.Value,parm_id); % calculate model values only for data sites impacted by the ith parameter
tmp=(predict2.velsq-E.Value.velsq(parm_id))/dG; % calculate finite differences only for impacted data sites
sdat{i}=tmp; % the length of each sdat{i} cell is variable and about 2% or less of total number of data
end
% at this point we have all non-zero elements of the Jacobian in a cell
% structure "sdat" with nspl cells, each cell contains the Jacobian values for
% data impacted by that parameter at data sites given in Aindx. These need to be unpacted and loaded
% into a sparse array
% Note that a cell structure rather than an array is used since the number
% of elements in each cell is variable
jtot=[];
itot=[];
stot=[];
parfor i=1:nspl
jtot=[jtot;Cntrl.Aindx{i}];
itot=[itot;i*ones(length(Cntrl.Aindx{i}),1)];
stot=[stot;s{i}];
end
A=sparse(jtot,int32(itot),stot,ndat,nspl);
% End of Example
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The function that calculates model predictions:
function out=model(coefs,Cntrl,id)
% Function to return model predictions given b spline coefficients with data site
% basis functions as described in the structure "Cntrl". Usage:
% out=model(coefs,Cntrl,id)
% where coefs are the spline parameters, Cntrl contains all the collocation
% information, id are indices of data sites influenced by a selected
% coefficient
if nargin==2
id=1:length(Cntrl.G(:,1));
end
%calculate all the necessary spline determined predictions
G=Cntrl.G(id,:)*coefs;
Gp=Cntrl.Gp(id,:)*coefs;
Gpp = Cntrl.Gpp(id,:)*coefs;
G3p = Cntrl.G3p(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
G4t=Cntrl.G4t(id,:)*coefs;
Gtt =Cntrl.Gtt(id,:)*coefs;
Gpt=Cntrl.Gpt(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
Vex=Cntrl.Vex(id,:)*coefs;
% predictions are placed in output structure
out.G=G; % Gibbs energy
out.V=Gp; % specific volume is Gp
out.Gtt=Gtt; % specific heat is -Gtt*T
out.velsq=Gp.^2./(Gpt.^2./Gtt-Gpp); % the combination of derivatives that gives the square of sound speeds
out.d2KdP2=G4p.*Gp.*Gpp.^(-2) + G3p./Gpp -2*G3p.^2.*Gp.*Gpp.^(-3); % the combination of derivatives giving the second derivative of the bulk modulus
out.G4t=G4t;% parameter to regularize - make "Gtt" smooth
4 commentaires
Matt J
le 2 Oct 2023
Perhaps there are cleaner quicker ways to accomplish this?
That would be easier to assess if we could see the optimization problem presented in mathematical form, rather than code.
Sam Chak
le 2 Oct 2023
I am not a thermodynamicist, so I was curious about how thermodynamic surfaces appear. The online results primarily displayed Maxwell's sculpture and this PVT diagram. Is modeling the Pressure, Volume, and Temperature what creates the sparse Jacobian array?

Michael Brown
le 2 Oct 2023
Michael Brown
le 2 Oct 2023
Réponse acceptée
Plus de réponses (1)
Matt J
le 2 Oct 2023
0 votes
I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces.
I suggest looking at Example2D.m in,
which demonstrates efficient ways of doing 2D tensorial spline coefficient estimation.
16 commentaires
Michael Brown
le 2 Oct 2023
The mathematical statement of the optimization of parameters starts with the linear problem: J*dp=devs. The idea is that solutions of a linear problem can iteratively lead to improvements of model predictions that are not linear with respect to model parameters.
There shouldn't be any need for iterative computation if J is a fixed matrix of tensorial B-spline basis functions. As illustrated at the link I gave you, because of the tensorial structure of the B-splines, J has the form of a Kronecker product J=kron(J1,J2,J3,...,J6) which can be analytically pseudo-inverted very efficiently without constructing the complete sparse Jacobian.
Michael Brown
le 2 Oct 2023
I still don't have a firm picture of the mathematics of the model. I'm assuming that you have a continuous function which, if it were simplified to 2D, would be parameterized something like the following, where x_ij are unknown spline coefficients and s(t) is a 1D spline basis function.

If we sample F(x,y) at discrete, integer locations x=m, y=n this gives,

which in matrix vector form is,

where S(m,i)=s(m-i) is a sparse matrix. The solution for X in MATLAB code would be,
X=S\F/S.'
The thing to notice is that even though F has O(N^2) elements, the sparse matrix S has only O(N) non-zero elements. Similarly, if you did this in 6D, F would be O(N^6) but the inversion would still involve only O(N) matrices.
In the case where you are taking partial derivatives of the splines, the above would modify to things like,
which in vector matrix form would now involve 2 different matrices,
but both S and S_y are still going to be O(N) sparse, so the same efficiencies apply.
Michael Brown
le 4 Oct 2023
I premultiply the sparse basis function matrixes for each dimension which generates a sparse basis function matrix B that has dimensions of the number data by the product of the number of spline coefficients in each dimension.
You should not do this. Since the basis matrix for each dimension is O(N), combining them together through multiplication will give a total matrix that has O(N^d) elements where d is the number of dimensions. Conversly, the computational approach I showed you keeps the basis matrices for each dimension separate, for a total memory footprint that is O(d*N).
I do not see a mathematical path that allows the basis function matrixes for the partials to be non-linearly combined to allow a linear (in spline coefficients) solution for the thermodynamic properties.
The equation I showed before,
is equivalent to the Kronecker product expression,
Fy(:)=kron(Sy,S)*X(:)
Certain nonlinear operations, in particular the ones you seem to be using, are distributive across Kronecker products as long as the matrix dimensions align e.g.,
kron(A,B).^2 = kron(A.^2,B.^2)
kron(A,B)./ kron(C,D) = kron(A./C, B./D)
kron(A,B) / kron(C,D) = kron(A/C, B/D)
Therefore, the nonlinear operations will give a new separable basis matrix and my technique of working with separate matrices in each dimension should again be applicable. As a side note, though, I do not quite understand why you would want to element-wise divide sparse matrices, like you are doing in this line,
out.velsq=Gp.^2./(Gpt.^2./Gtt-Gpp);
since most of the resulting elements will be 0/0=NaN.
although I usually solve the least square version X=(B'B)\(B'F)
I don't see why you would do that, since this is equivalent to X=B\F and B'*B has a poorer condition number than B alone. However, since B-spline matrices are well-conditioned, maybe the impact is minimal here...
Michael Brown
le 4 Oct 2023
Note that all the partial derivatives (predictions Fy) are full vectors - no zeros after the multiplication of basis functions with coefficients. No divide by zero.
If your Gxx are all full vectors, not basis matrices, then I don't understand what you were talking about when you referred to "allowing the basis function matrices for the partials to be non-linearly combined". What makes you think you would have to do anything like that? It might be clearer if you start communicating what you are trying to do with the partials in terms of LaTeX-formatted equations, rather than words or code.
Michael Brown
le 4 Oct 2023
I don't see any partial derivatives in the equations you've posted in your last comment, so I don't think they describe any new issues that rule out what I outlined before.
Hopefully you will see that equation 5 is what I wrote down here in slightly different notation. However, I want to reiterate that equation 6 isn't the way you would calculate the coefficients a_jk in practice. Instead, you rewrite equation 5 in the matrix form,
z=B*a*C.'
and the inversion is implemented as,
a=B\z/C.'
The matrix D in your equations is just D=kron (C,B), but you would never want to explicitly construct D because it is a much more intensive, both in memory and computation, to work with D and this gets worse and worse in higher dimensions.
Michael Brown
le 5 Oct 2023
What is the 3D version of this?
In 3D, you would have, using explicit Kronecker products,
a=kron(C,kron(B,A))\z(:);
To implement this using the approach I allude to above, you can download KronProd from the File Exchange, and do instead,
a=KronProd({A,B,C})\z;
Here is a small demo with 20x20 matrices, where you can already see considerable speedup:
n=20;
[A,B,C]=deal(rand(n),rand(n), rand(n));
z=rand(n^3,1);
%Use explict Kronecker Products
tic;
D=kron(C,kron(B,A));
a1=D\z;
toc
Elapsed time is 2.927993 seconds.
%Use KronProd
tic;
Dobj=KronProd({A,B,C});
a2=Dobj\z;
toc
Elapsed time is 0.006296 seconds.
percentError=norm(a1-a2)/norm(a1)*100
percentError =
1.9261e-10
Let's also look at the difference in memory consumption:
>> whos D Dobj
Name Size Bytes Class Attributes
D 8000x8000 512000000 double
Dobj 8000x8000 10062 KronProd
Michael Brown
le 5 Oct 2023
It appears that my makeCmn(A,B,C) is equivalent to your KronProd(A,B,C)
But it should be much faster and more memory-efficient, as demonstrated in my example.
We are back to trying to use the available associative properties of the tensors to formulate a non-linear combination of partial derivative basis functions as a linear problem.
I don't see how we are "back" there. The partial derivatives are small arrays and we've established that they can be efficiently computed using the same kind of Kronecker product manipulations that we've discussed at length.
Once you've computed the partial derivatives, you can manipulate them however you wish without any obvious computational challenges. However, if there's something I'm missing, you can elaborate on it in a new post, since your question in this one has been answered.
Sam Marshalik
le 5 Oct 2023
@Michael Brown: I am not sure if you have attempted to use Threads instead of Processes when running parpool (https://www.mathworks.com/help/parallel-computing/choose-between-thread-based-and-process-based-environments.html). If not, give it a shot, as threaded workers will be able to share memory space and will not need to duplicate variables (unless changes are made to them on the worker).
Catégories
En savoir plus sur Spline Postprocessing dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


