Advice constructing large sparse Jacobian arrays

I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces. Constructing the array of partial derivatives of the model predictions with respect to the model parameters (the Jacobian) is the most time consuming and memory limiting step. The arrays are large (>100,000 by >10,000) with the saving grace being that they are sparse (<2% non-zero). I have working code that strains a mac ultra (24 cores and 128 GB unified memory). With care in problem size choices things work as requested. However, I crash MATLAB and the macOS when I am insufficiently careful to not exceed memory limits.
As a practical matter, I watch MATLAB demand greater memory as it sets up the parallel loop and then release it while the loop executes. It is during the loop startup that MATLAB can run out of application memory.- the result is a crash giving: JavaScript Alert - https://127.0.0.1:31515 Message Service fatally disconnected. At this point, I have to force quit MATLAB or in some cases, restart the computer. I find that I can run a bigger problem if I submit it as a batch job rather than running the script in the command line.
At this point I am looking for advise on how to improve my code for greater speed with as little memory as possible required. Attached are code fragments that show what I have at this point with two versions - an ignorant parfor loop that lets MATLAB discover the non-zero array elements and an intellegent parfor loop that is aware of which data sites are sensitive to which model parameters. The second is slightly faster than the first but the speed enhancement is not as great as I expected.
There are also implicit questions in how MATLAB handles sparse arrays -
1. Do I save memory by type casting array indices as integers for the sparse array manipulations?
2. How small a floating point number is needed for MATLAB to label it zero for a sparse array?
3. I "invented" algorithms below to handle the parsing of array indices as needed for construction of the Jacobian. Perhaps there are cleaner quicker ways to accomplish this?
% Following are code fragments for non-linear optimization of tensor spline
% coefficients. Two methods are shown for the construction of the
% Jacobian - the matrix of partial derivatives of the model at all data
% sites with respect to the spline parameters (the coefficents)
% Actual problems are associated with roughly 500,000 data sites and 15,000
% model parameters. The code is running on a mac studio ultra with 24
% cores and 128 GB of ram. The number of workers has to be set to less than
% 24 in order to work within memory limits. The typical time for running
% an optimization ranges from 10 to 30 minutes. The majority of the time is
% consumed in the loop shown below. Thus, the bottlenck in calculations is
% the time and memory requirements of this loop to determine the Jacobians
% The spline coefficients are in vector "coefs"
% The collocation arrays (the spline basis function values) are in structure "Cntrl"
% The function "model" calculates a number of properties that are returned in a structure as separate
% vectors: y1, y2, ....
%
% In the following we track one prediction of the model, 'velsq'. The actual code has to work with multiple predictions
% The value for the finite difference derivative increment, dG, must be set
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 1 => code that is oblivious to the pattern of sparseness of the
% collocation arrays:
% ndat is number of data sites
% nspl is number of spline coefficients (model parameters)
predct1=model(coefs,Cntrl); %model output using the base set of coefficients
nz=0.02*ndat*nspl; % experience shows that the Jacobian sparse array has less than 2% non-zero elements
% preallocate an array to hold the Jacobian for model predictions:
A=sparse([],[],[],ndat,nspl,nz);
% set up parallel pool constants:
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
% Step through and perturb model parameter to determine finite difference
% derivatives of data as a function of model parameters - the Jacobian
parfor (i=1:nspl,num_worker)
coefs_tmp=C.Value;
coefs_tmp(i)=coefs_tmp(i)+dG;
predct2=model(coefs_tmp,D.Value);
A(:,i)=(predct2.velsq-E.Value.velsq)/dG; % blind calculation of prediction differences
%for all data sites. Let MATLAB decide which differences produce zeros for the sparse array A
end
% on exiting this loop we have the Jacobian in sparse array "A". This
% loop assumes no knowledge of the sparseness of the prediction differences
% and lets MATLAB make the call on what to place in the sparse array A
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 2 - Requires two steps - This is faster than the first example
% Below an effort is made to reduce numerical (hopefully memory as well) demands of the loop
% Step 1
% determine indices for data impacted by each parameter (do this once in an optimization run and place in Cntrl structure)
[jg,ig]=find(Cntrl.G); % get indices of all non-zero elements of the collocation array
Indexes=[jg(:) ig(:)]; % put indices in a matrix to facilitate making a list
% The cell structure Aindx{i} contains indices for data sites impacted by parameter "i"
Aindx=cell(nspl,1); % alocate a cell variable to hold all the data site indices associated with each parameter
C = parallel.pool.Constant(Indexes);
% loop through all parameters and gather indices of the data impacted by
% each parameter - make int32 to reduce memory required
parfor i=1:nspl
tmp=int32(C.Value(C.Value(:,2)==i,1));
Aindx{i}=tmp;
end
Cntrl.Aindx=Aindx; % put results in the Cntrl structure to make it available later.
% Step 2
% Construct the Jacobian - New Jacobian calculated with every iteration of the optimization
predct1=model(coefs,Cntrl); %model output using the base set of coefficients evaluated at all data sites
sdat=cell(nspl,1); % preallocate a cell structure to hold non-zero Jacobian values
% Step through and perturb model parameter to determine finite difference
% derivatives of data as a function of model parameters - the Jacobian
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
parfor (i=1:nspl,num_worker)
coefs_tmp=C.Value;
parm_id=D.Value.Aindx{i};
coefs_tmp(i)=coefs(i)+dG;
predict2=model(coefs_tmp,D.Value,parm_id); % calculate model values only for data sites impacted by the ith parameter
tmp=(predict2.velsq-E.Value.velsq(parm_id))/dG; % calculate finite differences only for impacted data sites
sdat{i}=tmp; % the length of each sdat{i} cell is variable and about 2% or less of total number of data
end
% at this point we have all non-zero elements of the Jacobian in a cell
% structure "sdat" with nspl cells, each cell contains the Jacobian values for
% data impacted by that parameter at data sites given in Aindx. These need to be unpacted and loaded
% into a sparse array
% Note that a cell structure rather than an array is used since the number
% of elements in each cell is variable
jtot=[];
itot=[];
stot=[];
parfor i=1:nspl
jtot=[jtot;Cntrl.Aindx{i}];
itot=[itot;i*ones(length(Cntrl.Aindx{i}),1)];
stot=[stot;s{i}];
end
A=sparse(jtot,int32(itot),stot,ndat,nspl);
% End of Example
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The function that calculates model predictions:
function out=model(coefs,Cntrl,id)
% Function to return model predictions given b spline coefficients with data site
% basis functions as described in the structure "Cntrl". Usage:
% out=model(coefs,Cntrl,id)
% where coefs are the spline parameters, Cntrl contains all the collocation
% information, id are indices of data sites influenced by a selected
% coefficient
if nargin==2
id=1:length(Cntrl.G(:,1));
end
%calculate all the necessary spline determined predictions
G=Cntrl.G(id,:)*coefs;
Gp=Cntrl.Gp(id,:)*coefs;
Gpp = Cntrl.Gpp(id,:)*coefs;
G3p = Cntrl.G3p(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
G4t=Cntrl.G4t(id,:)*coefs;
Gtt =Cntrl.Gtt(id,:)*coefs;
Gpt=Cntrl.Gpt(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
Vex=Cntrl.Vex(id,:)*coefs;
% predictions are placed in output structure
out.G=G; % Gibbs energy
out.V=Gp; % specific volume is Gp
out.Gtt=Gtt; % specific heat is -Gtt*T
out.velsq=Gp.^2./(Gpt.^2./Gtt-Gpp); % the combination of derivatives that gives the square of sound speeds
out.d2KdP2=G4p.*Gp.*Gpp.^(-2) + G3p./Gpp -2*G3p.^2.*Gp.*Gpp.^(-3); % the combination of derivatives giving the second derivative of the bulk modulus
out.G4t=G4t;% parameter to regularize - make "Gtt" smooth

4 commentaires

Perhaps there are cleaner quicker ways to accomplish this?
That would be easier to assess if we could see the optimization problem presented in mathematical form, rather than code.
I am not a thermodynamicist, so I was curious about how thermodynamic surfaces appear. The online results primarily displayed Maxwell's sculpture and this PVT diagram. Is modeling the Pressure, Volume, and Temperature what creates the sparse Jacobian array?
The mathematical statement of the optimization of parameters starts with the linear problem:
J*dp=devs
where J (the Jacobian) is the array of partial derivatives of the model with respect to parameters evaluated at data sites, dp is a vector of changes to the parameters and vector devs is the differences between model predictions and data. One inverts this to obtain revised (better) model parameters:
dp= inv(J)*devs The most straightforward matlab methods are: J\devs or least-square: (J'*J)\(J'*devs)
The idea is that solutions of a linear problem can iteratively lead to improvements of model predictions that are not linear with respect to model parameters. This is standard and underlies a whole body of "inverse theory" algorithms that is not the focus here. Many algorithms start with the need to construct the Jacobian which is, in my application, the most numerically intensive step.
Although the application is not important here, I am representing Gibbs energy with tensor b splines. Appropriate derivatives of G give the surfaces for individual phases as shown in the image above. With good representations, we predict those surfaces and their intersections. I published pioneering work here and further examples here
Also - all my publications come with matlab live scripts that demo the released versions of my codes which are also in GitHub repository linked to SeaFreeze.org -

Connectez-vous pour commenter.

 Réponse acceptée

Walter Roberson
Walter Roberson le 1 Oct 2023
Modifié(e) : Walter Roberson le 1 Oct 2023
1. Do I save memory by type casting array indices as integers for the sparse array manipulations?
No.
A1 = sparse(zeros(20,20));
A2 = sparse(zeros(20,20)); A2(20,20) = 1;
A3 = sparse(zeros(20,20)); A3(uint32(20),uint32(20)) = 2;
A4 = sparse(uint32(20), uint32(20), 3);
whos A1 A2 A3 A4
Name Size Bytes Class Attributes A1 20x20 184 double sparse A2 20x20 184 double sparse A3 20x20 184 double sparse A4 20x20 184 double sparse
Observe that they are all exactly the same size (though one might suspect that the completely empty one might be smaller because it has no non-zero entries at all.)
2. How small a floating point number is needed for MATLAB to label it zero for a sparse array?
format long g
idx = -350:-300;
A = (10.^idx).';
S = sparse(A);
[firstnz, ~, val_there] = find(S, 1);
idx(firstnz)
ans =
-323
val_there
val_there =
9.88131291682493e-324
B = sparse([eps(0), 0, -0])
B =
(1,1) 4.94065645841247e-324
B
B =
(1,1) 4.94065645841247e-324
So the boundary is at the smallest representable number. sparse() is testing for exact zeros, treating negative 0 the same as 0.

Plus de réponses (1)

Matt J
Matt J le 2 Oct 2023

0 votes

I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces.
I suggest looking at Example2D.m in,
which demonstrates efficient ways of doing 2D tensorial spline coefficient estimation.

16 commentaires

Hi, I'm working in 3D with 6th order b splines
Matt J
Matt J le 2 Oct 2023
Modifié(e) : Matt J le 2 Oct 2023
Shouldn't matter. The example should be easy enough to generalize to higher dimensions and spline orders.
Matt J
Matt J le 2 Oct 2023
Modifié(e) : Matt J le 2 Oct 2023
The mathematical statement of the optimization of parameters starts with the linear problem: J*dp=devs. The idea is that solutions of a linear problem can iteratively lead to improvements of model predictions that are not linear with respect to model parameters.
There shouldn't be any need for iterative computation if J is a fixed matrix of tensorial B-spline basis functions. As illustrated at the link I gave you, because of the tensorial structure of the B-splines, J has the form of a Kronecker product J=kron(J1,J2,J3,...,J6) which can be analytically pseudo-inverted very efficiently without constructing the complete sparse Jacobian.
Thank you for your reply and I am really trying to follow the ideas expressed. As I understand it, you are suggesting that I can form a non-linear combination of basis functions making use of their tensorial properties. One of my examples is Gp^2/(Gpt^2/Gtt-Gpp) where each Gxx is a matrix of basis functions and the subscripts denote which partial derivative of the underlying representation is specified) that then allows a linear evaluation of a property. I am not yet understanding the steps in the linked example to see how it applies to my application. Any suggested further reading?
Matt J
Matt J le 2 Oct 2023
Modifié(e) : Matt J le 2 Oct 2023
I still don't have a firm picture of the mathematics of the model. I'm assuming that you have a continuous function which, if it were simplified to 2D, would be parameterized something like the following, where x_ij are unknown spline coefficients and s(t) is a 1D spline basis function.
If we sample F(x,y) at discrete, integer locations x=m, y=n this gives,
which in matrix vector form is,
where S(m,i)=s(m-i) is a sparse matrix. The solution for X in MATLAB code would be,
X=S\F/S.'
The thing to notice is that even though F has O(N^2) elements, the sparse matrix S has only O(N) non-zero elements. Similarly, if you did this in 6D, F would be O(N^6) but the inversion would still involve only O(N) matrices.
In the case where you are taking partial derivatives of the splines, the above would modify to things like,
which in vector matrix form would now involve 2 different matrices,
but both S and S_y are still going to be O(N) sparse, so the same efficiencies apply.
You are stating the linear problem as I implement it - although I usually solve the least square version X=(B'B)\(B'F) where B is the matrix of basis functions and F is vector of data (flattend from the 3D array of x,y,z data points) and X are the unknown spline coefficients (flattend from the 3D array). And yes, the O(N) scaling of basis functions is the saving feature of the tensor problem. I premultiply the sparse basis function matrixes for each dimension which generates a sparse basis function matrix B that has dimensions of the number data by the product of the number of spline coefficients in each dimension. Partial derivatives of the representation each have their own basis function matrix. I predict all the needed partial derivatives Fy=By*X where Fy is a specified partial derivative of the representation at all data locations, By is the sparse matrix of basis functions associated with that partial derivative, and X is flattened from a 3D array to a vector (as are the data). The partial derivative predictions, in non-linear combinations, lead to predictions of thermodynamic properties. I do not see a mathematical path that allows the basis function matrixes for the partials to be non-linearly combined to allow a linear (in spline coefficients) solution for the thermodynamic properties.
Matt J
Matt J le 4 Oct 2023
Modifié(e) : Matt J le 4 Oct 2023
I premultiply the sparse basis function matrixes for each dimension which generates a sparse basis function matrix B that has dimensions of the number data by the product of the number of spline coefficients in each dimension.
You should not do this. Since the basis matrix for each dimension is O(N), combining them together through multiplication will give a total matrix that has O(N^d) elements where d is the number of dimensions. Conversly, the computational approach I showed you keeps the basis matrices for each dimension separate, for a total memory footprint that is O(d*N).
I do not see a mathematical path that allows the basis function matrixes for the partials to be non-linearly combined to allow a linear (in spline coefficients) solution for the thermodynamic properties.
The equation I showed before,
is equivalent to the Kronecker product expression,
Fy(:)=kron(Sy,S)*X(:)
Certain nonlinear operations, in particular the ones you seem to be using, are distributive across Kronecker products as long as the matrix dimensions align e.g.,
kron(A,B).^2 = kron(A.^2,B.^2)
kron(A,B)./ kron(C,D) = kron(A./C, B./D)
kron(A,B) / kron(C,D) = kron(A/C, B/D)
Therefore, the nonlinear operations will give a new separable basis matrix and my technique of working with separate matrices in each dimension should again be applicable. As a side note, though, I do not quite understand why you would want to element-wise divide sparse matrices, like you are doing in this line,
out.velsq=Gp.^2./(Gpt.^2./Gtt-Gpp);
since most of the resulting elements will be 0/0=NaN.
although I usually solve the least square version X=(B'B)\(B'F)
I don't see why you would do that, since this is equivalent to X=B\F and B'*B has a poorer condition number than B alone. However, since B-spline matrices are well-conditioned, maybe the impact is minimal here...
I will work through your ideas above.
Note that all the partial derivatives (predictions Fy) are full vectors - no zeros after the multiplication of basis functions with coefficients. No divide by zero.
Consistently with my collection of basis functions, I find B\F to be significantly slower to give essentially the same result.
Matt J
Matt J le 4 Oct 2023
Modifié(e) : Matt J le 4 Oct 2023
Note that all the partial derivatives (predictions Fy) are full vectors - no zeros after the multiplication of basis functions with coefficients. No divide by zero.
If your Gxx are all full vectors, not basis matrices, then I don't understand what you were talking about when you referred to "allowing the basis function matrices for the partials to be non-linearly combined". What makes you think you would have to do anything like that? It might be clearer if you start communicating what you are trying to do with the partials in terms of LaTeX-formatted equations, rather than words or code.
To be clear here is my articulation of the problem.
with solution in matlab language as:
The extension to three independent variables is an extension of this process
Matt J
Matt J le 4 Oct 2023
Modifié(e) : Matt J le 4 Oct 2023
I don't see any partial derivatives in the equations you've posted in your last comment, so I don't think they describe any new issues that rule out what I outlined before.
Hopefully you will see that equation 5 is what I wrote down here in slightly different notation. However, I want to reiterate that equation 6 isn't the way you would calculate the coefficients a_jk in practice. Instead, you rewrite equation 5 in the matrix form,
z=B*a*C.'
and the inversion is implemented as,
a=B\z/C.'
The matrix D in your equations is just D=kron (C,B), but you would never want to explicitly construct D because it is a much more intensive, both in memory and computation, to work with D and this gets worse and worse in higher dimensions.
The partials are inherent in the B and C matrixes of basis functions. If B is for the x corrdinate and C for the y coordinate, and a subscript means the derivative set of basis functions, then I calculate Dy=kron(Cy,B) and Gy=Dy*X (X being the coefficients and Gy(x,y) being the partial of G with y)
I do calculate D in both 2 and 3 dimensions. It is large but not impossible. I am not an applied mathematician. I looked at the form of the tensor multiplications, didn't know about "kron" and created my own function for 2 and 3 D (below). I've just compared my calculations with those using kron. Exactly the same.
The calculations of the inverse problem using D take a few seconds - not really a show stopper.
You have the 2D problem:
z=B*a*C.'
What is the 3D version of this?
function D=makeCmn(A,B,C)
% This function combines collocation matrixes A(i,j) B(k,l) C(p,q) for
% three dimensions (associated with a 4D hypersurface tensor spline)
% into a single matrix of size i*k*p by j*l*q. The idea is that the
% tensor spline calculation is given as:
% sum sum sum (ABC*a) where A,B, and C are collocation matrix and a is
% an array of coefficients.
% This can be remapped into a single matrix multiply D*a(:)
% where D is the matrix returned here.
% If working in 3D pass only matrixes A and B to this function
% If A and B are row vectors, makeCmn returns a
% single row corresponding to a single "data" point.
% JMB 2011
% Below I take advantage of sparseness -
[ai,aj]=size(A);
[bk,bl]=size(B);
%Determine whether there is another dimension to the problem
switch nargin
case 3
[cp,cq]=size(C);
case 2
C=1;
cp=1;cq=1;
end
%Determine (1) which elements are non-zero (A,B,C are sparse) and (2) the sizes of the matrixes
nR= ai*bk*cp;
nC= aj*bl*cq;
[ii,jj,sa]=find(A);
[kk,ll,sb]=find(B);
[pp,qq,sc]=find(C);
ni=length(ii);
nk=length(kk);
np=length(pp);
nmm=ni*nk*np;
% preallocate vectors
mm=zeros(nmm,1);
nn=zeros(nmm,1);
v=zeros(nmm,1);
%Do the work
for p=1:np
for k=1:nk
im= (kk(k)-1)*ai + bk*ai*(pp(p)-1);
in= (ll(k)-1)*aj + bl*aj*(qq(p)-1);
count=(k-1)*ni +ni*nk*(p-1);
for i=1:ni
mm(count+i)= ii(i) + im;
nn(count+i)= jj(i) + in;
v(count+i)=sa(i)*sb(k)*sc(p);
end
end
end
%Assemble the resultant sparse matrix
D=sparse(mm,nn,v,nR,nC);
end
Matt J
Matt J le 5 Oct 2023
Modifié(e) : Matt J le 5 Oct 2023
What is the 3D version of this?
In 3D, you would have, using explicit Kronecker products,
a=kron(C,kron(B,A))\z(:);
To implement this using the approach I allude to above, you can download KronProd from the File Exchange, and do instead,
a=KronProd({A,B,C})\z;
Here is a small demo with 20x20 matrices, where you can already see considerable speedup:
n=20;
[A,B,C]=deal(rand(n),rand(n), rand(n));
z=rand(n^3,1);
%Use explict Kronecker Products
tic;
D=kron(C,kron(B,A));
a1=D\z;
toc
Elapsed time is 2.927993 seconds.
%Use KronProd
tic;
Dobj=KronProd({A,B,C});
a2=Dobj\z;
toc
Elapsed time is 0.006296 seconds.
percentError=norm(a1-a2)/norm(a1)*100
percentError =
1.9261e-10
Let's also look at the difference in memory consumption:
>> whos D Dobj
Name Size Bytes Class Attributes
D 8000x8000 512000000 double
Dobj 8000x8000 10062 KronProd
Sorry - my LaTeX skills are nil
It appears that my makeCmn(A,B,C) is equivalent to your KronProd(A,B,C) and the first lines of your response are exactly the problem I routinely set up.
ie you write:
a=KronProd({A,B,C})\z;
and I have in my linear problem code for independent variables (p,t,m) giving G with coefficients a
a=makeCmn(A(p),B(t),C(m))\G;
or for a partial derivative of G with p: dG/dp=Gp=makeCmn(Ap,B,C)*a
where Ap are the first derivative basis functions associated with A
or G with p and t : d2G/dp/dt=Gpt=makeCmn(Ap,Bt,C)*a
We are back to trying to use the available associative properties of the tensors to formulate a non-linear combination of partial derivative basis functions as a linear problem.
Matt J
Matt J le 5 Oct 2023
Modifié(e) : Matt J le 5 Oct 2023
It appears that my makeCmn(A,B,C) is equivalent to your KronProd(A,B,C)
But it should be much faster and more memory-efficient, as demonstrated in my example.
We are back to trying to use the available associative properties of the tensors to formulate a non-linear combination of partial derivative basis functions as a linear problem.
I don't see how we are "back" there. The partial derivatives are small arrays and we've established that they can be efficiently computed using the same kind of Kronecker product manipulations that we've discussed at length.
Once you've computed the partial derivatives, you can manipulate them however you wish without any obvious computational challenges. However, if there's something I'm missing, you can elaborate on it in a new post, since your question in this one has been answered.
@Michael Brown: I am not sure if you have attempted to use Threads instead of Processes when running parpool (https://www.mathworks.com/help/parallel-computing/choose-between-thread-based-and-process-based-environments.html). If not, give it a shot, as threaded workers will be able to share memory space and will not need to duplicate variables (unless changes are made to them on the worker).

Connectez-vous pour commenter.

Catégories

Produits

Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by