can I pass these nonlinear constraints to lsqnonlin?

27 vues (au cours des 30 derniers jours)

Afficher commentaires plus anciens

SA-W le 15 Juin 2023

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/1983454-can-i-pass-these-nonlinear-constraints-to-lsqnonlin

Commenté : Matt J le 30 Juin 2023

Réponse acceptée : Matt J

Ouvrir dans MATLAB Online

Let

denote a function of two variables and

the parameters of the optimization problem

which I want to solve with lsqnonlin. To calculate

, fmust be convex at all iterations. Otherwise, it is purely a matter of luck that the identification works.

My idea is to enforce the convexity by enforcing the Hessian of fto be positiv definite. The determinant of the Hessian is given by (please correct me if I am wrong)

Given that, I would implement the constraints

c =

in a function

function [c,ceq] = mycon(x)
   ceq = [];
   c = ...     % the above formula
end

I evaluate the above equation at n points in the {x,y} space, i.e., c is a vector with nentries.

Can I implement those constraints as nonlinear constraints or do I make things too complicated?

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Matt J le 15 Juin 2023

To calculate sim , f must be convex at all iterations.

Convex as a function of (x,y), I assume you mean. Is is already a convex function of E.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Réponse acceptée

Matt J le 15 Juin 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1983454-can-i-pass-these-nonlinear-constraints-to-lsqnonlin#answer_1256449

Modifié(e) : Matt J le 15 Juin 2023

Can I implement those constraints as nonlinear constraints

You can, but you have a few problems:

(1) Nonlinear constraints cannot be enforced at all iterations. In fact, only simple bounds constraints can be.

(2) det(A)>0 is not a sufficient condition for the positive definiteness of A. By Sylvester's criterion, you need to ensure

as well, although that will be a simpler, linear constraint in E.

(3) The constraints need to be satisfied for all (x,y) in whatever domain f(x,y) is defined on. In theory, that gives you an unaccountably infinite number of constraints. To be practical, you could relax the constraint, imposing it on some discrete grid of points (x_j, y_j), but in theory, you could not be sure that f(x,y) is convex everywhere between these points.

or do I make things too complicated?

Maybe. You would need to tell us more about the properties of the N_i(x,y) functions and what g^sim() is doing. If the N_i(x,y) functions are all convex individually, then it would be sufficient (although not necessary) to impose the much simpler constraints E_i>=0.

44 commentaires
Afficher 42 commentaires plus anciensMasquer 42 commentaires plus anciens

SA-W le 15 Juin 2023

interp.png

Thank you!

can be easily implemented in A*x<b.

" Maybe. You would need to tell us more about the properties of the N_i(x,y) functions and what g^sim() is doing. If the N_i(x,y) functions are all convex, then it would be sufficient (although not necessary) to impose the much simpler constraints E_i>=0. "

Think about the N_i(x,y) as interpolation functions (piecewise polynomials). Attached is a picture where n=2 and f being a function just of x: as you can see, the N_i look like hat functions, they are 1 at one support point and zero at all others. The support points x_i are the points where the parameters E_i are defined, i.e. f(x_i) = E_i.

In my case, they are a little bit smoother such that I can take second derivatives of them, but the shape is like in the picture which means that they are not convex. E_i >=0 are lower bounds of my optimization, but this does not really help given the shape of the N_i, right?

SA-W le 15 Juin 2023

I have to think about that. Doing this local approximation of f is the novelty of my work, so I should research this in more depth.

As for your point (1):

I am aware that only bound constraints can be enforced at all itereations. We already talked about this issue in https://de.mathworks.com/matlabcentral/answers/1921445-fmincon-any-way-to-enforce-linear-inequality-constraints-at-intermediate-iterations?s_tid=prof_contriblnk

There, you suggested to (i) have the objective return NaN in case of evaluation failures, to (ii) project the current point onto the set of feasible points thats satisfy A*x<=b using lsqlin.

Returning NaN leads to exitflag=2 most of the time (although I have step_tolerance=1e-12) and projecting using lsqlin makes most of my constraints active, which is also not the best way to go. It ould be better to have them inactive.

What really helped me is to multiply my constraints by a large number (A*x<=b becomes (1e4*A)*x<=b). Just passing A and b to fmincon or lsqnonlin, the constraints were basically violated at all iterations. At the linked question above, you wrote

"it violates the theoretical assumptions of fmincon, and probably all the Optimization Toolbox solvers as well, when the domain of your objective function and constraints is not an open subset of . If you forbid evaluation outside the closed set defined by your inequality constraints, that is essentially the situation you are creating."

Can you comment a little bit more on this? At the end, is it not the same situation that I also create when (i) returning NaN or (ii) projecting the current point onto the feasible set?

Matt J le 15 Juin 2023

Modifié(e) : Matt J le 15 Juin 2023

I have to think about that. Doing this local approximation of f is the novelty of my work, so I should research this in more depth.

Might you be able to choose basis functions that are convex? Then you would still have some semi-local control over f through the E coefficients. Beyond that, it doesn't really make sense to be controlling f locally when f must satisfy a global property like convexity. A function generally cannot be perturbed locally withour destroying convexity.

Can you comment a little bit more on this?

The theory behind fmincon's algorithms were derived under the assumption that the objective and constraint functions are continuously differentiable over the domain D where you might want the algorithm to search. In order to be differentiable over D, it is necessary that these functions be defined on an open superset of D.

The following 1D function is an example that violates this, assuming the search domain is D= {x|x>=0}.

Since D is closed and f is not defined throughout any open superset of D, it is not possible to evaluate the derivative of f at x=0. Note that even though f has a right-hand derivative at x=0, it lacks a left-hand derivative, and therefore is not differentiable.

When f is undefined outside a closed domain D, there are also practical difficulties for finite differencing. If the solution xopt is on the boundary of D, and if you are approximating the derivatives using central finite differences, you have to take the finite differences with smaller and smaller step sizes as xopt is approached. Otherwise, the finite differencing will eventually, at some iteration, have to reach for x that are outside of D where f is undefined. On the other hand, if you let the step size shrink arbitrarily, it will eventually fall below floating point precision and the derivative estimation will fail.

At the end, is it not the same situation that I also create when (i) returning NaN

Yes, but the common situations where people return NaN from the obejctive and constraints are problems where the domain D is naturally open. For example f(x)=x-log(x) has an open search domain D={x|x>0}, and therefore an open superset for D is D itself. Therefore, setting f(x)=NaN for x<=0 don't give the above difficulties.

or (ii) projecting the current point onto the feasible set?

No. Let D be a non-empty search domain whose points x satisfy ,

(i) f(x) is defined and

(ii) x is feasible.

Also, let P(x) be a projection from any x \in R^n onto D. Then f(P(x)) will be defined throughout R^n due to (i).

SA-W le 16 Juin 2023

"The theory behind fmincon's algorithms were derived under the assumption that the objective and constraint functions are continuously differentiable over the domain D where you might want the algorithm to search. In order to be differentiable over D, it is necessary that these functions be defined on an open superset of D."

I think I got the gist of your examples. So you would not multiply the A matrix by a large number to have fmincon pay attention to the linear constraints? In my case, I return the Jacobian in the objective function and there are no finite-difference calculations involved. I can not really grasp what practical implications (1e4*A) has and when this might lead to problems.

"Might you be able to choose basis functions that are convex? Then you would still have some semi-local control over f through the E coefficients. Beyond that, it doesn't really make sense to be controlling f locally when f must satisfy a global property like convexity. A function generally cannot be perturbed locally withour destroying convexity."

I agree with you that controlling f(x,y) locally with finite element basis functions is not the best way to go. In 1d, f=f(x), it worked well with local basis functions but in 2d, f=f(x,y), I have the issue that the {xy} space is not "sampled" uniformly when calculating g^sim. Consider a grid with n uniformly spaced support points, f is mainly evaluated along the grid diagonal since there is a little correlation between x and y. Then, of course, I can not calibrate the parameters E_i far off the diagonal which is the main issue I currently face.

Maybe semi-local control of f is a workaround. Can you explain what "semi-local control over f through the E coefficients" means? And to answer your question, my basis functions do not neccessarily be convex. What are convex basis functions that you think might be appropriate?

SA-W le 16 Juin 2023

Modifié(e) : SA-W le 16 Juin 2023

"Imagine your basis functions were Gaussian lobes."

Gaussian lobes are Gaussian basis functions

or did you refer to something else?

"No way for me to know. The physics of the problem and what it says about how f() should look have not been described to us."

Let me try to give you a few more details, maybe you can give a suggestion then that allow me to explore new things...

The pde that I am solving is

, which is the balance of linear momentan (Newton's law). It is a purely mechanical boundary-value-problem and I solve it by means of the finite-element-method. P is a second-order tensor (stress tensor, 3x3 matrix) which can be derived from the strain energy density fas follows:

, where Cis a SPD second order tensor (strain tensor, 3x3 matrix). fis a scalar valued function (strain energy density) chracterizing the constitutive behavior of the material under consideration. If we assume the material to be isotropic (same response regardless of the loading direction), then we can express $f$ in terms of (some of) the principal invariants of $C$, i.e., $f(C) = f(I_1(C), I_2(C))$. Renaming the invariants to x and y, we end up with

as I defined it in my question. From the physics we know that fmust vanish for zero deformation, which is the case for

, where it should also attain its global minimum. f should have no other stationary points. Also we know that

based on the properties of C and

since energy can not be negative. Also, $f$ should go to infinity if either x or y approach infinity and is ideally convex. However,

are resonable upper bounds for most physical applications. That are the main properties we know about f.

What we usually do in material modeling is to prescribe the global shape like

and fit a quite small number of material parameters (here

) to some measurements. However, this approximation a priori assumes a particular shape of fand it turns out that this global approximation is not really satisfactory which is why people try to find better approximations of fto make it a good model for nearly all loading cases of the above boundary-value-problem. My idea is/was to discretize f(locally) over the space of invariants (x and y), distribute some support points in that space, and find the values of fat those support points (the

) with optimization. This works excellent in a one-dimensional invariant space, but as I told you, not so good in two-dimensional invariant space because the principal invariants of a SPD tensor show some correlation; In other words, there are some support points (x_a, y_a) that I do not really "touch" when I solve my boundary-value-problem, in particular the ones at the boundary. I highly doubt it works to identify the parameters on those support points with a purey local approximation.

As you said, maybe doing a semi-local approximation helps in a two-dimensional invariant space. What I can think of is to keep idenitifying some of the

(the ones at the "most activated" support points) with optimization, and computing the

at the boundary based on the optimized

and the convexity property.

Can this be achieved by a special class of (convex) basis functions? I am open for any input in any direction!

SA-W le 17 Juin 2023

Modifié(e) : SA-W le 17 Juin 2023

You don't necessarily need to situate a basis function at every nodal point.

Above is an example

, where the N_i are the local finite element basis functions that I use so far. The blue dots are the sampled points when I solve the pde. You can clearly see the correlation between x and y which makes it nearly impossible to identfiy all E_i. Replacing my N_i with your proposed

N_ij(x,y) = E_ij*log( 1 + exp(a_ij*(x-i)) + exp(b_ij*(y-j)) )/log(3)

will also not remedy this issue I think.

But what I would do now is to situate three basis function at the support points building the diagonal, i.e.,

with

N_i(x,y) = E_i*log( 1 + exp(a_i*(x-x_i)) + exp(b_i*(y-y_i)) )/log(3)

In words, I would only situate a basis function at the most frequent sampled points (here, the three points along the diagonal). Do you think this makes sense? I think I do not have the problem anymore that most of the E_i are not sampled when solving the pde.

Matt J le 19 Juin 2023

Modifié(e) : Matt J le 20 Juin 2023

Ouvrir dans MATLAB Online

Most of the f(x,y) that I am working on are indeed of the form f(x,y)=h(x)+g(y).

In that case, the basis decomposition I would probably use is,

N_i(x) = C_i*log( 1 + exp(a_i*(x-x_i))) = C_i *softplus(a_i*(x-x_i)))
M_j(y) = D_j*log( 1 + exp(b_i*(y-y_j))) = D_i *softplus(b_i*(x-x_i)))
f(x,y)= sum_i N_i(x) + sum_j N_j(y)

Also, I happen to have numerically stable code for the softplus(z)=log(1+exp(z)) function that I can give you (below).

I think x_i,y_i,E_i<=10 are reasonable upper bounds, so overflow should not be a matter of concern.

I don't see why that removes the concern unless you can also bound the a_i and b_i values.

Why in particular 33 (a_i*(x-x_i)<=33)?

I believe z>=33 is the threshold past which f(z)=log(1+exp(z)) and f(z)=z can't be distinguished in double float precision.

function y=softplus(x)
%Accurate implementation of log(1+exp(x))
    idx=x<=33;
    y=x;
    y(idx)=log1p( exp(x(idx)) );
    
end

Matt J le 20 Juin 2023

Modifié(e) : Matt J le 20 Juin 2023

Ouvrir dans MATLAB Online

I also mentioned the possibility of adding an additional parameter on the exponentials.

N=@(x) 10*log( 1 + A*exp(a*(x-5)));

With the right selection of A, it doesn't seem too bad

A=0.1;

avalues=linspace(-0.5,0.5,9);

for a=avalues

N=@(x) 10*log( 1 + A*exp(a*(x-5)));

hold on

fplot(N,[0,10])

hold off

end; legend("a="+avalues,'Location','north')

SA-W le 20 Juin 2023

I also mentioned the possibility of adding an additional parameter on the exponentials.

Yes, but the additional parameters require additional constraints and, more important, increase the number of parameters by

. So it is probably a trade-off: if I do not situate a basis function at every support point, I have to introduce the additional parameters on the exponentials to make the basis more flexible. If I have a higher density of basis functions, I can probably set them to one. Makes intuitively sense?

Connectez-vous pour commenter.

Plus de réponses (1)

Matt J le 20 Juin 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1983454-can-i-pass-these-nonlinear-constraints-to-lsqnonlin#answer_1259224

Modifié(e) : Matt J le 27 Juin 2023

Ouvrir dans MATLAB Online

Most of the f(x,y) that I am working on are indeed of the form f(x,y)=h(x)+g(y).

If this is true, then ensuring the convexity of f(x,y) is the same as ensuring the convexity of h(x) and g(y) as 1D functions, which is much simpler. I you use 1D linear interpolating basis functions,

h=@(x) interp1([E(1),E(2),E(3),...,En] ,x)

then you can ensure the convexity of h just by ensuring that the second order finite differences are increasing, which is a simple linear constraint on the E(i),

E(i+1)-2*E(i) + E(i+1)>=0

No need for nonlinear constraints at all. Moreover, if you make the change of variables

E=cumsum([C2,cumsum([C1,D])]);

where D(j), C1, and C2 are the new set of unknowns, then the constraints on D required for convexity are simple positivity bounds D(j)>=0. As you'll recall, bound constraints can indeed be enforced at all iterations, which you said is what you wanted.

D=rand(1,20);

C1=-5;

C2=0;

E=cumsum([C2,cumsum([C1,D])]);

fplot( @(x) interp1( E,x ) ,[1, numel(E)] ); xlabel x; ylabel h(x)

89 commentaires
Afficher 87 commentaires plus anciensMasquer 87 commentaires plus anciens

Matt J le 20 Juin 2023

Modifié(e) : Matt J le 20 Juin 2023

Ouvrir dans MATLAB Online

If we have f=f(x) and there are n support points x_i, there are also n parameters E_i.

f(x) is a continuous function on some interval [fmin, fmax]. You have a parametric model

where the s_i are known shifts which can be thought of as the "locations" of the basis functions. The function F(x) is also continuous and defined on some possibly larger domain [Fmin,Fmax]. You want these functions to agree at certain discrete sample points,

and to choose the unknown E_i to achieve this agreement as closely as possible.

You seem to be assuming that for every basis function location s_i there needs to be corresponding x_j=s_i, but there is no reason to have that assumption. There is no necessary relationship between the s_i and x_j at all other than that they should be chosen to ensure,

Additonally, there are certain mathematical simplifications and conveniences if you choose the s_i so that

how do I transform the "real" parameters into D?

The inverse of,

E=cumsum([C2,cumsum([C1,D])])

D=diff(E,2)

For example,

D=rand(1,8)
D = 1×8
    0.1401    0.6240    0.2104    0.2853    0.7782    0.2184    0.6423    0.4381
C1=rand; C2=rand; 
E=cumsum([C2,cumsum([C1,D])]);
D2=diff(E,2)
D2 = 1×8
    0.1401    0.6240    0.2104    0.2853    0.7782    0.2184    0.6423    0.4381

Matt J le 21 Juin 2023

Ouvrir dans MATLAB Online

You said, I can make the E-grid a little larger than the f-grid

Yes, but I think it's moot at this point. I think we've determined that you can make the grids the same size, and the convexity conditions E(i+1)-2*E(i)+E(i-1) do not have to be satisfied at all i. It is sufficient for this to hold for i=2,...n-1

But how can that work with linear basis functions? Here, we have , but how would I determine and . We said f(x) is not defined there.

One way would be to not use linear interpolating basis functions. We talked about basis functions like

E_i log(1+A_i*exp(a_i*(x-s_i))), i=1...n

which do not have a compact footprint Instead, each basis function has a footprint over the entire domain of x. The value F(x0) at any x0 is therefore influenced by all n basis functions, So there is reason to think that all 3*n parameters could be estimated, assuming you have 3*n sample points f(x_j) wherever they happen to be located. Although the more spread out the x_j the better conditioning the estimation problem is likely to have

Another approach would be to add a penalty function on the roughness of the coefficients E_i to your objective function, e.g.,

Penalty(x)= smallweight * sum_i abs(E(i)-E(i+1)) , i=1...n-1

Certainly if you have linear interpolating basis functions, we expect the E(i) to step smoothly from one i to the next, and the above penalty would enforce that, even without having any samples f(x_j) that provides information on a particular E_i.

SA-W le 26 Juin 2023

Modifié(e) : SA-W le 26 Juin 2023

Ouvrir dans MATLAB Online

@Matt J

I would appreciate your opinion regarding my new results when approximating f(x,y) = g(x) + h(y).

g(x) is the above plot and h(y) the plot below, each of which is approximated with five parameters and linear basis functions. If the objective function is

||g^sim(f(x,y)) - g^exp ||^2

"No noise" means that I solve the pde with the reference paramters (green curve) to obtain g^exp, and "1% noise" means that I add random noise to g^exp with a standard deviation of 1%*max(g^exp) to emulate real experimental data.

As you can see, in the noiseless case I have perfect fitting results, not so in the case with noise.

As I told you, x and y are somwhow correlated at some intervals in the {xy} space, see the plot below where I obtained g^exp using a coupled approximation to f(x,y) with nine parameters.

Do you think the correlation between x and y is the reason why the identification fails for the noisy case above? Here, h(y)=sqrt(y) is concave in the example above, which is why I implemented E(i+1)-2*E(i) + E(i+1) <=0, i=2,...,4 and E(4)<=E(5).

I mean, due to correlation, the columns of the Jacobian associated with x and y are nearly identical and I was sceptical that the decoupled approximation can work at all under those circumstances. But it clearly worked in the noiseless case.

Matt J le 27 Juin 2023

Modifié(e) : Matt J le 27 Juin 2023

Ouvrir dans MATLAB Online

From the documentation, exitflag=2 means "Change in x is less than the specified tolerance, or Jacobian at x is undefined."

Jacobian at x is undefined. It would be easy for you to use assert() to trap the case where you have NaN's or inf in your Jacobian.

Change in x is less than the specified tolerance: If the objective is very flat at the current iteration, the optimizer can be fooled into thinking it is close to a minimum, even if it is not. This can result in it taking prematurely small steps or stopping outright. This can happen even if you get an exitflag=1. In the example below, the minimum at x=1 is reached with an error of more than 11%, which some might consider large:

opts=optimoptions('lsqnonlin','StepTolerance',1e-12);
[x,fval,res,exitflag]=lsqnonlin(@(x) (x-1).^4, 1.2,[],[],opts)
Local minimum found.

Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.
x = 1.1125
fval = 2.5658e-08
res = 1.6018e-04
exitflag = 1

SA-W le 27 Juin 2023

Modifié(e) : SA-W le 27 Juin 2023

Ouvrir dans MATLAB Online

I have very tight tolerances:

opts=optimoptions('lsqnonlin','StepTolerance',1e-12, 'FunctionTolerance', 1e-12, 'OptimalityTolerance', 1e-9);

With these tolerances, the error in your example decreases further to around 3%, which, of course, may be still considered large.

We would expect ground truth to be fairly close to the minimum and for the objective to be getting pretty flat in that general vicinity. Yet,the optimizer strongly pulls away from it.

If you consider noise with 1%*max(gexp) standard deviation small, I agree that the minimum should be close to ground truth. But why would you expect the objective to be getting flat in that general vicinity? If I think of minimizing f(x)=x^2, the objective is far from being flat at the minimum.

Also, if my objective were really flat around ground truth, would this not explain the inaccuracies in g and h? If I think of a nearly flat valley, there are many parameter combinations giving the same objective function value.

Matt J le 27 Juin 2023

Modifié(e) : Matt J le 27 Juin 2023

Ouvrir dans MATLAB Online

With these tolerances, the error in your example decreases further to around 3%, which, of course, may be still considered large.

But remember also that we have no idea precisely what the Taylor expansion of your actual function looks like at the minimum, and whether your tolerances are appropriate to it.. I can modify my example as below to get poor accuracy again, even with your actual tolerances.

opts=optimoptions('lsqnonlin','StepTolerance',1e-12, 'FunctionTolerance', 1e-12, 'OptimalityTolerance', 1e-9);
[x,fval,exitflag]=lsqnonlin(@(x) (x-1).^6, 1.2,[],[],opts)
Local minimum found.

Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.
x = 1.1157
fval = 5.7788e-12
exitflag = 2.4039e-06

If I think of minimizing f(x)=x^2, the objective is far from being flat at the minimum.

When I say "flat", I mean that qualitatively speaking the gradients are getting small. Even in my example with (x-1)^4, the objective is not perfectly flat anywhere, but you can still see that it is flat enough to cause early termination.

Matt J le 27 Juin 2023

Modifié(e) : Matt J le 27 Juin 2023

Ouvrir dans MATLAB Online

If your objective is nearly flat in a broad area around ground truth, it would explain why the function is sensitive to noise. The addition of noise can "tilt" the floor of the valley so that water runs downhill away from ground truth. We do know that once you added the 1% noise, you tilted the valley away from ground truth significantly, because the optimizer pulled strongly away from ground truth when initialized there.

One thing you could try is to add curvature penalty terms to your objective function. For lsqnonlin, this would be equivalent to extending your residual vector,

residual =[residual; smallnumber*[D_g;D_h]]

where D_g and D_h are the D parameters (from our earlier discussion) of g and h. This will discourage g and h from bending more than necessary.

SA-W le 28 Juin 2023

If so, that could be another reason why your StepTolerance stopping threshold is being met prematurely, resulting in exitflag=2.

Indeed, if I do forego the linear constraints, which works surprisingly for a quite small number of initial guesses, I have exitflag=3 with FunctionTolerance=1e-12. Reducing it to say 1e-6, sometimes results in exitflag=1.

But why is the higher weighting of the constraints (1e4*A) a constellation that can lead to exitflag=2? I know that this somehow restricts the domain of the parameters. But exitflag=2 makes sense to me if I think of a nearly flat objective as discussed, but not if the search domain is restricted.

Yes, if lsqnonlin is handing an E vector to your objective function code, you can transform from E to D within each call to your objective function code and use D for whatever computations are needed.

But I mean if I do not make the transformation to D-space, but keep the E vector. In that case, how would the curvature penalty term look like (if possible)?

Matt J le 28 Juin 2023

Modifié(e) : Matt J le 28 Juin 2023

Instead of passing these linear constraints to lsqnonlin, you want to attach them to the residual.

No, you would still keep the constraints as they are. They are not related to what we are doing with the residuals.

The constraints are forcing the derivatives to be non-negative.

Conversely, the residual terms are not being used to control whether the second derivatives are positive or negative in sign. Rather they are trying to keep the second derivatives small in magnitude. Moreover, the algorithm will normally not push the residuals to zero precisely, so the second derivatives will merely be encouraged to be small in magnitude rather required to have some exact value, as with constraints. Finally, the residuals can be prioritized by weighting them differently, unlike the constraints. By varying the penalty weight factor smallnumber, we can put different priority on those residuals as compared to the data fitting residual terms gsim-gexp.

SA-W le 29 Juin 2023

I see. I will append the residual vector by the constraints once my solver is available and report again then.

I did not fully understand your comment,

It's algorithm dependent thing, but basically if you force the algorithm to prioritize the constraints over progress in reducing the objective function, it may need to take much small steps than it otherwise would, and this can trigger the steptolerance stopping criterion.

regarding my question

But why is the higher weighting of the constraints (1e4*A) a constellation that can lead to exitflag=2

Say the algorithm takes some smaller steps to fulfill the constraints (due to 1e4), which are currently violated. If satisfied, the algorithm can devote again to reducing the objective function and may take larger steps again. What I want to say is that, to fullfill the constraints, a step tolerance of 1e-12 should not be triggered.

Most of the time when I have exitflag=2, the following happens: The algorithm takes some regular steps, but from a certain iteration on, the sum of squares remains static (far away from ground truth) and the parameters change only at high digits after the decimal point:

Iteration 50: resnorm=234, step size = 1e-3,

Iteration 51: resnorm=234, step size = 1e-4,

...

Iteration 58: resnorm=234, step size = 1e-11,

--> step tolerance is passed, algorithm terminates

As I said, this happens at points far away from the minimum. Do you think this can also be traced back to the weighting of A? If so, this is not tangible to me at the moment: If constraints at iteration 50 were violated, let the algorithm do some iteation with maybe smaller steps, but then proceed again with a bigger step size.

Matt J le 29 Juin 2023

Modifié(e) : Matt J le 29 Juin 2023

Ouvrir dans MATLAB Online

data.mat

Well, my intuition was that if the update step is (approximately) indifferent to the objective function, then you can imagine it may take steps as if the objective function were some trivial flat value with gradient zero everywhere. In other words, it would have no reason to take large steps once the constraints were satisfied. But that was just my intuition...

I cannot come up with an example to show how scaling up the constraints can force an exitflag=2, but I can show one (below) which demonstrates that scaling the constraints can slow convergence when the objective is ill-conditioned:

load data
opts=optimoptions('fmincon','Algorithm','interior-point','Display','none',...
                   'MaxFunEvals',inf);
%%%Optimization 1
[E1,f1,exitflag1,out1]=fmincon(f,E0,A,b,[],[],[],[],[],opts);
%%%%Optimization 2: scaled constraints
s=1e4;
opts.MaxIterations=out1.iterations;
[E2,f2,exitflag2]=fmincon(f,E0,s*A,s*b,[],[],[],[],[],opts);
f1,f2
f1 = 7.2114
f2 = 6.3699e+04

SA-W le 29 Juin 2023

Thanks for the demo. Well, sounds plausible to me. I just varied the scale factor of the constraints between 1e0 and 1e4 for an initial guess that was succesful with 1e4. In either case, the optimizer returned exitflag=2 and sometimes, the optimized parameters were wrong. Long story short, one would have to narrow that down much more to better understand whats going on.

What I described before, namely that the optimizer does not reduce the sum of squares from a certain iteration on, but decreases the step size even further like

Iteration 50: resnorm=234, step size = 1e-3,

Iteration 51: resnorm=234, step size = 1e-4,

...

Iteration 58: resnorm=234, step size = 1e-11,

do you have an intutition what might be the problem here? As I said, this happens at points which are far off the ground truth. Is it maybe that the optimizer reached a region in the parameter space where the objective is flat, and the scaling of the constraints amplifies this even more?

SA-W le 29 Juin 2023

Modifié(e) : SA-W le 29 Juin 2023

I'm not convinced that resnorm is not changing, but maybe it's only doing so to decimal places that are beyond the precision of the display.

Yes, resnorm still changes but at the fifth/sixth digit after the decimal point only.

Or mabe you've landed in a region where the resnorm is unchanging, but the constraints have not yet been satisfied.

What could be a region where the resnorm is unchanging? Where one parameter say increases the objective value and another parameter cancels this out again?

So, more iterations are necessary.

But in those iterations (to satisfy the constraints), I would expect the parameters to change also, which however happens only at the 7th, 8th, 9th digit ... after the decimal point.

Matt J le 30 Juin 2023

Ouvrir dans MATLAB Online

What could be a region where the resnorm is unchanging?

For example,

fmincon(@(x) 0 ,[5,1],[1,1],1,[],[],[0,0],[],[],optimoptions('fmincon','Display','iter'))
                                            First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0       3    0.000000e+00    5.000e+00    0.000e+00
    1       7    0.000000e+00    4.829e+00    0.000e+00    1.211e-01
    2      10    0.000000e+00    2.691e+00    0.000e+00    1.527e+00
    3      13    0.000000e+00    0.000e+00    0.000e+00    2.934e+00

Local minimum found that satisfies the constraints.

Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.
ans = 1×2
    0.7572    0.1630

which however happens only at the 7th, 8th, 9th digit ... after the decimal point.

Why expect larger changes? You have a StepTolerance of 1e-12.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Catégories

Mathematics and Optimization Optimization Toolbox Solver-Based Optimization Problem Setup Set Optimization Options

En savoir plus sur Set Optimization Options dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

can I pass these nonlinear constraints to lsqnonlin?

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

44 commentaires
Afficher 42 commentaires plus anciensMasquer 42 commentaires plus anciens

Plus de réponses (1)

89 commentaires
Afficher 87 commentaires plus anciensMasquer 87 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

can I pass these nonlinear constraints to lsqnonlin?

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

44 commentaires Afficher 42 commentaires plus anciensMasquer 42 commentaires plus anciens

Plus de réponses (1)

89 commentaires Afficher 87 commentaires plus anciensMasquer 87 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

44 commentaires
Afficher 42 commentaires plus anciensMasquer 42 commentaires plus anciens

89 commentaires
Afficher 87 commentaires plus anciensMasquer 87 commentaires plus anciens