Note: This page has been translated by MathWorks. Click here to see

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Given a set of *n* nonlinear functions *F _{i}*(

`fsolve`

attempts to solve
systems of equations by minimizing the sum of squares of the components.
If the sum of squares is zero, the system of equation is solved. `fsolve`

has
three algorithms:

Trust-region

Trust-region dogleg

Levenberg-Marquardt

All algorithms are large-scale; see Large-Scale vs. Medium-Scale Algorithms.

The `fzero`

function solves
a single one-dimensional equation.

The `mldivide`

function
solves systems of linear equations.

Many of the methods used in Optimization
Toolbox™ solvers
are based on *trust regions,* a simple yet powerful
concept in optimization.

To understand the trust-region approach to optimization, consider
the unconstrained minimization problem, minimize *f*(*x*),
where the function takes vector arguments and returns scalars. Suppose
you are at a point *x* in *n*-space
and you want to improve, i.e., move to a point with a lower function
value. The basic idea is to approximate *f* with
a simpler function *q*, which reasonably reflects
the behavior of function *f* in a neighborhood *N* around
the point *x*. This neighborhood is the trust region.
A trial step *s* is computed by minimizing (or approximately
minimizing) over *N*. This is the trust-region subproblem,

$$\underset{s}{\mathrm{min}}\left\{q(s),\text{}s\in N\right\}.$$ | (1) |

The current point is updated to be *x* + *s* if *f*(*x* + *s*) < *f*(*x*);
otherwise, the current point remains unchanged and *N*,
the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach
to minimizing *f*(*x*) are how to
choose and compute the approximation *q* (defined
at the current point *x*), how to choose and modify
the trust region *N*, and how accurately to solve
the trust-region subproblem. This section focuses on the unconstrained
problem. Later sections discuss additional complications due to the
presence of constraints on the variables.

In the standard trust-region method ([48]), the quadratic approximation *q* is
defined by the first two terms of the Taylor approximation to *F* at *x*;
the neighborhood *N* is usually spherical or ellipsoidal
in shape. Mathematically the trust-region subproblem is typically
stated

$$\mathrm{min}\left\{\frac{1}{2}{s}^{T}Hs+{s}^{T}g\text{suchthat}\Vert Ds\Vert \le \Delta \right\},$$ | (2) |

where *g* is the gradient of *f* at the current point
*x*, *H* is the Hessian matrix (the symmetric
matrix of second derivatives), *D* is a diagonal scaling matrix, Δ
is a positive scalar, and ∥ . ∥ is the 2-norm. Good algorithms exist for solving
Equation 2 (see [48]); such algorithms typically involve the computation of all eigenvalues of
*H* and a Newton process applied to the secular equation

$$\frac{1}{\Delta}-\frac{1}{\Vert s\Vert}=0.$$

Such algorithms provide an accurate solution to Equation 2. However, they
require time proportional to several factorizations of *H*.
Therefore, for trust-region problems a different approach is needed.
Several approximation and heuristic strategies, based on Equation 2, have been
proposed in the literature ([42] and [50]). The approximation approach followed
in Optimization
Toolbox solvers is to restrict the trust-region
subproblem to a two-dimensional subspace *S* ([39] and [42]).
Once the subspace *S* has been computed, the work
to solve Equation 2 is
trivial even if full eigenvalue/eigenvector information is needed
(since in the subspace, the problem is only two-dimensional). The
dominant work has now shifted to the determination of the subspace.

The two-dimensional subspace *S* is
determined with the aid of a preconditioned
conjugate gradient process described below. The solver defines *S* as
the linear space spanned by *s*_{1} and *s*_{2},
where *s*_{1} is in the direction
of the gradient *g*, and *s*_{2} is
either an approximate Newton direction, i.e.,
a solution to

$$H\cdot {s}_{2}=-g,$$ | (3) |

or a direction of negative curvature,

$${s}_{2}^{T}\cdot H\cdot {s}_{2}<0.$$ | (4) |

The philosophy behind this choice of *S* is
to force global convergence (via the steepest descent direction or
negative curvature direction) and achieve fast local convergence (via
the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region subproblem.

Solve Equation 2 to determine the trial step

*s*.If

*f*(*x*+*s*) <*f*(*x*), then*x*=*x*+*s*.Adjust Δ.

These four steps are repeated until convergence. The trust-region
dimension Δ is adjusted according to standard rules. In particular,
it is decreased if the trial step is not accepted, i.e., *f*(*x* + *s*)
≥ *f*(*x*).
See [46] and [49] for a discussion of this aspect.

Optimization
Toolbox solvers treat a few important special
cases of *f* with specialized functions: nonlinear
least-squares, quadratic functions, and linear least-squares. However,
the underlying algorithmic ideas are the same as for the general case.
These special cases are discussed in later sections.

A popular way to solve large symmetric positive definite systems
of linear equations *Hp* = –*g* is the
method of Preconditioned Conjugate Gradients (PCG). This iterative
approach requires the ability to calculate matrix-vector products
of the form *H·v* where *v* is
an arbitrary vector. The symmetric positive definite matrix *M *is
a* preconditioner* for *H*.
That is, *M* = *C*^{2},
where *C*^{–1}*HC*^{–1} is
a well-conditioned matrix or a matrix with clustered eigenvalues.

In a minimization context, you can assume that the Hessian matrix *H* is
symmetric. However, *H* is guaranteed to be positive
definite only in the neighborhood of a strong minimizer. Algorithm
PCG exits when a direction of negative (or zero) curvature is encountered,
i.e., *d ^{T}Hd* ≤ 0. The
PCG output direction,

Another approach is to solve a linear system of equations to
find the search direction, namely, Newton's method says to solve for
the search direction *d _{k}* such
that

*J*(*x _{k}*)

where *J*(*x _{k}*)
is the

$$J\left({x}_{k}\right)=\left[\begin{array}{c}\nabla {F}_{1}{\left({x}_{k}\right)}^{T}\\ \nabla {F}_{2}{\left({x}_{k}\right)}^{T}\\ \vdots \\ \nabla {F}_{n}{\left({x}_{k}\right)}^{T}\end{array}\right].$$

Newton's method can run into difficulties. *J*(*x _{k}*)
may be singular, and so the Newton step

Using trust-region techniques (introduced in Trust-Region Methods for Nonlinear Minimization) improves
robustness when starting far from the solution and handles the case
when *J*(*x _{k}*)
is singular. To use a trust-region strategy, a merit function is needed
to decide if

$$\underset{d}{\mathrm{min}}f(d)=\frac{1}{2}F{\left({x}_{k}+d\right)}^{T}F\left({x}_{k}+d\right).$$

But a minimum of * f*(*d*)
is not necessarily a root of *F*(*x*).

The Newton step *d _{k}* is
a root of

*M*(*x _{k}* +

and so it is also a minimum of *m*(*d*),
where

$$\begin{array}{c}\underset{d}{\mathrm{min}}m(d)=\frac{1}{2}{\Vert M\left({x}_{k}+d\right)\Vert}_{2}^{2}=\frac{1}{2}{\Vert F\left({x}_{k}\right)+J\left({x}_{k}\right)d\Vert}_{2}^{2}\\ =\frac{1}{2}F{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+{d}^{T}J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+\frac{1}{2}{d}^{T}J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)d.\end{array}$$ | (5) |

Then *m*(*d*) is a better
choice of merit function than *f*(*d*),
and so the trust-region subproblem is

$$\underset{d}{\mathrm{min}}\left[\frac{1}{2}F{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+{d}^{T}J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+\frac{1}{2}{d}^{T}J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)d\right],$$ | (6) |

such that ∥*D*·*d*∥
≤ Δ. This subproblem can be efficiently
solved using a dogleg strategy.

For an overview of trust-region methods, see Conn [4], and Nocedal [31].

The key feature of this algorithm is the use of the Powell dogleg
procedure for computing the step *d*, which minimizes Equation 6. For a detailed description, see Powell [34].

The step *d* is constructed from a convex combination
of a Cauchy step (a step along the steepest descent direction) and
a Gauss-Newton step for *f*(*x*).
The Cauchy step is calculated as

*d _{C}* = –

where *α* is chosen to minimize Equation 5.

The Gauss-Newton step is calculated by solving

*J*(*x _{k}*)·

using the MATLAB^{®} `mldivide`

(matrix
left division) operator.

The step *d* is chosen so that

*d* = *d _{C}* +

where *λ* is the largest value in the
interval [0,1] such that ∥*d*∥
≤ Δ. If *J _{k}* is
(nearly) singular,

The dogleg algorithm is efficient since it requires only one linear solve per iteration (for the computation of the Gauss-Newton step). Additionally, it can be more robust than using the Gauss-Newton method with a line search.

The Levenberg-Marquardt [25], and [27] method uses a search direction that is a solution of the linear set of equations

$$\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)+{\lambda}_{k}I\right){d}_{k}=-J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right),$$ | (7) |

or, optionally, of the equations

$$\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)+{\lambda}_{k}diag\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)\right)\right){d}_{k}=-J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right),$$ | (8) |

where the scalar *λ _{k}* controls
both the magnitude and direction of

`ScaleProblem`

to `'none'`

to
choose Equation 7,
and set `ScaleProblem`

to `'Jacobian'`

to
choose Equation 8.When *λ _{k}* is zero,
the direction

This algorithm is described in the MATLAB arithmetic operators
section for `mldivide`

.

`fzero`

attempts to find the root of a scalar
function *f* of a scalar variable *x*.

`fzero`

looks for an interval around an initial
point such that *f*(*x*) changes
sign. If you give an initial interval instead of an initial point, `fzero`

checks
to make sure *f*(*x*) has different
signs at the endpoints of the interval. The initial interval must
be finite; it cannot contain ±`Inf`

.

`fzero`

uses a combination of interval bisection,
linear interpolation, and inverse quadratic interpolation in order
to locate a root of *f*(*x*). See `fzero`

for more information.