Writing Scalar Objective Functions
Function Files
A scalar objective function file accepts one input, say x
, and
returns one real scalar output, say f
. The input
x
can be a scalar, vector, or matrix. A function file can return more outputs (see Including Gradients and Hessians).
For example, suppose your objective is a function of three variables, x, y, and z:
f(x) = 3*(x – y)4 + 4*(x + z)2 / (1 + x2 + y2 + z2) + cosh(x – 1) + tanh(y + z).
Write this function as a file that accepts the vector
xin
= [x;y;z] and returns f:function f = myObjective(xin) f = 3*(xin(1)-xin(2))^4 + 4*(xin(1)+xin(3))^2/(1+norm(xin)^2) ... + cosh(xin(1)-1) + tanh(xin(2)+xin(3));
Save it as a file named
myObjective.m
to a folder on your MATLAB® path.Check that the function evaluates correctly:
myObjective([1;2;3])
ans = 9.2666
For information on how to include extra parameters, see Passing Extra Parameters. For more complex examples of function files, see Minimization with Gradient and Hessian Sparsity Pattern or Minimization with Bound Constraints and Banded Preconditioner.
Local Functions and Nested Functions
Functions can exist inside other files as local functions or nested functions. Using local functions or nested functions can lower the number of distinct files you save. Using nested functions also lets you access extra parameters, as shown in Nested Functions.
For example, suppose you want to minimize the myObjective.m
objective function, described in Function Files, subject to the ellipseparabola.m
constraint, described in Nonlinear Constraints. Instead of writing
two files, myObjective.m
and
ellipseparabola.m
, write one file that contains both
functions as local functions:
function [x fval] = callObjConstr(x0,options) % Using a local function for just one file if nargin < 2 options = optimoptions('fmincon','Algorithm','interior-point'); end [x fval] = fmincon(@myObjective,x0,[],[],[],[],[],[], ... @ellipseparabola,options); function f = myObjective(xin) f = 3*(xin(1)-xin(2))^4 + 4*(xin(1)+xin(3))^2/(1+sum(xin.^2)) ... + cosh(xin(1)-1) + tanh(xin(2)+xin(3)); function [c,ceq] = ellipseparabola(x) c(1) = (x(1)^2)/9 + (x(2)^2)/4 - 1; c(2) = x(1)^2 - x(2) - 1; ceq = [];
Solve the constrained minimization starting from the point
[1;1;1]
:
[x fval] = callObjConstr(ones(3,1))
Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the default value of the function tolerance, and constraints are satisfied to within the default value of the constraint tolerance. x = 1.1835 0.8345 -1.6439 fval = 0.5383
Anonymous Function Objectives
Use anonymous functions to write simple objective functions. For more information about anonymous functions, see What Are Anonymous Functions?. Rosenbrock's function is simple enough to write as an anonymous function:
anonrosen = @(x)(100*(x(2) - x(1)^2)^2 + (1-x(1))^2);
anonrosen
evaluates correctly at [-1
2]
:anonrosen([-1 2])
ans = 104
anonrosen
with
fminunc
yields the following
results:options = optimoptions(@fminunc,'Algorithm','quasi-newton'); [x fval] = fminunc(anonrosen,[-1;2],options)
Local minimum found. Optimization completed because the size of the gradient is less than the default value of the function tolerance. x = 1.0000 1.0000 fval = 1.2266e-10
Including Gradients and Hessians
Provide Derivatives For Solvers
For fmincon
and fminunc
, you can
include gradients in the objective function. Generally, solvers are more robust,
and can be slightly faster when you include gradients. See Benefits of Including Derivatives. To also include second
derivatives (Hessians), see Including Hessians.
The following table shows which algorithms can use gradients and Hessians.
Solver | Algorithm | Gradient | Hessian |
---|---|---|---|
fmincon | active-set | Optional | No |
interior-point | Optional | Optional (see Hessian for fmincon interior-point algorithm) | |
sqp | Optional | No | |
trust-region-reflective | Required | Optional (see Hessian for fminunc trust-region or fmincon trust-region-reflective algorithms) | |
fminunc | quasi-newton | Optional | No |
trust-region | Required | Optional (see Hessian for fminunc trust-region or fmincon trust-region-reflective algorithms) |
How to Include Gradients
Write code that returns:
The objective function (scalar) as the first output
The gradient (vector) as the second output
Set the
SpecifyObjectiveGradient
option totrue
usingoptimoptions
. If appropriate, also set theSpecifyConstraintGradient
option totrue
.Optionally, check if your gradient function matches a finite-difference approximation. See Checking Validity of Gradients or Jacobians.
Tip
For most flexibility, write conditionalized code.
Conditionalized means that the number of function outputs can vary, as shown
in the following example. Conditionalized code does not error depending on
the value of the SpecifyObjectiveGradient
option.
Unconditionalized code requires you to set options appropriately.
For example, consider Rosenbrock's function
which is described and plotted in Constrained Nonlinear Problem Using Optimize Live Editor Task or Solver. The gradient of f(x) is
rosentwo
is a conditionalized function that returns
whatever the solver requires:
function [f,g] = rosentwo(x) % Calculate objective f f = 100*(x(2) - x(1)^2)^2 + (1-x(1))^2; if nargout > 1 % gradient required g = [-400*(x(2)-x(1)^2)*x(1)-2*(1-x(1)); 200*(x(2)-x(1)^2)]; end
nargout
checks the number of arguments that a calling
function specifies. See Find Number of Function Arguments.
The fminunc
solver, designed for unconstrained
optimization, allows you to minimize Rosenbrock's function. Tell
fminunc
to use the gradient and Hessian by setting
options
:
options = optimoptions(@fminunc,'Algorithm','trust-region',... 'SpecifyObjectiveGradient',true);
Run fminunc
starting at [-1;2]
:
[x fval] = fminunc(@rosentwo,[-1;2],options)
Local minimum found. Optimization completed because the size of the gradient is less than the default value of the function tolerance. x = 1.0000 1.0000 fval = 1.9886e-17
If you have a Symbolic Math Toolbox™ license, you can calculate gradients and Hessians automatically, as described in Calculate Gradients and Hessians Using Symbolic Math Toolbox.
Including Hessians
You can include second derivatives with the fmincon
'trust-region-reflective'
and
'interior-point'
algorithms, and with the
fminunc
'trust-region'
algorithm. There are several ways to include
Hessian information, depending on the type of information and on the
algorithm.
You must also include gradients (set
SpecifyObjectiveGradient
to true
and,
if applicable, SpecifyConstraintGradient
to
true
) in order to include Hessians.
Hessian for fminunc
trust-region or
fmincon
trust-region-reflective algorithms. These algorithms either have no constraints, or have only bound or linear
equality constraints. Therefore the Hessian is the matrix of second
derivatives of the objective function.
Include the Hessian matrix as the third output of the objective function. For example, the Hessian H(x) of Rosenbrock’s function is (see How to Include Gradients)
Include this Hessian in the objective:
function [f, g, H] = rosenboth(x) % Calculate objective f f = 100*(x(2) - x(1)^2)^2 + (1-x(1))^2; if nargout > 1 % gradient required g = [-400*(x(2)-x(1)^2)*x(1)-2*(1-x(1)); 200*(x(2)-x(1)^2)]; if nargout > 2 % Hessian required H = [1200*x(1)^2-400*x(2)+2, -400*x(1); -400*x(1), 200]; end end
Set HessianFcn
to 'objective'
. For
example,
options = optimoptions('fminunc','Algorithm','trust-region',... 'SpecifyObjectiveGradient',true,'HessianFcn','objective');
Hessian for fmincon
interior-point algorithm. The Hessian is the Hessian of the Lagrangian, where the Lagrangian
L(x,λ)
is
g and h are vector functions representing all inequality and equality constraints respectively (meaning bound, linear, and nonlinear constraints), so the minimization problem is
For details, see Constrained Optimality Theory. The Hessian of the Lagrangian is
(1) |
To include a Hessian, write a function with the syntax
hessian = hessianfcn(x,lambda)
hessian
is an
n-by-n matrix, sparse or dense,
where n is the number of variables. If
hessian
is large and has relatively few nonzero
entries, save running time and memory by representing
hessian
as a sparse matrix. lambda
is a structure with the Lagrange multiplier vectors associated with the
nonlinear constraints lambda.ineqnonlin
and
lambda.eqnonlin
.
fmincon
computes the structure
lambda
and passes it to your Hessian function.
hessianfcn
must calculate the sums in Equation 1. Indicate that you are supplying a
Hessian by setting these options:
options = optimoptions('fmincon','Algorithm','interior-point',... 'SpecifyObjectiveGradient',true,'SpecifyConstraintGradient',true,... 'HessianFcn',@hessianfcn);
For example, to include a Hessian for Rosenbrock’s function constrained to the unit disk , notice that the constraint function has gradient and second derivative matrix
Write the Hessian function as
function Hout = hessianfcn(x,lambda) % Hessian of objective H = [1200*x(1)^2-400*x(2)+2, -400*x(1); -400*x(1), 200]; % Hessian of nonlinear inequality constraint Hg = 2*eye(2); Hout = H + lambda.ineqnonlin*Hg;
Save hessianfcn
on your MATLAB path. To complete the example, the constraint function
including gradients is
function [c,ceq,gc,gceq] = unitdisk2(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [2*x(1);2*x(2)]; gceq = []; end
Solve the problem including gradients and Hessian.
fun = @rosenboth; nonlcon = @unitdisk2; x0 = [-1;2]; options = optimoptions('fmincon','Algorithm','interior-point',... 'SpecifyObjectiveGradient',true,'SpecifyConstraintGradient',true,... 'HessianFcn',@hessianfcn); [x,fval,exitflag,output] = fmincon(fun,x0,[],[],[],[],[],[],@unitdisk2,options);
For other examples using an interior-point Hessian, see fmincon Interior-Point Algorithm with Analytic Hessian and Calculate Gradients and Hessians Using Symbolic Math Toolbox.
Hessian Multiply Function. Instead of a complete Hessian function, both the
fmincon
interior-point
and
trust-region-reflective
algorithms allow you to
supply a Hessian multiply function. This function gives the result of a
Hessian-times-vector product, without computing the Hessian directly. This
can save memory. The SubproblemAlgorithm
option must be
'cg'
for a Hessian multiply function to work; this is
the trust-region-reflective
default.
The syntaxes for the two algorithms differ.
For the
interior-point
algorithm, the syntax isW = HessMultFcn(x,lambda,v);
The result
W
should be the productH*v
, whereH
is the Hessian of the Lagrangian atx
(see Equation 3),lambda
is the Lagrange multiplier (computed byfmincon
), andv
is a vector of size n-by-1. Set options as follows:options = optimoptions('fmincon','Algorithm','interior-point','SpecifyObjectiveGradient',true,... 'SpecifyConstraintGradient',true,'SubproblemAlgorithm','cg','HessianMultiplyFcn',@HessMultFcn);
Supply the function
HessMultFcn
, which returns an n-by-1 vector, where n is the number of dimensions of x. TheHessianMultiplyFcn
option enables you to pass the result of multiplying the Hessian by a vector without calculating the Hessian.The
trust-region-reflective
algorithm does not involvelambda
:W = HessMultFcn(H,v);
The result
W = H*v
.fmincon
passesH
as the value returned in the third output of the objective function (see Hessian for fminunc trust-region or fmincon trust-region-reflective algorithms).fmincon
also passesv
, a vector or matrix with n rows. The number of columns inv
can vary, so writeHessMultFcn
to accept an arbitrary number of columns.H
does not have to be the Hessian; rather, it can be anything that enables you to calculateW = H*v
.Set options as follows:
options = optimoptions('fmincon','Algorithm','trust-region-reflective',... 'SpecifyObjectiveGradient',true,'HessianMultiplyFcn',@HessMultFcn);
For an example using a Hessian multiply function with the
trust-region-reflective
algorithm, see Minimization with Dense Structured Hessian, Linear Equalities.
Benefits of Including Derivatives
If you do not provide gradients, solvers estimate gradients via finite differences. If you provide gradients, your solver need not perform this finite difference estimation, so can save time and be more accurate, although a finite-difference estimate can be faster for complicated derivatives. Furthermore, solvers use an approximate Hessian, which can be far from the true Hessian. Providing a Hessian can yield a solution in fewer iterations. For example, see the end of Calculate Gradients and Hessians Using Symbolic Math Toolbox.
For constrained problems, providing a gradient has another advantage. A solver
can reach a point x
such that x
is
feasible, but, for this x
, finite differences around
x
always lead to an infeasible point. Suppose further
that the objective function at an infeasible point returns a complex output,
Inf
, NaN
, or error. In this case, a
solver can fail or halt prematurely. Providing a gradient allows a solver to
proceed. To obtain this benefit, you might also need to include the gradient of
a nonlinear constraint function, and set the
SpecifyConstraintGradient
option to
true
. See Nonlinear Constraints.
Choose Input Hessian Approximation for interior-point fmincon
The fmincon
interior-point
algorithm has many options for selecting an
input Hessian approximation. For syntax details, see Hessian as an Input. Here are the options, along with estimates of their
relative characteristics.
Hessian | Relative Memory Usage | Relative Efficiency |
---|---|---|
'bfgs' (default) | High (for large problems) | High |
'lbfgs' | Low to Moderate | Moderate |
'fin-diff-grads' | Low | Moderate |
'HessianMultiplyFcn' | Low (can depend on your code) | Moderate |
'HessianFcn' | ? (depends on your code) | High (depends on your code) |
Use the default 'bfgs'
Hessian unless you
Run out of memory — Try
'lbfgs'
instead of'bfgs'
. If you can provide your own gradients, try'fin-diff-grads'
, and set theSpecifyObjectiveGradient
andSpecifyConstraintGradient
options totrue
.Want more efficiency — Provide your own gradients and Hessian. See Including Hessians, fmincon Interior-Point Algorithm with Analytic Hessian, and Calculate Gradients and Hessians Using Symbolic Math Toolbox.
The reason 'lbfgs'
has only moderate efficiency is twofold.
It has relatively expensive Sherman-Morrison updates. And the resulting
iteration step can be somewhat inaccurate due to the 'lbfgs'
limited memory.
The reason 'fin-diff-grads'
and
HessianMultiplyFcn
have only moderate efficiency is that
they use a conjugate gradient approach. They accurately estimate the Hessian of
the objective function, but they do not generate the most accurate iteration
step. For more information, see fmincon Interior Point Algorithm, and its discussion of the LDL
approach and the conjugate gradient approach to solving Equation 38.