Minimization with Dense Structured Hessian, Linear Equalities

Hessian Multiply Function for Lower Memory

The fmincon interior-point and trust-region-reflective algorithms, and the fminunc trust-region algorithm, can solve problems where the Hessian is dense but structured. For these problems, fmincon and fminunc do not compute H*Y with the Hessian H directly, because forming H would be memory-intensive. Instead, you must provide fmincon or fminunc with a function that, given a matrix Y and information about H, computes W = H*Y.

In this example, the objective function is nonlinear and linear equalities exist so fmincon is used. The description applies to the trust-region reflective algorithm; the fminunc trust-region algorithm is similar. For the interior-point algorithm, see the HessianMultiplyFcn option in Hessian Multiply Function. The objective function has the structure

$f (x) = f_{}^{ˆ} (x) - \frac{1}{2} x^{T} V V^{T} x,$

where $V$ is a 1000-by-2 matrix. The Hessian of $f$ is dense, but the Hessian of $f_{}^{ˆ}$ is sparse. If the Hessian of $f_{}^{ˆ}$ is $H_{}^{ˆ}$ , then $H$ , the Hessian of $f$ , is

$H = H_{}^{ˆ} - V V^{T} .$

To avoid excessive memory usage that could happen by working with $H$ directly, the example provides a Hessian multiply function, hmfleq1. This function, when passed a matrix Y, uses sparse matrices Hinfo, which corresponds to $H_{}^{ˆ}$ , and V to compute the Hessian matrix product

W = H*Y = (Hinfo - V*V')*Y.

In this example, the Hessian multiply function needs $H_{}^{ˆ}$ , and V to compute the Hessian matrix product. V is a constant, so you can capture V in a function handle to an anonymous function.

However, $H_{}^{ˆ}$ is not a constant and must be computed at the current x. You can do this by computing $H_{}^{ˆ}$ in the objective function and returning $H_{}^{ˆ}$ as Hinfo in the third output argument. By using optimoptions to set the HessianMultiplyFcn option to a handle to the hmfleq1 Hessian multiply function listed here, fmincon knows to get the Hinfo value from the objective function and pass it to the Hessian multiply function hmfleq1.

Step 1: Write a file brownvv.m that computes the objective function, the gradient, and the sparse part of the Hessian.

The example passes brownvv to fmincon as the objective function. The brownvv.m file is long and is not listed here. You can view the code using the command

type brownvv

Because brownvv computes the gradient as well as the objective function, the example (Step 3) uses optimoptions to set the SpecifyObjectiveGradient option to true.

Step 2: Write a function to compute Hessian-matrix products for H given a matrix Y.

Now, define a function hmfleq1 that uses Hinfo, which is computed in brownvv, and V, which you can capture in a function handle to an anonymous function, to compute the Hessian matrix product W where W = H*Y = (Hinfo - V*V')*Y. This function must have the form

W = hmfleq1(Hinfo,Y)

The first argument must be the same as the third argument returned by the objective function brownvv. The second argument to the Hessian multiply function is the matrix Y (of W = H*Y).

Because fmincon expects the second argument Y to be used to form the Hessian matrix product, Y is always a matrix with n rows, where n is the number of dimensions in the problem. The number of columns in Y can vary. Finally, you can use a function handle to an anonymous function to capture V, so V can be the third argument to hmfleq1. This technique is described in more detail in Passing Extra Parameters.

function W = hmfleq1(Hinfo,Y,V)
%HMFLEQ1 Hessian-matrix product function for BROWNVV objective.
% Documentation example.
% W = hmfbx4(Hinfo,Y,V) computes W = (Hinfo-V*V')*Y
% where Hinfo is a sparse matrix computed by BROWNVV 
% and V is a 2 column matrix.

%   Copyright 1984-2008 The MathWorks, Inc.
  
W = Hinfo*Y - V*(V'*Y);
end

Step 3: Call a nonlinear minimization routine with a starting point and linear equality constraints.

Load the problem parameter, V, and the sparse equality constraint matrices, Aeq and beq, from fleq1.mat, which is available when you run this example. Use optimoptions to set the SpecifyObjectiveGradient option to true and to set the HessianMultiplyFcn option to a function handle that points to hmfleq1. Call fmincon with objective function brownvv and with V as an additional parameter. This is the code that runs this procedure:

function [fval,exitflag,output,x] = runfleq1
% RUNFLEQ1 demonstrates the HessMult option for FMINCON with linear
% equalities.

problem = load("fleq1");   % Get V, Aeq, beq
V = problem.V; Aeq = problem.Aeq; beq = problem.beq;
n = 1000;             % problem dimension
xstart = -ones(n,1); xstart(2:2:n,1) = ones(length(2:2:n),1); % starting point
options = optimoptions(@fmincon,...
    Algorithm="trust-region-reflective",...
    SpecifyObjectiveGradient=true, ...
    HessianMultiplyFcn=@(Hinfo,Y)hmfleq1(Hinfo,Y,V),...
    Display="iter",...
    OptimalityTolerance=1e-9,...
    FunctionTolerance=1e-9);
[x,fval,exitflag,output] = fmincon(@(x)brownvv(x,V),xstart,[],[],Aeq,beq,[],[], ...
                                    [],options);
end

To run this code, enter

[fval,exitflag,output,x] = runfleq1;

                                Norm of      First-order 
 Iteration        f(x)          step          optimality   CG-iterations
     0            2297.63                      1.41e+03                
     1            1084.59         6.3903            578           1
     2            1084.59            100            578           3
     3            1084.59             25            578           0
     4            1084.59           6.25            578           0
     5            1047.61         1.5625            240           0
     6            761.592          3.125           62.4           2
     7            761.592           6.25           62.4           4
     8            746.478         1.5625            163           0
     9            546.578          3.125           84.1           2
    10            274.311           6.25           26.9           2
    11            55.6193        11.6597             40           2
    12            55.6193             25             40           3
    13            22.2964           6.25           26.3           0
    14            -49.516           6.25             78           1
    15           -93.2772         1.5625             68           1
    16           -207.204          3.125           86.5           1
    17           -434.162           6.25           70.7           1
    18           -681.359           6.25           43.7           2
    19           -681.359           6.25           43.7           4
    20           -698.041         1.5625            191           0
    21           -723.959          3.125            256           7
    22            -751.33        0.78125            154           3
    23           -793.974         1.5625           24.4           3
    24           -820.831        2.51937           6.11           3
    25           -823.069       0.562132           2.87           3
    26           -823.237       0.196753          0.486           3
    27           -823.245      0.0621202          0.386           3
    28           -823.246      0.0199951           0.11           6
    29           -823.246     0.00731333         0.0404           7
    30           -823.246     0.00505883         0.0185           8
    31           -823.246     0.00126471        0.00268           9
    32           -823.246     0.00149326        0.00521           9
    33           -823.246    0.000373314        0.00091           9

Local minimum possible.

fmincon stopped because the final change in function value relative to 
its initial value is less than the value of the function tolerance.

<stopping criteria details>

Convergence is rapid for a problem of this size with the PCG iteration cost increasing modestly as the optimization progresses. Feasibility of the equality constraints is maintained at the solution.

problem = load("fleq1");   % Get V, Aeq, beq
V = problem.V;
Aeq = problem.Aeq;
beq = problem.beq;
norm(Aeq*x-beq,inf)

ans = 
2.3093e-14

Preconditioning

In this example, fmincon cannot use H to compute a preconditioner because H only exists implicitly. Instead of H, fmincon uses Hinfo, the third argument returned by brownvv, to compute a preconditioner. Hinfo is a good choice because it is the same size as H and approximates H to some degree. If Hinfo were not the same size as H, fmincon would compute a preconditioner based on some diagonal scaling matrices determined from the algorithm. Typically, this would not perform as well.