MATLAB Answers

1

Quadratic Programming (quadprog):

Asked by Logan Turton on 16 Aug 2019
Latest activity Edited by Bruno Luong
on 16 Aug 2019
Hey everyone,
I'm currently trying to get my head around how quadprog works to apply it to a problem I'm working on. In the current problem I'm looking to minimise a ridge regression problem, such:
H = 2(A'*A + lambda.*eye(size(A'*A));
where A is an overdetermined matrix of dimensions [m,n], m > n. H is non-symettric and has negative values however is a positive definite as tested by:
%H Must be Positive Definite to use quadprog check p = 0:
[~,p] = chol(H);
and f = -2*A'*b where b is an [mx1] matrix of measurements.
My confusion arises when reading the description of how quadprog works:
"x = quadprog(H,f) returns a vector x that minimizes1/2*x'*H*x + f'*x. The input H must be positive definite for the problem to have a finite minimum. If H is positive definite, then the solution x = H\(-f)."
Given H is positive definite, my expectation is then that x1 = quadprog(H,f) and x2 = H\(-f) would give identical solutions.
For my problem quadprog converges on a solution (exit flag =1) after 95580 iterations. However the solution is very different from x2 = H\(-f). For my current setup, I know the target norm for the solution, norm(x2) is very close to this (5.0703e6 compared to true norm = 5e6) whilst the norm(x1) = 2.1072e5 which is significantly smaller than required for a correct solution.
What's going on here? Am I misinterpreting the description for quadprog somehow?

  7 Comments

Bruno Luong
on 16 Aug 2019
You don't need quadprog for unconstrained case can do
x = [A; lambda*speye(size(A,2))] \ [b; zeros(size(A,2),1)]
I request the four residual values, could you post them?
I know I don't need quadprog, the issue I'm having is that quadprog doesn't seem to be doing what the description says it should: x = quadprog(H,f) = H \ (-f).
As for the residuals:
clear,close all; clc; dbstop if error;
load('Atestmatrix.mat')
testobject = ones(size(A,2),1);
testobject(1:25,1) = 1e6;
nto = norm(testobject);
B = A*testobject(:);
cond(A)
%cond = 4.1544e9, solution here is near perfect with identical norm to 5dp
solution = A\B;
nsol1 = norm(solution);
resid1 = norm(A*solution - B)^2 + 0 * norm(solution)^2;
%Given no noise is added, regularisation parameter chosen to be
%min(A'*A)*eps effectively giving zero regularisation for this test problem
lambda = 5.3555e-33;
H = 2.*(A'*A + lambda.*eye(size(A'*A)));
f = -2.*A'*B;
%This solution does not change even given numerous runs such
%I believe it is stable:
solution2 = H\(-f);
nsol2 = norm(solution2);
resid2 = norm(A*solution2 - B)^2 + lambda * norm(solution2)^2;
%Elements 1:25 approx 1e6 as expected
%abs(Elements 26:50 ) <= 1.26e4 - some blurring
%abs(Elements 51:75 ) <= 266 - some blurring again
%The aim of the problem is to identify the object with values 1e6 from the
%background of 1s. The solution here has a similar norm to testobject so is
%a reasonable solution.
%I would then expect, given the description the following to give an
%identical result:
[solution3,fval,exitflag,output,lambdaoutput] = quadprog(H,f);
%Exitflag = 0 ==> does not converge in the given iterations - output isn't
%worth considering
options = optimset('Display', 'final-detailed','LargeScale', 'off','MaxIter',1000000);
[solution4,fval2,exitflag2,output2,lambdaouput2] = quadprog(H,f,[],[],[],[],[],[],[],options);
%Exitflag = 1 ==> converged on a solution within default tolerences
nsol4 = norm(solution4);
%nsol4 = 2.1072e5, significantly less than the result in solution 2 despite
%the description suggesting these should be identical
resid4 = norm(A*solution4 - B)^2 + lambda * norm(solution4)^2;
%The solution also suggests the object to smoothly span elements 26:75 at
%around 1e4 and having values <1e3 in elements 1:25 which is not at all
%representative of the solution.
resid1 = 2.3595e-32;
resid2 = 6.0411e-18;
resid4 = 2.6998e-4;
As you can see (and check with the code) the 'converged solution' from quadprog is significantly less accurate compared to the direct route H\(-f).
I'm unsure as to why H\(-f) =/ quadprog(H,f).
Bruno Luong
on 16 Aug 2019
Your A is pratically singular (you might miss to add boundary condition, or such when you discretize your problem).
Your regularization is too small to make any effective effect.
The two combined lead to "strange" thing that you observe.
Remember: condition number plays two roles:
  • sensitive of solution due to noise, including numerical round-off noise
  • bad convergence of iterative (conjugate graient) method where quadprog is based on
Anything system condition number >= 1e6 is pratically singular.
clear
caseid = 1; % or 2
switch caseid
case 1
load('Atestmatrix.mat');
testobject = zeros(size(A,2),1);
testobject(1:25,1) = 1;
case 2
A = rand(50,10);
testobject = rand(size(A,2),1);
end
B = A*testobject(:);
lambda = norm(A)*1e-6;
f = -A'*B;
resfun = @(x) norm(A*x - B)^2 + lambda * norm(x)^2;
%% backslash on A
backslash_x = [A; lambda*speye(size(A,2))] \ [B; zeros(size(A,2),1)];
backslash_residu = resfun(backslash_x);
%% backslash on A'*A
H = (A'*A + lambda.*eye(size(A'*A)));
Hdivided_x = H\(-f);
Hdivided_residu = resfun(Hdivided_x);
%% quandprog
options = optimset('Display', 'final-detailed','LargeScale', 'off','MaxIter',1000000);
[quadprog_x,fval2,exitflag2,output2,lambdaouput2] = quadprog(H,f,[],[],[],[],[],[],[],options);
qp_residu = resfun(quadprog_x);
close all
h1=plot(testobject)
hold on
h2=plot(backslash_x)
h3=plot(Hdivided_x)
h4=plot(quadprog_x)
legend([h1 h2 h3 h4],'exact','backslash','Hdivided','quadprog');
resid0 = resfun(testobject) % 3.0871e-11
backslash_residu % 2.5272e-11
Hdivided_residu % 3.3919e-14
qp_residu % 7.5741e-14

Sign in to comment.

1 Answer

Matt J
Answer by Matt J
on 16 Aug 2019
Edited by Matt J
on 16 Aug 2019
 Accepted Answer

Using your posted A matrix I find that
>> cond(H)
ans =
7.5703e+18
For numerical purposes, this is a singular matrix and shouldn't be treated as positive definite (but rather positive semi-definite). The test you did with chol() isn't reproducible,
>> [~,p] = chol(H); p
p =
1
but even if p had been 0, it would be irrelevant. chol's judgement on whether a matrix is sufficiently positive definite to compute a Cholesky decomposition does not mean it is sufficiently positive definite for everything.
For quadprog, the only thing that matters here is that, due to the ill-conditioning of H, the equations H*x+f=0 will have a non-unique set of solutions that all approximately minimize the quadprog objective. You simply cannot be sure which of these solutions different algorithms will produce, with such an ill-conditioned H.
As for why the residual is so much worse for quadprog, that is likely because quadprog is an iterative solver and cannot use the residual to decide when to stop iterating (because it doesn't have access to A and B). In fact, it almost certainly uses something like a stopping tolerance on the gradient norm,
gradnorm = norm(H*x+f)^2 < tolerance
Since H*x+f=2*A.'*(A*x-B), we can see that the gradient norm can be significantly smaller than the residual vector r= A*x-B depending on the values of A.'

  0 Comments

Sign in to comment.