Hey everyone,
I'm currently trying to get my head around how quadprog works to apply it to a problem I'm working on. In the current problem I'm looking to minimise a ridge regression problem, such:
H = 2(A'*A + lambda.*eye(size(A'*A));
where A is an overdetermined matrix of dimensions [m,n], m > n. H is non-symettric and has negative values however is a positive definite as tested by:
%H Must be Positive Definite to use quadprog check p = 0:
[~,p] = chol(H);
and f = -2*A'*b where b is an [mx1] matrix of measurements.
My confusion arises when reading the description of how quadprog works:
"x = quadprog(H,f) returns a vector x that minimizes1/2*x'*H*x + f'*x. The input H must be positive definite for the problem to have a finite minimum. If H is positive definite, then the solution x = H\(-f)."
Given H is positive definite, my expectation is then that x1 = quadprog(H,f) and x2 = H\(-f) would give identical solutions.
For my problem quadprog converges on a solution (exit flag =1) after 95580 iterations. However the solution is very different from x2 = H\(-f). For my current setup, I know the target norm for the solution, norm(x2) is very close to this (5.0703e6 compared to true norm = 5e6) whilst the norm(x1) = 2.1072e5 which is significantly smaller than required for a correct solution.
What's going on here? Am I misinterpreting the description for quadprog somehow?

7 commentaires

what
cond(H)
(or for sparse condest(H))
returns?
ADSW121365
ADSW121365 le 16 Août 2019
I'm working in electromagnetism so the problem is naturally horribly ill-posed. For my current regularisation parameter cond(H) is 1.5248e8.
I know from linear testing of lower n, using x = A \ B I can get an accurate solution, x for cond(A) = 4.1544e9.
However due to the ill-posed nature of the problem cond(A) becomes much larger as the number of elements I try to resolve increases, hence the introduction of a regularisation.
My understanding of the problem is then (and I'm happy to be corrected if wrong) that if cond(H) is of a similar or smaller than the above I would also expect an stable solution with some smoothing due to the choice of regularisation.
The result from quadprog(H,f) is stable in that it will use the same number of iterations and reach the same solution for multiple runs (I tested 10,000 runs). Similarily H\(-f) is stable and throws up no warnings.
From the description in quadprog quoted above, I would expect the two to give identical results, yet quadprog gives me a result which is a significantly worse fit to the problem.
I'm hoping to use quadprog to add additional constraints to my problem however I'm trying to ensure it works as I'd expect which it currently does not.
Bruno Luong
Bruno Luong le 16 Août 2019
Modifié(e) : Bruno Luong le 16 Août 2019
Sorry: I wouldn't expect anything stable when the condition number is larger than 1e8. Converging after 95580 iterations can bring you to the moon.
If you don't trust then display the residual
norm(A*x - b)^2 + lambda * norm(x)^2
for 4 cases:
x random (e.g zeros(n,1))
x = solution of quadprod
x = A\b
x = -H\f
I expect the three last three give much smaller than the first one despite of they are "different" to your liking.
I'm not quite sure I follow what you're saying. However I will attach some example data I'm working with and the associated code to see if that clarifies the situation/my confusion:
clear,close all; clc; dbstop if error;
load('Atestmatrix.mat')
testobject = ones(size(A,2),1);
testobject(1:25,1) = 1e6;
nto = norm(testobject);
B = A*testobject(:);
cond(A)
%cond = 4.1544e9, solution here is near perfect with identical norm to 5dp
solution = A\B;
nsol1 = norm(solution);
%Given no noise is added, regularisation parameter chosen to be
%min(A'*A)*eps effectively giving zero regularisation for this test problem because I know the linear solution works
lambda = 5.3555e-33;
H = 2.*(A'*A + lambda.*eye(size(A'*A)));
f = -2.*A'*B;
%This solution does not change even given numerous runs such
%I believe it is stable:
solution2 = H\(-f);
nsol2 = norm(solution2);
%Elements 1:25 approx 1e6 as expected
%abs(Elements 26:50 ) <= 1.26e4 - some blurring
%abs(Elements 51:75 ) <= 266 - some blurring again
%The aim of the problem is to identify the object with values 1e6 from the
%background of 1s. The solution here has a similar norm to testobject so is
%a reasonable solution.
%I would then expect, given the description the following to give an
%identical result:
[solution3,fval,exitflag,output,lambda] = quadprog(H,f);
%Exitflag = 0 ==> does not converge in the given iterations - output isn't
%worth considering
options = optimset('Display', 'final-detailed','LargeScale', 'off','MaxIter',1000000);
[solution4,fval2,exitflag2,output2,lambda2] = quadprog(H,f,[],[],[],[],[],[],[],options);
%Exitflag = 1 ==> converged on a solution within default tolerences
nsol4 = norm(solution4);
%nsol4 = 2.1072e5, significantly less than the result in solution 2 despite
%the description suggesting these should be identical
%The solution also suggests the object to smoothly span elements 26:75 at
%around 1e4 and having values <1e3 in elements 1:25 which is not at all
%representative of the solution.
I know that in a realistic sitution noise is added to B which for the current condition number is very ill-posed and therefore will give a terrible solution. Hence the wish to use quadprog with some regularisation term and potentially further constraints on the solution. However currently as detailed above quadprogs output after 'converging' by its own definition is very different to the one it suggests I should get in it's own description.
You don't need quadprog for unconstrained case can do
x = [A; lambda*speye(size(A,2))] \ [b; zeros(size(A,2),1)]
I request the four residual values, could you post them?
I know I don't need quadprog, the issue I'm having is that quadprog doesn't seem to be doing what the description says it should: x = quadprog(H,f) = H \ (-f).
As for the residuals:
clear,close all; clc; dbstop if error;
load('Atestmatrix.mat')
testobject = ones(size(A,2),1);
testobject(1:25,1) = 1e6;
nto = norm(testobject);
B = A*testobject(:);
cond(A)
%cond = 4.1544e9, solution here is near perfect with identical norm to 5dp
solution = A\B;
nsol1 = norm(solution);
resid1 = norm(A*solution - B)^2 + 0 * norm(solution)^2;
%Given no noise is added, regularisation parameter chosen to be
%min(A'*A)*eps effectively giving zero regularisation for this test problem
lambda = 5.3555e-33;
H = 2.*(A'*A + lambda.*eye(size(A'*A)));
f = -2.*A'*B;
%This solution does not change even given numerous runs such
%I believe it is stable:
solution2 = H\(-f);
nsol2 = norm(solution2);
resid2 = norm(A*solution2 - B)^2 + lambda * norm(solution2)^2;
%Elements 1:25 approx 1e6 as expected
%abs(Elements 26:50 ) <= 1.26e4 - some blurring
%abs(Elements 51:75 ) <= 266 - some blurring again
%The aim of the problem is to identify the object with values 1e6 from the
%background of 1s. The solution here has a similar norm to testobject so is
%a reasonable solution.
%I would then expect, given the description the following to give an
%identical result:
[solution3,fval,exitflag,output,lambdaoutput] = quadprog(H,f);
%Exitflag = 0 ==> does not converge in the given iterations - output isn't
%worth considering
options = optimset('Display', 'final-detailed','LargeScale', 'off','MaxIter',1000000);
[solution4,fval2,exitflag2,output2,lambdaouput2] = quadprog(H,f,[],[],[],[],[],[],[],options);
%Exitflag = 1 ==> converged on a solution within default tolerences
nsol4 = norm(solution4);
%nsol4 = 2.1072e5, significantly less than the result in solution 2 despite
%the description suggesting these should be identical
resid4 = norm(A*solution4 - B)^2 + lambda * norm(solution4)^2;
%The solution also suggests the object to smoothly span elements 26:75 at
%around 1e4 and having values <1e3 in elements 1:25 which is not at all
%representative of the solution.
resid1 = 2.3595e-32;
resid2 = 6.0411e-18;
resid4 = 2.6998e-4;
As you can see (and check with the code) the 'converged solution' from quadprog is significantly less accurate compared to the direct route H\(-f).
I'm unsure as to why H\(-f) =/ quadprog(H,f).
Bruno Luong
Bruno Luong le 16 Août 2019
Modifié(e) : Bruno Luong le 16 Août 2019
Your A is pratically singular (you might miss to add boundary condition, or such when you discretize your problem).
Your regularization is too small to make any effective effect.
The two combined lead to "strange" thing that you observe.
Remember: condition number plays two roles:
  • sensitive of solution due to noise, including numerical round-off noise
  • bad convergence of iterative (conjugate graient) method where quadprog is based on
Anything system condition number >= 1e6 is pratically singular.
clear
caseid = 1; % or 2
switch caseid
case 1
load('Atestmatrix.mat');
testobject = zeros(size(A,2),1);
testobject(1:25,1) = 1;
case 2
A = rand(50,10);
testobject = rand(size(A,2),1);
end
B = A*testobject(:);
lambda = norm(A)*1e-6;
f = -A'*B;
resfun = @(x) norm(A*x - B)^2 + lambda * norm(x)^2;
%% backslash on A
backslash_x = [A; lambda*speye(size(A,2))] \ [B; zeros(size(A,2),1)];
backslash_residu = resfun(backslash_x);
%% backslash on A'*A
H = (A'*A + lambda.*eye(size(A'*A)));
Hdivided_x = H\(-f);
Hdivided_residu = resfun(Hdivided_x);
%% quandprog
options = optimset('Display', 'final-detailed','LargeScale', 'off','MaxIter',1000000);
[quadprog_x,fval2,exitflag2,output2,lambdaouput2] = quadprog(H,f,[],[],[],[],[],[],[],options);
qp_residu = resfun(quadprog_x);
close all
h1=plot(testobject)
hold on
h2=plot(backslash_x)
h3=plot(Hdivided_x)
h4=plot(quadprog_x)
legend([h1 h2 h3 h4],'exact','backslash','Hdivided','quadprog');
resid0 = resfun(testobject) % 3.0871e-11
backslash_residu % 2.5272e-11
Hdivided_residu % 3.3919e-14
qp_residu % 7.5741e-14

Connectez-vous pour commenter.

 Réponse acceptée

Matt J
Matt J le 16 Août 2019
Modifié(e) : Matt J le 16 Août 2019

0 votes

Using your posted A matrix I find that
>> cond(H)
ans =
7.5703e+18
For numerical purposes, this is a singular matrix and shouldn't be treated as positive definite (but rather positive semi-definite). The test you did with chol() isn't reproducible,
>> [~,p] = chol(H); p
p =
1
but even if p had been 0, it would be irrelevant. chol's judgement on whether a matrix is sufficiently positive definite to compute a Cholesky decomposition does not mean it is sufficiently positive definite for everything.
For quadprog, the only thing that matters here is that, due to the ill-conditioning of H, the equations H*x+f=0 will have a non-unique set of solutions that all approximately minimize the quadprog objective. You simply cannot be sure which of these solutions different algorithms will produce, with such an ill-conditioned H.
As for why the residual is so much worse for quadprog, that is likely because quadprog is an iterative solver and cannot use the residual to decide when to stop iterating (because it doesn't have access to A and B). In fact, it almost certainly uses something like a stopping tolerance on the gradient norm,
gradnorm = norm(H*x+f)^2 < tolerance
Since H*x+f=2*A.'*(A*x-B), we can see that the gradient norm can be significantly smaller than the residual vector r= A*x-B depending on the values of A.'

Plus de réponses (0)

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by