Backslash does not provided the solution with the smallest 2-norm
Afficher commentaires plus anciens
I was debugging a constraint solver when I encountered a problem with the backslash operator in MATLAB.
I have isolated the problem into the attached minimal working example. The issue occurs when solving a consistent but under-determined linear system Ax=b. In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right? I believe that the attached script demonstrates that this is not the case and that the computed result depends on whether A is treated as a sparse or a dense matrix.
Would you please take a look and see if I have missed something or if this is indeed an issue that should be fixed?
Kind regards
Carl Christian K. Mikkelsen.
% The backslash operator and underdetermined linear systems
%
% The result of A\b depends on whether A is dense or sparse and
% the computed results do not minimize the 2-norm over the set
% of solutions of Ax=b.
%
% PROGRAMMING by Carl Christian Kjelgaard Mikkelsen (spock@cs.umu.se)
% 2022-08-04 Initial programming and testing
% Define a wide matrix
A=[1 1 0; 0 1 1];
% Define a solution
t=(1:3)';
% Define the right hand side so that Ax=b is consistent
b=A*t;
% Solve the underdetermined linear equation Ax=b
% using a dense representation of A
x=A\b;
% Solve the underdetermined linear equation Ax=b
% using a sparse representation of A
y=sparse(A)\b;
% Solve the underdetermined linear equation Ax=b
% in the linear least squares sense
z=A'*((A*A')\b);
% Compute the residuals
rx=b-A*x;
ry=b-A*y;
rz=b-A*z;
% Compute the 2-norm of the solutions
nx=norm(x);
ny=norm(y);
nz=norm(z);
% Display the results
fprintf('The solutions\n');
display([x y z]);
fprintf('The residuals\n');
display([rx ry rz]);
fprintf('The norm of the solutions\n');
display([nx ny nz]);
15 commentaires
Steven Lord
le 4 Août 2022
In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right?
Can you state what led you to this expectation? If there was a specific documentation page or some other webpage that stated this, could you link to that page here?
Carl Christian Kjelgaard Mikkelsen
le 4 Août 2022
In the case of a consistent and underdetermined linear system, there are solutions and it makes no sense to minimize the residual, rather it is the norm of the solution which is minimized.
What about
x + y + z = 1
-(x + y + z) = -2
?
Underdetermined, but no solution that makes the residual = 0. And who can see for a large wide system if it is consistent ... So it makes sense in all cases (tall,wide and square) to use the terminology "in the least-squares sense" to mean a solution that minimizes the residual.
Carl Christian Kjelgaard Mikkelsen
le 5 Août 2022
Torsten
le 5 Août 2022
You are always talking about your application. I was talking about a senseful definition of "least-squares solution for underdetermined linear systems" in general. And I think that the MATLAB definition of a solution with minimum residual for all classes of linear problems (wide,tall,square) is the only senseful one.
I agree with @Carl Christian Kjelgaard Mikkelsen that help and the documentation should be clarified. The help states
s = split(string(help('mldvide')),'.');
s = strtrim(s(8))
Note the phrase "the solution."
"
- If A is a rectangular m-by-n matrix with m ~= n, and B is a matrix with m rows, then A\B returns a least-squares solution to the system of equations A*x= B."
Note the phrase "a ... solution."
So if reading the former, it's reasonable to expect that mldivide returns "the [unique] solution" and if the latter it returns "a ... solution [of many]."
As has been pointed out, if A is not square, then there can be many solutions that yield the same, minimum 2-norm of the residual. So, IMO, the help should be changed to not refer to "the solution."
What would be the harm in modifying the doc page (and the help I suppose) to read something like "... then A\B returns a least-squares solution to the system of equations A*x= B, i.e., a solution that mininizes norm(A*x - b)." Then there would be no confusion between least-squares and minimum norm solutions.
It would be even better to also explain that in some cases such a solution is unique and in others it isn't, and for the latter cases explain which criteria is used to return the result, unless such logic is too dificult to explain when considering all of the possible paths that mldivide can take.
Bruno Luong
le 5 Août 2022
Modifié(e) : Bruno Luong
le 5 Août 2022
@Paul "it's reasonable to expect that mldivide returns "the [unique] solution"
I don't know well english but one cannot use "the" to design one of many existing instants?
Let me give an exemple:
Can Sandy points a finger to a specific car she saw and speaks to her friend Bobby "look at the car!"? Or does she have to say "look at a car!'? To me "the" means a "the [specific]" and not exclusively used for "the [unique]".
Beside if users who reads TMW doc think "the solution" implies unique, they must wonder about "underdetermined system" specifically written in the doc next to "the soluion" and means just the opposite of "unique".
Paul
le 5 Août 2022
You are correct, I shouldn't have used "unique." According to my dictionary "the" is a definite article to "indicate that a following noun or noun equivalent is definite or has been previously specified by context or by circumstance." By pointing her finger, Sandy is making clear by circumstance the specific car. On the other hand, it would be quite confusing if Sandy said "look at the car" w/o any other indication if there are many cars to see.
If stating "the solution" it should be clear to the reader which solution is the referent, which might or might not be a unique solution, but at least we'll know which solution we're talking about.
In this case, there can, in general, be many least-squares solutions, so refering to "the (define) least-squares solution" w/o any additional context is not correct, IMO.
OTOH, the indefinite article "a" means the referent is uspecified. Sandy pointed at a car. I don't know which car she pointed to. mldivide returns a least-squares solution. I don't know which one it returns, just that whetever it returns is a least-squares solution.
Though I woudn't use the minimum norm solution synonymously with least-square solution, apparently others do as shown by @Carl Christian Kjelgaard Mikkelsen in this commentand also here. So if one were to (correctly IMO) read "the solution" as referring to a particular solution, it makes a lot of sense to think it's refering to the unique, minimum norm solution for the underdetermined problem.
And now I realzie that @Carl Christian Kjelgaard Mikkelsen already pointed out the "a" / "the" issue in this comment. I just should have responded there.
Bruno Luong
le 5 Août 2022
Modifié(e) : Bruno Luong
le 5 Août 2022
"the (define) least-squares solution" w/o any additional context is not correct, IMO.
The context is simply this: "The solution" designates "A\b" where the page of document is focus on.
Next time when you read other document pages, such as eig, when you see:
"[V,D] = eig(A) returns diagonal matrix D of eigenvalues and matrix V whose columns are the corresponding right eigenvectors, so that A*V = V*D"
Please don't ever believe that TMW implies eigenvectors are unique.
Paul
le 5 Août 2022
If I didn't know better, I might very well believe the eigenvectors in V are unique. But I do know better, so I do believe that the documentation writers are not being precise, or perhaps not even correct, with the language. That statement should delete "the."
Bruno Luong
le 5 Août 2022
Modifié(e) : Bruno Luong
le 5 Août 2022
Or from your own words: the "the" is a definite article to "indicate that a following noun or noun equivalent is definite or has been previously specified by context or by circumstance.", which is the first output argument of [V,D] = EIG. Has you consider that?
I can probably find out the word "THE" used that way in a tone of other document pages. Might be they are all wrong as you know better.
Paul
le 5 Août 2022
I did consider that. To me, that interpretation sounds like
Sandy: "The function eig returns a diagonal matrix D of eigenvalues and matrix V whose columns are the corresponding right eigenvectors, so that A*V = V*D"
Bob: "Well, there are many, many eigenvectors that correspond to each eigenvalue. Which specific eigenvectors are actually returned?"
Sandy: "The specific eigenvectors that are the ouput of eig."
Perhaps I misunderstand what I should have considered.
Would you agree that the doc page could just have accurately said: "... the matriix V whose columns are one set of many possible sets of right eigenvectors that satisfy A*V = V*D." ?
The bottom line for mldivide is that "a solution" and "the solution" are not gramatically equivalent. So either the doc page or the help should be modified. They can't both be correct. For me, it's pretty clear that the help should change.
Bruno Luong
le 5 Août 2022
Modifié(e) : Bruno Luong
le 5 Août 2022
They could, but to me at all cost writing a superbe precise statement, one lost the main substance of the document. The user who works with eigen vectors MUST know they are defined up to a non-zero scaling, TMW is not responsible to remind that fact. The purpose of the function document is to describe what is the output argument it returns. Adding "one of many possible bla bla bla" just make the sentense long and does add any useful information. That is just my point of view you might disagree.
Bruno Luong
le 5 Août 2022
"a solution" and "the solution" are not gramatically equivalent."
Granted, but they are not exclusive.
Given that
- "the solution" (in a proper context) is "a solution".
They can perfectly both used, without any contradiction. So I don't see the reason they cannot be coexist.
Les Beckham
le 5 Août 2022
Holy cow, guys. This has pretty much degenerated into "that depends on what the definition of "is" is". Let it go, already. Have a good night. :)
Réponse acceptée
Plus de réponses (2)
Bruno Luong
le 4 Août 2022
Modifié(e) : Bruno Luong
le 4 Août 2022
Few other methods to get least-norm solution
A = rand(3,6)
b = rand(3,1)
% Methode 1, Christine, recommended
x = lsqminnorm(A,b)
% Method 2, Pseudo inverse
x = pinv(A)*b
% Method 3, BS on KKT
[m,n] = size(A);
y=[eye(n), A'; A, zeros(m)] \ [zeros(n,1); b];
x = y(1:n)
% Method 4, QR on A'
[Q,R,p] = qr(A',0);
x = Q*(R'\b(p,:))
2 commentaires
Carl Christian Kjelgaard Mikkelsen
le 4 Août 2022
Bruno Luong
le 4 Août 2022
Modifié(e) : Bruno Luong
le 4 Août 2022
"Please note that my issue is not the problem of computing the least squares solution"
I think you need to reset the definition of what is "least squares solution". I guess all the confusion is that you associate with minimum norm solusion, which is not in general.
I know, and guess most of people working sometime with MATLAB also know it.
John D'Errico
le 5 Août 2022
Modifié(e) : John D'Errico
le 5 Août 2022
It seems the gist of your question comes down to the idea that for a wide matrix, MATLAB should (in your eyes only) always produce a solution that minimizes the norm of x. For square matrices, or matrices that are taller than wide, it should do something else? That alone seems highly inconsistent to me.
So first, where have you ever seen the claim that this is true? No place in the documentation is this claimed. I even took a quick look. Maybe you have confused the idea of a solution that minimizes the norm of the residiuals to a solution that minimizes the norm of x itself. We can't really know where you got the idea.
Should something be fixed? Of course not! MATLAB already provides multiple solutions that yield the minimum norm of x when they are used.
A = [1 2 3;4 5 7];
b = [2;3];
A\b
So backslash produces a basic solution, as expected for the underdetermined case, with one element exactly zero. For a solution that is minimum norm in x, we already have:
lsqminnorm(A,b)
pinv(A)*b
lsqr(A,b)
5 commentaires
Carl Christian Kjelgaard Mikkelsen
le 5 Août 2022
Bruno Luong
le 5 Août 2022
Modifié(e) : Bruno Luong
le 5 Août 2022
@Carl Christian Kjelgaard Mikkelsen " I have explained elsewhere on this page how the current documentation can be read to support my interpretation "
Not really, or not at all period, because it implies a big incoherent about the sentence on the warning of eventual different between "\" and "PINV" that you completely miss (or denies) when you read it.
I think beside you no one here read like you. That must count for something right?
In the doc there are flowcharts of the algorithms used for each matrix types, dense and sparse, please scroll to the bottom just above the bibliography references. Wasn't that enough for you?
Paul
le 5 Août 2022
The flowchart says "Use QR solver" for both full and sparse, non-square. Wtih that exact same wording, I think it's reasonable for users like me who have no expertise in the mechanics of such solvers to expect the same result (to within floating point roundoff stuff) for both sparse and full cases. There would be no harm in addiing a line at the bottom of the flowcharts alerting the user to not expect the same results for sparse and full, at least for the non-square path (I didn't look at all of the others).
John D'Errico
le 6 Août 2022
The only place I can see where you claim the documentation says the solution is a minimum norm solution is here:
"The key is the use of the article "the" as in "the solution". It implies uniqueness. Therefore, the reader will believe that x=A\b minimizes 2-norm of the residual when A is tall matrix of full column rank and that x=A\b has the smallest 2-norm of all the solutions of Ax=b when A is a wide matrix of full row rank. Why? Because these two problems connected to the 2-norm have unique solutions."
NO place in the documentation for backslash has it said anything about minimum norm for x. You have made a jump to an unfounded conclusion from nothing more than their use of the word "THE", to decide that this implies your version of what you want to be true.
The problem is, there are certainly other ways to choose a UNIQUE solution. For example, why assume a minimum 2-norm?
And to jump to a conclusion from the inclusion of a three letter word in the documentation is a bit strange. Surely if they were going to SPECIFICALLY do something different for one shape matrix than for another shape matrix, they would have said something more than just an obique reference with the word "the"?
We see this line:
"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"
The phrase "In the least squares sense" means a solution that minimizes norm(A*X-b). It says nothing about norm(x).
Honestly, I think you are grasping at straws here, based only on the use of the word "the".
Bruno Luong
le 6 Août 2022
" grasping at straws"
I disagree, he only grasping at the straw. ;-)
Catégories
En savoir plus sur Mathematics and Optimization dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

