Backslash does not provided the solution with the smallest 2-norm

I was debugging a constraint solver when I encountered a problem with the backslash operator in MATLAB.
I have isolated the problem into the attached minimal working example. The issue occurs when solving a consistent but under-determined linear system Ax=b. In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right? I believe that the attached script demonstrates that this is not the case and that the computed result depends on whether A is treated as a sparse or a dense matrix.
Would you please take a look and see if I have missed something or if this is indeed an issue that should be fixed?
Kind regards
Carl Christian K. Mikkelsen.
% The backslash operator and underdetermined linear systems
%
% The result of A\b depends on whether A is dense or sparse and
% the computed results do not minimize the 2-norm over the set
% of solutions of Ax=b.
%
% PROGRAMMING by Carl Christian Kjelgaard Mikkelsen (spock@cs.umu.se)
% 2022-08-04 Initial programming and testing
% Define a wide matrix
A=[1 1 0; 0 1 1];
% Define a solution
t=(1:3)';
% Define the right hand side so that Ax=b is consistent
b=A*t;
% Solve the underdetermined linear equation Ax=b
% using a dense representation of A
x=A\b;
% Solve the underdetermined linear equation Ax=b
% using a sparse representation of A
y=sparse(A)\b;
% Solve the underdetermined linear equation Ax=b
% in the linear least squares sense
z=A'*((A*A')\b);
% Compute the residuals
rx=b-A*x;
ry=b-A*y;
rz=b-A*z;
% Compute the 2-norm of the solutions
nx=norm(x);
ny=norm(y);
nz=norm(z);
% Display the results
fprintf('The solutions\n');
The solutions
display([x y z]);
-2.0000 3.0000 0.3333 5.0000 0 2.6667 0 5.0000 2.3333
fprintf('The residuals\n');
The residuals
display([rx ry rz]);
1.0e-14 * 0.1776 0 0.0444 0.1776 0 -0.1776
fprintf('The norm of the solutions\n');
The norm of the solutions
display([nx ny nz]);
5.3852 5.8310 3.5590

15 commentaires

In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right?
Can you state what led you to this expectation? If there was a specific documentation page or some other webpage that stated this, could you link to that page here?
@Steven Lord Thank your for your response. I have discussed this point elsewhere on this page, but I shall develop the point further right here.
Some authors will carefully distinguish between the solving a tall linear system in the least squares sense and computing the least norm solution of a wide linear systems. However, it is quite common to use the term "least squares solution" to cover both case. In the case of a consistent and underdetermined linear system, there are solutions and it makes no sense to minimize the residual, rather it is the norm of the solution which is minimized.
Therefore, when a user reads the text provided by the command "help \" there is only one way to interpret the line:
"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"
The key is the use of the article "the" as in "the solution". It implies uniqueness. Therefore, the reader will believe that x=A\b minimizes 2-norm of the residual when A is tall matrix of full column rank and that x=A\b has the smallest 2-norm of all the solutions of Ax=b when A is a wide matrix of full row rank. Why? Because these two problems connected to the 2-norm have unique solutions.
Torsten
Torsten le 4 Août 2022
Modifié(e) : Torsten le 4 Août 2022
In the case of a consistent and underdetermined linear system, there are solutions and it makes no sense to minimize the residual, rather it is the norm of the solution which is minimized.
What about
x + y + z = 1
-(x + y + z) = -2
?
Underdetermined, but no solution that makes the residual = 0. And who can see for a large wide system if it is consistent ... So it makes sense in all cases (tall,wide and square) to use the terminology "in the least-squares sense" to mean a solution that minimizes the residual.
@Torsten Thank you for your input. I am dealing with a physics application for which the set of inconsistent problems have measure zero and the smallest 2-norm solution is required.
You are always talking about your application. I was talking about a senseful definition of "least-squares solution for underdetermined linear systems" in general. And I think that the MATLAB definition of a solution with minimum residual for all classes of linear problems (wide,tall,square) is the only senseful one.
I agree with @Carl Christian Kjelgaard Mikkelsen that help and the documentation should be clarified. The help states
s = split(string(help('mldvide')),'.');
s = strtrim(s(8))
s =
"If A is an M-by-N rectangular matrix with M~=N and B is a column vector with M components, or a matrix with several such columns, then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"
Note the phrase "the solution."
However, the doc page mldivide, \ states:
"
  • If A is a rectangular m-by-n matrix with m ~= n, and B is a matrix with m rows, then A\B returns a least-squares solution to the system of equations A*x= B."
Note the phrase "a ... solution."
So if reading the former, it's reasonable to expect that mldivide returns "the [unique] solution" and if the latter it returns "a ... solution [of many]."
As has been pointed out, if A is not square, then there can be many solutions that yield the same, minimum 2-norm of the residual. So, IMO, the help should be changed to not refer to "the solution."
What would be the harm in modifying the doc page (and the help I suppose) to read something like "... then A\B returns a least-squares solution to the system of equations A*x= B, i.e., a solution that mininizes norm(A*x - b)." Then there would be no confusion between least-squares and minimum norm solutions.
It would be even better to also explain that in some cases such a solution is unique and in others it isn't, and for the latter cases explain which criteria is used to return the result, unless such logic is too dificult to explain when considering all of the possible paths that mldivide can take.
Bruno Luong
Bruno Luong le 5 Août 2022
Modifié(e) : Bruno Luong le 5 Août 2022
@Paul "it's reasonable to expect that mldivide returns "the [unique] solution"
I don't know well english but one cannot use "the" to design one of many existing instants?
Let me give an exemple:
Can Sandy points a finger to a specific car she saw and speaks to her friend Bobby "look at the car!"? Or does she have to say "look at a car!'? To me "the" means a "the [specific]" and not exclusively used for "the [unique]".
Beside if users who reads TMW doc think "the solution" implies unique, they must wonder about "underdetermined system" specifically written in the doc next to "the soluion" and means just the opposite of "unique".
You are correct, I shouldn't have used "unique." According to my dictionary "the" is a definite article to "indicate that a following noun or noun equivalent is definite or has been previously specified by context or by circumstance." By pointing her finger, Sandy is making clear by circumstance the specific car. On the other hand, it would be quite confusing if Sandy said "look at the car" w/o any other indication if there are many cars to see.
If stating "the solution" it should be clear to the reader which solution is the referent, which might or might not be a unique solution, but at least we'll know which solution we're talking about.
In this case, there can, in general, be many least-squares solutions, so refering to "the (define) least-squares solution" w/o any additional context is not correct, IMO.
OTOH, the indefinite article "a" means the referent is uspecified. Sandy pointed at a car. I don't know which car she pointed to. mldivide returns a least-squares solution. I don't know which one it returns, just that whetever it returns is a least-squares solution.
Though I woudn't use the minimum norm solution synonymously with least-square solution, apparently others do as shown by @Carl Christian Kjelgaard Mikkelsen in this commentand also here. So if one were to (correctly IMO) read "the solution" as referring to a particular solution, it makes a lot of sense to think it's refering to the unique, minimum norm solution for the underdetermined problem.
And now I realzie that @Carl Christian Kjelgaard Mikkelsen already pointed out the "a" / "the" issue in this comment. I just should have responded there.
Bruno Luong
Bruno Luong le 5 Août 2022
Modifié(e) : Bruno Luong le 5 Août 2022
"the (define) least-squares solution" w/o any additional context is not correct, IMO.
The context is simply this: "The solution" designates "A\b" where the page of document is focus on.
Next time when you read other document pages, such as eig, when you see:
"[V,D] = eig(A) returns diagonal matrix D of eigenvalues and matrix V whose columns are the corresponding right eigenvectors, so that A*V = V*D"
Please don't ever believe that TMW implies eigenvectors are unique.
If I didn't know better, I might very well believe the eigenvectors in V are unique. But I do know better, so I do believe that the documentation writers are not being precise, or perhaps not even correct, with the language. That statement should delete "the."
Bruno Luong
Bruno Luong le 5 Août 2022
Modifié(e) : Bruno Luong le 5 Août 2022
Or from your own words: the "the" is a definite article to "indicate that a following noun or noun equivalent is definite or has been previously specified by context or by circumstance.", which is the first output argument of [V,D] = EIG. Has you consider that?
I can probably find out the word "THE" used that way in a tone of other document pages. Might be they are all wrong as you know better.
I did consider that. To me, that interpretation sounds like
Sandy: "The function eig returns a diagonal matrix D of eigenvalues and matrix V whose columns are the corresponding right eigenvectors, so that A*V = V*D"
Bob: "Well, there are many, many eigenvectors that correspond to each eigenvalue. Which specific eigenvectors are actually returned?"
Sandy: "The specific eigenvectors that are the ouput of eig."
Perhaps I misunderstand what I should have considered.
Would you agree that the doc page could just have accurately said: "... the matriix V whose columns are one set of many possible sets of right eigenvectors that satisfy A*V = V*D." ?
The bottom line for mldivide is that "a solution" and "the solution" are not gramatically equivalent. So either the doc page or the help should be modified. They can't both be correct. For me, it's pretty clear that the help should change.
Bruno Luong
Bruno Luong le 5 Août 2022
Modifié(e) : Bruno Luong le 5 Août 2022
They could, but to me at all cost writing a superbe precise statement, one lost the main substance of the document. The user who works with eigen vectors MUST know they are defined up to a non-zero scaling, TMW is not responsible to remind that fact. The purpose of the function document is to describe what is the output argument it returns. Adding "one of many possible bla bla bla" just make the sentense long and does add any useful information. That is just my point of view you might disagree.
"a solution" and "the solution" are not gramatically equivalent."
Granted, but they are not exclusive.
Given that
  • "the solution" (in a proper context) is "a solution".
They can perfectly both used, without any contradiction. So I don't see the reason they cannot be coexist.
Holy cow, guys. This has pretty much degenerated into "that depends on what the definition of "is" is". Let it go, already. Have a good night. :)

Connectez-vous pour commenter.

 Réponse acceptée

Matt J
Matt J le 4 Août 2022
Modifié(e) : Matt J le 4 Août 2022
In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right?
No, the backslash operator will use a QR-solver to produce a solution in that case. This won't necessarily be the least 2-norm solution.

12 commentaires

There is a function lsqminnorm which returns the solution x with smallest 2-norm. As Matt says, backslash just returns a solution x (the one that's cheapest to compute).
Firstly, I would like to thank you both for reading and responding to my question.
While I am grateful for references to the various functions that can be used to compute the least squares solution of an underdetermined linear system, this was never an issue for me.
However, two issues remain:
Firstly, the computed result depends on the representation of the matrix. We get one result if we treat the matrix as a dense matrix and we get another result which is completely different if we treat the matrix as a sparse matrix. Is this really the intended behavior? If the answer is yes, then this should be made very clear in the documentation.
Secondly, the documentation does not reflect the reality of the software. I quote from the website: https://se.mathworks.com/help/matlab/ref/mldivide.html, specifically the line:
"If A is a rectangular m-by-n matrix with m~=n, and B is a matrix with m rows, then A\B is a least-squares solution to the system of equations A*x=B."
The website is correct in the sense that A\B is a solution, but as I have demonstrated it is not the least squares solution. I also quote from the internal documentation from MATLAB 2020b, i.e., the command "help \":
"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B. The effective rank, K, of A is determined from the QR decomposition with pivoting. A solution X is computed which has at most K nonzero components per column. If K < N this will usually not be the same solution as PINV(A)*B. A\EYE(SIZE(A)) produces a generalized inverse of A."
The internal documentation is wrong because as I have demonstrated, A\B is not the least squares solution. Moreover, the first and the third sentence of the internal documentation appear to contradict each other.
Least squares solution and least squares solution with minimum norm are two different things.
Bruno Luong
Bruno Luong le 4 Août 2022
Modifié(e) : Bruno Luong le 4 Août 2022
As I understood, least square solution means the square-norm of the residu is minimal
|A*x - b|
Not necessary the solution with minimal norm |x| in case (K < N) where multiple solutions exist.
Once you know, the doc can be interpret coherently. I must admit that sometime TMW doc is not rigorously written.
We get one result if we treat the matrix as a dense matrix and we get another result which is completely different if we treat the matrix as a sparse matrix. Is this really the intended behavior? If the answer is yes, then this should be made very clear in the documentation.
Even if you had agreement between dense-type and sparse-type, you could not count on getting the same result on different computer architectures.
When a continuuum of solutions exists to any minimization problem, then the solution is unstable. It is never possible to guarantee which one you will get from a numerical routine. This is a general fact from numerical analysis. See below how much the solution can change when we add small random perturbations to A and b in your proposed example:
% Define a wide matrix
A=[1 1 0; 0 1 1];
% Define a solution
t=(1:3)';
% Define the right hand side so that Ax=b is consistent
b=A*t
b = 2×1
3 5
t1=A\b %solution 1
t1 = 3×1
-2.0000 5.0000 0
noise=@(z) z+0.0001*randn(size(z));
An=noise(A), bn=noise(b), %add noise
An = 2×3
0.9999 1.0001 -0.0002 0.0000 1.0000 1.0000
bn = 2×1
2.9999 5.0003
t2=An\bn %noisy solution
t2 = 3×1
0 2.9999 2.0003
Thank you all for taking the time to respond. I shall reply to each individual.
@Torsten You are certainly correct that the least squares solution of a tall linear system is distinctly different from the least norm solution of a wide linear system. However, it is quite common to use the term least squares solution to refer to both situations. When I read the internal documentation I took note of the use of definite article "the" as in "the solution", a terminology which always implies uniqueness. I therefore assumed that A\b would minimize the 2-norm of the residual when the matrix is tall and minimize the 2-norm of the solution when the matrix is wide.
@Bruno Luong In my case of a consistent underdetermined linear system the residual is zero. It is therefore natural that the reader assumes that the term "the solution in the least squares sense" refers to problem of finding the solution with the smallest 2-norm. It is very kind of you to admit that the documentation is not always rigourously written. We now know how to improve this specific section.
@Matt J Thank you for demonstrating that MATLAB's backslash operator is unstable for this use case. The algorithm that I use, i.e. the expression z=A'(A*A')\b, does not suffer from this problem as the central matrix is exceedingly well-conditioned for the example that I have provided.
Matt J
Matt J le 4 Août 2022
Modifié(e) : Matt J le 4 Août 2022
The algorithm that I use, i.e. the expression z=A'(A*A')\b is exceedingly well-conditioned for the example that I have provided.
It's an apples and oranges comparison, though. Your expression z=A'(A*A')\b is the solution of of different minimization problem, one with a unique minimum. It is also more computationally demanding than a simple, unqualified least squares minimization.
Bruno Luong
Bruno Luong le 4 Août 2022
Modifié(e) : Bruno Luong le 4 Août 2022
Assume that the word "least-square solution" has other meaning, and define as minimal norm solution (I still beg you to find a link/book to backup this definition that you pretend to be usual), then you should notice that the bellow sentense from the doc wouldn't then make any logical sense:
"The effective rank, K, of A is determined from the QR decomposition with pivoting. A solution X is computed which has at most K nonzero components per column. If K < N this will usually not be the same solution as PINV(A)*B. "
You should first question the correctness of the doc instead of believe (obviously wrongly) the doc.
@Bruno Luong Regarding the terminology. Here is an example from a course at the University of Wisconsin:
In any case, the sentence that your are refering to clashes with a previous sentence that implied the existence of a unique solution. Here the issue is the use of the article "the" as in "the solution".
@Matt J You are quite far from the issue at hand, i.e., the result of x=A\b depends on the data type of A and the 2-norm of x is not minimal, when A is wide and has full row rank. In general, one would solve a wide linear system using a QR factorization of the tall matrix instead of using the formula .
Matt J
Matt J le 5 Août 2022
Modifié(e) : Matt J le 5 Août 2022
You are quite far from the issue at hand, i.e., the result of x=A\b depends on the data type of A and the 2-norm of x is not minima
The fact that the 2-norm of x is not minimized is not the issue. We have already established that A\b does not pledge to deliver an x with this property, either for sparse or for full A.
The fact that the solution depends on the data type of A is the only issue that remains, and my comments are applicable to it. I have argued that when the solution is under-determined, you cannot expect different numerical routines - nor even the same numerical routine on a different computer - to deliver the same solution, ever. So, it is not clear why you think the results should be the same.
Perhaps you think that sparse and full should at least use the same algorithm to select one of the non-unique solutions? That defeats the purpose, though, of having sparse type. The purpose of sparse type is to process things differently from full type, and in a way that exploits the sparsity of A.
@Matt J No, I am afraid that I cannot agree with you. I believe that we have established that the documentation of backslash can be improved to clarify which problem is being solved and that the terminology is not consistent across the international community or even within the USA, see the link to Wiscon state university that I provided above.
I have already explained to @Steven Lord how the current documentation can be read to support my interpretation. I have also stated clearly that if the desired behavior is to produce different solution for the same matrix depending on the data-type, then this should figure prominently in the documentation.
Matt J
Matt J le 5 Août 2022
Modifié(e) : Matt J le 5 Août 2022
@Carl Christian Kjelgaard Mikkelsen Even if you disagree or think the documentation should be modified, your question has been answered right?
It should now be clear that A\b purports to do no more than to minimize . When there exists more than one solution to this problem, there is no commitment on the MathWorks' part as to which of the non-unique solutions you will get. Furthermore, there is no commitment that the solutions will be the same from full-to-sparse or from computer-to-computer. None of this is anything the MathWorks regards as a bug.

Connectez-vous pour commenter.

Plus de réponses (2)

Few other methods to get least-norm solution
A = rand(3,6)
A = 3×6
0.2748 0.9301 0.9530 0.4196 0.5850 0.3316 0.4791 0.8315 0.6904 0.6636 0.2678 0.6999 0.4191 0.4202 0.0789 0.2761 0.9976 0.3208
b = rand(3,1)
b = 3×1
0.4586 0.9898 0.8222
% Methode 1, Christine, recommended
x = lsqminnorm(A,b)
x = 6×1
0.5627 0.0942 -0.3752 0.5117 0.2032 0.7243
% Method 2, Pseudo inverse
x = pinv(A)*b
x = 6×1
0.5627 0.0942 -0.3752 0.5117 0.2032 0.7243
% Method 3, BS on KKT
[m,n] = size(A);
y=[eye(n), A'; A, zeros(m)] \ [zeros(n,1); b];
x = y(1:n)
x = 6×1
0.5627 0.0942 -0.3752 0.5117 0.2032 0.7243
% Method 4, QR on A'
[Q,R,p] = qr(A',0);
x = Q*(R'\b(p,:))
x = 6×1
0.5627 0.0942 -0.3752 0.5117 0.2032 0.7243

2 commentaires

Thank you for reading and responding to my question.
Please note that my issue is not the problem of computing the least squares solution of a consistent underdetermined linear system, but the fact that the backslash operator provides very different results depending on the matrix representation and that the computed results do not minimize the 2-norm over the set of solutions.
Bruno Luong
Bruno Luong le 4 Août 2022
Modifié(e) : Bruno Luong le 4 Août 2022
"Please note that my issue is not the problem of computing the least squares solution"
I think you need to reset the definition of what is "least squares solution". I guess all the confusion is that you associate with minimum norm solusion, which is not in general.
I know, and guess most of people working sometime with MATLAB also know it.

Connectez-vous pour commenter.

It seems the gist of your question comes down to the idea that for a wide matrix, MATLAB should (in your eyes only) always produce a solution that minimizes the norm of x. For square matrices, or matrices that are taller than wide, it should do something else? That alone seems highly inconsistent to me.
So first, where have you ever seen the claim that this is true? No place in the documentation is this claimed. I even took a quick look. Maybe you have confused the idea of a solution that minimizes the norm of the residiuals to a solution that minimizes the norm of x itself. We can't really know where you got the idea.
Should something be fixed? Of course not! MATLAB already provides multiple solutions that yield the minimum norm of x when they are used.
A = [1 2 3;4 5 7];
b = [2;3];
A\b
ans = 3×1
-1.0000 0 1.0000
So backslash produces a basic solution, as expected for the underdetermined case, with one element exactly zero. For a solution that is minimum norm in x, we already have:
lsqminnorm(A,b)
ans = 3×1
-1.0571 0.2857 0.8286
pinv(A)*b
ans = 3×1
-1.0571 0.2857 0.8286
lsqr(A,b)
lsqr converged at iteration 2 to a solution with relative residual 1.7e-14.
ans = 3×1
-1.0571 0.2857 0.8286

5 commentaires

@John D'Errico Thank your for your input.
The backslash operator is already overloaded and the choice of algorithm depends on the input. All I am asking is that documentation clearly establishes which problem is being solved and that the output depends on the datatype.
I have explained elsewhere on this page how the current documentation can be read to support my interpretation and that the terminology varies even with the USA.
Bruno Luong
Bruno Luong le 5 Août 2022
Modifié(e) : Bruno Luong le 5 Août 2022
@Carl Christian Kjelgaard Mikkelsen " I have explained elsewhere on this page how the current documentation can be read to support my interpretation "
Not really, or not at all period, because it implies a big incoherent about the sentence on the warning of eventual different between "\" and "PINV" that you completely miss (or denies) when you read it.
I think beside you no one here read like you. That must count for something right?
In the doc there are flowcharts of the algorithms used for each matrix types, dense and sparse, please scroll to the bottom just above the bibliography references. Wasn't that enough for you?
The flowchart says "Use QR solver" for both full and sparse, non-square. Wtih that exact same wording, I think it's reasonable for users like me who have no expertise in the mechanics of such solvers to expect the same result (to within floating point roundoff stuff) for both sparse and full cases. There would be no harm in addiing a line at the bottom of the flowcharts alerting the user to not expect the same results for sparse and full, at least for the non-square path (I didn't look at all of the others).
The only place I can see where you claim the documentation says the solution is a minimum norm solution is here:
"The key is the use of the article "the" as in "the solution". It implies uniqueness. Therefore, the reader will believe that x=A\b minimizes 2-norm of the residual when A is tall matrix of full column rank and that x=A\b has the smallest 2-norm of all the solutions of Ax=b when A is a wide matrix of full row rank. Why? Because these two problems connected to the 2-norm have unique solutions."
NO place in the documentation for backslash has it said anything about minimum norm for x. You have made a jump to an unfounded conclusion from nothing more than their use of the word "THE", to decide that this implies your version of what you want to be true.
The problem is, there are certainly other ways to choose a UNIQUE solution. For example, why assume a minimum 2-norm?
And to jump to a conclusion from the inclusion of a three letter word in the documentation is a bit strange. Surely if they were going to SPECIFICALLY do something different for one shape matrix than for another shape matrix, they would have said something more than just an obique reference with the word "the"?
We see this line:
"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"
The phrase "In the least squares sense" means a solution that minimizes norm(A*X-b). It says nothing about norm(x).
Honestly, I think you are grasping at straws here, based only on the use of the word "the".
" grasping at straws"
I disagree, he only grasping at the straw. ;-)

Connectez-vous pour commenter.

Catégories

En savoir plus sur Mathematics and Optimization dans Centre d'aide et File Exchange

Produits

Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by