Backslash does not provided the solution with the smallest 2-norm

Question

1 vote

I was debugging a constraint solver when I encountered a problem with the backslash operator in MATLAB.

I have isolated the problem into the attached minimal working example. The issue occurs when solving a consistent but under-determined linear system Ax=b. In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right? I believe that the attached script demonstrates that this is not the case and that the computed result depends on whether A is treated as a sparse or a dense matrix.

Would you please take a look and see if I have missed something or if this is indeed an issue that should be fixed?

Kind regards

Carl Christian K. Mikkelsen.

% The backslash operator and underdetermined linear systems
% 
% The result of A\b depends on whether A is dense or sparse and 
% the computed results do not minimize the 2-norm over the set
% of solutions of Ax=b.
%
% PROGRAMMING by Carl Christian Kjelgaard Mikkelsen (spock@cs.umu.se)
%   2022-08-04 Initial programming and testing
% Define a wide matrix 
A=[1 1 0; 0 1 1];
% Define a solution
t=(1:3)';
% Define the right hand side so that Ax=b is consistent
b=A*t;
% Solve the underdetermined linear equation Ax=b
% using a dense representation of A
x=A\b;
% Solve the underdetermined linear equation Ax=b 
% using a sparse representation of A
y=sparse(A)\b;
% Solve the underdetermined linear equation Ax=b
% in the linear least squares sense
z=A'*((A*A')\b);
% Compute the residuals
rx=b-A*x;
ry=b-A*y;
rz=b-A*z;
% Compute the 2-norm of the solutions
nx=norm(x);
ny=norm(y);
nz=norm(z);
% Display the results
fprintf('The solutions\n');
The solutions
display([x y z]);
   -2.0000    3.0000    0.3333
    5.0000         0    2.6667
         0    5.0000    2.3333
fprintf('The residuals\n');
The residuals
display([rx ry rz]);
   1.0e-14 *

    0.1776         0    0.0444
    0.1776         0   -0.1776
fprintf('The norm of the solutions\n');
The norm of the solutions
display([nx ny nz]);
    5.3852    5.8310    3.5590

15 commentaires
Afficher 13 commentaires plus anciens Masquer 13 commentaires plus anciens

Carl Christian Kjelgaard Mikkelsen le 4 Août 2022

@Steven Lord Thank your for your response. I have discussed this point elsewhere on this page, but I shall develop the point further right here.

Some authors will carefully distinguish between the solving a tall linear system in the least squares sense and computing the least norm solution of a wide linear systems. However, it is quite common to use the term "least squares solution" to cover both case. In the case of a consistent and underdetermined linear system, there are solutions and it makes no sense to minimize the residual, rather it is the norm of the solution which is minimized.

Therefore, when a user reads the text provided by the command "help \" there is only one way to interpret the line:

"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"

The key is the use of the article "the" as in "the solution". It implies uniqueness. Therefore, the reader will believe that x=A\b minimizes 2-norm of the residual when A is tall matrix of full column rank and that x=A\b has the smallest 2-norm of all the solutions of Ax=b when A is a wide matrix of full row rank. Why? Because these two problems connected to the 2-norm have unique solutions.

Paul le 5 Août 2022

Ouvrir dans MATLAB Online

I agree with @Carl Christian Kjelgaard Mikkelsen that help and the documentation should be clarified. The help states

s = split(string(help('mldvide')),'.');
s = strtrim(s(8))
s = 
    "If A is an M-by-N rectangular matrix with M~=N and B is a column
         vector with M components, or a matrix with several such columns,
         then X = A\B is the solution in the least squares sense to the
         under- or overdetermined system of equations A*X = B"

Note the phrase "the solution."

However, the doc page mldivide, \ states:

"

If A is a rectangular m-by-n matrix with m ~= n, and B is a matrix with m rows, then A\B returns a least-squares solution to the system of equations A*x= B."

Note the phrase "a ... solution."

So if reading the former, it's reasonable to expect that mldivide returns "the [unique] solution" and if the latter it returns "a ... solution [of many]."

As has been pointed out, if A is not square, then there can be many solutions that yield the same, minimum 2-norm of the residual. So, IMO, the help should be changed to not refer to "the solution."

What would be the harm in modifying the doc page (and the help I suppose) to read something like "... then A\B returns a least-squares solution to the system of equations A*x= B, i.e., a solution that mininizes norm(A*x - b)." Then there would be no confusion between least-squares and minimum norm solutions.

It would be even better to also explain that in some cases such a solution is unique and in others it isn't, and for the latter cases explain which criteria is used to return the result, unless such logic is too dificult to explain when considering all of the possible paths that mldivide can take.

Paul le 5 Août 2022

You are correct, I shouldn't have used "unique." According to my dictionary "the" is a definite article to "indicate that a following noun or noun equivalent is definite or has been previously specified by context or by circumstance." By pointing her finger, Sandy is making clear by circumstance the specific car. On the other hand, it would be quite confusing if Sandy said "look at the car" w/o any other indication if there are many cars to see.

If stating "the solution" it should be clear to the reader which solution is the referent, which might or might not be a unique solution, but at least we'll know which solution we're talking about.

In this case, there can, in general, be many least-squares solutions, so refering to "the (define) least-squares solution" w/o any additional context is not correct, IMO.

OTOH, the indefinite article "a" means the referent is uspecified. Sandy pointed at a car. I don't know which car she pointed to. mldivide returns a least-squares solution. I don't know which one it returns, just that whetever it returns is a least-squares solution.

Though I woudn't use the minimum norm solution synonymously with least-square solution, apparently others do as shown by @Carl Christian Kjelgaard Mikkelsen in this commentand also here. So if one were to (correctly IMO) read "the solution" as referring to a particular solution, it makes a lot of sense to think it's refering to the unique, minimum norm solution for the underdetermined problem.

And now I realzie that @Carl Christian Kjelgaard Mikkelsen already pointed out the "a" / "the" issue in this comment. I just should have responded there.

Bruno Luong le 5 Août 2022

"a solution" and "the solution" are not gramatically equivalent."

Granted, but they are not exclusive.

Given that

"the solution" (in a proper context) is "a solution".

They can perfectly both used, without any contradiction. So I don't see the reason they cannot be coexist.

Les Beckham le 5 Août 2022

Holy cow, guys. This has pretty much degenerated into "that depends on what the definition of "is" is". Let it go, already. Have a good night. :)

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Matt J le 4 Août 2022

Modifié(e) : Matt J le 4 Août 2022

2 votes

In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right?

No, the backslash operator will use a QR-solver to produce a solution in that case. This won't necessarily be the least 2-norm solution.

12 commentaires
Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

Carl Christian Kjelgaard Mikkelsen le 4 Août 2022

Firstly, I would like to thank you both for reading and responding to my question.

While I am grateful for references to the various functions that can be used to compute the least squares solution of an underdetermined linear system, this was never an issue for me.

However, two issues remain:

Firstly, the computed result depends on the representation of the matrix. We get one result if we treat the matrix as a dense matrix and we get another result which is completely different if we treat the matrix as a sparse matrix. Is this really the intended behavior? If the answer is yes, then this should be made very clear in the documentation.

Secondly, the documentation does not reflect the reality of the software. I quote from the website: https://se.mathworks.com/help/matlab/ref/mldivide.html, specifically the line:

"If A is a rectangular m-by-n matrix with m~=n, and B is a matrix with m rows, then A\B is a least-squares solution to the system of equations A*x=B."

The website is correct in the sense that A\B is a solution, but as I have demonstrated it is not the least squares solution. I also quote from the internal documentation from MATLAB 2020b, i.e., the command "help \":

"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B. The effective rank, K, of A is determined from the QR decomposition with pivoting. A solution X is computed which has at most K nonzero components per column. If K < N this will usually not be the same solution as PINV(A)*B. A\EYE(SIZE(A)) produces a generalized inverse of A."

The internal documentation is wrong because as I have demonstrated, A\B is not the least squares solution. Moreover, the first and the third sentence of the internal documentation appear to contradict each other.

Matt J le 4 Août 2022

Modifié(e) : Matt J le 4 Août 2022

Ouvrir dans MATLAB Online

We get one result if we treat the matrix as a dense matrix and we get another result which is completely different if we treat the matrix as a sparse matrix. Is this really the intended behavior? If the answer is yes, then this should be made very clear in the documentation.

Even if you had agreement between dense-type and sparse-type, you could not count on getting the same result on different computer architectures.

When a continuuum of solutions exists to any minimization problem, then the solution is unstable. It is never possible to guarantee which one you will get from a numerical routine. This is a general fact from numerical analysis. See below how much the solution can change when we add small random perturbations to A and b in your proposed example:

% Define a wide matrix 
A=[1 1 0; 0 1 1];
% Define a solution
t=(1:3)';
% Define the right hand side so that Ax=b is consistent
b=A*t
b = 2×1
     3
     5
t1=A\b  %solution 1
t1 = 3×1
   -2.0000
    5.0000
         0
noise=@(z) z+0.0001*randn(size(z));
An=noise(A), bn=noise(b),  %add noise
An = 2×3
    0.9999    1.0001   -0.0002
    0.0000    1.0000    1.0000
bn = 2×1
    2.9999
    5.0003
t2=An\bn  %noisy solution
t2 = 3×1
         0
    2.9999
    2.0003

Carl Christian Kjelgaard Mikkelsen le 4 Août 2022

Thank you all for taking the time to respond. I shall reply to each individual.

@Torsten You are certainly correct that the least squares solution of a tall linear system is distinctly different from the least norm solution of a wide linear system. However, it is quite common to use the term least squares solution to refer to both situations. When I read the internal documentation I took note of the use of definite article "the" as in "the solution", a terminology which always implies uniqueness. I therefore assumed that A\b would minimize the 2-norm of the residual when the matrix is tall and minimize the 2-norm of the solution when the matrix is wide.

@Bruno Luong In my case of a consistent underdetermined linear system the residual is zero. It is therefore natural that the reader assumes that the term "the solution in the least squares sense" refers to problem of finding the solution with the smallest 2-norm. It is very kind of you to admit that the documentation is not always rigourously written. We now know how to improve this specific section.

@Matt J Thank you for demonstrating that MATLAB's backslash operator is unstable for this use case. The algorithm that I use, i.e. the expression z=A'(A*A')\b, does not suffer from this problem as the central matrix

is exceedingly well-conditioned for the example that I have provided.

Matt J le 5 Août 2022

Modifié(e) : Matt J le 5 Août 2022

You are quite far from the issue at hand, i.e., the result of x=A\b depends on the data type of A and the 2-norm of x is not minima

The fact that the 2-norm of x is not minimized is not the issue. We have already established that A\b does not pledge to deliver an x with this property, either for sparse or for full A.

The fact that the solution depends on the data type of A is the only issue that remains, and my comments are applicable to it. I have argued that when the solution is under-determined, you cannot expect different numerical routines - nor even the same numerical routine on a different computer - to deliver the same solution, ever. So, it is not clear why you think the results should be the same.

Perhaps you think that sparse and full should at least use the same algorithm to select one of the non-unique solutions? That defeats the purpose, though, of having sparse type. The purpose of sparse type is to process things differently from full type, and in a way that exploits the sparsity of A.

Carl Christian Kjelgaard Mikkelsen le 5 Août 2022

@Matt J No, I am afraid that I cannot agree with you. I believe that we have established that the documentation of backslash can be improved to clarify which problem is being solved and that the terminology is not consistent across the international community or even within the USA, see the link to Wiscon state university that I provided above.

I have already explained to @Steven Lord how the current documentation can be read to support my interpretation. I have also stated clearly that if the desired behavior is to produce different solution for the same matrix depending on the data-type, then this should figure prominently in the documentation.

Matt J le 5 Août 2022

Modifié(e) : Matt J le 5 Août 2022

@Carl Christian Kjelgaard Mikkelsen Even if you disagree or think the documentation should be modified, your question has been answered right?

It should now be clear that A\b purports to do no more than to minimize

. When there exists more than one solution to this problem, there is no commitment on the MathWorks' part as to which of the non-unique solutions you will get. Furthermore, there is no commitment that the solutions will be the same from full-to-sparse or from computer-to-computer. None of this is anything the MathWorks regards as a bug.

Connectez-vous pour commenter.

Answer 2

Bruno Luong le 4 Août 2022

Modifié(e) : Bruno Luong le 4 Août 2022

Ouvrir dans MATLAB Online

1 vote

Few other methods to get least-norm solution

A = rand(3,6)
A = 3×6
    0.2748    0.9301    0.9530    0.4196    0.5850    0.3316
    0.4791    0.8315    0.6904    0.6636    0.2678    0.6999
    0.4191    0.4202    0.0789    0.2761    0.9976    0.3208
b = rand(3,1)
b = 3×1
    0.4586
    0.9898
    0.8222
% Methode 1, Christine, recommended
x = lsqminnorm(A,b)
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243
% Method 2, Pseudo inverse
x = pinv(A)*b
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243
% Method 3, BS on KKT
[m,n] = size(A);
y=[eye(n), A'; A, zeros(m)] \ [zeros(n,1); b];
x = y(1:n)
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243
% Method 4, QR on A'
[Q,R,p] = qr(A',0);
x = Q*(R'\b(p,:))
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243

2 commentaires
Afficher Aucune Masquer Aucune

Carl Christian Kjelgaard Mikkelsen le 4 Août 2022

Thank you for reading and responding to my question.

Please note that my issue is not the problem of computing the least squares solution of a consistent underdetermined linear system, but the fact that the backslash operator provides very different results depending on the matrix representation and that the computed results do not minimize the 2-norm over the set of solutions.

Bruno Luong le 4 Août 2022

Modifié(e) : Bruno Luong le 4 Août 2022

"Please note that my issue is not the problem of computing the least squares solution"

I think you need to reset the definition of what is "least squares solution". I guess all the confusion is that you associate with minimum norm solusion, which is not in general.

I know, and guess most of people working sometime with MATLAB also know it.

https://www.math-forums.com/threads/underdetermined-systems-backslash.188674/

https://fr.mathworks.com/matlabcentral/answers/814450-mldivide-algorithm-for-an-underdetermined-system-of-equations

https://fr.mathworks.com/matlabcentral/answers/1755095-why-matrix-division-returns-different-answers?s_tid=srchtitle

Connectez-vous pour commenter.

Answer 3

John D'Errico le 5 Août 2022

Modifié(e) : John D'Errico le 5 Août 2022

Ouvrir dans MATLAB Online

0 votes

It seems the gist of your question comes down to the idea that for a wide matrix, MATLAB should (in your eyes only) always produce a solution that minimizes the norm of x. For square matrices, or matrices that are taller than wide, it should do something else? That alone seems highly inconsistent to me.

So first, where have you ever seen the claim that this is true? No place in the documentation is this claimed. I even took a quick look. Maybe you have confused the idea of a solution that minimizes the norm of the residiuals to a solution that minimizes the norm of x itself. We can't really know where you got the idea.

Should something be fixed? Of course not! MATLAB already provides multiple solutions that yield the minimum norm of x when they are used.

A = [1 2 3;4 5 7];
b = [2;3];
A\b
ans = 3×1
   -1.0000
         0
    1.0000

So backslash produces a basic solution, as expected for the underdetermined case, with one element exactly zero. For a solution that is minimum norm in x, we already have:

lsqminnorm(A,b)
ans = 3×1
   -1.0571
    0.2857
    0.8286
pinv(A)*b
ans = 3×1
   -1.0571
    0.2857
    0.8286
lsqr(A,b)
lsqr converged at iteration 2 to a solution with relative residual 1.7e-14.
ans = 3×1
   -1.0571
    0.2857
    0.8286

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

John D'Errico le 6 Août 2022

@Carl Christian Kjelgaard Mikkelsen

The only place I can see where you claim the documentation says the solution is a minimum norm solution is here:

"The key is the use of the article "the" as in "the solution". It implies uniqueness. Therefore, the reader will believe that x=A\b minimizes 2-norm of the residual when A is tall matrix of full column rank and that x=A\b has the smallest 2-norm of all the solutions of Ax=b when A is a wide matrix of full row rank. Why? Because these two problems connected to the 2-norm have unique solutions."

NO place in the documentation for backslash has it said anything about minimum norm for x. You have made a jump to an unfounded conclusion from nothing more than their use of the word "THE", to decide that this implies your version of what you want to be true.

The problem is, there are certainly other ways to choose a UNIQUE solution. For example, why assume a minimum 2-norm?

And to jump to a conclusion from the inclusion of a three letter word in the documentation is a bit strange. Surely if they were going to SPECIFICALLY do something different for one shape matrix than for another shape matrix, they would have said something more than just an obique reference with the word "the"?

We see this line:

"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"

The phrase "In the least squares sense" means a solution that minimizes norm(A*X-b). It says nothing about norm(x).

Honestly, I think you are grasping at straws here, based only on the use of the word "the".

Bruno Luong le 6 Août 2022

" grasping at straws"

I disagree, he only grasping at the straw. ;-)

Connectez-vous pour commenter.

Backslash does not provided the solution with the smallest 2-norm

15 commentaires
Afficher 13 commentaires plus anciens Masquer 13 commentaires plus anciens

Réponse acceptée

12 commentaires
Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

Plus de réponses (2)

2 commentaires
Afficher Aucune Masquer Aucune

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Catégories

Produits

Version

Tags

Community Treasure Hunt

Backslash does not provided the solution with the smallest 2-norm

15 commentaires Afficher 13 commentaires plus anciens Masquer 13 commentaires plus anciens

Réponse acceptée

12 commentaires Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

Plus de réponses (2)

2 commentaires Afficher Aucune Masquer Aucune

5 commentaires Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

15 commentaires
Afficher 13 commentaires plus anciens Masquer 13 commentaires plus anciens

12 commentaires
Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

2 commentaires
Afficher Aucune Masquer Aucune

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens