Inconsistency in behavior of SVD

39 vues (au cours des 30 derniers jours)
Thomas Humphries
Thomas Humphries le 20 Déc 2021
Modifié(e) : John D'Errico le 23 Déc 2021
This came out of an example that I gave my class on computing SVD of low-rank flags (based on an exercise from Gilbert Strang's book). I had students compute low-rank approximations to different flags and then plot the singular values of the flags on a semilog scale. Some students' plots ended up looking different than others, and I realized it depended on whether they called svd with one or three output arguments. Here is a minimal working example:
F = zeros(1000,1800); F(:,1:600) = 37; F(:,601:1200) = 255; F(:,1201:end) = 102; %A tricolor flag, eg France or Italy
>> [u,s,v] = svd(F); s(100,100) %s is a matrix
ans =
2.143808870680404e-11
s = svd(F); s(100) %s is a vector
ans =
7.312718101034703e-79
Now, this flag has rank 1, so obviously the 100th singular value is mathematically zero. But what accounts for the nearly 70 orders of magnitude difference here? (which obviously looks very different on a semilog plot!) My assumption was that the svd calculation should be the same whether one is requesting the entire SVD or just the singular values, but obviously something different is happening behind the scenes...

Réponse acceptée

Christine Tobler
Christine Tobler le 23 Déc 2021
Firstly, I agree completely with John's great explanations above: Any singular value below eps*first singular value should be treated as zero.
The behavior here isn't wrong, since in both cases the outputs are accurate up to expected round-off. However, it is clearly puzzling and we would like to make the two calls' results match more closely (so that they would look similar when plotted), even though we don't aim for matching down to the last decimal between different calls like this, preferring to get the best possible performance for each individual call.
Some background: Computing the SVD of a matrix A is done in two stages:
  1. A = X*B*Y', where X and Y are orthogonal matrices and B is a bidiagonal matrix. This incurs a round-off error of eps*norm(A).
  2. B = Q*S*P' where Q and P are orthogonal matrices and S is a diagonal matrix of singular values.
For step 2, there is an algorithm which is specialized for cases where the scaling changes a lot along the diagonals of B, in which case it can compute singular values that differ by much more than a factor eps from largest to smallest. However, this algorithm only computes the singular values and not the singular vectors.
That algorithm has been added into the LAPACK library (a standard linear algebra library which we call for SVD and other computations in MATLAB), to be used when no eigenvectors are requested. This is what we call in MATLAB, and why you are seeing the differences in behavior. We can't practically change this behavior from MATLAB without always computing the singular vectors even when they aren't requested, which would be a huge decrease in performance.
Some more comments:
  • These differences in behavior largely affect matrices with large differences in scale (either the norms of each row or the norms of each column have large differences between them). Other matrices are usually not noticeably affected.
  • While the singular values computed in the one-output case are accurate for the matrix B above, this only means they are accurate for the original matrix A if the round-off in step 1 was much smaller than usual (for example, if A was already bidiagonal to start with). Otherwise, we first introduce round-off in step 1, and then get a very accurate picture of the singular values of the matrix that already doesn't match A to the degree that step 2 is using.
So in short, the reason for the difference is that a more accurate algorithm is used in the one-output case that doesn't exist for the three-output case. However, this algorithm is only more accurate if the input matrix has some very special preconditions, otherwise it just represents the round-off error slightly differently, with no effect on accuracy. Singular values below eps*norm(A) should not be trusted, except when you have some very specific knowledge about your input matrix.
  4 commentaires
John D'Errico
John D'Errico le 23 Déc 2021
Modifié(e) : John D'Errico le 23 Déc 2021
We are truly greatful that Christine is at TMW. Certainly I am. I have never seen a case where her input on a question was not insightful and well thought out. Thank you again.
I'll try out a random thought here, just for kicks. Suppose we made the decision today that in future releases, SVD will always look at any singular values, compare them to the maximum singular value, and ZERO out any that fall below eps times that maximum. Essentially, we would be just conceding that our best estimate down there is they are zero anyway. Returning a garbage value seems a bit silly. So just round down to zero, as that is our best estimate of the value at that point.
Would that break anything? Well, it might, at least in the sense it would make SVD not completely backwards compatible. But I could argue if anyone has been truly drawing conclusions based on those small singular vlaues, those conclusions were essentially random fluff.
Does SVD produce correct estimate for the singular values, for a matrix that is known to be non-singular, but is numerically singular? For example, we could consider hilb(25). It looks to have a numerical rank of 14 here.
N = 25;
A = hilb(N);
rankA = rank(A)
rankA = 14
But in fact, we know it is not truly singular, just a difficult problem. Lets look at the singular values there.
[U,S,V] = svd(A);
s = svd(A);
semilogy(1:N,s,'g-o')
hold on
semilogy(1:N,diag(S),'b:s')
semilogy(1:N,double(svd(sym(hilb(N)))),'m--+')
xline(rankA)
yline(eps*max(s),'r')
hold off
Interestingly, even though the last singular value that rank trusted was the 14th one, two more singular values seem to have been accurately produced, since they closely follow the symbolic result. As well, I see that both the 1 and 3 output versions of SVD have produced the same results for the smaller singular values here. That is very different from the flag call from @Thomas Humphries.
All of that suggests we would not have wanted to broadly replace all singular values below
eps*max(s)
ans = 4.3338e-16
with zero.
I would also note that both versions of SVD produce the same result for the small singular values, at least for hilb(25). However, suppose I tried it for hilb(26)? :-) In fact, for any value of N greater than 25, we get a different result. I'll try N=26 here as an example.
N = 26;
A = hilb(N);
rankA = rank(A)
rankA = 14
[U,S,V] = svd(A);
s = svd(A);
semilogy(1:N,s,'g-o')
hold on
semilogy(1:N,diag(S),'b:s')
xline(rankA)
yline(eps*max(s),'r')
Interesting behavior from svd here as beyond N=25, the two versions of SVD diverge in their behavior for the small singular values. Regardless, it does appear that on at least some matrices, SOME of the singular values below eps*max(s) seem to contain useful information content.
Thomas Humphries
Thomas Humphries le 23 Déc 2021
Thanks for this great answer, Christine!

Connectez-vous pour commenter.

Plus de réponses (2)

John D'Errico
John D'Errico le 21 Déc 2021
Modifié(e) : John D'Errico le 21 Déc 2021
F = zeros(1000,1800); F(:,1:600) = 37; F(:,601:1200) = 255; F(:,1201:end) = 102; %A tricolor flag, eg France or Italy
[U,S,V] = svd(F);
s = svd(F);
s(1)
ans = 2.1466e+05
eps(s(1))
ans = 2.9104e-11
S(100,100)
ans = 2.1438e-11
s(100)
ans = 2.1902e-113
So that 100th singular value is roughly no larger than eps*s(1). In fact, even the second singular value is roughly that small.
The first rule of computing, and especialy linear algebra is to NEVER trust anything in the least significant bits, especially when there will be subtly different computations involved on different algorithmic paths.
rank(F)
ans = 1
semilogy(s)
semilogy(diag(S))
It looks like they use different code, IF you just want the singular values. And that makes some sense. Since it is all trash at that point, see rule number 1.
  4 commentaires
Paul
Paul le 22 Déc 2021
I'm not as concerned about how to interpret floating point trash outputs as I would be with using the resutls that are not floating point trash. For example
rng(100);
M = rand(3);
S1 = svd(M)
S1 = 3×1
1.4243 0.5670 0.2862
y1 = 2*S1(1);
[U,S2,V] = svd(M,"vector");
S2
S2 = 3×1
1.4243 0.5670 0.2862
y2 = 2*S2(1);
isequal(y1,y2)
ans = logical
0
Suppose I have some code that only needs S, and then a few months later I modify that code to also use U and V but do not change any computations using S. I could be in for a surprise when my S-based results change.
John D'Errico
John D'Errico le 22 Déc 2021
Again, the results are identical to within floating point trash! NEVER TRUST THE LEAST SIGNIFICANT BITS OF TWO COMPUTATIONS THAT ARE NOT DONE IN IDENTICALLY THE SAME SEQUENCE.
rng(100);
M = rand(3);
[U,S1,V] = svd(M,"vector");
format long g
S1
S1 = 3×1
1.42429794861722 0.567016326172269 0.286243430498377
S2 = svd(M)
S2 = 3×1
1.42429794861722 0.567016326172269 0.286243430498377
S1 == S2
ans = 3×1 logical array
0 0 0
norm(S1 - S2)
ans =
2.98936698014091e-16
So even though the two vectors look the same, the least significant bits vary just slightly. This is expected. It is true for almost any algebraic computation, when two operations are done that while mathematically the same result is expected, you should NEVER trust they will produce the same result down to the least significant bits.
A = rand(10);
B = rand(10);
C = rand(10);
(A + B - C) == (-C + B + A)
ans = 10×10 logical array
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0
You should not be surprised at finding tiny differences there, any more than being surprised at the SVD producing differences on the order of machine epsilon when there is likely a different path chosen in the code. Should the MathWorks document that there are differences in the code between the two paths inside SVD? As I said in my answer, that just begs for more problems explaining why you should not be trusting the LSBs of those numbers. And worse, suppose next year, they change the SVD code in some way, that does use the same path? Now they will need to change the documentation again and re-educate people.
The simple rule is best: NEVER trust the LSBs of your numbers in floating point computations.

Connectez-vous pour commenter.


Chunru
Chunru le 21 Déc 2021
F = zeros(1000,1800);
F(:,1:600) = 37; F(:,601:1200) = 255; F(:,1201:end) = 102; %A tricolor flag, eg France or Italy
This is a rank 1 matrix as shown by:
rank(F)
ans = 1
For rank 1 matrix, there is only one non-zero singular value. MATLAB arrange the singular values in descending order so that the 1st value is the desired singular value. All the rest should be ideally zero, but subject to computational accuracy.
So s(100,100) for matrix s or s(100) for vector s(100), they corresponding to the true sigular value of 0.
[u,s,v] = svd(F);
s(100,100) %s is a matrix
ans = 2.1438e-11
s1 = svd(F);
s1(100) %s is a vector
ans = 2.1902e-113
We are not sure the exact internal implementation of svd inside matlab. But the results with different output are more or less accurate up to computing accuracy of double (?).
format longg
[diag(s) s1]
ans = 1000×2
1.0e+00 * 214659.730736811 214659.730736811 2.97801336572204e-10 2.97104202825509e-10 2.1438088706804e-11 1.03894314937022e-12 2.1438088706804e-11 1.06656371608964e-23 2.1438088706804e-11 7.52960822699182e-24 2.1438088706804e-11 7.33435968746888e-24 2.1438088706804e-11 5.73717383511201e-24 2.1438088706804e-11 4.92384115399226e-24 2.1438088706804e-11 4.28392707240299e-24 2.1438088706804e-11 3.40957046848442e-24

Catégories

En savoir plus sur Linear Algebra dans Help Center et File Exchange

Produits


Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by