Single precision matrix multiplication

Question

Donald Liu le 12 Déc 2019

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/496358-single-precision-matrix-multiplication

Modifié(e) : James Tursa le 6 Août 2022

Script:

M = 10000;  N = 100;
A = round(10000 * complex(randn(M, N), randn(M, N)));
B = single(A') * single(A);
C = B' - B;
max(abs(C(:)))  %// ==> answer is non-zero

A = single(A);
B = A' * A;
C = B' - B;
max(abs(C(:)))  %// ==> answer is zero

Not sure what caused the difference? Thanks!

5 commentaires
Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

Bruno Luong le 5 Août 2022

Ouvrir dans MATLAB Online

This test seems to show that there is no extra intelligent parsing, but simply the expression (C*D) returns numerical Hermitian when C==D' in R2022a.

M = 10000;  N = 100;
A = round(10000 * complex(randn(M, N), randn(M, N)));
A = single(A);
SAc = single(A');
isequal(SAc, A')
ans = logical
   1
tic
B1 = SAc * A;
toc
Elapsed time is 0.011465 seconds.
norm(B1-B1','fro')
ans = single
    0
tic
B2 = A' * A;
toc
Elapsed time is 0.006590 seconds.
norm(B2-B2','fro')
ans = single
    0
norm(B1-B2,'fro')/norm(B2)
ans = single
    4.6414e-07

James Tursa le 6 Août 2022

Modifié(e) : James Tursa le 6 Août 2022

I can only guess, then, that the underlying BLAS library algorithms have changed. The timings and B1-B2 differences confirm that two different BLAS subroutines are called as has been the case historically. I don't have the latest version installed to investigate this further.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Matt J le 12 Déc 2019

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/496358-single-precision-matrix-multiplication#answer_406156

Modifié(e) : Matt J le 12 Déc 2019

Ouvrir dans MATLAB Online

The Matlab interpreter runs a special dedicated "ctranspose-multiply" function to evaluate expressions of the form,

P'*Q

It does not do any explicit transposition of P, thus saving time and memory. This special function likely does a pre-check for the special case when P and Q are the same variable. In this case, it knows that it only has to compute the lower (or upper) triangular part of the resulting matrix, and can just copy the conjugate of that to the upper(lower) triangle. Thus, the output of A'*A will always be perfectly Hermitian.

However, for the expression,

single(P')*single(Q)

no special functions are used. Every operation in this expression (single,ctranpose,mtimes) is done separately and explicitly, and can generate floating point errors that may not be Hermitian across the final matrix.

2 commentaires
Afficher AucuneMasquer Aucune

James Tursa le 12 Déc 2019

Modifié(e) : James Tursa le 12 Déc 2019

Yes, but "A'*A will always be perfectly symmetric" should instead read "A'*A will always be perfectly Hermitian". And the triangular copy is a conjugate copy, not a straight copy.

Matt J le 12 Déc 2019

Right, you are! Fixed.

Connectez-vous pour commenter.

Answer 2

James Tursa le 12 Déc 2019

3
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/496358-single-precision-matrix-multiplication#answer_406160

Modifié(e) : James Tursa le 12 Déc 2019

Ouvrir dans MATLAB Online

To illustrate what Matt is saying, a simple timing test:

>> format longg
>> S = round(10000*single(rand(5000)+rand(5000)*1i));
>> St = S';
>> tic;SaS=S'*S;toc
Elapsed time is 1.675010 seconds.
>> tic;StS=St*S;toc
Elapsed time is 2.942037 seconds.
>> isequal(SaS,StS)
ans =
  logical
   0
>> isequal(SaS',SaS)
ans =
  logical
   1
>> isequal(StS',StS)
ans =
  logical
   0

The difference in timing is because the S'*S method only does about half the multiplication work with some added conjugate copying. And since the off-diagonal elements are the result of a copy, you get an exact Hermitian result for this case. Not so for the generic multiply case ... more on that below.

Another thing to note is that even though the individual elements are made of up integers, the intermediate sums overflow the precision of a single precision variable. Hence the difference in the results depending on the order of calculations even though the result is also made up of exact integers. E.g.,

>> SaS(1,1)
ans =
  single
    3.353673e+11
>> StS(1,1)
ans =
  single
  3.353673e+11 +         3408i
>> dot(S(:,1),S(:,1))
ans =
  single
    3.353671e+11
>> S(:,1)'*S(:,1)
ans =
  single
    3.353672e+11

Four different answers for the (1,1) spot depending on how we do the calculation.

Now lower the integer values so the intermediate sums do not overflow the precision of a single precision variable:

>> S = round(10*single(rand(5000)+rand(5000)*1i));
>> St = S';
>> tic;SaS=S'*S;toc
Elapsed time is 1.807634 seconds.
>> tic;StS=St*S;toc
Elapsed time is 2.977281 seconds.
>> isequal(SaS,StS)
ans =
  logical
   1
>> SaS(1,1)
ans =
  single
      329958
>> StS(1,1)
ans =
  single
      329958
>> dot(S(:,1),S(:,1))
ans =
  single
      329958
>> S(:,1)'*S(:,1)
ans =
  single
      329958

Here we get exactly the same results for the four different methods because the sums didn't overflow the precision of single precision.

7 commentaires
Afficher 5 commentaires plus anciensMasquer 5 commentaires plus anciens

Ryan Dreifuerst le 5 Août 2022

This is strange and more than a little troublesome, especially since Matlab is so casual about data types, yet does not implement any kind of increased precision for single-single floating point multiplication.

I just spent 2 days debugging some code that had large enough "rounding errors" from doing single precision hermitian inner products that caused our final results to be incorrect by ~3x. Only ended up realizing it was a precision error (not sure if you can necessarily call it that considering its specific to how Matlab handles single precision multiplication, this did not occur in Python using single precision values results in about 10% error vs double) because the resulting matrix had slightly complex values on the diagonal elements. And one would normally expect the most efficient implementation would be single precision, yet that is clearly not the case. Why would a similar efficient implementation not be included for single precision? These little 'gotcha' moments really add an unnecessary complexity to Matlab in my opinion.

Ryan Dreifuerst le 5 Août 2022

Modifié(e) : Ryan Dreifuerst le 5 Août 2022

@Bruno Luong You are definitely right about the ill conditioned inverse, though the problem will always be such and we do have a number of different methods for handling it via psuedo-inverse, matrix inversion lemma, or factorization. As you said, there is a whole slew of mathematics on such topics. The error is actually still quite substantial via the psuedo-inverse though, which is normally not the case but that is not really the point.

My complaint instead stems from how Matlab handles data type changes quietly, yet one of the most universal data type changes (increased precision for multiplies and divides before reducing precision) is not done. This is visible when comparing other programming languages on the same data. Furthermore, that single precision hermitian inner products are slower than double precision ones means there is no (ok, extremely extremely rarely? a) point in ever taking a single precision hermitian inner product. Instead, either warnings suggesting to switch or just blatant type casting should be done under the hood like it already does in many other situations.

James Tursa le 5 Août 2022

Modifié(e) : James Tursa le 5 Août 2022

@Ryan Dreifuerst MATLAB calls a 3rd party BLAS library to do matrix multiplication. That BLAS library optimizes for speed via multithreading and smart cache usage. If MATLAB knows that there is symmetry involved (e.g. because it can detect when operands are the same variable), it will call special symmetric BLAS functions which are faster and guarantee Hermitian results. Otherwise it will call generic functions which will not guarantee Hermitian results. There are separate single precision and double precision versions of the matrix multiply routines in the BLAS library.

Bottom line is whatever intermediate data type changes you would like done to increase accuracy would have to be done inside the BLAS library because that is where all the intermediate calculations are done. I have no insight into what criteria Mathworks uses to choose their BLAS/LAPACK library vendors, but I am guessing that some compromise of speed and $$$ was at the top of the list. When comparing to other programming languages you have to ask yourself how is matrix arithmetic done in that language? If it is through a BLAS library then there is the same potential issue and you are at the mercy of whatever BLAS library that language uses and what criteria the writers of that library used.

Q: Can you post simple example code that shows single precision Hermitian inner products are slower than the double precision version? I wouldn't expect this to be the case for simple BLAS calls.

James Tursa le 5 Août 2022

Modifié(e) : James Tursa le 5 Août 2022

If speed was not an issue, you could always write your own C mex routine to do the matrix multiply as a series of dot products using the BLAS dsdot( ) routine, which computes the dot product of single precision inputs using a double precision accumulator. This would be very easy to code and could even be multi-threaded, but would not run as fast as the single precision matrix multiply routines.

Walter Roberson le 5 Août 2022

Ouvrir dans MATLAB Online

Furthermore, that single precision hermitian inner products are slower than

double precision ones means there is no (ok, extremely extremely rarely?

a) point in ever taking a single precision hermitian inner product.

S = round(10*single(rand(5000)+rand(5000)*1i));
St = S';
tic;SaS=S'*S;toc
Elapsed time is 1.596208 seconds.
tic;StS=St*S;toc
Elapsed time is 3.040935 seconds.
dS = double(S);
dSt = double(St);
tic; dSaS = dS'*dS;toc
Elapsed time is 3.155149 seconds.
tic; dStS = dSt*dS;toc
Elapsed time is 5.954180 seconds.

Double precision looks slower ?

Connectez-vous pour commenter.

Answer 3

Donald Liu le 12 Déc 2019

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/496358-single-precision-matrix-multiplication#answer_406161

After reading Matt's answer, I realized the following:

In single(A), we're converting element a + j * b to single precision a1 + j * b1.

In single(A'), the same element becomes a + j * (-b) and it is converted to single precision yielding a2 + j * (-b2).

Due to rounding differences of positive and negative numbers, b1 and b2 may not be equal. Thus single(A') * single(A) is not strictly Hermitian.

However, even single(A)' * single(A) is not strictly Hermitian, which is still puzzling.

Thanks!

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

James Tursa le 12 Déc 2019

You do the rounding before you do the multiply, so all the downstream calculations start with exactly the same values. The rounding you did has no effect on this. If everything starts with integral values, it won't matter what order you do the calculations AS LONG AS you don't overflow the precision of the variable type you are using. But in your case you did overflow this, hence the differences.

Matt J le 12 Déc 2019

Modifié(e) : Matt J le 12 Déc 2019

Ouvrir dans MATLAB Online

However, even single(A)' * single(A) is not strictly Hermitian, which is still puzzling.

Note that the first single(A) in the expression will not occupy the same memory address as the second single(A). Therefore, Matlab's ctranpose-multiply routine might not be recognizing them as the same matrix. Example:

>> format debug; A=rand(3); B=single(A), C=single(A),
B =
  3×3 single matrix
Structure address = 13f3b4f00                 %<-----Memory Address
m = 3
n = 3
pr = 12ebdd7e0
    0.7265    0.2352    0.7033
    0.6667    0.6863    0.5338
    0.0327    0.2811    0.3319
C =
  3×3 single matrix
Structure address = 13f37b270                %<-----Memory Address
m = 3
n = 3
pr = 12eb5dae0
    0.7265    0.2352    0.7033
    0.6667    0.6863    0.5338
    0.0327    0.2811    0.3319

By contrast, in the following, Q and A do occupy the same memory address

M = 10000;  N = 100;
A = round(10000 * complex(randn(M, N), randn(M, N)));
A = single(A);
Q=A;
B = Q' * A;
C = B' - B;
max(abs(C(:)))
ans =
  single
     0

Connectez-vous pour commenter.

Single precision matrix multiplication

5 commentaires
Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

Réponse acceptée

2 commentaires
Afficher AucuneMasquer Aucune

Plus de réponses (2)

7 commentaires
Afficher 5 commentaires plus anciensMasquer 5 commentaires plus anciens

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

Single precision matrix multiplication

5 commentaires Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

Réponse acceptée

2 commentaires Afficher AucuneMasquer Aucune

Plus de réponses (2)

7 commentaires Afficher 5 commentaires plus anciensMasquer 5 commentaires plus anciens

4 commentaires Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

5 commentaires
Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune

7 commentaires
Afficher 5 commentaires plus anciensMasquer 5 commentaires plus anciens

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens