A single precision matrix multiplication problem

scripts:
a=[-0.1708981,-0.1415759,-0.0727073,-0.0554292;
0.0000000,-0.1715775,-0.1147511,-0.1098115;
0.0000000,0.0000000,-0.4013846,-0.3982548;
0.0000000,0.0000000,0.0000000,-0.0430463];
a=single(a);
b=[16557086;2829575;3241423;7206131];
b=single(b);
y=a*b;
z=zeros(4,1);
z=single(z);
for ii=1:1:4
for jj=1:1:4
z(ii)=a(ii,jj)*b(jj)+z(ii);
end
end
c=y-z;
plot(c) %=> It's not all 0
The algorithm says y and z should be the same,But c is not all 0,why is this?
Even multiplying individual rows and columns makes a difference.

9 commentaires

Paul
Paul le 18 Mai 2023
Modifié(e) : Paul le 18 Mai 2023
Running here on Answers the result, c, is all zeros. I'm not sure if I should be surprised by that result or not as it suggests that, for these inputs, the loop is doing compuations in the exact same order as mtimes.
a=[-0.1708981,-0.1415759,-0.0727073,-0.0554292;
0.0000000,-0.1715775,-0.1147511,-0.1098115;
0.0000000,0.0000000,-0.4013846,-0.3982548;
0.0000000,0.0000000,0.0000000,-0.0430463];
a=single(a);
b=[16557086;2829575;3241423;7206131];
b=single(b);
y=a*b;
z=zeros(4,1);
z=single(z);
for ii=1:1:4
for jj=1:1:4
z(ii)=a(ii,jj)*b(jj)+z(ii);
end
end
c=y-z;
max(abs(c))
ans = single 0
plot(c) %=> It's not all 0
That's strange,I ran it in matlab 2019a and the result was this:
clear all
clc
a=[-0.1708981,-0.1415759,-0.0727073,-0.0554292;
0.0000000,-0.1715775,-0.1147511,-0.1098115;
0.0000000,0.0000000,-0.4013846,-0.3982548;
0.0000000,0.0000000,0.0000000,-0.0430463];
a=single(a);
b=[16557086;2829575;3241423;7206131];
b=single(b);
y=a*b;
z=zeros(4,1);
z=single(z);
for ii=1:1:4
for jj=1:1:4
z(ii)=a(ii,jj)*b(jj)+z(ii);
end
end
c=y-z;
max(abs(c))
plot(c)
But the run here is the same as yours.
Now it's even more confusing.
Dyuman Joshi
Dyuman Joshi le 18 Mai 2023
Modifié(e) : Dyuman Joshi le 18 Mai 2023
I ran it in my laptop with R2021a and it produced the same results as the Live editor is producing here.
Edit - I don't think there has been any updates regarding the functions used in this code from 2019 onwards.
Paul
Paul le 18 Mai 2023
The screen cap indicates you used R2021a Update 5
Dyuman Joshi
Dyuman Joshi le 18 Mai 2023
Modifié(e) : Dyuman Joshi le 18 Mai 2023
My bad, I used the command "version" to show the version of my MALTAB, but typed out 2019. It seems my mind still confuses 2021 with 2019 :/
I will edit my comment above, Thanks for pointing it out.
Paul
Paul le 18 Mai 2023
Can you comment on the results in this thread where Answers 2023a and 2021a U5 have mtimes yield the same result as the loop, but that is not the case with 2019a. I didn't see any changes to mtimes in the release notes since 2019a. Is it possible mtimes works differently (order of computations) on different hardware?
If I recall correctly we upgraded some of the libraries used to perform matrix multiplication during one or more of the releases between release R2019a and release R2021a to fix bugs, improve performance, and/or improve accuracy.
Paul
Paul le 19 Mai 2023
I checked the Release notes from 2019a onwards and didn't see any mention of changes to mtimes, which is suprising if such a heavy-use function actually had a behavior change. I didn't check all of the bug fix pages, but I'd be surprised to find it there, at least relative to this issue.
James Tursa
James Tursa le 19 Mai 2023
Modifié(e) : James Tursa le 19 Mai 2023
"... Is it possible mtimes works differently (order of computations) on different hardware? ..."
Yes. The function mtimes( ) for full double and single matrices is performed in the background by a multi-threaded BLAS library. Anytime that library changes you should expect there might be small differences in the outputs. Different MATLAB versions might use different libraries, and different machines/hardware might use different libraries or call different routines in the libraries (e.g., symmetric vs generic), or maybe the same library is used but with a different number of threads available. All of this can affect order of computations. MATLAB used to get their BLAS & LAPACK libraries from a 3rd party. I haven't checked lately but I presume they still do. Bottom line is you shouldn't expect floating point calculations to necessarily match exactly when comparing different MATLAB environments.

Connectez-vous pour commenter.

 Réponse acceptée

John D'Errico
John D'Errico le 18 Mai 2023
Modifié(e) : John D'Errico le 18 Mai 2023
Why are you confused? NEVER trust the least significant bits of a floating point number.
The numbers in a are singles. So the least significant bits in a are of magnitude
eps('single')
ans = single 1.1921e-07
Now you add and subtract them, multiplying by numbers that are on the order of 1e7. Do you appreciate that when MATLAB does the computations you wrote, there is no presumption that it will add and subtract them in EXACTLY the same sequence as the explicit for loops you wrote? And that means again, NEVER TRUST THE LEAST SIGNIFICANT BITS OF A FLOATING POINT RESULT.
The difference is now on the order of 1, actually, 0.25. WHAT A SURPRISE! NOT.
Floating point arithmetic is NOT exact mathematically correct arithmetic. In fact, it is just an approximation to exact mathematically correct arithmetic. A very good approximation in most cases. (Slightly less good for singles than doubles, but the difference is irrelevant in this respect.) As an approximation, there are always tiny errors made. And then when you magnify those errors, you can find them, IF you look very closely.
So, just for kicks, suppose I asked you to compute the result
2/3 - 1/3 - 1/3
BUT, you must use 4 digit decimal arithmetic, first approximating each number as a 4 digit decimal number. You would now get:
0.6667 - 0.3333 - 0.3333 = 0.0001
So even though the answer is mathematically zero, when we use a finite number of digits in any form, the result need not be zero.
Does order make a difference?
x = single([0.1 0.2 0.3]);
x(3) - x(1) - x(2)
ans = single 1.4901e-08
-(x(1) + x(2)) + x(3)
ans = single 0
Again, NEVER TRUST THE LEAST SIGNIFICANT BITS OF A RESULT. In both cases, the answer is effectively zero, to within a tolerance of eps('single').
The answer is to learn to use tolerances. Learn to not trust those least significant bits. Never test for exact equality of floating point numbers, at least not until you know enough about them that you fully understand when you can.

9 commentaires

jian
jian le 18 Mai 2023
You're right.The last few digits of a floating-point number are inaccurate.
But in my case,The input values are the same for both methods,and they should be calculating it the same way,direct multiplication of matrices should be the same as multiplication and addition.Therefore, their input data and calculation methods are consistent,the calculations should be the same,even for floating-point numbers.This is my personal understanding
This is reflected in several other figures,only the first digit is not equal to 0,and everything else is going to be 0.So I want to figure out why the first digit doesn't equal 0.This is where I get confused.
Thanks
DID YOU READ MY ANSWER?
x = single([0.1 0.2 0.3]);
x(3) - x(1) - x(2)
ans = single 1.4901e-08
-(x(1) + x(2)) + x(3)
ans = single 0
Are those computations identically the same, in terms of mathematics?
Do you understand that just because you write y=a*b, that the BLAS used to compute the matrix*vector product need not perform those additions and multiplications in exactly the same sequnce as a set of nested for loops? If you understand that, then go back and look at the simple example I just ran. Both results should be identically zero, AND identically the same, right?
I'm sorry, but your personal understanding is simply wrong.
The input values are the same for both methods,
This is true.
and they should be calculating it the same way,direct multiplication of matrices should be the same as multiplication and addition.
Where specifically in the documentation do we state that matrix multiplication performs the same operations in the same order as the naive (in the sense of the fourth definition in the Adjective part of the English definition on Wikitionary, "Intuitive; designed to follow the way ordinary people approach a problem.") for loop implementation?
If you can find a place in the documentation where we state that, please let me know so I can have that section of the documentation corrected.
One minor suggestion about the code you posted:
z=zeros(4,1);
z=single(z);
This will allocate a double precision array (for the first line) then immediately convert it to single precision and throw away the memory allocated for the double precision array. For a vector this small that's not a big deal, but if you want to create a larger single precision array a more efficient approach is to tell zeros that you want a single precision array from the start, so it allocate that single precision array without creating a double precision array first.
z2 = zeros(4, 1, 'single');
isequal(z, z2)
ans = logical
1
jian
jian le 18 Mai 2023
No, I'm not sure that multiplying matrices directly is done with a for loop, but I just want to show that when the input data is consistent and the calculation method is consistent, the result should be the same.
So the reason why this problem arises is because matrix multiplication in matlab is not simply multiplication and addition by definition?
In addition, thank you for your advice, I will improve later.
when the input data is consistent and the calculation method is consistent, the result should be the same.
When the input data is the same and the calculation method is consistent, the result should be consistent.
If you were making the same chocolate chip cookie recipe as your friend you're trying to achieve the same results. But your friend may have more experience making these cookies and know a more streamlined approach for making them that lets them start baking their cookies a minute or two earlier than you. So their cookies may be very slightly different from yours. Does that make either cookie wrong? Not necessarily.
So the reason why this problem arises is because matrix multiplication in matlab is not simply multiplication and addition by definition?
The implementation of matrix multiplication in MATLAB is not the naive for loop approach. So if you're expecting exact, down-to-the-last-bit equality your expectation may not be satisfied.
jian
jian le 19 Mai 2023
Ok, thank you for your reply. I know the answer to the question.
But I still have a question. Why is the result calculated by my roommate's matlab2020 different from mine by matlab2019?This is the same question that Mr. Paul raised above.
Matrix multiplication calls upon libraries that take into account hardware cache line sizes and cache sizes. Computations that fit within the fastest cache can be considerably more efficient, so the code prioritizes efficiency. That can involve performing operations in a different order than you might otherwise expect, such as dividing summations into blocks and totaling the partial sums.
There is no reason to assume that different versions of MATLAB MUST use exactly the same BLAS version. Matrix multiplications are performed by the BLAS, which stands for Basic Linear Algebra Subroutines as I recall. MATLAB farms that out, to highly optimized code in the BLAS.
The end result is that subtle things can influence the EXACT order orf arithmetic operations, but as I have shown several times, the EXACT order of floating point adds and subtracts may cause tiny differences in the result. That is something you need to learn to not worry about. NEVER test for exact equality of floating point numbers, because the way they were generated can easily be influenced by things not under your control.
Again, remember that floating point arithmetic is only an approximation to real arithmetic. Think of it as a model. But as a model, it can suffer from tiny flaws. In the case of floating point arithmetic, the flaws are in the least significant bits. That is something you need to live with, unless you are willing to operate in infinite precision. And infinite precision will take too long to use for any computation.
Even in the case of high precision tools, like the HPF code I have posted on the file exchange, it is still floating point arithmetic. So there are still flaws in the least significant bits, but now the depth of those flaws drops down by many orders of magnitude. You cannot get around that.
Note by the way that the high performance libraries may be different depending on the processor line, such as Intel vs AMD.

Connectez-vous pour commenter.

Plus de réponses (1)

James Tursa
James Tursa le 19 Mai 2023
Modifié(e) : James Tursa le 22 Mai 2023
Somewhat related side comments:
In general, the BLAS/LAPACK libraries do not contain mixed type argument routines. E.g., there are no BLAS matrix multiply routines to multiply a single precision matrix with a double precision matrix. To accomplish this with BLAS routines one must first make a temporary copy of the single matrix as a double matrix and then call the appropriate BLAS routine. This of course will chew up extra time and memory. The only exceptions to this are some accumulation routines (e.g., dot product of single precision inputs but using a double precision accumulator). That being said, I don't know if MATLAB ever makes use of these extra precision accumulator routines that are available.
Related to that, there are also no mixed real/complex argument routines in the BLAS/LAPACK libraries. Either all arguments have to be real or all arguments have to be complex. And all of the complex variables are assumed to be interleaved memory model (this was true even back in the R2017b and earlier days).
MATLAB R2017b and prior used a separate complex memory model, meaning the real part of a variable was held in contiguous memory that was separate from the imaginary part which was also held in contiguous memory. For MATLAB R2018a and later this changed to an interleaved complex memory model, where the real part is interleaved with the imaginary part in memory. This now aligns better with the BLAS/LAPACK libraries and how complex variables are typically held in memory in other languages such as Fortran and C/C++, but has implications for how the BLAS/LAPACK routines can be used in the background. A couple of examples:
(Real Matrix) * (Complex Matrix)
R2017b and earlier: This can be accomplished with a series of Real BLAS routine calls on the individual real and imaginary parts. No need for temporary copies.
R2018a and later: This requires that a temporary copy of the real matrix be made as an interleaved complex matrix with imaginary part 0 so that the Complex BLAS routine can be used. Takes extra time and memory compared to R2017b and earlier, and also lots of unnecessary 0 multiplies will be done (which might give some NaN results that don't appear in R2017b if there are inf's involved).
inv(Complex Matrix)
R2017b and earlier: This involves calling complex LAPACK routines, so the separate real and imaginary parts of the variable must first be copied into an equivalent interleaved variable. Then the appropriate LAPACK routines are called. Then the interleaved result must be copied into a separate memory model variable. Takes extra time and memory compared to R2018b and later.
R2018b and later: This can be accomplished with direct complex LAPACK routine calls. No need for temporary copies.
Symmetrix Matrix Results:
A = rand(5000);
AT = A'; % Explicit transpose constructed in memory
tic; x = A' * A; toc % Symmetrix matrix multiply routine called, A' not explicitly formed
Elapsed time is 0.737080 seconds.
tic; y = AT * A; toc % Generic matrix multiply routine called using explicitly formed A'
Elapsed time is 1.300470 seconds.
isequal(x,y)
ans = logical
1
isequal(x,x') % Strictly symmetric result
ans = logical
1
isequal(y,y') % Might not be strictly symmetric result
ans = logical
1
So, with this last example, you can influence which BLAS routines are called in the background by how you write your code. Care must be taken to maintain strict symmetry of results. In the A' * A example, MATLAB can recognize the symmetry of the inputs and call a symmetric routine in the background without explicitly forming the transpose A'. Takes less time and guaranteed results in strict symmetric result. In the AT * A example, the transpose A' is explicitly formed but MATLAB has no clue that AT is the exact transpose of A, so a generic matrix multiply routine is called which takes longer and doesn't guarantee a strict symmetric result. In that last case we happened to get strict symmetry, but this will highly depend on your MATLAB/BLAS version and can't be guaranteed.

7 commentaires

Does Matlab identify strict symmetry for temporaray variables? That is if I have
C = B' * A' * A * B
I don't think that Matlab would do the A' * A first using the symmetric routine (I recall seeing an mlint message aobut that. If, instead, I do
C = (A*B)' * (A*B)
does Matlab recognize the symmetry and use the symmetric routine?
For this:
C = B' * A' * A * B
MATLAB will do this left to right, so ...
C = ((B' * A') * A) * B % all generic routines called
For this:
C = (A*B)' * (A*B) % all generic routines called?
Unless the parser is smarter than it used to be, the first A*B will be calculated, and the second A*B will separately be calculated. Since these are physically two different matrices, MATLAB does not recognize the symmetry and generic routines will be called for the middle multiply.
The correct way to do this calculation to guarantee MATLAB will recognize and use the symmetry is:
AB = A*B; % generic routine called
C = AB' * AB; % symmetric routine called, C guaranteed symmetric
This leaves AB in your workspace, of course, which is not the case with the temporary A*B copies above. That's the price you pay to make sure the symmetric routine is called. But it's a very small price given that you can always clear AB right afterwards if you don't want it hanging around.
Now a test of online MATLAB:
A = rand(5000);
B = rand(5000);
tic; AB = A*B; C = AB' * AB; toc
Elapsed time is 2.475560 seconds.
tic; D = (A*B)' * (A*B); toc
Elapsed time is 4.003737 seconds.
isequal(C,D)
ans = logical
1
isequal(C',C)
ans = logical
1
isequal(D',D)
ans = logical
1
So based on the timings it looks like the online parser isn't yet smart enough to recognize the symmetry in the (A*B)' * (A*B) case, and generic matrix multiply routines are called in the background.
When I run this code on my desktop PC version R2021a I get these results:
Elapsed time is 2.058999 seconds.
Elapsed time is 3.581072 seconds.
ans =
logical
0
ans =
logical
1
ans =
logical
1
So C and D are both symmetric, but they are not equal and you get two different results with this MATLAB/BLAS version. Which should not be surprising given that different background routines are called. When you compare the difference:
max(abs((C(:)-D(:))./C(:)))
ans =
3.6221e-16
You can see that the relative difference is down in the least significant bits. Both answers are "correct" in that sense. But you can save execution time and temporary memory and guarantee symmetric results if you take care in how you write your code.
Do you know if there is any effiiency to be gained by assigning to C and then overwriting, instead of creating the temporary AB (assuming a case where A*B is square so C doesn't have to change size)?
%AB = A*B; % generic routine called
%C = AB' * AB; % symmetric routine called, C guaranteed symmetric
C = A*B;
C = C' * C;
James Tursa
James Tursa le 23 Mai 2023
Modifié(e) : James Tursa le 23 Mai 2023
There is probably no way to answer this definitively. It will depend on the MATLAB parser and memory manager as to what exactly happens in the background for cases like this, and those rules may change between MATLAB versions. Knowing when the mxArray structure addresses and data pointers get reused vs reallocated is not something TMW publishes. For small sized matrices you might see subtle differences in timings depending on how you code things. But for large sized matrices you will definitely see timings dominated by unnecessary temporary memory copies and/or generic vs symmetric routines.
This does bring up a side discussion about so-called "inplace" operations. If you call a function from within another function using R = myfunction(R,etc.), and that function has a signature like R = myfunction(R,etc.), then MATLAB can sometimes do inplace operations on the variable R (writing into current data pointer) rather than reallocating new memory and freeing old memory.
Do you know if there is a symmetric multiplication BLAS routine that does the computations in place that Matlab could potentially take advantage of? Something like
subroutine symmmlt(A) ! returns trans(A)*A in A
James Tursa
James Tursa le 23 Mai 2023
Modifié(e) : James Tursa le 23 Mai 2023
You can look here to get an idea of what the routine signatures are for the BLAS: https://netlib.org/blas/
Note that the symmetric routines work with only the upper or lower triangle. If you want the full result you have to manually write extra code to copy one triangle into the other before returning the result back to MATLAB.
The Level 3 routines do the matrix multiplies, and all of the signatures (generic and symmetric) indicate that the multiplies involving A (and B) accumulate into a separate matrix C. Of course you could pass the pointer to A in the C spot, but my guess is that this is undefined behavior since the routines are likely allowed to accumulate into C in multiple steps. I.e., A would be overwritten in the middle of the matrix multiply algorithms, invalidating the result. It wouldn't be too hard to code up a mex routine to see what the MATLAB BLAS does in this situation.
James Tursa
James Tursa le 23 Mai 2023
Modifié(e) : James Tursa le 23 Mai 2023
So I went ahead and did this test and got what I expected, an invalid result. With the correct mex code:
>> A = rand(5000);
>> B = rand(5000);
>> C = A*B;
>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,Cmex)
ans =
logical
1
With the incorrect inplace mex code:
>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,A) % comparing to A because that's where inplace result should be
ans =
logical
0
>> max(abs(A(:)-C(:)))
ans =
1.1143e+27
And here is the code, calling with square matrices so that A and C are the same size for the inplace test:
// Does generic matrix multiply using BLAS dgemm( )
// C = A * B
/* Includes ----------------------------------------------------------- */
#include "mex.h"
#include "blas.h"
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
char TRANS = 'N';
mwSignedIndex M, N, K, LDA, LDB, LDC;
double *A, *B, *C;
double ALPHA = 1.0, BETA = 0.0;
if( nlhs > 1 )
mexErrMsgTxt("Too many outputs.");
if( nrhs != 2 || !mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ||
mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ||
mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ||
mxGetNumberOfDimensions(prhs[0]) != 2 || mxGetNumberOfDimensions(prhs[1]) != 2 )
mexErrMsgTxt("Need exactly two full double real 2D matrix inputs.");
M = mxGetM(prhs[0]);
K = mxGetN(prhs[0]);
N = mxGetN(prhs[1]);
LDA = M;
LDB = mxGetM(prhs[1]);
LDC = M;
A = mxGetData(prhs[0]);
B = mxGetData(prhs[1]);
if( LDB != K )
mexErrMsgTxt("Incorrect dimensions for matrix multiplication.");
plhs[0] = mxCreateDoubleMatrix( M, N, mxREAL );
C = mxGetData(plhs[0]);
// dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, C, &LDC); // the correct code
dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, A, &LDC); // the incorrect inplace code
}
Compiled using this helper function:
function compile_blas(varargin)
libdir = 'microsoft';
lib_blas = fullfile(matlabroot,'extern','lib',computer('arch'),libdir,'libmwblas.lib');
mex(varargin{:},lib_blas,'-largeArrayDims');
end

Connectez-vous pour commenter.

Produits

Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by