64 bit binary number representation in matlab without rounding off

I have a 64 bits binary number having 1's and 0's and i need this number in all 64 bits for further processing. However, i am getting this number in exponential form after rounding i.e. 1.01010101010101e+41. Can anyone please tell me how to get full 64 bits without losing precision ? Thanks in advance.

 Réponse acceptée

Guillaume
Guillaume le 20 Juil 2019
If matlab shows the number is 1.01010101010101e+41 then you do not have a binary number. You have a decimal integer number consisting of just the digits 0 and 1. Additionally, that decimal number is stored as a 64-bit double.
Any integer number greater than flintmax (= 9007199254740992), so anything with 16 digits or more may not be stored exactly as a double. The greater the number, the less likely you can store it exactly. e.g 100000000000000010 (17 digits) is stored as 100000000000000016. There's nothing that can be done about that, that's the way double numbers work.
You could store your decimal representation as a 64-bit integer instead (uint64) to be able to store a few more digits but that won't get you far. The maximum decimal number consisting of 0s and 1s that you can store as uint64 is 11111111111111111111 (20 digits). Anything above that cannot be stored.
So, the first thing you need to realise is that you cannot store a number consisting of 64 0s and 1s as a decimal number, double or uint64 (unless you defer to the symbolic toolbox but that would be slow).
You could store it as a char vector, as suggested by Bruno.
In my opinion, the simplest way to store a 64-bit number is to store it in its decimal representation as a uint64 (i.e. store the binary 1111111111111111111111111111111111111111111111111111111111111111b as uint64(18446744073709551615)

21 commentaires

Thanks for your answer. However, the problem with this strategy is explained with the help of an example. Suppose i have a decimal number, num=-108.308 which I convert to binary using b = reshape(flip(dec2bin(typecast(num,'uint32'),32)),1,[]) and then converted to uint64 using num = typecast(flip(uint32(bin2dec(reshape(b,32,2)'))),'uint64'). Now, how am i going to decode this uint64 number to the original number, num=-108.308?
The conversion to uint64 can be achieved more simply with:
numdouble = -108.308
numuint64 = typecast(numdouble, 'uint64')
No need to go through a binary representation as characters.
From that you can see that decoding it back to double is:
orignumber = typecast(numuint64, 'double')
Nits
Nits le 20 Juil 2019
Modifié(e) : Nits le 20 Juil 2019
Thank you so much for your answer. However, the uint64 I am getting from >>numuint64 = typecast(numdouble, 'uint64') is coming from buffer in simulink which will be in the form of vector, bufferout [1;3;8;6;0;6;9;3;9;5;1;7;3;6;0;4;0;1;2;9]. How am i going to convert this vector into the form which can be fed into >>orignumber = typecast(numuint64, 'double') to get back original number,-108.308
What datatype is the vector bufferout? At the moment it looks plausible that it is double but that the maximum value is 9?
It’s going to be a 1 dimension vector of 20 bits.
That does not tell me class() of the data.
It is not obvious to me how to connect [1;3;8;6;0;6;9;3;9;5;1;7;3;6;0;4;0;1;2;9] to the idea that this is a vector of bit values.
Bruno Luong
Bruno Luong le 21 Juil 2019
Modifié(e) : Bruno Luong le 21 Juil 2019
  • bufferout [1;3;8;6;0;6;9;3;9;5;1;7;3;6;0;4;0;1;2;9].
  • original number,-108.308
  • 64 bits binary number having 1's and 0's
  • number in exponential form after rounding i.e. 1.01010101010101e+41
Thes are picked from your various posts: Waoh I don't kow if someboby can guess the relation of those objects and understand what you are trying to do. I certainly don't.
Nits
Nits le 21 Juil 2019
Modifié(e) : Nits le 21 Juil 2019
sorry about that. The output from buffer is going to be a vector of class uint64. I am trying to concatenate all 20 bits from buffer into one number and then decode it into decimal number Hope this answers your question.
20 bits from one number, but the number is expected to decode to a double precision value that normally takes 64 bits to represent ?? Or did you take the first 20 bits of a 32 bit single precision number?
Yes, the uint64 class 20 bits are expected to decode to a double.
The first bit of an ieee 754 double is the sign bit. The 11 bits after that are the exponent. The 52 after that are the mantissa. Could you confirm that you are keeping the sign bit and the 11 exponent bits, and the first 8 mantissa bits, and discarding everything after that? So a full range up to 1e308 but only a precision of 1/512?
I agree with others, it's very confusing, particularly as you appear to mix up the terms bits, digits and elements
bufferout = [1;3;8;6;0;6;9;3;9;5;1;7;3;6;0;4;0;1;2;9]
is a vector with 20 elements. These elements appear to be single decimal digits (i.e. from 0 to 9), which is unusual. They're certainly not bits since bits can only have two different values. Depending on the class of bufferout, the number of bits used by each element will be 8 (if class is uint8 or int8), 16 (if class is uint16 or int16), 32 (if class is uint32 or int32 or single) or 64 (if class is uint64, int64 or double). However, if the numbers are indeed decimal digits how many bits they occupy is irrelevant.
It is unclear what these digits represent and how they could be converted to a 64 bit double number.
edited for typos
Maybe each vector entry encodes 4 consecutive bits? But that would give 80 bits. Is it somehow a dump of an Intel 80 bit extended precision floating point register??
Perhaps, but then it's a bit of bad luck that the demo vector does not have any number from 10 to 15.
But if that's the case, then another question is what is expected to be done with that number since matlab doesn't support long double. Round to nearest double?
mmap = dec2bin(0:15,4);
typecast(uint8(bin2dec(reshape(mmap(bufferout(1:16)+1,:).',8,[]).')),'double')
ans =
2.26694861213593e-288
At the very least we have a bit order incompatibility.
Guillaume
Guillaume le 22 Juil 2019
Modifié(e) : Guillaume le 22 Juil 2019
If the numbers indeed represent a 80-bit long double, I don't think you can just take the first 64 bits and interpret that as a double. The two representations are significantly different (in particular long double has the explicit integer part that double doesn't store).
Ah, you are correct, the representation is different and would need more work to convert.
Nits
Nits le 24 Juil 2019
Modifié(e) : Nits le 24 Juil 2019
Finally, This is how I end up doing it.
A=-108.308;
numuint64 = typecast(A, 'uint64')
B=[1;3;8;6;0;6;9;3;9;5;1;7;3;6;0;4;0;1;2;9];
B=B';
validateattributes(B(1,1:10), {'numeric'}, {'integer', 'nonnegative', '<', 10});
y1 = strrep(num2str(polyval(B(1,1:10), 10)),'0.', '.')
validateattributes(B(1,11:20), {'numeric'}, {'integer', 'nonnegative', '<', 10});
y2 = strrep(num2str(polyval(B(1,11:20), 10)),'0.', '.')
numuint=floor(vpa(str2num(y1)*10^10,20)+vpa(str2num(y2),20))
orignumber = typecast(uint64(numuint), 'double')
Thanks everyone.
Please have a look at my next question regarding simulink on the link: https://www.mathworks.com/matlabcentral/answers/473131-simulink-code-generation
No need for vpa
B=[1;3;8;6;0;6;9;3;9;5;1;7;3;6;0;4;0;1;2;9];
C=uint64(0);
for i=1:length(B)
C=10*C+B(i);
end
typecast(C, 'double')
Wow! It certainly wasn't obvious that the input was simply the digits of a uint64 integer.
Another way of obtaining the result:
B=[1;3;8;6;0;6;9;3;9;5;1;7;3;6;0;4;0;1;2;9];
C = sum(10.^uint64(numel(B)-1:-1:0) .* uint64(B'), 'native');
N = typecast(C, 'double')
"Wow! It certainly wasn't obvious that the input was simply the digits of a uint64 integer."
Actually I did suspect that since 10^20 covers just the maximum of uint64 numbers.
But to me the real surprise is casting uint64 in double!!! What's purpose (quick and dirtly crypting)? And what's the rounding issue he mentioned in the first post?

Connectez-vous pour commenter.

Plus de réponses (3)

KALYAN ACHARJYA
KALYAN ACHARJYA le 20 Juil 2019
Modifié(e) : KALYAN ACHARJYA le 20 Juil 2019
num=1.01010101010101e+41;
format long g; % Put the format, its just change the display pattern
bin_num=dec2bin(num);
Result:
bin_num =
'10010100011010111100011011101001001110100101101001011000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
Please do the change as per your requirements

2 commentaires

Note that as per its documentation, dec2bin cannot convert accurately any number greater than flintmax (because it uses double representation internally), so it's not suitable for 64-bit integer. It fails miserably for e.g.
>> dec2bin(intmax('uint64'))
ans =
'10000000000000000000000000000000000000000000000000000000000000000'
when the answer should be
ans =
'1111111111111111111111111111111111111111111111111111111111111111'
Agree. Thanks.

Connectez-vous pour commenter.

Bruno Luong
Bruno Luong le 20 Juil 2019
Modifié(e) : Bruno Luong le 20 Juil 2019
Assuming you have your binary in string of length 64 such as
b=repmat('0',1,64);
b([1 end])='1',
You can convert to uint64 integer class by
num = typecast(flip(uint32(bin2dec(reshape(b,32,2)'))),'uint64')
If you binary string has less than 64 chars you might need first to pad it with '0' in the head before casting
b = [repmat('0',1,64-length(b)) b]

2 commentaires

for num of class 'uint64' conversion to binary can be done like this
b = reshape(flip(dec2bin(typecast(num,'uint32'),32)),1,[])
This avoid the issue of inaccuracy when dec2bin applies directly on num > flintmax.
Thanks for your answer. However, the problem with this strategy is explained with the help of an example. Suppose i have a decimal number, num=-108.308 which I convert to binary using b = reshape(flip(dec2bin(typecast(num,'uint32'),32)),1,[]) and then converted to uint64 using num = typecast(flip(uint32(bin2dec(reshape(b,32,2)'))),'uint64'). Now, how am i going to decode this uint64 number to the original number, num=-108.308?

Connectez-vous pour commenter.

John D'Errico
John D'Errico le 21 Juil 2019
Modifié(e) : John D'Errico le 21 Juil 2019
Maybe you are looking for something like this. I wrote a little utility recently that fully extracts the binary form of any numeric class.
num = 108.308
num =
108.308
>> B = num2bin(num)
B =
struct with fields:
Class: 'double'
Sign: 1
Exponent: 6
Mantissa: '11011000100111011011001000101101000011100101011000001'
BinaryExpansion: [6 5 3 2 -2 -5 -6 -7 -9 -10 -12 -13 -16 -20 -22 -23 -25 -30 -31 -32 -35 -37 -39 -40 -46]
BiSci: '1.1011000100111011011001000101101000011100101011000001 B6'
BiCimal: '1101100.0100111011011001000101101000011100101011000001'
>> sum(2.^B.BinaryExpansion)
ans =
108.308
>> B.Sign*sum(2.^B.BinaryExpansion) == num
ans =
logical
1
So B contains the bits that are in num, and shows the binary form of num in several ways, thus in a scientific binary form, and a binary form with a decimal point, and as a list of powers of 2, that can then be summed to recover num.
It thus allows you to easily recover the original number in num, and do so in a way that should be exact. It even understands what denormal numbers are.
I'll probably play with the names of the fields before I post it on the file exchange, but I've attached it to this answer.

Catégories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by