Question: Binary floating point division, a follow-up

Subject: Binary floating point division
Date: Sat, 24 Jan 1998 17:11:52 +0100
From: Bernd Liebermann

Hi Alexander,

thank you very much for your quick reply to my question.

> It's a known dictum that the output is at best as good as input.
> Assume A and B are exact numbers while a and b are their
> respective approximations: |A - a| < e1 and |B - b| < e2.
> In your case, A = 11.8, B = 5. a and b are their approximations
> in, say, 64 bit binary.

So, e2=0.

> I do not how numbers are represented in
> a 64 bit PC. I would assume that 1/5 (or, probably, 13 bits) are
> occupied by exponent, 1 goes for the sign, 50 remain for mantissa.

Assuming 50 bits for mantissa (I think there are only 49), 11.8 would correspond to binary:

1011 .1100 1100 1100 1100 1100 1100 1100 1100 1100 1100 1100 11

what equals 11.799999999999997 in decimal notation. If what the JS interpreter lets the machine calculate is actually 11.799999999999997 / 5, then one should assume that the returned result is slightly smaller than the correct one. - Strange enough, it is slightly larger (2.3600000000000003). But it gets even stranger (at least to me): if you let the interpreter assign 11.799999999999999 / 5 to x, x has the value - you guess it? - 2.36.

> Everything fits snugly.

That's too much for me. I'm not a professional mathematician (rather I'm a student of psychology), and I have vast difficulties in understanding what's going on here.

I had a Turbo Pascal compiled program do this division with all types of variables, single(4 bytes), double(8), extended(10). What I found out, is that the program returns always the correct results in scientific notation (15 digits fixed for double), but in ordinary notation when I control how many digits shall be displayed, 18.2 is representated as 18.200000000000000700, and the result is the same as the one from Netsacpe's Javascript interpreter.

What I came to understand by that experiment is that every program that deals in any way with arithmetic seems to have a built-in routine, that chops all zeros if there's no non-zero digit up to the last significant one. For double representation, the last significant digit obviously is the 15th. And this exactly seems to be the point where Netscape's JS interpreter is wrong: it considers the 16th digit to be last significant one, but the information this digit contains is senseless in most cases.

Okay. What remains to be explained is where the 16th digit in double representation comes from. If I were a computer and had to convert the result of 11.8(10)/5(10) from binary notation to decimal, I would compute 1*2^1 + 0*2^0 + 0*2^-1 + 1*2^-2 + ... + 0*2^-48; that's 2,35999999999999. Seems to be periodical, so let's say 2,36000000000000. Now take some waste from the stack or anywhere else and use it to fill the remaining positions up to 20 digits total: 2.360000000000003000. Check if there's any non-zero digit after 2.36 on a significant position; no, there isn't, so chop the rest and return 2.36. Done.

I feel, my empathy for computers might not be sufficient. Do you have a clue how it works exactly? Am I fundamentally wrong anywhere?

- I continued exploring your site, and I must say it's absolutely great. You're doing really good educational work.

Regards,

Bernd Liebermann

|Reply|

73526886