py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

dpgeorge · 2020-05-10T15:23:04Z

This PR attempts to implement exact parsing of floats in mp_parse_num_decimal(). There have been quite a few issues related to this, and parsing floats correctly is important, so I thought it'd be worth trying to fix this properly.

Other implementations I could find (eg musl's) are pretty complex and use a lot of stack. The implementation here is deigned and written from scratch (but it's probably not original) and reuses the existing MicroPython big-int functions to do the bulk of the conversion.

The point is that all operations during the conversion must be exact (ie not do any rounding or else the rounding will compound), except as an optimisation the very last bit can use the FP hardware because it'll round in the correct way.

The idea is to parse the input number as a big-int and then apply the exponent also using big-int arithmetic. The trick is that the exponent of 10**e can be factored into (2*5)**e of which the 2**e part can be done exactly in hardware because that's the natural base of the FP representation. So only the 5**e calculation needs to be done using big-ints. If e is negative then the big-int is pre-multiplied by a large enough number (namely 8**e) so it can then be divided by 5**e without loss.

The entire parsing could be done without any FP hardware, ie just using big-ints to construct the bits in the floating point representation (mantissa, exponent, sign bit). But it seems OK to do the last bit using standard float operations (converting the big-int to a float then adjusting by the power-of-2 exponent) and this makes the code a bit simpler (reuses existing functions, like ldexp).

Tested against a set of known difficult numbers (from CPython test suite) and it parses all of them correctly (in double precision mode).

TODO/outstanding:

do more tests
test with 32-bit floats
don't allocate heap memory for temporary mpz's, use the stack instead
benchmark to see if it has acceptable performance

dpgeorge · 2020-05-11T11:51:07Z

For an example of a number that did not parse correctly before this PR, but parses correctly now, consider 74e46. This used to parse to b'\xee\xad\xa0\xec\xd73\xe0I' but now parses correctly to b'\xef\xad\xa0\xec\xd73\xe0I' (note LSB). And getting this correct is hard because 74 can be represented exactly in a double, so the error is introduced by applying the exponent, ie the multiplication by 10**46. This exponent cannot be represented exactly on its own so pow(10, 46) is inexact, and then doing the multiplication 74 * pow(10, 46) propagates the error.

The way this number is parsed in this PR is:

74 parsed as a big-int, with exponent 46 parsed as a machine int
5**exp computed as a big-int
74 * 5**exp computed as a big-int (still exact)
normalisation of 74 * 5**exp which does correct rounding (this step is not strictly needed though)
convert 74 * 5**exp to a float, no longer exact but as close as possible to the true value (due to IEEE properties of FP)
multiply byt 2**exp by: ldexp(float(74 * 5**exp), exp) which is an exact FP operation (basically just adjusts the FP exponent of 74 * 5**exp)

stinos · 2020-05-13T07:13:17Z

This looks really good. Didn't check the code in detail but the logic is sound and if tests pass the code should be correct :) With that in mind: it's probably worth adding as much 'speical' cases as needed to cover a good range. Then if the performance isn't abysmal this is the way forward I think.

dpgeorge · 2020-05-13T07:22:08Z

With that in mind: it's probably worth adding as much 'speical' cases as needed to cover a good range.

Not sure what you mean by this, which "special cases"?

stinos · 2020-05-13T07:29:00Z

The ones which don't parse correctly now like in your previous comment. And picked across the whole range of doubles, i.e. also very large and very small ones.

dpgeorge · 2020-05-13T07:48:48Z

Aah, I see, you mean add specific tests. Yes indeed.

For 32-bit parsing it should be possible to test every value (although not as part of the test suite, it'd take too long).

ddiminnie · 2020-06-05T23:13:06Z

If you would like some additional 'special case' double-precision tests, feel free to use these (or feel free to bin them - won't hurt my feelings :-) )

(All cases refer to 17-digits-of-precision decimal inputs (sans trailing zeros), with expected results expressed as (little-endian) IEEE-754 binary64 values.)

Overflow/largest double boundary:
1.7976931348623159e308 (should overflow, 0x000000000000f07f )
1.7976931348623158e308 (should yield the max double, 0xffffffffffffef7f )

Normalized/denormalized double boundary
2.2250738585072012e-308 (should yield smallest normalized double, 0x0000000000001000 )
2.2250738585072011e-308 (should yield largest denormalized double, 0xffffffffffff0f00 )

Shortest (up to) 17-digit input that converts to smallest denormalized double:
5e-324 (should yield smallest denormalized double, 0x0100000000000000 )

Closest 17-digit input to the smallest denormalized double:
4.9406564584124654e-324 (should yield smallest denormalized double, 0x0100000000000000 )

The next boundary will depend on how good the ldexp implementation is on the target platform:
Smallest denormalized double/underflow boundary:

2.4703282292062328e-324 (should yield smallest denormalized double, 0x0100000000000000 )
(Note that this value is greater than 2**-1075 and therefore should round up. 64-bit CPython 3.7.5 on win32 gets this right. Your mileage may vary, since the 54 most significant bits of the result are 0b1.00000000000000000000000000000000000000000000000000000 x 2**-1075.)

2.4703282292062327e-324 (should underflow to zero: 0x0000000000000000 )
(Note that this value is less than 2**-1075 and therefore should round down to zero.)

projectgus · 2024-03-07T23:53:26Z

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

github-actions · 2025-06-12T00:28:10Z

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +448 +0.052% standard
      stm32:  +140 +0.036% PYBV10
     mimxrt:  -728 -0.195% TEENSY40
        rp2:  +288 +0.031% RPI_PICO_W
       samd:  +128 +0.048% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  +182 +0.040% VIRT_RV32

codecov · 2025-06-12T00:38:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (9bde125) to head (e4af676).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #6024   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files         169      169           
  Lines       21941    21968   +27     
=======================================
+ Hits        21621    21648   +27     
  Misses        320      320

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Float parsing should now be exact. Costs about 200 bytes on stm32. TODO: - test more - don't allocate heap memory, use the stack instead - benchmark to see if it's acceptable Signed-off-by: Damien George <[email protected]>

Signed-off-by: Damien George <[email protected]>

@ddiminnie

Thanks to @ddiminnie for the extra test cases. Signed-off-by: Damien George <[email protected]>

dpgeorge added the py-core Relates to py/ directory in source label May 10, 2020

dpgeorge mentioned this pull request May 11, 2020

py/mkrules.mk: workaround fused multiply-add inaccuracy #5995

Closed

dpgeorge force-pushed the py-parsenum-exact-float branch from 489efde to 1c5238c Compare May 16, 2020 07:38

ddiminnie mentioned this pull request Mar 23, 2021

Slight inaccuracy in decimal formatting of floating-point values #7066

Closed

This was referenced May 18, 2021

ports/unix/Makefile: make float() constistent across cpu architectures #4246

Closed

Trailing .0 in float literal can cause rounding error #5831

Closed

dlech mentioned this pull request Jun 4, 2021

py/parsenum: fix rounding error when float ends in .0 #5832

Closed

Gadgetoid mentioned this pull request Feb 29, 2024

global: Remove the STATIC macro. #13763

Merged

dpgeorge force-pushed the py-parsenum-exact-float branch from 1c5238c to b34b502 Compare June 12, 2025 00:15

dpgeorge mentioned this pull request Jun 12, 2025

py/formatfloat: Improve accuracy of float formatting code. #17444

Merged

dpgeorge force-pushed the py-parsenum-exact-float branch from b34b502 to 39a6e9e Compare June 12, 2025 00:32

dpgeorge and others added 3 commits June 12, 2025 11:07

py/parsenum: Implement exact float parsing using integer mpz.

6b1a98c

Float parsing should now be exact. Costs about 200 bytes on stm32. TODO: - test more - don't allocate heap memory, use the stack instead - benchmark to see if it's acceptable Signed-off-by: Damien George <[email protected]>

py/parsenum: Improve parsing of floats with large number of digits.

dc33f73

Signed-off-by: Damien George <[email protected]>

tests/float: Workaround float formatting error.

92d4fa8

Signed-off-by: Damien George <[email protected]>

dpgeorge force-pushed the py-parsenum-exact-float branch from 39a6e9e to 371c84c Compare June 12, 2025 01:08

tests/float: Add test for exact float parsing.

e4af676

Thanks to @ddiminnie for the extra test cases. Signed-off-by: Damien George <[email protected]>

dpgeorge force-pushed the py-parsenum-exact-float branch from 371c84c to e4af676 Compare June 12, 2025 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

Uh oh!

dpgeorge commented May 10, 2020

Uh oh!

dpgeorge commented May 11, 2020

Uh oh!

stinos commented May 13, 2020

Uh oh!

dpgeorge commented May 13, 2020

Uh oh!

stinos commented May 13, 2020

Uh oh!

dpgeorge commented May 13, 2020

Uh oh!

ddiminnie commented Jun 5, 2020 •

edited

Loading

Uh oh!

projectgus commented Mar 7, 2024

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

codecov bot commented Jun 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

Are you sure you want to change the base?

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

Uh oh!

Conversation

dpgeorge commented May 10, 2020

Uh oh!

dpgeorge commented May 11, 2020

Uh oh!

stinos commented May 13, 2020

Uh oh!

dpgeorge commented May 13, 2020

Uh oh!

stinos commented May 13, 2020

Uh oh!

dpgeorge commented May 13, 2020

Uh oh!

ddiminnie commented Jun 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

projectgus commented Mar 7, 2024

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

codecov bot commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

ddiminnie commented Jun 5, 2020 •

edited

Loading

codecov bot commented Jun 12, 2025 •

edited

Loading