Thanks to visit codestin.com
Credit goes to github.com

Skip to content

py/parsenum: Implement exact float parsing using integer mpz (WIP) #6024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

dpgeorge
Copy link
Member

This PR attempts to implement exact parsing of floats in mp_parse_num_decimal(). There have been quite a few issues related to this, and parsing floats correctly is important, so I thought it'd be worth trying to fix this properly.

Other implementations I could find (eg musl's) are pretty complex and use a lot of stack. The implementation here is deigned and written from scratch (but it's probably not original) and reuses the existing MicroPython big-int functions to do the bulk of the conversion.

The point is that all operations during the conversion must be exact (ie not do any rounding or else the rounding will compound), except as an optimisation the very last bit can use the FP hardware because it'll round in the correct way.

The idea is to parse the input number as a big-int and then apply the exponent also using big-int arithmetic. The trick is that the exponent of 10**e can be factored into (2*5)**e of which the 2**e part can be done exactly in hardware because that's the natural base of the FP representation. So only the 5**e calculation needs to be done using big-ints. If e is negative then the big-int is pre-multiplied by a large enough number (namely 8**e) so it can then be divided by 5**e without loss.

The entire parsing could be done without any FP hardware, ie just using big-ints to construct the bits in the floating point representation (mantissa, exponent, sign bit). But it seems OK to do the last bit using standard float operations (converting the big-int to a float then adjusting by the power-of-2 exponent) and this makes the code a bit simpler (reuses existing functions, like ldexp).

Tested against a set of known difficult numbers (from CPython test suite) and it parses all of them correctly (in double precision mode).

TODO/outstanding:

  • do more tests
  • test with 32-bit floats
  • don't allocate heap memory for temporary mpz's, use the stack instead
  • benchmark to see if it has acceptable performance

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label May 10, 2020
@dpgeorge
Copy link
Member Author

For an example of a number that did not parse correctly before this PR, but parses correctly now, consider 74e46. This used to parse to b'\xee\xad\xa0\xec\xd73\xe0I' but now parses correctly to b'\xef\xad\xa0\xec\xd73\xe0I' (note LSB). And getting this correct is hard because 74 can be represented exactly in a double, so the error is introduced by applying the exponent, ie the multiplication by 10**46. This exponent cannot be represented exactly on its own so pow(10, 46) is inexact, and then doing the multiplication 74 * pow(10, 46) propagates the error.

The way this number is parsed in this PR is:

  • 74 parsed as a big-int, with exponent 46 parsed as a machine int
  • 5**exp computed as a big-int
  • 74 * 5**exp computed as a big-int (still exact)
  • normalisation of 74 * 5**exp which does correct rounding (this step is not strictly needed though)
  • convert 74 * 5**exp to a float, no longer exact but as close as possible to the true value (due to IEEE properties of FP)
  • multiply byt 2**exp by: ldexp(float(74 * 5**exp), exp) which is an exact FP operation (basically just adjusts the FP exponent of 74 * 5**exp)

@stinos
Copy link
Contributor

stinos commented May 13, 2020

This looks really good. Didn't check the code in detail but the logic is sound and if tests pass the code should be correct :) With that in mind: it's probably worth adding as much 'speical' cases as needed to cover a good range. Then if the performance isn't abysmal this is the way forward I think.

@dpgeorge
Copy link
Member Author

With that in mind: it's probably worth adding as much 'speical' cases as needed to cover a good range.

Not sure what you mean by this, which "special cases"?

@stinos
Copy link
Contributor

stinos commented May 13, 2020

The ones which don't parse correctly now like in your previous comment. And picked across the whole range of doubles, i.e. also very large and very small ones.

@dpgeorge
Copy link
Member Author

Aah, I see, you mean add specific tests. Yes indeed.

For 32-bit parsing it should be possible to test every value (although not as part of the test suite, it'd take too long).

@dpgeorge dpgeorge force-pushed the py-parsenum-exact-float branch from 489efde to 1c5238c Compare May 16, 2020 07:38
@ddiminnie
Copy link

ddiminnie commented Jun 5, 2020

If you would like some additional 'special case' double-precision tests, feel free to use these (or feel free to bin them - won't hurt my feelings :-) )

(All cases refer to 17-digits-of-precision decimal inputs (sans trailing zeros), with expected results expressed as (little-endian) IEEE-754 binary64 values.)

Overflow/largest double boundary:
1.7976931348623159e308 (should overflow, 0x000000000000f07f )
1.7976931348623158e308 (should yield the max double, 0xffffffffffffef7f )

Normalized/denormalized double boundary
2.2250738585072012e-308 (should yield smallest normalized double, 0x0000000000001000 )
2.2250738585072011e-308 (should yield largest denormalized double, 0xffffffffffff0f00 )

Shortest (up to) 17-digit input that converts to smallest denormalized double:
5e-324 (should yield smallest denormalized double, 0x0100000000000000 )

Closest 17-digit input to the smallest denormalized double:
4.9406564584124654e-324 (should yield smallest denormalized double, 0x0100000000000000 )

The next boundary will depend on how good the ldexp implementation is on the target platform:
Smallest denormalized double/underflow boundary:

2.4703282292062328e-324 (should yield smallest denormalized double, 0x0100000000000000 )
(Note that this value is greater than 2**-1075 and therefore should round up. 64-bit CPython 3.7.5 on win32 gets this right. Your mileage may vary, since the 54 most significant bits of the result are 0b1.00000000000000000000000000000000000000000000000000000 x 2**-1075.)

2.4703282292062327e-324 (should underflow to zero: 0x0000000000000000 )
(Note that this value is less than 2**-1075 and therefore should round down to zero.)

@projectgus
Copy link
Contributor

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

Copy link

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +448 +0.052% standard
      stm32:  +140 +0.036% PYBV10
     mimxrt:  -728 -0.195% TEENSY40
        rp2:  +288 +0.031% RPI_PICO_W
       samd:  +128 +0.048% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  +182 +0.040% VIRT_RV32

@dpgeorge dpgeorge force-pushed the py-parsenum-exact-float branch from b34b502 to 39a6e9e Compare June 12, 2025 00:32
Copy link

codecov bot commented Jun 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (9bde125) to head (e4af676).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6024   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files         169      169           
  Lines       21941    21968   +27     
=======================================
+ Hits        21621    21648   +27     
  Misses        320      320           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dpgeorge and others added 3 commits June 12, 2025 11:07
Float parsing should now be exact.  Costs about 200 bytes on stm32.

TODO:
- test more
- don't allocate heap memory, use the stack instead
- benchmark to see if it's acceptable

Signed-off-by: Damien George <[email protected]>
@dpgeorge dpgeorge force-pushed the py-parsenum-exact-float branch from 39a6e9e to 371c84c Compare June 12, 2025 01:08
Thanks to @ddiminnie for the extra test cases.

Signed-off-by: Damien George <[email protected]>
@dpgeorge dpgeorge force-pushed the py-parsenum-exact-float branch from 371c84c to e4af676 Compare June 12, 2025 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants