py/parse: Add support for math module constants and float folding #16666

yoctopuce · 2025-01-28T18:02:27Z

Summary

This is the first of four pull requests providing enhancements to the MicroPython parser, mainly targeting mpy-cross, with the aim of reducing the footprint of compiled mpy files to save flash and RAM. I have previously opened a discussion in the MicroPython discussion forum and asked for comments.

The first new feature is to extend the use of compile-time const() expressions to some unhandled cases, that we found useful in the MicroPython implementation of our programming API. our programming API uses named constants to refer to specific cases, such as INVALID_MEASURE. For floating-point methods, this requires a definition such as:

_INVALID_MEASURE = const(math.nan)

or

_INVALID_MEASURE = const(-math.inf)

However, this is not supported by the current implementation, because MICROPY_COMP_MODULE_CONST and MICROPY_COMP_CONST_FOLDING are restricted to integer constants.

So we have introduced a new MICROPY_COMP_FLOAT_CONST feature which reuses the code of MICROPY_COMP_CONST_FOLDING to also support folding of floating point constants, and to include math module constants when MICROPY_COMP_MODULE_CONST is defined. This makes it possible to use compile-time math constants such as:

_DEG_TO_GRADIANT = const(math.pi/180)
_INVALID_VALUE = const(math.nan)

The commit explicitely enables this feature for mpy-cross, as it makes the most sense there, but is otherwise limited to ports using MICROPY_CONFIG_ROM_LEVEL_FULL_FEATURES.

Testing

We have verified that the new code in mpy-cross works properly both on Windows and Linux. As target for running mpy code, we have been testing various Windows and Linux versions, as well as our custom board, which uses a Texas ARM Cortex processor very similar to the cc3200 port.

Micropython integration testing has found some tricky corner cases that have been solved:

Constants 0.0 and -0.0 should not be merged during code emission, even though they are identical according to ==
The string based encoding used for floats in the .mpy file must be done carefully, as the mp_parse_num_float() used when loading the .mpy constants has some quirks (due to the use of a float to build the mantissa from the decimal form) which can cause a decrease in precision when adding more decimals. For instance, the number returned by mp_parse_num_float() when parsing 2.7182818284590451 is smaller than 2.718281828459045, and therefore less accurate to represent math.e although it should actually be closer. But relying on 16 decimal places to represent double-precision does not work in all cases, such as properly encoding/decoding 2.0**100. So we ended up checking at compile time if mp_parse_num_float() would give the exact same number using the shortest representation, and only adding an extra digit only if this is not the case. This empirical method seems to work for all test cases.

Another way to solve the mp_parse_num_float() problem would have been to avoid the mantissa overflow altogether by using a pair of floats instead of a single float, but this would have required a change to the runtime code that is otherwise not needed by this pull request, and causing

an increase in code size in the runtime code
possible loss of precision when using new mpy-cross binaries with unpatched runtime

There are two qemu ports for which the integration tests show failed test cases, that appear to be related to this mp_parse_num_float() problem. We could investigate these further if you can provide information on how to reproduce this test environment.

Trade-offs and Alternatives

This Pull Request only affects the code size of mpy-cross and ports using MICROPY_CONFIG_ROM_LEVEL_FULL_FEATURES, for which the negative impact of increased code size is unlikely to be relevant.

Folding floating-point expressions at compile time generally reduces the memory footprint of .mpy files, by saving some opcodes and even some qstr for referencing math constants. The saving is even greater for use cases like ours, where global definition such as _INVALID_MEASURE = -math.inf can be replaced by a compile-time const() expression, removing all references to the qstr _INVALID_MEASURE.

codecov · 2025-01-28T18:05:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.54%. Comparing base (f5d10c3) to head (bf8cda8).

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #16666   +/-   ##
=======================================
  Coverage   98.54%   98.54%           
=======================================
  Files         169      169           
  Lines       21898    21910   +12     
=======================================
+ Hits        21579    21591   +12     
  Misses        319      319

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-01-28T18:12:51Z

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +384 +0.045% standard
      stm32:  +224 +0.057% PYBV10
     mimxrt:  +232 +0.062% TEENSY40
        rp2:  +208 +0.023% RPI_PICO_W
       samd:  +216 +0.080% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  +266 +0.059% VIRT_RV32

dpgeorge · 2025-01-28T23:07:55Z

Thanks for the contribution! At first glance this looks like a good enhancement.

Please can you add tests to get 100% coverage of the new code.

yoctopuce · 2025-01-29T06:39:31Z

Sure, I will update to get full coverage.

I have also identified the issue with qemu arm integration test failing and will fix it.

Should I convert to 'draft' or is it fine to leave the pull request as-is in between ?

yoctopuce · 2025-01-29T10:35:17Z

Followup on the qemu arm integration failure: the problem is indeed linked to the suboptimal result provided by mp_parse_num_float(). Contrarily to what I initially thought, the problem is not linked to a mantissa overflow but to a round-up correction after multiplying by the exponent.

I wrote a small piece of test code that compares the value provide by mp_parse_num_float() to the value computed by casting the result of bigint arithmetic to a float, which is expected to provided the closest value. The two resulting values are displayed using a format string showing extra digits to put the difference in evidence. As illustrated below run with MICROPY_FLOAT_IMPL_FLOAT, mp_parse_num_float() currently fails to provide the best floating point number even in some very simple cases:

float('1.2e30'):
    => 1.2e+30
    vs 1.2e+30
float('1.26e30'):
    => 1.2599999020249624675e+30  FAIL
    vs 1.2599999775828091928e+30
float('1.267e30'):
    => 1.2669999610545268366e+30
    vs 1.2669999610545268366e+30
float('1.2676e30'):
    => 1.2676000416158554855e+30  FAIL
    vs 1.2675999660720408409e+30
float('1.26765e30'):
    => 1.2676499097910229878e+30  FAIL
    vs 1.2676499853488888585e+30
float('1.267650e30'):
    => 1.2676499097910229878e+30  FAIL
    vs 1.2676499853488888585e+30
float('1.2676506e30'):
    => 1.2676505898138652462e+30
    vs 1.2676505898138652462e+30
float('1.26765060e30'):
    => 1.2676505898138652462e+30
    vs 1.2676505898138652462e+30
float('1.267650600e30'):
    => 1.2676505898138652462e+30
    vs 1.2676505898138652462e+30
float('1.2676506002e30'):
    => 1.2676505898138652462e+30
    vs 1.2676505898138652462e+30
float('1.26765060022e30'):
    => 1.2676505142560004287e+30  FAIL
    vs 1.2676505898138652462e+30
float('1.267650600228e30'):
    => 1.2676505142560004287e+30  FAIL
    vs 1.2676505898138652462e+30
float('1.2676506002282e30'):
    => 1.2676505142560004287e+30  FAIL
    vs 1.2676505898138652462e+30
float('1.26765060022822e30'):
    => 1.2676505898138652462e+30
    vs 1.2676505898138652462e+30
float('1.267650600228229e30'):
    => 1.2676504386981526132e+30  FAIL
    vs 1.2676505898138652462e+30

I will see if I can fix that rounding issue properly.

yoctopuce · 2025-01-30T19:13:47Z

The code has been improved since previous review, and coverage fixed.

The two outstanding failed checks are due to inaccurate float parsing when compiling with MICROPY_FLOAT_IMPL_FLOAT, but they should be fixed once my other pull request py/parsenum.c: reduce code footprint of mp_parse_num_float. #16672 is integrated

yoctopuce · 2025-02-18T21:53:57Z

As mentionned in my previous comment, as this pull request depends on the improvement in parsenum.c in another pull request (#16672) , I have rebased it accordingly so that the unit test checks show the actual status once pulled in.

yoctopuce · 2025-03-03T14:38:06Z

Rebased to master now that #16672 has been merged in.

yoctopuce · 2025-03-05T13:58:06Z

rebased to head revision

yoctopuce · 2025-05-16T17:01:49Z

rebased to head revision

py/mpconfig.h

py/emitcommon.c

py/objfloat.c

dpgeorge · 2025-05-26T04:07:47Z

py/parse.c

    // Extra constants as defined by a port
    MICROPY_PORT_CONSTANTS
 };
 static MP_DEFINE_CONST_MAP(mp_constants_map, mp_constants_table);
 #endif

+static bool binary_op_maybe(mp_binary_op_t op, mp_obj_t lhs, mp_obj_t rhs, mp_obj_t *res) {
+    nlr_buf_t nlr;
+    if (nlr_push(&nlr) == 0) {


This nlr_push is relatively expensive.

What are the cases in which it's needed? From what I can gather it's:

divide by zero (both int and float)

when complex numbers are produced without complex enabled (float only)

negative shift (int only)

negative power without float enabled (int only)

unsupported operation (eg shifting a float)

The int cases are already handled explicitly. Maybe it's not much extra code to explicitly guard against the invalid float operations, so that this nlr_push protection is not needed?

It's a bit of a trade-off here, whether to use nlr_push and let mp_binary_op dictate what's valid and what isn't, or to make the code in the parser here know what's valid.

I thought about that initially, but I was afraid to miss a corner case which would then break compilation. For instance, on builds with limited integers (longlong), an exception might be raised if the result of an operation is beyond the supported longlong range. I was afraid that going into such detailled tests somehow breaks the modularity of MicroPython, hence my preference for catching the exception.

Are you mostly concerned by the impact on stack space, or more on compilation time ?

py/parse.c

dpgeorge · 2025-05-26T04:15:11Z

tests/micropython/const_error.py

+# They should not anymore be expected to always fail:
+#
+# test_syntax("A = const(1 / 2)")
+# test_syntax("A = const(1 ** -2)")


I think it's worth putting these in a new test file tests/micropython/const_float.py, along with some other basic tests of float constant folding.

I have added two test files, one for simple const floats and one for const expression referencing math constants (if math is available).
I have also added a note to point out the the const folding code is actually mostly tested by running the full coverage tests via mpy, which involves lots of float constant folding.

I have also added a note to point out the the const folding code is actually mostly tested by running the full coverage tests via mpy, which involves lots of float constant folding.

Does it really? Going via mpy doesn't change the parsing or compilation process. Can you point to a specific existing test that tests float folding?

diff.txt
I was originally enabling float folding only in mpy-cross, hence my comment. Now that float folding is enabled in CORE, float folding is actually tested during float test cases regardless of whether mpy is used or not.

Looking at the difference in generated bytecode for floating point test cases demonstrates the effect of float folding. Here is an extract (I have attached to this message the complete diff file if you want to have a deeper look at it):

--- no-const-float.dump 2025-05-30 09:09:53.530062733 +0200 +++ const-float.dump 2025-05-30 09:09:06.069969310 +0200 @@ -1,6 +1,6 @@ mpy_source_file: float1.mpy source_file: float1.py -obj_table: [0.12, 1.0, 1.2, 0.0, b'1.2', b'3.4', 2.0, 3.4, 1.847286994360591] +obj_table: [0.12, 1.0, 1.2, 0.0, b'1.2', b'3.4', -1.2, 0.5, 3.4, -3.4, 1.8472869943605905] simple_name: <module> 11:16 LOAD_NAME print 23:00 LOAD_CONST_OBJ 0.12 @@ -189,17 +189,13 @@ 59 POP_TOP 11:16 LOAD_NAME print 23:02 LOAD_CONST_OBJ 1.2 - d0 UNARY_OP 0 __pos__ 34:01 CALL_FUNCTION 1 59 POP_TOP 11:16 LOAD_NAME print - 23:02 LOAD_CONST_OBJ 1.2 - d1 UNARY_OP 1 __neg__ + 23:06 LOAD_CONST_OBJ -1.2 34:01 CALL_FUNCTION 1 59 POP_TOP - 81 LOAD_CONST_SMALL_INT 1 - 82 LOAD_CONST_SMALL_INT 2 - f7 BINARY_OP 32 __truediv__ + 23:07 LOAD_CONST_OBJ 0.5 16:1a STORE_NAME x 11:16 LOAD_NAME print 11:1a LOAD_NAME x ...

Another extract from float2int_doubleprec_intbig.py

11:19 LOAD_NAME is_64bit - 44:66 POP_JUMP_IF_FALSE 38 - 23:0a LOAD_CONST_OBJ 1.00000005 - d1 UNARY_OP 1 __neg__ - 23:01 LOAD_CONST_OBJ 2.0 - 23:0b LOAD_CONST_OBJ 62.0 - f9 BINARY_OP 34 __pow__ - f4 BINARY_OP 29 __mul__ + 44:52 POP_JUMP_IF_FALSE 18 + 23:0b LOAD_CONST_OBJ -4.611686249011688e+18 16:27 STORE_NAME neg_bad_fp - 23:01 LOAD_CONST_OBJ 2.0 - 23:0b LOAD_CONST_OBJ 62.0 - f9 BINARY_OP 34 __pow__ + 23:0c LOAD_CONST_OBJ 4.611686018427388e+18 16:28 STORE_NAME pos_bad_fp - 23:01 LOAD_CONST_OBJ 2.0 - 23:0b LOAD_CONST_OBJ 62.0 - f9 BINARY_OP 34 __pow__ - d1 UNARY_OP 1 __neg__ + 23:0d LOAD_CONST_OBJ -4.611686018427388e+18 16:29 STORE_NAME neg_good_fp - 23:0c LOAD_CONST_OBJ 0.9999999299999999 - 23:01 LOAD_CONST_OBJ 2.0 - 23:0b LOAD_CONST_OBJ 62.0 - f9 BINARY_OP 34 __pow__ - f4 BINARY_OP 29 __mul__ + 23:0e LOAD_CONST_OBJ 4.6116856956093665e+18 16:2a STORE_NAME pos_good_fp

Another nice one from math_fun.py

10:19 LOAD_CONST_STRING pow 11:19 LOAD_NAME pow - 23:0d LOAD_CONST_OBJ (1.0, 0.0) - 23:0e LOAD_CONST_OBJ (0.0, 1.0) - 23:0f LOAD_CONST_OBJ (2.0, 0.5) - 23:10 LOAD_CONST_OBJ 3.0 - d1 UNARY_OP 1 __neg__ - 23:11 LOAD_CONST_OBJ 5.0 - 2a:02 BUILD_TUPLE 2 - 23:10 LOAD_CONST_OBJ 3.0 - d1 UNARY_OP 1 __neg__ - 23:12 LOAD_CONST_OBJ 4.0 - d1 UNARY_OP 1 __neg__ - 2a:02 BUILD_TUPLE 2 - 2a:05 BUILD_TUPLE 5 + 23:17 LOAD_CONST_OBJ ((1.0, 0.0), (0.0, 1.0), (2.0, 0.5), (-3.0, 5.0), (-3.0, -4.0)) 2a:03 BUILD_TUPLE 3

By the way, this dump shows that float folding shortcuts the true intent of some of these tests, as some float operations were supposed to execute in runtime but are now executed at compile-time. Should we rewrite them using a temporary variable to prevent float folding ?

py/builtin.h

py/mpconfig.h

tests/micropython/const_error.py

tests/float/float_parse_doubleprec.py

dpgeorge · 2025-05-28T01:31:00Z

py/objfloat.c

+        // If the length increases by more than one digit, it means we are
+        // digging too far (eg. 1.234499999999), so we keep the short form
+        char ebuf[32];
+        mp_format_float(o_val, ebuf, sizeof(ebuf), 'g', precision + 1, '\0');


I'm still trying to understand why this code is necessary.

Is it only necessary when floats are double precision? Or also single precision?

Can you give an example of a number that needs this additional digit of precision?

The numbers that require this extra digit are not the the same for single-precision and double-precision (as the rounding effects differs), but a typical example for double-precision is 2.0 ** 100, which is the test case I have added to float_parse_doubleprec.py. Without the improved repr code, numbers which appear to be the same as per repr() would actually not match with regards to == operator:

>>> n = float('2.0')**100 >>> n2 = float(repr(n)) >>> n 1.267650600228229e+30 >>> n2 1.267650600228229e+30 >>> n2 == n False

With the improved repr code, we get the expected behaviour:

>>> n = float('2.0')**100 >>> n2 = float(repr(n)) >>> n 1.2676506002282294e+30 >>> n2 1.2676506002282294e+30 >>> n2 == n True

With float folding enabled, large expanded decimal numbers become more frequent in mpy files and this could quickly cause functional differences.

Without this improved repr code, the coverage tests fail in float_float2int_intbig due to the missing half-digit.

OK, thanks for the info. I now understand the problem. Let me restate it in two ways:

Printing a double with 16 digits of precision is not enough to represent every number. 17 digits is a little bit more than needed.

There exist multiple different numbers whose repr at 16 digits is the same. At 17 digits they all differ.

MicroPython at the moment uses 16 digits which is not enough to faithfully represent doubles.

So we do need to fix something here, but it's not clear what or how. My testing shows that in CPython doing '{:.17g}'.format(n) is enough digits to properly represent every number. Ie:

n == float('{:.17g}'.format(n))

The issue is that in MicroPython the above is not true. In fact there's no repr precision in MicroPython that allows you to satisfy the above for all n (eg even .20g doesn't work).

For some numbers, eg 39.0**100, the above is true in MicroPython for precision of 17 and 19 but false for 16, 18 and 20!

Numbers like 12.0**100 and 18.0**100 get worse going from 16 to 17 digits (in MicroPython).

(And others like 40.0**100 are accurate at 17, but at 16 are printed only with 15, and so the length increases by 2 digits to get full accuracy.)

The real issue here is that MicroPython's mp_format_float() code is inaccurate. Eg:

$ python -c "print(('{:.22g}'.format(2.0**100)))" 1.267650600228229401497e+30 $ micropython -c "print(('{:.22g}'.format(2.0**100)))" 1.267650600228229428959e+30

That's a problem! It means we can't really use mp_format_float() to store floats in .mpy files, because it's inaccurate.

What to do?

There is #12008 which stores floats as binary in .mpy files. That's definitely one way to fix it, but I don't want to change the .mpy version number at this stage.

We could try to improve mp_format_float(), but I think that's very difficult, and probably needs a rewrite to make it more accurate.

Note that currently (without this PR) certain floats are already stored inaccurately in .mpy files, due to using 16 digit precision only. So does this PR make things worse? I think yes, because now more floats have the chance to be store and hence stored inaccurately. Eg those in tests/float/float2int_intbig.py.

That said, I don't think this PR (constant folding) will make things that much worse in .mpy files.

My suggestion to move forward is:

Switch to using 17 digits of precision for all repr printing (and I guess 7 or 8 for single precision). And don't worry about things like 1.2345 rendering as 1.23499999999. I think if you use repr(a_float) then you want accuracy. If you want a set number of digits, you'd be using a precision specifier like '{:.10g}'.format(a_float). (It's pretty expensive to call mp_format_float() so I'd rather not do it twice.)

Accept that floats in .mpy files are possibly inaccurate. Users can use array.array with binary representation, or escape the const folding with eg float("2.0") + 3.4 if they need 100% accuracy.

In the future, try to improve things.

If you don't like (1) because it makes 1.2345 print out as 1.234999999 at the REPL, then maybe we can adjust the code in py/persistentcode.c:save_obj to save floats explicitly with a higher precision (17 digits for double). (From my testing, always using 17 digits is slightly better than using 17 only if it grows 16 by one digit.)

(I didn't investigate single precision floats, but I assume all the above logic/reasoning applies there as well.)

Your understanding is correct, and indeed, single precision runs into the exact same kind of issues.

As you have guessed, I am not a fan of 1.234499999999, but if your testing shows that using 17 digits provides overall better results than conditionally shorting to 16, then this is the way to go.

I made a test:

import os, random, array, math stats = [0, 0, 0, 0, 0] N = 10000000 for _ in range(N): #f = random.random() f = array.array("d", os.urandom(8))[0] while f == math.isinf(f) or math.isnan(f): f = array.array("d", os.urandom(8))[0] str_f_repr = repr(f) str_f_16g = "{:.16g}".format(f) str_f_17g = "{:.17g}".format(f) str_f_18g = "{:.18g}".format(f) str_f_19g = "{:.19g}".format(f) f_repr = float(str_f_repr) f_16g = float(str_f_16g) f_17g = float(str_f_17g) f_18g = float(str_f_18g) f_19g = float(str_f_19g) stats[0] += f_repr == f stats[1] += f_16g == f stats[2] += f_17g == f stats[3] += f_18g == f stats[4] += f_19g == f print(list(100 * s / N for s in stats))

That generates 10 million random doubles and checks repr and formatting with 16 through 19 digits of precision, to see if the rendered number is equivalent to the original.

On Python it gives:

[100.0, 54.61923, 100.0, 100.0, 100.0]

So that means 100% of the numbers are accurate using repr and 17 or more digits of precision. Using 16 digits, only 54.6% represent correctly.

MicroPython unix port with this PR (ie repr using 16 or 17 digits) gives:

[52.24606, 37.99564, 53.94176, 54.37388, 54.12582]

That shows ... it's pretty terrible. But at least we see that using repr with this PR is much better than master which just uses 16 digits: 52.2% for repr vs 38% for always using 16 digits. But also, always using 17 digits gives a little more accuracy than repr, in this case 53.9% which is about 1.7% more than repr.

Note that MicroPython may also have inaccuracies converting str to float, so that may add to the errors here. But that's representative of .mpy loading because the loader uses the same float parsing code.

Also note that MicroPython was faster than CPython in the above test! That might be because MicroPython's float printing code is taking shortcuts. It also means that rendering the float twice in repr to try and extract the extra half digit is not taking up that much extra time.

Open questions:

is it worth the extra 1% or so accuracy always showing 17 digits, or is it better to show a nicer number and doing the 16/17 trick?

is it worth trying to improve mp_float_format() using a different algorithm, eg fully integer based?

I tried extending the 16/17 trick to the following:

--- a/py/objfloat.c +++ b/py/objfloat.c @@ -133,7 +133,7 @@ static void float_print(const mp_print_t *print, mp_obj_t o_in, mp_print_kind_t // digging too far (eg. 1.234499999999), so we keep the short form char ebuf[32]; mp_format_float(o_val, ebuf, sizeof(ebuf), 'g', precision + 1, '\0'); - if (strlen(ebuf) == strlen(buf) + 1) { + if (strlen(ebuf) - strlen(buf) <= 3) { memcpy(buf, ebuf, sizeof(buf)); } }

That uses 17 digits if it extends up to 3 digits more. That gives better results in the above test:

[53.85271, 37.94894, 53.94477, 54.36183, 54.16032]

That means repr is almost as good as unconditionally doing 17 digits, 53.85% vs 53.94%. But note that these percentages change a bit across runs, due to the randomness.

I have been working on this today. It is now clear to me as well that the proper solution is indeed to fix mp_float_format(), rather than adding a quick fix in repr.

I have implemented today an alternate algorithm for mp_float_format(), based on the same idea as my previous enhancement to mp_parse_num_float: using an mp_float_uint_t to convert the mantissa, rather than working directly on floats. This brings the correct percentage up to 60%, which is a good start.

Once the mantissa is computed using an integer, it is possible fix it incrementally to ensure that the parse algorithm will get back to the original number. I have been working on that, and got up to 95% correct conversions using 18 digits. I still have a bit of work to fix corner cases (forced round-ups) before I can push that code.

Do you want me to send the new mp_float_format code in a separate PR ?

I have implemented today an alternate algorithm for mp_float_format(), based on the same idea as my previous enhancement to mp_parse_num_float: using an mp_float_uint_t to convert the mantissa, rather than working directly on floats.

Wow, very nice!

Do you want me to send the new mp_float_format code in a separate PR ?

Yes please.

py/parse.c

Following discussions in PR micropython#16666, this commit updates the float formatting code to reduce the `repr` reversibility error, i.e. the percentage of valid floating point numbers that do not parse back to the same number when formatted by `repr`. The baseline before this commit is an error rate of ~46%, when using double-precision floats. This new code initially brings the error down to ~41%, using an integer representation of the decimal mantissa rather than working on floats. It will also improve the rounding in some conditions. An additional improvement to the accuracy can be turned on to bring the error down to 4.5%, by iterative refinement. This extra code however makes the code slightly slower than CPython, when tested on ports/unix. The residual error rate appears to be due to the parse code itself, which is unable to produce some specific floating point values, regardless of the string provided as input. Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to reduce the `repr` reversibility error, i.e. the percentage of valid floating point numbers that do not parse back to the same number when formatted by `repr`. The baseline before this commit is an error rate of 46%, when using double-precision floats. This new code initially brings the error down to 0.17%, by using an integer representation of the decimal mantissa rather than working on floats, followed by incremental improvements. The incremental improvement code makes the conversion slightly slower than CPython when tested on ports/unix, but not much. It can be disabled via a mpconfig setting, MICROPY_FLOAT_HIGH_QUALITY_REPR. The residual error rate appears to be due to the parse code itself, which is unable to produce some specific floating point values, regardless of the string provided as input. When the number obtained by parsing the `repr` string is not the same as the original one, the maximum relative error is < 2.7e-16, which is ~2**-52. On single-precision floats, the new code brings the error down to 0.29%, with a maximum relative error under 1e-7, which is ~2**-23. For bare metal machines with very limited ressources, disabling MICROPY_FLOAT_HIGH_QUALITY_REPR would bring the error rate to 5.11%, with a maximum relative error of 1e-6. Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr`. This new code initially offers a choice of 3 float conversion methods, depending on the desired tradeoff between code footprint and precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size - EXACT method uses higher-precision floats during conversion, which provides best results but has a higher impact on code size. It is faster than APPROX method. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 FLOAT DOUBLE current = 364136 85.47% 37.90% basic = 364188 97.78% 62.18% approx = 364396 99.70% 99.84% exact = 365608 100.00% 100.00% The commit also include two minor fix for nanbox, that were preventing the new CI tests to run properly on that port. Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr`. This new code offers a choice of 3 float conversion methods, depending on the desired tradeoff between code size and conversion precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size. It provides `repr` reversibility on >99.70% of the cases. - EXACT method uses higher-precision floats during conversion, which provides best results but, has a higher impact on code size. It is faster than APPROX method, and faster than CPython equivalent implementation. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 FLOAT DOUBLE current = 364136 85.47% 37.90% basic = 364188 97.78% 62.18% approx = 364396 99.70% 99.84% exact = 365608 100.00% 100.00% The commit also include two minor fix for nanbox, that were preventing the new CI tests to run properly on that port. Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr`. This new code offers a choice of 3 float conversion methods, depending on the desired tradeoff between code size and conversion precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size. It provides `repr` reversibility on >99.70% of the cases. - EXACT method uses higher-precision floats during conversion, which provides best results but, has a higher impact on code size. It is faster than APPROX method, and faster than CPython equivalent implementation. It is however not available on all compilers when using FLOAT_IMPL_DOUBLE. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 FLOAT DOUBLE current = 364136 85.47% 37.90% basic = 364188 97.78% 62.18% approx = 364396 99.70% 99.84% exact = 365608 100.00% 100.00% The commit also include two minor fix for nanbox, that were preventing the new CI tests to run properly on that port. Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr`. This new code offers a choice of 3 float conversion methods, depending on the desired tradeoff between code size and conversion precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size. It provides `repr` reversibility on >99.8% of the cases in double precision, and on >98.5% in single precision. - EXACT method uses higher-precision floats during conversion, which provides best results but, has a higher impact on code size. It is faster than APPROX method, and faster than CPython equivalent implementation. It is however not available on all compilers when using FLOAT_IMPL_DOUBLE. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 FLOAT DOUBLE current = 364136 27.57% 37.90% basic = 364188 91.01% 62.18% approx = 364396 98.50% 99.84% exact = 365608 100.00% 100.00% The commit also include two minor fix for nanbox, that were preventing the new CI tests to run properly on that port. Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr`. This new code offers a choice of 3 float conversion methods, depending on the desired tradeoff between code size and conversion precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size. It provides `repr` reversibility on >99.8% of the cases in double precision, and on >98.5% in single precision. - EXACT method uses higher-precision floats during conversion, which provides best results but, has a higher impact on code size. It is faster than APPROX method, and faster than CPython equivalent implementation. It is however not available on all compilers when using FLOAT_IMPL_DOUBLE. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 FLOAT DOUBLE current = 364136 27.57% 37.90% basic = 364188 91.01% 62.18% approx = 364396 98.50% 99.84% exact = 365608 100.00% 100.00% The commit also include two minor fix for nanbox, that were preventing the new CI tests to run properly on that port. It also fix a similar math.nan sign error in REPR_C (i.e. copysign(0.0,math.nan) should return 0.0). Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr`. This new code offers a choice of 3 float conversion methods, depending on the desired tradeoff between code size and conversion precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size. It provides `repr` reversibility on >99.8% of the cases in double precision, and on >98.5% in single precision. - EXACT method uses higher-precision floats during conversion, which provides best results but, has a higher impact on code size. It is faster than APPROX method, and faster than CPython equivalent implementation. It is however not available on all compilers when using FLOAT_IMPL_DOUBLE. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 FLOAT DOUBLE current = 364136 27.57% 37.90% basic = 364188 91.01% 62.18% approx = 364396 98.50% 99.84% exact = 365608 100.00% 100.00% The commit also include two minor fix for nanbox, that were preventing the new CI tests to run properly on that port. Signed-off-by: Yoctopuce dev <[email protected]>

Following discussions in PR micropython#16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr`. This new code offers a choice of 3 float conversion methods, depending on the desired tradeoff between code size and conversion precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size. It provides `repr` reversibility on >99.8% of the cases in double precision, and on >98.5% in single precision. - EXACT method uses higher-precision floats during conversion, which provides best results but, has a higher impact on code size. It is faster than APPROX method, and faster than CPython equivalent implementation. It is however not available on all compilers when using FLOAT_IMPL_DOUBLE. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 FLOAT DOUBLE current = 364136 27.57% 37.90% basic = 364188 91.01% 62.18% approx = 364396 98.50% 99.84% exact = 365608 100.00% 100.00% Signed-off-by: Yoctopuce dev <[email protected]>

dpgeorge added the py-core Relates to py/ directory in source label Jan 28, 2025

yoctopuce force-pushed the MICROPY_COMP_FLOAT_CONST branch from 6c82468 to cde2ff7 Compare January 30, 2025 18:49

yoctopuce force-pushed the MICROPY_COMP_FLOAT_CONST branch from 67497a0 to 902db0a Compare February 18, 2025 10:06

yoctopuce force-pushed the MICROPY_COMP_FLOAT_CONST branch from 902db0a to 9b59aa4 Compare March 3, 2025 13:57

yoctopuce force-pushed the MICROPY_COMP_FLOAT_CONST branch 2 times, most recently from cfab33f to fdecfc5 Compare March 5, 2025 13:04

yoctopuce force-pushed the MICROPY_COMP_FLOAT_CONST branch from fdecfc5 to 9c601a2 Compare May 16, 2025 14:07

dpgeorge added this to the release-1.26.0 milestone May 20, 2025

dpgeorge reviewed May 26, 2025

View reviewed changes

yoctopuce force-pushed the MICROPY_COMP_FLOAT_CONST branch 3 times, most recently from c2352bb to f40be29 Compare May 26, 2025 21:56

dpgeorge reviewed May 28, 2025

View reviewed changes

py/builtin.h Outdated Show resolved Hide resolved

dpgeorge reviewed May 28, 2025

View reviewed changes

yoctopuce force-pushed the MICROPY_COMP_FLOAT_CONST branch 6 times, most recently from db08842 to 860cc5c Compare May 30, 2025 13:46

yoctopuce mentioned this pull request Jun 6, 2025

py/formatfloat: Improve accuracy of float formatting code. #17444

Open

Uh oh!

py/parse: Add support for math module constants and float folding #16666

Are you sure you want to change the base?

py/parse: Add support for math module constants and float folding #16666

Conversation

yoctopuce commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Trade-offs and Alternatives

Uh oh!

codecov bot commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpgeorge commented Jan 28, 2025

Uh oh!

yoctopuce commented Jan 29, 2025

Uh oh!

yoctopuce commented Jan 29, 2025

Uh oh!

yoctopuce commented Jan 30, 2025

Uh oh!

yoctopuce commented Feb 18, 2025

Uh oh!

yoctopuce commented Mar 3, 2025

Uh oh!

yoctopuce commented Mar 5, 2025

Uh oh!

yoctopuce commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yoctopuce May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yoctopuce May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yoctopuce Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yoctopuce commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading

github-actions bot commented Jan 28, 2025 •

edited

Loading

yoctopuce May 30, 2025 •

edited

Loading

yoctopuce May 30, 2025 •

edited

Loading

yoctopuce Jun 4, 2025 •

edited

Loading