Add OBJ_REPR_E for full 32-bit float precision #18401

jepler · 2025-11-11T04:26:35Z

Summary

Add a new object representation called OBJ_REPR_E.

This work is loosely inspired by Float Self-Tagging but the shift & tag values are not any of those proposed in the paper.

Testing

I locally ran the testsuite. Some native tests fail, which I'll have to resolve one way or another. It may be #18108 because the small int size is 30 bits, not 31.

Trade-offs and Alternatives

I did not see a way to organize the new repr while retaining 31-bit ints. This means we hit #18105 and #18108 -- no uctypes, and no mpy-compiled native/viper code.

It adds about +1.5kB.

This builds on #18396

github-actions · 2025-11-11T04:37:41Z

Code size report:

Reference:  tests/serial_test.py: Allow up to 2 seconds between bytes. [2762fe6]
Comparison: core: Add OBJ_REPR_E. [merge of 156444f]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

codecov · 2025-11-11T04:39:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.38%. Comparing base (2762fe6) to head (156444f).

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #18401   +/-   ##
=======================================
  Coverage   98.38%   98.38%           
=======================================
  Files         171      171           
  Lines       22294    22294           
=======================================
  Hits        21933    21933           
  Misses        361      361

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Jeff Epler <[email protected]>

jepler · 2025-11-11T16:27:16Z

@yoctopuce is there any tuning of the float to string conversion that can be done for this port? I noticed that for instance 2.**28 isn't printed exactly (but it's also not printed exactly in the unix longlong build so maybe it's expected?)

$ ./ports/unix/build-repr_e/micropython -c 'print(2.**28, int(2.**28))'
268435460.0 268435456
$ ./ports/unix/build-longlong/micropython -c 'print(2.**28, int(2.**28))'
268435500.0 268435456

yoctopuce · 2025-11-11T17:08:00Z

@jepler Interesting PR :-) I started on the same path back in June, but feedback on this idea was not very positive at that time: https://github.com/orgs/micropython/discussions/17566
So I ended up with improving REPR_C accuracy only.

Regarding your question, you can try to enable

#define MICROPY_FLOAT_FORMAT_IMPL (MICROPY_FLOAT_FORMAT_IMPL_EXACT)

and see if that helps for these edge cases.

Now, use of an identifier like `MP_CONST_FLOAT_e` introduces a floating point constant object. Signed-off-by: Jeff Epler <[email protected]>

jepler · 2025-11-12T14:36:19Z

I changed repr_e and longlong variants slightly so they were more similar, so I could make a size comparison. It looks like it's approximately +1560.

disable uctypes & use LONGINT_IMPL_LONGLONG in both

diff --git a/ports/unix/variants/longlong/mpconfigvariant.h b/ports/unix/variants/longlong/mpconfigvariant.h
index d50d360b1f..7ac596085f 100644
--- a/ports/unix/variants/longlong/mpconfigvariant.h
+++ b/ports/unix/variants/longlong/mpconfigvariant.h
@@ -40,5 +40,8 @@
 // Set base feature level.
 #define MICROPY_CONFIG_ROM_LEVEL (MICROPY_CONFIG_ROM_LEVEL_EXTRA_FEATURES)
 
+// Disable uctypes so size comparison with REPR_E is relevant
+#define MICROPY_PY_UCTYPES       (0)
+
 // Enable extra Unix features.
 #include "../mpconfigvariant_common.h"
diff --git a/ports/unix/variants/repr_e/mpconfigvariant.h b/ports/unix/variants/repr_e/mpconfigvariant.h
index 82fd84c633..bf23592199 100644
--- a/ports/unix/variants/repr_e/mpconfigvariant.h
+++ b/ports/unix/variants/repr_e/mpconfigvariant.h
@@ -30,6 +30,9 @@
 // for full float range & precision. Therefore this variant should be built
 // using MICROPY_FORCE_32BIT=1
 
+// Use MICROPY_LONGINT_IMPL_LONGLONG so size comparison with longlong is relevant.
+#define MICROPY_LONGINT_IMPL           (MICROPY_LONGINT_IMPL_LONGLONG)
+
 #define MICROPY_OBJ_REPR               (MICROPY_OBJ_REPR_E)
 #define MICROPY_FLOAT_IMPL             (MICROPY_FLOAT_IMPL_FLOAT)
 
diff --git a/ports/unix/variants/repr_e/mpconfigvariant.mk b/ports/unix/variants/repr_e/mpconfigvariant.mk
index 5765e86bc0..02f4f3842d 100644
--- a/ports/unix/variants/repr_e/mpconfigvariant.mk
+++ b/ports/unix/variants/repr_e/mpconfigvariant.mk
@@ -6,3 +6,5 @@ MICROPY_FORCE_32BIT := 1
 MICROPY_PY_FFI := 0
 
 RUN_TESTS_MPY_CROSS_FLAGS = --mpy-cross-flags='-march=x86 -msmall-int-bits=30'
+
+MPY_TOOL_FLAGS = -mlongint-impl longlong

jepler · 2025-11-12T17:32:46Z

Is there any particular numeric workload it would be instructive to try, to make sure the improved accuracy is working?

what sorts of test(s) should be written specifically for REPR_E? For instance should I test that float arithmetic on "modest numbers" doesn't do allocations?

All bits from "struct.unpack" come through

$ build-repr_e/micropython
>>> struct.unpack("5f", struct.pack("5i", 0x3f800000, 0x3f800001, 0x3f800002, 0x3f800003, 0x3f800004))
(1.0, 1.0000001, 1.0000002, 1.0000004, 1.0000005)
$ build-longlong/micropython
>>> struct.unpack("5f", struct.pack("5i", 0x3f800000, 0x3f800001, 0x3f800002, 0x3f800003, 0x3f800004))
(1.0, 1.0, 1.0, 1.0, 1.0000008)

A lengthy sum() exactly matches numpy.float32 arithmetic

$ ./build-repr_e/micropython -c 'print(sum(1/i for i in range(1, 1000)))'
7.4844784
$ python -c 'import numpy; print(sum(1/numpy.float32(i) for i in range(1, 1000)))'
7.4844785
>>> numpy.float32("7.4844784") == numpy.float32("7.4844785")
np.True_
$ ./build-longlong/micropython -c 'print(sum(1/i for i in range(1, 1000)))'
7.484432

Heap/boxing behavior

>>> e = 20; micropython.heap_lock(); print(2.**e)
1048576.0
>>> e = -20; micropython.heap_lock(); print(2.**e)
9.5367432e-07
>>> e = 50; micropython.heap_lock(); print(2.**e)
MemoryError: memory allocation failed, heap is locked
>>> e = -50; micropython.heap_lock(); print(2.**e)
MemoryError: memory allocation failed, heap is locked

jepler · 2025-11-12T17:39:07Z

# Sum the harmonic series until an addend is so small it doesn't contribute
# to the sum.
#
# cpython: np.float32(15.403683) 4176757c
# repr_e: 15.403683 4176757c
# longlong: 13.218584 41537f50


import struct
import sys
if sys.implementation.name == "cpython":
    from numpy import float32 as ff
else:
    ff = float

f = ff("1")
g = 0
while True:
    h = g + 1/f
    if g == h: break
    f += 1
    g = h

print(repr(g), "%08x" % struct.unpack('I', struct.pack("f", g)))

dhalbert · 2025-11-12T17:45:50Z

@jepler This is very clever. I think it would be worth including quite a bit more explanation in mpconfig.h or somewhere else. The comment 30-bit fp. zero/inf/nan/large/small boxed. does not say a lot, and you say you chose a scheme different from the paper. Also it would be good to give a reference to the paper.

In Repr E, the full precision of 32-bit floating point values can be used, but absolute values smaller than 3.7252904e-09 or bigger than 5.36871e+08 are stored as boxed values, requiring allocations. Repr E reduces the width of small ints to 30 bits. Signed-off-by: Jeff Epler <[email protected]>

jepler · 2025-11-12T17:56:44Z

Good points @dhalbert , updated. (as well as some other fixes)

dpgeorge · 2025-11-13T00:45:11Z

I started on the same path back in June, but feedback on this idea was not very positive at that time: https://github.com/orgs/micropython/discussions/17566

@yoctopuce indeed it's quite a big change to lose precision of small ints. I otherwise agree with your comments on that discussion, that in some applications more float precision is better than more small ints being able to exist without heap allocation. So maybe your idea could be a new object representation (although not necessarily used by any ports here).

I did actually suggest a related idea for this PR to @jepler, but my idea was to retain (30+1)-bits for small ints.

@jepler I did not read the paper you cite, and I should have made this clearer in my email to you, but I was thinking to just use exactly the same object layout as the existing representation C (which has 31-bit small ints), that boxes qstr and immediate objects inside the float nan space. Then just create floats on the heap if they can't be exactly represented when truncating the least two bits. Does that make sense?

jepler · 2025-11-13T01:09:11Z

@dpgeorge indeed, it's likely I misunderstood, because I had this paper in mind.

What's interesting about the approach in the paper is that "presumably common floats" (ones within some range of exponents around 0) are all unboxed, while it's the ones with extreme exponents that get boxed. As opposed to just adding boxed floats to REPR_C, where 3/4 of all floats will end up boxed uniformly without respect to exponent.

But, I just can't figure out a way to also keep 31 bit small ints while using an "add and roll" encoding. I can arrange for the tag bits of floats to be ...10 instead of ...11 using an offset of 0x1000_0000 instead, but that doesn't let me put the other values in with the NaNs because NaNs don't end up tagged ...10, they would be tagged ...00. And taking another tag bit from the exponent means the argument that "plausibly common floats" all fit become more and more false.

If that means this approach is not likely to be merged that's fine, I got a good education in object representation out of it. Assuming I understand you now, I can come back and try the approach that strictly extends REPR_C instead in a fresh PR.

jepler changed the title ~~Add OBJ_REPR_E~~ Add OBJ_REPR_E for full 32-bit float precision Nov 11, 2025

jepler added 2 commits November 11, 2025 08:23

Docs: Clarify which object representation is described.

018d771

Signed-off-by: Jeff Epler <[email protected]>

ci: Move embedding commands to ci.sh.

aeba401

Signed-off-by: Jeff Epler <[email protected]>

jepler force-pushed the repr_e branch from 86f3a02 to 1cc0cfc Compare November 11, 2025 14:23

core: Flexibly define float constant objects.

217f6ce

Now, use of an identifier like `MP_CONST_FLOAT_e` introduces a floating point constant object. Signed-off-by: Jeff Epler <[email protected]>

jepler mentioned this pull request Nov 12, 2025

core: Flexibly define float constant objects. #18396

Closed

4 tasks

jepler force-pushed the repr_e branch from 1cc0cfc to 156444f Compare November 12, 2025 17:56

dpgeorge added the py-core Relates to py/ directory in source label Nov 13, 2025

jepler mentioned this pull request Nov 13, 2025

float: Introduce MICROPY_FLOAT_BOX_AS_NEEDED. #18413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add OBJ_REPR_E for full 32-bit float precision #18401

Add OBJ_REPR_E for full 32-bit float precision #18401

jepler commented Nov 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

jepler commented Nov 11, 2025

Uh oh!

yoctopuce commented Nov 11, 2025

Uh oh!

jepler commented Nov 12, 2025

Uh oh!

jepler commented Nov 12, 2025

Uh oh!

jepler commented Nov 12, 2025

Uh oh!

dhalbert commented Nov 12, 2025

Uh oh!

jepler commented Nov 12, 2025

Uh oh!

dpgeorge commented Nov 13, 2025

Uh oh!

jepler commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Add OBJ_REPR_E for full 32-bit float precision #18401

Are you sure you want to change the base?

Add OBJ_REPR_E for full 32-bit float precision #18401

Conversation

jepler commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Trade-offs and Alternatives

Uh oh!

github-actions bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jepler commented Nov 11, 2025

Uh oh!

yoctopuce commented Nov 11, 2025

Uh oh!

jepler commented Nov 12, 2025

Uh oh!

jepler commented Nov 12, 2025

All bits from "struct.unpack" come through

A lengthy sum() exactly matches numpy.float32 arithmetic

Heap/boxing behavior

Uh oh!

jepler commented Nov 12, 2025

Uh oh!

dhalbert commented Nov 12, 2025

Uh oh!

jepler commented Nov 12, 2025

Uh oh!

dpgeorge commented Nov 13, 2025

Uh oh!

jepler commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jepler commented Nov 11, 2025 •

edited

Loading

github-actions bot commented Nov 11, 2025 •

edited

Loading

codecov bot commented Nov 11, 2025 •

edited

Loading