Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented Jun 4, 2025

These changes make the worst-worst case (we reach unsignedDivModHelper and b is small, so we need all 11 iterations of the loop) as fast as the best-worst case (we also reach the method but b is large and we only need 1 iteration). The cases where b <= 2^21 even become faster.

@sjrd sjrd force-pushed the opt-rt-long-divide branch 2 times, most recently from 3883616 to b500b8f Compare June 13, 2025 18:33
@sjrd sjrd force-pushed the opt-rt-long-divide branch from b500b8f to 442e307 Compare June 23, 2025 15:06
@sjrd sjrd force-pushed the opt-rt-long-divide branch 3 times, most recently from 65fd9b0 to 1d9f4bd Compare July 22, 2025 11:06
@sjrd sjrd changed the title WiP Optimize RuntimeLong division and remainder. Optimize RuntimeLong division and remainder. Jul 22, 2025
@sjrd sjrd force-pushed the opt-rt-long-divide branch 2 times, most recently from 6aa0198 to 16de707 Compare July 22, 2025 12:20
@sjrd sjrd changed the title Optimize RuntimeLong division and remainder. Optimize the worst case of RuntimeLong division and remainder. Jul 22, 2025
@sjrd sjrd marked this pull request as ready for review July 22, 2025 12:25
@sjrd sjrd requested a review from gzm0 July 22, 2025 12:25
@sjrd
Copy link
Member Author

sjrd commented Jul 22, 2025

@gzm0 This is now ready for review. The code is straightforward, but the proof is hairy.

@sjrd sjrd force-pushed the opt-rt-long-divide branch 5 times, most recently from 6423179 to e6fe198 Compare July 28, 2025 17:04
@@ -617,14 +617,14 @@ object RuntimeLong {
*
* We convert the unsigned value num = (lo, hi) to a Double value
* approxNum. This is an approximation. It can lose as many as
* 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^12.
* 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^10.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correction here is correct, but the argumentation isn't anymore: If we lose 10 bits (not 11 bits), we should lose at most 2^10-1 < 2^10 precision (2^9 + 2^8 + ... + 2^0 = 2^10 - 1).

But it feels like 64 - 53 = 11 should be 64 - 54 = 10: If we interpret the long as unsigned, the sign bit of the double also counts as part of our number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sign bit of the double will always be 0, so it does not contribute to any precision. It's not that we loose 10 bits. We do still lose 11 bits. But because we round, rather than truncate, the result can only be (2^11)/2 = 2^10 away from the original value. I elaborated the reasoning. LMK if that's clearer.

* in a particular bit width, they correspond to the computer semantics of
* unsigned integer division and remainder.
*
* For all a and b, 0 <= rem(a, b) < b.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For all a and b, 0 <= rem(a, b) < b.
* For all a and b, 0 <= rem(a, b) < |b|.

Otherwise this is obviously wrong for negative b.

Alternatively (seems to fit what comes later more:

Suggested change
* For all a and b, 0 <= rem(a, b) < b.
* For all non-negative a and b, 0 <= rem(a, b) < b.

* Case 0 < b < 2²¹
* ================
*
* In this case, b = blo, as bhi = 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In this case, b = blo, as bhi = 0.
* In this case, b = blo, and bhi = 0.

* the computation can be performed modulo 2³², using `int` operations.
*
* We still need to compute quotLo. By construction, k < b < 2²¹. Therefore,
* 2³²∙k + alo < 2⁵³. Since both operands of the div of quotLo are < 2⁵³,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* 2³²k + alo < 2⁵³. Since both operands of the div of quotLo are < 2⁵³,
* 2³²k + alo < 2³²(k + 1) 2³²b < 2⁵³. Since both operands of the div of quotLo are < 2⁵³,

For ease of understanding?

* We will prove that |q̂ - q| <= 1, where q = div(a, b) is the exact integer
* quotient.
*
* Since a >= 0 and b >= 2²¹, we know that â >= 0 and b̂ >= 2²¹,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding: the statement about b̂ is only correct because the boundary we compare to is a power of 2. So we know in floating point representation it is a rounding point (so we'll never round "across" it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. It doesn't need to be a power of 2. Any value that is exactly representable as a double will do. I've made that more formal with a no-round-across-boundary property in the preliminaries.

* = ⌊◦(â / b̂)⌋
*
* We write δa = â - a. From elementary properties of ◦() and the range of a,
* we have that â is an integer (and so is δa) and |δa| <= 2¹⁰.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the more detailed argument in toUnsignedStringLarge?

*
* We will need the following lemma:
*
* (Lemma 1) For all 0 <= x, y <= 2⁵² such that |x - y| <= 1/2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* (Lemma 1) For all 0 <= x, y <= 2⁵² such that |x - y| <= 1/2,
* (Lemma 1) For all {x, y} [0, 2⁵²] such that |x - y| <= 1/2,

I interpreted this as two equations, one on x (x <= 0) and one on y (y <= 2⁵²).

Comment on lines 1262 to 1264
* Proof. Observe that, in the given range, all multiples of 1/2 are
* representable as doubles. Rewrite x = n + f with n an integer and
* 0 <= f < 1. Then x - 1/2 = n + f - 1/2 <= y <= n + f + 1/2 = x + 1/2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to see which property is used where. How about something like:

Suggested change
* Proof. Observe that, in the given range, all multiples of 1/2 are
* representable as doubles. Rewrite x = n + f with n an integer and
* 0 <= f < 1. Then x - 1/2 = n + f - 1/2 <= y <= n + f + 1/2 = x + 1/2.
* Proof.
* From |x - y| <= 1/2, we have x - 1/2 <= y <= x + 1/2.
* Rewrite x = n + f with n an integer and 0 <= f < 1.
* Then n + f - 1/2 <= y <= n + f + 1/2.

* n - 1/2 <= y < n + 1, so n - 1 <= ⌊y⌋ <= n.
* Therefore |⌊◦(x)⌋ - ⌊y⌋| <= 1, as desired.
*
* Otherwise, 1/2 <= f < 1, then n + 1/2 <= ◦(x) <= n + 1 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is where we need "Observe that, in the given range, all multiples of 1/2 are representable as doubles."? Consider moving it closer in the argument.

/** Helper for `unsigned_/` and `unsigned_%`.
*
* If `askQuotient` is true, computes the quotient, otherwise computes the
* remainder. Stores the hi word of the result in `hiReturn`, and returns
* the lo word.
*/
@inline // inlined twice; specializes for askQuotient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the increase code size is worth it for the speed? (IIUC this is why we'd inline here).

Or do we actually get smaller code with inlining?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It produces slightly larger code with inlining. Less than 1 KB. But surprisingly, the gzipped size is smaller with inlining. Go figure 🤷‍♂️

Regardless, IMO the inlining is worth it. I don't remember having benchmarked that particular difference. However the generated code is much cleaner when specialized for the concrete value of askQuotient:
https://gist.github.com/sjrd/28eb776066031218535c8b3cd2abe7e7
Most of the duplicated code comes from the multiplication.

@sjrd sjrd force-pushed the opt-rt-long-divide branch from e6fe198 to 8a9f400 Compare September 1, 2025 12:11
/** Helper for `unsigned_/` and `unsigned_%`.
*
* If `askQuotient` is true, computes the quotient, otherwise computes the
* remainder. Stores the hi word of the result in `hiReturn`, and returns
* the lo word.
*/
@inline // inlined twice; specializes for askQuotient
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It produces slightly larger code with inlining. Less than 1 KB. But surprisingly, the gzipped size is smaller with inlining. Go figure 🤷‍♂️

Regardless, IMO the inlining is worth it. I don't remember having benchmarked that particular difference. However the generated code is much cleaner when specialized for the concrete value of askQuotient:
https://gist.github.com/sjrd/28eb776066031218535c8b3cd2abe7e7
Most of the duplicated code comes from the multiplication.

@@ -617,14 +617,14 @@ object RuntimeLong {
*
* We convert the unsigned value num = (lo, hi) to a Double value
* approxNum. This is an approximation. It can lose as many as
* 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^12.
* 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^10.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sign bit of the double will always be 0, so it does not contribute to any precision. It's not that we loose 10 bits. We do still lose 11 bits. But because we round, rather than truncate, the result can only be (2^11)/2 = 2^10 away from the original value. I elaborated the reasoning. LMK if that's clearer.

*
* Since a >= 0 and b >= 2²¹, we know that â >= 0 and b̂ >= 2²¹,
* and therefore q̂₀ >= 0. Likewise, since a < 2⁶⁴, we have â <= 2⁶⁴.
* Therefore q̂₀ <= ◦(2⁶⁴ / 2²¹) = 2⁴³.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the "rounding boundary" argument more precise in this section and elsewhere.

*
* Since 0 <= q̂₀ < 2⁶⁴, we conclude that
*
* q̂ = ⌊q̂₀⌋
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now developed in excruciating details 😅

* We will prove that |q̂ - q| <= 1, where q = div(a, b) is the exact integer
* quotient.
*
* Since a >= 0 and b >= 2²¹, we know that â >= 0 and b̂ >= 2²¹,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. It doesn't need to be a power of 2. Any value that is exactly representable as a double will do. I've made that more formal with a no-round-across-boundary property in the preliminaries.

Comment on lines +1133 to +1136
* Rounding never goes "farther than necessary" in any direction. Formally,
* for all reals x, y such that x >= y and y is exactly representable as a
* `double` value, we have ◦(x) >= y. Similarly for x <= y. We refer to this
* as the no-round-across-boundary property.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is new.

Comment on lines 1254 to 1282
* Since 0 <= a < 2⁶⁴ and 2²¹ <= b < 2⁶⁴, and since 0, 2²¹ and 2⁶⁴ are all
* exactly representable as doubles, the no-round-across-boundary property
* tells us that 0 <= â <= 2⁶⁴ and b̂ >= 2²¹.
* Therefore, â / b̂ <= 2⁶⁴ / 2²¹ = 2⁴³. Since 2⁴³ is also exactly
* representable, we have q̂₀ = ◦(â / b̂) <= 2⁴³.
*
* If a = 0, then q̂₀ = 0 and q̂₀ / 2³² = 0, hence ◦(q̂₀ / 2³²) is exact.
* Otherwise, a >= 1, hence â >= 1, and q̂₀ >= 1 / 2⁶⁴, which means q̂₀ is a
* *normal* `double` value. q̂₀ / 2³² >= 1 / 2⁹⁶ is also a normal `double`
* value. Therefore, ◦(q̂₀ / 2³²) cannot underflow, and it is exact because it
* divides by a power of 2.
* Hence in all cases, we have ◦(q̂₀ / 2³²) = q̂₀ / 2³².
*
* We will use Theorem D3 of Hacker's Delight (section 9-1):
* > For x real, d an integer ≠ 0: ⌊⌊x⌋ / d⌋ = ⌊x / d⌋.
*
* We can now develop q̂ as
*
* q̂ = 2³²∙wrap32(◦(q̂₀ / 2³²)) + wrap32(q̂₀)
* = 2³²∙rem(⌊◦(q̂₀ / 2³²)⌋, 2³²) + rem(⌊q̂₀⌋, 2³²) def of wrap32
* = 2³²∙rem(⌊ q̂₀ / 2³² ⌋, 2³²) + rem(⌊q̂₀⌋, 2³²) because ◦(q̂₀ / 2³²) = q̂₀ / 2³²
* = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙div(⌊q̂₀ / 2³²⌋, 2³²)) + (⌊q̂₀⌋ - 2³²∙div(⌊q̂₀⌋, 2³²)) def of rem
* = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙⌊⌊q̂₀ / 2³²⌋ / 2³²⌋ ) + (⌊q̂₀⌋ - 2³²∙⌊⌊q̂₀⌋ / 2³²⌋ ) def of div
* = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙⌊ q̂₀ / 2³² / 2³²⌋ ) + (⌊q̂₀⌋ - 2³²∙⌊ q̂₀ / 2³²⌋ ) Theorem D3
* = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙⌊ q̂₀ / 2⁶⁴ ⌋ ) + (⌊q̂₀⌋ - 2³²∙⌊ q̂₀ / 2³²⌋ )
* = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙0 ) + (⌊q̂₀⌋ - 2³²∙⌊ q̂₀ / 2³²⌋ ) because q̂₀ <= 2⁴³ < 2⁶⁴
* = 2³²∙⌊q̂₀ / 2³²⌋ + ⌊q̂₀⌋ - 2³²∙⌊q̂₀ / 2³²⌋
* = ⌊q̂₀⌋
* = ⌊◦(â / b̂)⌋
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section is basically new.

Comment on lines +1291 to +1311
* (Lemma 1) For all reals x and y in [0, 2⁵²) such that |x - y| <= 1/2,
* it holds that |⌊◦(x)⌋ - ⌊y⌋| <= 1.
*
* Proof.
* From |x - y| <= 1/2, we have x - 1/2 <= y <= x + 1/2.
* Rewrite x = n + f with n an integer and 0 <= f < 1.
* Then n + f - 1/2 <= y <= n + f + 1/2.
*
* Observe that, in the range [0, 2⁵²], all multiples of 1/2 are exactly
* representable as doubles. n, n + 1/2 and n + 1 all belong to that range
* and are multiples of 1/2, so they are exactly representable.
*
* If 0 <= f < 1/2, then n <= ◦(x) <= n + 1/2 (no-round-across-boundary)
* and ⌊◦(x)⌋ = n.
* n - 1/2 <= y < n + 1, so n - 1 <= ⌊y⌋ <= n.
* Therefore |⌊◦(x)⌋ - ⌊y⌋| <= 1, as desired.
*
* Otherwise, 1/2 <= f < 1, then n + 1/2 <= ◦(x) <= n + 1 (no-round-across-boundary)
* and n <= ⌊◦(x)⌋ <= n + 1.
* n <= y < n + 3/2, so n <= ⌊y⌋ <= n + 1.
* Therefore |⌊◦(x)⌋ - ⌊y⌋| <= 1 as well, as desired.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lemma proof was significantly rearranged.

* Such huge divisors are practically useless, but they defeat the
* correction code of the algorithm above.
*
* Since b >= 2^62 and a < 2^64, we know that a < 4*b (mathematically).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a <= 2^64 into a < 2^64. (If it could be == 2^64, the conclusion would be wrong.)

@sjrd sjrd requested a review from gzm0 September 1, 2025 12:15
@sjrd sjrd force-pushed the opt-rt-long-divide branch 2 times, most recently from faf785b to 2faf7ae Compare September 1, 2025 12:23
@sjrd sjrd force-pushed the opt-rt-long-divide branch from 2faf7ae to e2a3b53 Compare September 7, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants