Thanks to visit codestin.com
Credit goes to github.com

Optimize the worst case of RuntimeLong division and remainder. #5190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

sjrd wants to merge 1 commit into scala-js:main from sjrd:opt-rt-long-divide

Member

sjrd commented Jun 4, 2025 •

edited

Loading

These changes make the worst-worst case (we reach unsignedDivModHelper and b is small, so we need all 11 iterations of the loop) as fast as the best-worst case (we also reach the method but b is large and we only need 1 iteration). The cases where b <= 2^21 even become faster.

sjrd force-pushed the opt-rt-long-divide branch 2 times, most recently from 3883616 to b500b8f Compare

June 13, 2025 18:33

sjrd force-pushed the opt-rt-long-divide branch from b500b8f to 442e307 Compare

June 23, 2025 15:06

sjrd force-pushed the opt-rt-long-divide branch 3 times, most recently from 65fd9b0 to 1d9f4bd Compare

July 22, 2025 11:06

sjrd changed the title ~~WiP Optimize RuntimeLong division and remainder.~~ Optimize RuntimeLong division and remainder.

sjrd force-pushed the opt-rt-long-divide branch 2 times, most recently from 6aa0198 to 16de707 Compare

July 22, 2025 12:20

sjrd changed the title ~~Optimize RuntimeLong division and remainder.~~ Optimize the worst case of RuntimeLong division and remainder.

sjrd marked this pull request as ready for review

July 22, 2025 12:25

sjrd requested a review from gzm0

July 22, 2025 12:25

Member Author

sjrd commented Jul 22, 2025

@gzm0 This is now ready for review. The code is straightforward, but the proof is hairy.

sjrd force-pushed the opt-rt-long-divide branch 5 times, most recently from 6423179 to e6fe198 Compare

July 28, 2025 17:04

gzm0 requested changes

View reviewed changes

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

@@ @@ -617,14 +617,14 @@ object RuntimeLong { @@
                    *
                    * We convert the unsigned value num = (lo, hi) to a Double value
                    * approxNum. This is an approximation. It can lose as many as
-                   * 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^12.
+                   * 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^10.

Contributor

gzm0 Aug 30, 2025

I think the correction here is correct, but the argumentation isn't anymore: If we lose 10 bits (not 11 bits), we should lose at most 2^10-1 < 2^10 precision (2^9 + 2^8 + ... + 2^0 = 2^10 - 1).

But it feels like 64 - 53 = 11 should be 64 - 54 = 10: If we interpret the long as unsigned, the sign bit of the double also counts as part of our number.

Member Author

sjrd Sep 1, 2025

The sign bit of the double will always be 0, so it does not contribute to any precision. It's not that we loose 10 bits. We do still lose 11 bits. But because we round, rather than truncate, the result can only be (2^11)/2 = 2^10 away from the original value. I elaborated the reasoning. LMK if that's clearer.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 * in a particular bit width, they correspond to the computer semantics of
+                 * unsigned integer division and remainder.
+                 *
+                 * For all a and b, 0 <= rem(a, b) < b.

Contributor

gzm0 Aug 30, 2025

Suggested change

      
               * For all a and b, 0 <= rem(a, b) < b.
          
               * For all a and b, 0 <= rem(a, b) < |b|.

Otherwise this is obviously wrong for negative b.

Alternatively (seems to fit what comes later more:

Suggested change

      
               * For all a and b, 0 <= rem(a, b) < b.
          
               * For all non-negative a and b, 0 <= rem(a, b) < b.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 * Case 0 < b < 2²¹
+                 * ================
+                 *
+                 * In this case, b = blo, as bhi = 0.

Contributor

gzm0 Aug 30, 2025

Suggested change

      
               * In this case, b = blo, as bhi = 0.
          
               * In this case, b = blo, and bhi = 0.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 * the computation can be performed modulo 2³², using `int` operations.
+                 *
+                 * We still need to compute quotLo. By construction, k < b < 2²¹. Therefore,
+                 * 2³²∙k + alo < 2⁵³. Since both operands of the div of quotLo are < 2⁵³,

Contributor

gzm0 Aug 31, 2025

Suggested change

      
               * 2³²∙k + alo < 2⁵³. Since both operands of the div of quotLo are < 2⁵³,
          
               * 2³²∙k + alo < 2³²∙(k + 1) ≤ 2³²∙b < 2⁵³. Since both operands of the div of quotLo are < 2⁵³,

For ease of understanding?

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 * We will prove that |q̂ - q| <= 1, where q = div(a, b) is the exact integer
+                 * quotient.
+                 *
+                 * Since a >= 0 and b >= 2²¹, we know that â >= 0 and b̂ >= 2²¹,

Contributor

gzm0 Aug 31, 2025

Just to check my understanding: the statement about b̂ is only correct because the boundary we compare to is a power of 2. So we know in floating point representation it is a rounding point (so we'll never round "across" it).

Member Author

sjrd Sep 1, 2025

Yes, that's right. It doesn't need to be a power of 2. Any value that is exactly representable as a double will do. I've made that more formal with a no-round-across-boundary property in the preliminaries.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 *     = ⌊◦(â / b̂)⌋
+                 *
+                 * We write δa = â - a. From elementary properties of ◦() and the range of a,
+                 * we have that â is an integer (and so is δa) and |δa| <= 2¹⁰.

Contributor

gzm0 Aug 31, 2025

Refer to the more detailed argument in toUnsignedStringLarge?

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 *
+                 * We will need the following lemma:
+                 *
+                 *   (Lemma 1) For all 0 <= x, y <= 2⁵² such that |x - y| <= 1/2,

Contributor

gzm0 Aug 31, 2025

Suggested change

      
               *   (Lemma 1) For all 0 <= x, y <= 2⁵² such that |x - y| <= 1/2,
          
               *   (Lemma 1) For all {x, y} ∈ [0, 2⁵²] such that |x - y| <= 1/2,

I interpreted this as two equations, one on x (x <= 0) and one on y (y <= 2⁵²).

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

Comment on lines 1262 to 1264

+                 *   Proof. Observe that, in the given range, all multiples of 1/2 are
+                 *   representable as doubles. Rewrite x = n + f with n an integer and
+                 *   0 <= f < 1. Then x - 1/2 = n + f - 1/2 <= y <= n + f + 1/2 = x + 1/2.

Contributor

gzm0 Aug 31, 2025

It's a bit hard to see which property is used where. How about something like:

Suggested change

      
               *   Proof. Observe that, in the given range, all multiples of 1/2 are
          
               *   representable as doubles. Rewrite x = n + f with n an integer and
          
               *   0 <= f < 1. Then x - 1/2 = n + f - 1/2 <= y <= n + f + 1/2 = x + 1/2.
          
               *   Proof. 
          
               *   From |x - y| <= 1/2, we have x - 1/2 <= y <= x + 1/2.
          
               *   Rewrite x = n + f with n an integer and 0 <= f < 1.
          
               *   Then n + f - 1/2 <= y <= n + f + 1/2.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 *   n - 1/2 <= y < n + 1, so n - 1 <= ⌊y⌋ <= n.
+                 *   Therefore |⌊◦(x)⌋ - ⌊y⌋| <= 1, as desired.
+                 *
+                 *   Otherwise, 1/2 <= f < 1, then n + 1/2 <= ◦(x) <= n + 1 and

Contributor

gzm0 Aug 31, 2025

IIUC this is where we need "Observe that, in the given range, all multiples of 1/2 are representable as doubles."? Consider moving it closer in the argument.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala

                 /** Helper for `unsigned_/` and `unsigned_%`.
                  *
                  *  If `askQuotient` is true, computes the quotient, otherwise computes the
                  *  remainder. Stores the hi word of the result in `hiReturn`, and returns
                  *  the lo word.
                  */
+                @inline // inlined twice; specializes for askQuotient

Contributor

gzm0 Aug 31, 2025

Are you sure the increase code size is worth it for the speed? (IIUC this is why we'd inline here).

Or do we actually get smaller code with inlining?

Member Author

sjrd Sep 1, 2025

It produces slightly larger code with inlining. Less than 1 KB. But surprisingly, the gzipped size is smaller with inlining. Go figure 🤷‍♂️

Regardless, IMO the inlining is worth it. I don't remember having benchmarked that particular difference. However the generated code is much cleaner when specialized for the concrete value of askQuotient:
https://gist.github.com/sjrd/28eb776066031218535c8b3cd2abe7e7
Most of the duplicated code comes from the multiplication.

sjrd force-pushed the opt-rt-long-divide branch from e6fe198 to 8a9f400 Compare

September 1, 2025 12:11

sjrd commented

View reviewed changes

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala

                 /** Helper for `unsigned_/` and `unsigned_%`.
                  *
                  *  If `askQuotient` is true, computes the quotient, otherwise computes the
                  *  remainder. Stores the hi word of the result in `hiReturn`, and returns
                  *  the lo word.
                  */
+                @inline // inlined twice; specializes for askQuotient

Member Author

sjrd Sep 1, 2025

It produces slightly larger code with inlining. Less than 1 KB. But surprisingly, the gzipped size is smaller with inlining. Go figure 🤷‍♂️

Regardless, IMO the inlining is worth it. I don't remember having benchmarked that particular difference. However the generated code is much cleaner when specialized for the concrete value of askQuotient:
https://gist.github.com/sjrd/28eb776066031218535c8b3cd2abe7e7
Most of the duplicated code comes from the multiplication.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

@@ @@ -617,14 +617,14 @@ object RuntimeLong { @@
                    *
                    * We convert the unsigned value num = (lo, hi) to a Double value
                    * approxNum. This is an approximation. It can lose as many as
-                   * 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^12.
+                   * 64 - 53 = 11 low-order bits. Hence |approxNum - num| <= 2^10.

Member Author

sjrd Sep 1, 2025

The sign bit of the double will always be 0, so it does not contribute to any precision. It's not that we loose 10 bits. We do still lose 11 bits. But because we round, rather than truncate, the result can only be (2^11)/2 = 2^10 away from the original value. I elaborated the reasoning. LMK if that's clearer.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 *
+                 * Since a >= 0 and b >= 2²¹, we know that â >= 0 and b̂ >= 2²¹,
+                 * and therefore q̂₀ >= 0. Likewise, since a < 2⁶⁴, we have â <= 2⁶⁴.
+                 * Therefore q̂₀ <= ◦(2⁶⁴ / 2²¹) = 2⁴³.

Member Author

sjrd Sep 1, 2025

I made the "rounding boundary" argument more precise in this section and elsewhere.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 *
+                 * Since 0 <= q̂₀ < 2⁶⁴, we conclude that
+                 *
+                 *   q̂ = ⌊q̂₀⌋

Member Author

sjrd Sep 1, 2025

Now developed in excruciating details 😅

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

+                 * We will prove that |q̂ - q| <= 1, where q = div(a, b) is the exact integer
+                 * quotient.
+                 *
+                 * Since a >= 0 and b >= 2²¹, we know that â >= 0 and b̂ >= 2²¹,

Member Author

sjrd Sep 1, 2025

Yes, that's right. It doesn't need to be a power of 2. Any value that is exactly representable as a double will do. I've made that more formal with a no-round-across-boundary property in the preliminaries.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala

Comment on lines +1133 to +1136

+                 * Rounding never goes "farther than necessary" in any direction. Formally,
+                 * for all reals x, y such that x >= y and y is exactly representable as a
+                 * `double` value, we have ◦(x) >= y. Similarly for x <= y. We refer to this
+                 * as the no-round-across-boundary property.

Member Author

sjrd Sep 1, 2025

This paragraph is new.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala Outdated

Comment on lines 1254 to 1282

+                 * Since 0 <= a < 2⁶⁴ and 2²¹ <= b < 2⁶⁴, and since 0, 2²¹ and 2⁶⁴ are all
+                 * exactly representable as doubles, the no-round-across-boundary property
+                 * tells us that 0 <= â <= 2⁶⁴ and b̂ >= 2²¹.
+                 * Therefore, â / b̂ <= 2⁶⁴ / 2²¹ = 2⁴³. Since 2⁴³ is also exactly
+                 * representable, we have q̂₀ = ◦(â / b̂) <= 2⁴³.
+                 *
+                 * If a = 0, then q̂₀ = 0 and q̂₀ / 2³² = 0, hence ◦(q̂₀ / 2³²) is exact.
+                 * Otherwise, a >= 1, hence â >= 1, and q̂₀ >= 1 / 2⁶⁴, which means q̂₀ is a
+                 * *normal* `double` value. q̂₀ / 2³² >= 1 / 2⁹⁶ is also a normal `double`
+                 * value. Therefore, ◦(q̂₀ / 2³²) cannot underflow, and it is exact because it
+                 * divides by a power of 2.
+                 * Hence in all cases, we have ◦(q̂₀ / 2³²) = q̂₀ / 2³².
+                 *
+                 * We will use Theorem D3 of Hacker's Delight (section 9-1):
+                 * > For x real, d an integer ≠ 0: ⌊⌊x⌋ / d⌋ = ⌊x / d⌋.
+                 *
+                 * We can now develop q̂ as
+                 *
+                 *   q̂ = 2³²∙wrap32(◦(q̂₀ / 2³²))                     + wrap32(q̂₀)
+                 *     = 2³²∙rem(⌊◦(q̂₀ / 2³²)⌋, 2³²)                 + rem(⌊q̂₀⌋, 2³²)              def of wrap32
+                 *     = 2³²∙rem(⌊  q̂₀ / 2³² ⌋, 2³²)                 + rem(⌊q̂₀⌋, 2³²)              because ◦(q̂₀ / 2³²) = q̂₀ / 2³²
+                 *     = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙div(⌊q̂₀ / 2³²⌋, 2³²)) + (⌊q̂₀⌋ - 2³²∙div(⌊q̂₀⌋, 2³²))  def of rem
+                 *     = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙⌊⌊q̂₀ / 2³²⌋ / 2³²⌋  ) + (⌊q̂₀⌋ - 2³²∙⌊⌊q̂₀⌋ / 2³²⌋  )  def of div
+                 *     = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙⌊ q̂₀ / 2³²  / 2³²⌋  ) + (⌊q̂₀⌋ - 2³²∙⌊ q̂₀  / 2³²⌋  )  Theorem D3
+                 *     = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙⌊ q̂₀ / 2⁶⁴       ⌋  ) + (⌊q̂₀⌋ - 2³²∙⌊ q̂₀  / 2³²⌋  )
+                 *     = 2³²∙(⌊q̂₀ / 2³²⌋ - 2³²∙0                   ) + (⌊q̂₀⌋ - 2³²∙⌊ q̂₀  / 2³²⌋  )  because q̂₀ <= 2⁴³ < 2⁶⁴
+                 *     = 2³²∙⌊q̂₀ / 2³²⌋ + ⌊q̂₀⌋ - 2³²∙⌊q̂₀ / 2³²⌋
+                 *     = ⌊q̂₀⌋
+                 *     = ⌊◦(â / b̂)⌋

Member Author

sjrd Sep 1, 2025

This whole section is basically new.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala

Comment on lines +1291 to +1311

+                 *   (Lemma 1) For all reals x and y in [0, 2⁵²) such that |x - y| <= 1/2,
+                 *   it holds that |⌊◦(x)⌋ - ⌊y⌋| <= 1.
+                 *
+                 *   Proof.
+                 *   From |x - y| <= 1/2, we have x - 1/2 <= y <= x + 1/2.
+                 *   Rewrite x = n + f with n an integer and 0 <= f < 1.
+                 *   Then n + f - 1/2 <= y <= n + f + 1/2.
+                 *
+                 *   Observe that, in the range [0, 2⁵²], all multiples of 1/2 are exactly
+                 *   representable as doubles. n, n + 1/2 and n + 1 all belong to that range
+                 *   and are multiples of 1/2, so they are exactly representable.
+                 *
+                 *   If 0 <= f < 1/2, then n <= ◦(x) <= n + 1/2 (no-round-across-boundary)
+                 *   and ⌊◦(x)⌋ = n.
+                 *   n - 1/2 <= y < n + 1, so n - 1 <= ⌊y⌋ <= n.
+                 *   Therefore |⌊◦(x)⌋ - ⌊y⌋| <= 1, as desired.
+                 *
+                 *   Otherwise, 1/2 <= f < 1, then n + 1/2 <= ◦(x) <= n + 1 (no-round-across-boundary)
+                 *   and n <= ⌊◦(x)⌋ <= n + 1.
+                 *   n <= y < n + 3/2, so n <= ⌊y⌋ <= n + 1.
+                 *   Therefore |⌊◦(x)⌋ - ⌊y⌋| <= 1 as well, as desired.

Member Author

sjrd Sep 1, 2025

The lemma proof was significantly rearranged.

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala

+                   * Such huge divisors are practically useless, but they defeat the
+                   * correction code of the algorithm above.
+                   *
+                   * Since b >= 2^62 and a < 2^64, we know that a < 4*b (mathematically).

Member Author

sjrd Sep 1, 2025

Fixed a <= 2^64 into a < 2^64. (If it could be == 2^64, the conclusion would be wrong.)

sjrd requested a review from gzm0

September 1, 2025 12:15

sjrd force-pushed the opt-rt-long-divide branch 2 times, most recently from faf785b to 2faf7ae Compare

September 1, 2025 12:23


          Optimize the worst case of RuntimeLong division and remainder.

e2a3b53

sjrd force-pushed the opt-rt-long-divide branch from 2faf7ae to e2a3b53 Compare

September 7, 2025 13:26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet