[libc][math][c23] Add rsqrtf16() function #137545

amemov · 2025-04-27T19:57:10Z

github-actions · 2025-04-27T19:59:34Z

✅ With the latest revision this PR passed the C/C++ code formatter.

amemov · 2025-04-30T20:30:32Z

Trying to figure out what would be the best option to compute the result.
I found that the current polynomial produces the least errors ( bigger ones yield negligible results )
P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]);
And has ULP Error = 1.0

Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.

Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

llvmbot · 2025-09-13T01:08:34Z

@llvm/pr-subscribers-libc

Author: Anton Shepelev (amemov)

Changes

Addresses #132818
Part of #95250

Patch is 22.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137545.diff

22 Files Affected:

(modified) libc/config/linux/x86_64/entrypoints.txt (+1)
(modified) libc/docs/headers/math/index.rst (+3-1)
(modified) libc/include/math.yaml (+7)
(modified) libc/shared/math.h (+2)
(added) libc/shared/math/rsqrtf16.h (+29)
(modified) libc/src/__support/math/CMakeLists.txt (+16)
(added) libc/src/__support/math/rsqrtf16.h (+139)
(modified) libc/src/math/CMakeLists.txt (+2)
(modified) libc/src/math/generic/CMakeLists.txt (+12-1)
(added) libc/src/math/generic/rsqrtf16.cpp (+15)
(added) libc/src/math/rsqrtf16.h (+21)
(modified) libc/test/shared/CMakeLists.txt (+2)
(modified) libc/test/shared/shared_math_test.cpp (+2)
(modified) libc/test/src/math/CMakeLists.txt (+11)
(added) libc/test/src/math/rsqrtf16_test.cpp (+42)
(modified) libc/test/src/math/smoke/CMakeLists.txt (+11)
(added) libc/test/src/math/smoke/rsqrtf16_test.cpp (+37)
(modified) libc/utils/MPFRWrapper/MPCommon.cpp (+6)
(modified) libc/utils/MPFRWrapper/MPCommon.h (+1)
(modified) libc/utils/MPFRWrapper/MPFRUtils.cpp (+2)
(modified) libc/utils/MPFRWrapper/MPFRUtils.h (+1)
(modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+25)

diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 1fef16f190af6..0bb8a683c5b01 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -784,6 +784,7 @@ if(LIBC_TYPES_HAS_FLOAT16)
     libc.src.math.rintf16
     libc.src.math.roundevenf16
     libc.src.math.roundf16
+    libc.src.math.rsqrtf16
     libc.src.math.scalblnf16
     libc.src.math.scalbnf16
     libc.src.math.setpayloadf16
diff --git a/libc/docs/headers/math/index.rst b/libc/docs/headers/math/index.rst
index 6c0e2190808df..7d5b341ba674a 100644
--- a/libc/docs/headers/math/index.rst
+++ b/libc/docs/headers/math/index.rst
@@ -255,6 +255,7 @@ Basic Operations
 Higher Math Functions
 =====================
 
+
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+----------++------------+------------------------+----------------------------+
 | <Func>    | <Func_f> (float) | <Func> (double) | <Func_l> (long double) | <Func_f16> (float16) | <Func_f128> (float128) | <Func_bf16> (bfloat16) | C23 Definition Section | C23 Error Handling Section |
 +===========+==================+=================+========================+======================+========================+========================+========================+============================+
@@ -342,7 +343,7 @@ Higher Math Functions
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
 | rootn     |                  |                 |                        |                      |                        |                        | 7.12.7.8               | F.10.4.8                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
-| rsqrt     |                  |                 |                        |                      |                        |                        | 7.12.7.9               | F.10.4.9                   |
+| rsqrt     |                  |                 |                        | |check|              |                        |                        | 7.12.7.9               | F.10.4.9                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
 | sin       | |check|          | |check|         |                        | |check|              |                        |                        | 7.12.4.6               | F.10.1.6                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
@@ -363,6 +364,7 @@ Higher Math Functions
 | tgamma    |                  |                 |                        |                      |                        |                        | 7.12.8.4               | F.10.5.4                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
 
+
 Legends:
 
 * |check| : correctly rounded for all 4 rounding modes.
diff --git a/libc/include/math.yaml b/libc/include/math.yaml
index 17f26fcfcb308..6c800a0e2aa28 100644
--- a/libc/include/math.yaml
+++ b/libc/include/math.yaml
@@ -2349,6 +2349,13 @@ functions:
     return_type: long double
     arguments:
       - type: long double
+  - name: rsqrtf16
+    standards:
+      - stdc
+    return_type: _Float16
+    arguments:
+      - type: _Float16
+    guard: LIBC_TYPES_HAS_FLOAT16
   - name: scalbln
     standards:
       - stdc
diff --git a/libc/shared/math.h b/libc/shared/math.h
index 69d785b3e0291..4f20095912bf1 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -53,4 +53,6 @@
 #include "math/ldexpf128.h"
 #include "math/ldexpf16.h"
 
+#include "math/rsqrtf16.h"
+
 #endif // LLVM_LIBC_SHARED_MATH_H
diff --git a/libc/shared/math/rsqrtf16.h b/libc/shared/math/rsqrtf16.h
new file mode 100644
index 0000000000000..54c7499214636
--- /dev/null
+++ b/libc/shared/math/rsqrtf16.h
@@ -0,0 +1,29 @@
+//===-- Shared rsqrtf16 function -------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_RSQRTF16_H
+#define LLVM_LIBC_SHARED_MATH_RSQRTF16_H
+
+#include "include/llvm-libc-macros/float16-macros.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "shared/libc_common.h"
+#include "src/__support/math/rsqrtf16.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::rsqrtf16;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_TYPES_HAS_FLOAT16
+
+#endif // LLVM_LIBC_SHARED_MATH_RSQRTF16_H
diff --git a/libc/src/__support/math/CMakeLists.txt b/libc/src/__support/math/CMakeLists.txt
index 39dc0e57f4472..ed5f314b0a9b5 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -109,6 +109,22 @@ add_header_library(
     libc.src.__support.macros.properties.types
 )
 
+
+add_header_library(
+  rsqrtf16
+  HDRS
+    rsqrtf16.h
+  DEPENDS
+    libc.src.__support.FPUtil.cast
+    libc.src.__support.FPUtil.fenv_impl
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.manipulation_functions
+    libc.src.__support.macros.optimization
+    libc.src.__support.macros.properties.types
+)
+
 add_header_library(
   asin_utils
   HDRS
diff --git a/libc/src/__support/math/rsqrtf16.h b/libc/src/__support/math/rsqrtf16.h
new file mode 100644
index 0000000000000..b410f258450d8
--- /dev/null
+++ b/libc/src/__support/math/rsqrtf16.h
@@ -0,0 +1,139 @@
+//===-- Implementation header for rsqrtf16 ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC___SUPPORT_MATH_RSQRTF16_H
+#define LLVM_LIBC_SRC___SUPPORT_MATH_RSQRTF16_H
+
+#include "include/llvm-libc-macros/float16-macros.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/ManipulationFunctions.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/cast.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/macros/optimization.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace math {
+
+static constexpr float16 rsqrtf16(float16 x) {
+  using FPBits = fputil::FPBits<float16>;
+  FPBits xbits(x);
+
+  uint16_t x_u = xbits.uintval();
+  uint16_t x_abs = x_u & 0x7fff;
+  uint16_t x_sign = x_u >> 15;
+
+  // x is NaN
+  if (LIBC_UNLIKELY(xbits.is_nan())) {
+    if (xbits.is_signaling_nan()) {
+      fputil::raise_except_if_required(FE_INVALID);
+      return FPBits::quiet_nan().get_val();
+    }
+    return x;
+  }
+
+  // |x| = 0
+  if (LIBC_UNLIKELY(x_abs == 0x0)) {
+    fputil::raise_except_if_required(FE_DIVBYZERO);
+    fputil::set_errno_if_required(ERANGE);
+    return FPBits::inf(Sign::POS).get_val();
+  }
+
+  // -inf <= x < 0
+  if (LIBC_UNLIKELY(x_sign == 1)) {
+    fputil::raise_except_if_required(FE_INVALID);
+    fputil::set_errno_if_required(EDOM);
+    return FPBits::quiet_nan().get_val();
+  }
+
+  // x = +inf => rsqrt(x) = 0
+  if (LIBC_UNLIKELY(xbits.is_inf())) {
+    return fputil::cast<float16>(0.0f);
+  }
+
+  // x is valid, estimate the result
+  // Range reduction:
+  // x can be expressed as m*2^e, where e - int exponent and m - mantissa
+  // rsqrtf16(x) = rsqrtf16(m*2^e)
+  // rsqrtf16(m*2^e) = 1/sqrt(m) * 1/sqrt(2^e) = 1/sqrt(m) * 1/2^(e/2)
+  // 1/sqrt(m) * 1/2^(e/2) = 1/sqrt(m) * 2^(-e/2)
+
+  // Compute in float throughout to minimize cost while preserving accuracy.
+  float xf = x;
+  int exponent = 0;
+  float mantissa = fputil::frexp(xf, exponent);
+
+  float result = 0.0f;
+  int exp_floored = -(exponent >> 1);
+
+  if (mantissa == 0.5f) {
+    // When mantissa is 0.5f, x was a power of 2 (or subnormal that normalizes
+    // this way). 1/sqrt(0.5f) = sqrt(2.0f).
+    // If exponent is odd (exponent = 2k + 1):
+    //   rsqrt(x) = (1/sqrt(0.5)) * 2^(-(2k+1)/2) = sqrt(2) * 2^(-k-0.5)
+    //            = sqrt(2) * 2^(-k) * (1/sqrt(2)) = 2^(-k)
+    //   exp_floored = -((2k+1)>>1) = -(k) = -k
+    //   So result = ldexp(1.0f, exp_floored)
+    // If exponent is even (exponent = 2k):
+    //   rsqrt(x) = (1/sqrt(0.5)) * 2^(-2k/2) = sqrt(2) * 2^(-k)
+    //   exp_floored = -((2k)>>1) = -(k) = -k
+    //   So result = ldexp(sqrt(2.0f), exp_floored)
+    if (exponent & 1) {
+      result = fputil::ldexp(1.0f, exp_floored);
+    } else {
+      constexpr float SQRT_2_F = 0x1.6a09e6p0f; // sqrt(2.0f)
+      result = fputil::ldexp(SQRT_2_F, exp_floored);
+    }
+  } else {
+    // Degree-5 polynomial (float coefficients) generated with Sollya:
+    // P = fpminimax(1/sqrt(x) + 2^-28, 5, [|single...|], [0.5,1])
+    float y =
+        fputil::polyeval(mantissa, 0x1.9c81fap1f, -0x1.e2c63ap2f, 0x1.91e9b8p3f,
+                         -0x1.899abep3f, 0x1.9eddeap2f, -0x1.6bdb48p0f);
+
+    // Newton-Raphson iteration in float (use multiply_add to leverage FMA when
+    // available):
+    float y2 = y * y;
+    float factor = fputil::multiply_add(-0.5f * mantissa, y2, 1.5f);
+    y = y * factor;
+
+    result = fputil::ldexp(y, exp_floored);
+    if (exponent & 1) {
+      constexpr float ONE_OVER_SQRT2 = 0x1.6a09e6p-1f; // 1/sqrt(2)
+      result *= ONE_OVER_SQRT2;
+    }
+
+    // Targeted post-correction: for the specific half-precision mantissa
+    // pattern M == 0x011F we observe a consistent -1 ULP bias across exponents.
+    // Apply a tiny upward nudge to cross the rounding boundary in all modes.
+    const uint16_t half_mantissa = static_cast<uint16_t>(x_abs & 0x3ff);
+    if (half_mantissa == 0x011F) {
+      // Nudge up to fix consistent -1 ULP at that mantissa boundary
+      result = fputil::multiply_add(result, 0x1.0p-21f,
+                                    result); // result *= (1 + 2^-21)
+    } else if (half_mantissa == 0x0313) {
+      // Nudge down to fix +1 ULP under upward rounding at this mantissa
+      // boundary
+      result = fputil::multiply_add(result, -0x1.0p-21f,
+                                    result); // result *= (1 - 2^-21)
+    }
+  }
+
+  return fputil::cast<float16>(result);
+}
+
+} // namespace math
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_TYPES_HAS_FLOAT16
+
+#endif // LLVM_LIBC_SRC___SUPPORT_MATH_RSQRTF16_H
diff --git a/libc/src/math/CMakeLists.txt b/libc/src/math/CMakeLists.txt
index e418a8b0e24b9..a6f400c873b7e 100644
--- a/libc/src/math/CMakeLists.txt
+++ b/libc/src/math/CMakeLists.txt
@@ -516,6 +516,8 @@ add_math_entrypoint_object(roundevenf16)
 add_math_entrypoint_object(roundevenf128)
 add_math_entrypoint_object(roundevenbf16)
 
+add_math_entrypoint_object(rsqrtf16)
+
 add_math_entrypoint_object(scalbln)
 add_math_entrypoint_object(scalblnf)
 add_math_entrypoint_object(scalblnl)
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index 263c5dfd0832b..ca7baeccae01a 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -973,7 +973,7 @@ add_entrypoint_object(
 )
 
 add_entrypoint_object(
-    roundevenbf16
+  roundevenbf16
   SRCS
     roundevenbf16.cpp
   HDRS
@@ -988,6 +988,17 @@ add_entrypoint_object(
     ROUND_OPT
 )
 
+add_entrypoint_object(
+  rsqrtf16
+  SRCS
+    rsqrtf16.cpp
+  HDRS
+    ../rsqrtf16.h
+  DEPENDS
+    libc.src.__support.math.rsqrtf16
+    libc.src.errno.errno
+)
+
 add_entrypoint_object(
   lround
   SRCS
diff --git a/libc/src/math/generic/rsqrtf16.cpp b/libc/src/math/generic/rsqrtf16.cpp
new file mode 100644
index 0000000000000..fb166b131d673
--- /dev/null
+++ b/libc/src/math/generic/rsqrtf16.cpp
@@ -0,0 +1,15 @@
+//===-- Half-precision rsqrt function -------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/rsqrtf16.h"
+#include "src/__support/math/rsqrtf16.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+LLVM_LIBC_FUNCTION(float16, rsqrtf16, (float16 x)) { return math::rsqrtf16(x); }
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/math/rsqrtf16.h b/libc/src/math/rsqrtf16.h
new file mode 100644
index 0000000000000..c88ab5256ce88
--- /dev/null
+++ b/libc/src/math/rsqrtf16.h
@@ -0,0 +1,21 @@
+//===-- Implementation header for rsqrtf16 ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_RSQRTF16_H
+#define LLVM_LIBC_SRC_MATH_RSQRTF16_H
+
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/properties/types.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+float16 rsqrtf16(float16 x);
+
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_MATH_RSQRTF16_H
diff --git a/libc/test/shared/CMakeLists.txt b/libc/test/shared/CMakeLists.txt
index 48241d3f55287..495d6f0a81a4c 100644
--- a/libc/test/shared/CMakeLists.txt
+++ b/libc/test/shared/CMakeLists.txt
@@ -48,4 +48,6 @@ add_fp_unittest(
     libc.src.__support.math.ldexpf
     libc.src.__support.math.ldexpf128
     libc.src.__support.math.ldexpf16
+    libc.src.__support.math.rsqrtf16
+
 )
diff --git a/libc/test/shared/shared_math_test.cpp b/libc/test/shared/shared_math_test.cpp
index 2e5a2d51146d4..aa459f88c29f5 100644
--- a/libc/test/shared/shared_math_test.cpp
+++ b/libc/test/shared/shared_math_test.cpp
@@ -17,6 +17,8 @@ TEST(LlvmLibcSharedMathTest, AllFloat16) {
 
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::acoshf16(1.0f16));
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::acospif16(1.0f16));
+  EXPECT_FP_EQ(0x1p+0f16, LIBC_NAMESPACE::shared::rsqrtf16(1.0f16));
+
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::asinf16(0.0f16));
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::asinhf16(0.0f16));
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::atanf16(0.0f16));
diff --git a/libc/test/src/math/CMakeLists.txt b/libc/test/src/math/CMakeLists.txt
index 378eadcf9e70b..9d644703a61ae 100644
--- a/libc/test/src/math/CMakeLists.txt
+++ b/libc/test/src/math/CMakeLists.txt
@@ -1678,6 +1678,17 @@ add_fp_unittest(
     libc.src.math.sqrtl
 )
 
+add_fp_unittest(
+  rsqrtf16_test
+  NEED_MPFR
+  SUITE
+    libc-math-unittests
+  SRCS
+    rsqrtf16_test.cpp
+  DEPENDS
+    libc.src.math.rsqrtf16
+)
+
 add_fp_unittest(
   sqrtf16_test
   NEED_MPFR
diff --git a/libc/test/src/math/rsqrtf16_test.cpp b/libc/test/src/math/rsqrtf16_test.cpp
new file mode 100644
index 0000000000000..d2f3fe8f49b92
--- /dev/null
+++ b/libc/test/src/math/rsqrtf16_test.cpp
@@ -0,0 +1,42 @@
+//===-- Exhaustive test for rsqrtf16 --------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/rsqrtf16.h"
+#include "test/UnitTest/FPMatcher.h"
+#include "test/UnitTest/Test.h"
+#include "utils/MPFRWrapper/MPFRUtils.h"
+
+using LlvmLibcRsqrtf16Test = LIBC_NAMESPACE::testing::FPTest<float16>;
+
+namespace mpfr = LIBC_NAMESPACE::testing::mpfr;
+
+// Range: [0, Inf]
+static constexpr uint16_t POS_START = 0x0000U;
+static constexpr uint16_t POS_STOP = 0x7c00U;
+
+// Range: [-Inf, 0]
+static constexpr uint16_t NEG_START = 0x8000U;
+static constexpr uint16_t NEG_STOP = 0xfc00U;
+
+TEST_F(LlvmLibcRsqrtf16Test, PositiveRange) {
+  for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
+    float16 x = FPBits(v).get_val();
+
+    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x,
+                                   LIBC_NAMESPACE::rsqrtf16(x), 0.5);
+  }
+}
+
+TEST_F(LlvmLibcRsqrtf16Test, NegativeRange) {
+  for (uint16_t v = NEG_START; v <= NEG_STOP; ++v) {
+    float16 x = FPBits(v).get_val();
+
+    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x,
+                                   LIBC_NAMESPACE::rsqrtf16(x), 0.5);
+  }
+}
diff --git a/libc/test/src/math/smoke/CMakeLists.txt b/libc/test/src/math/smoke/CMakeLists.txt
index b8d5ecf4d77e5..93243e0ca9e5a 100644
--- a/libc/test/src/math/smoke/CMakeLists.txt
+++ b/libc/test/src/math/smoke/CMakeLists.txt
@@ -3502,6 +3502,17 @@ add_fp_unittest(
     libc.src.math.sqrtl
 )
 
+add_fp_unittest(
+  rsqrtf16_test
+  SUITE
+    libc-math-smoke-tests
+  SRCS
+    rsqrtf16_test.cpp
+  DEPENDS
+    libc.src.errno.errno
+    libc.src.math.rsqrtf16
+)
+
 add_fp_unittest(
   sqrtf16_test
   SUITE
diff --git a/libc/test/src/math/smoke/rsqrtf16_test.cpp b/libc/test/src/math/smoke/rsqrtf16_test.cpp
new file mode 100644
index 0000000000000..a229ca6cdaaaf
--- /dev/null
+++ b/libc/test/src/math/smoke/rsqrtf16_test.cpp
@@ -0,0 +1,37 @@
+//===-- Unittests for rsqrtf16 --------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/__support/libc_errno.h"
+#include "src/math/rsqrtf16.h"
+#include "test/UnitTest/FPMatcher.h"
+#include "test/UnitTest/Test.h"
+
+using LlvmLibcRsqrtf16Test = LIBC_NAMESPACE::testing::FPTest<float16>;
+TEST_F(LlvmLibcRsqrtf16Test, SpecialNumbers) {
+  LIBC_NAMESPACE::libc_errno = 0;
+  EXPECT_FP_EQ(aNaN, LIBC_NAMESPACE::rsqrtf16(aNaN));
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ_WITH_EXCEPTION(aNaN, LIBC_NAMESPACE::rsqrtf16(sNaN), FE_INVALID);
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ(inf, LIBC_NAMESPACE::rsqrtf16(0.0f));
+  EXPECT_MATH_ERRNO(ERANGE);
+
+  EXPECT_FP_EQ(1.0f, LIBC_NAMESPACE::rsqrtf16(1.0f));
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ(0.0f, LIBC_NAMESPACE::rsqrtf16(inf));
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ(aNaN, LIBC_NAMESPACE::rsqrtf16(neg_inf));
+  EXPECT_MATH_ERRNO(EDOM);
+
+  EXPECT_FP_EQ(aNaN, LIBC_NAMESPACE::rsqrtf16(-2.0f));
+  EXPECT_MATH_ERRNO(EDOM);
+}
diff --git a/libc/utils/MPFRWrapper/MPCommon.cpp b/libc/utils/MPFRWrapper/MPCommon.cpp
index c255220774110..6b78bee6e7cae 100644
--- a/libc/utils/MPFRWrapper/MPCommon.cpp
+++ b/libc/utils/MPFRWrapper/MPCommon.cpp
@@ -393,6 +393,12 @@ MPFRNumber MPFRNumber::rint(mpfr_rnd_t rnd) const {
   return result;
 }
 
+MPFRNumber MPFRNumber::rsqrt() const {
+  MPFRNumber result(*this);
+  mpfr_rec_sqrt(result.value, value, mpfr_rounding);
+  return result;
+}
+
 MPFRNumber MPFRNumber::mod_2pi() const {
   MPFRNumber result(0.0, 1280);
   MPFRNumber _2pi(0.0, 1280);
diff --git a/libc/utils/MPFRWrapper/MPCommon.h b/libc/utils/MPFRWrapper/MPCommon.h
index 25bdc9bc00250..9f4107a7961d2 100644
--- a/libc/utils/MPFRWrapper/MPCommon.h
+++ b/libc/utils/MPFRWrapper/MPCommon.h
@@ -222,6 +222,7 @@ class MPFRNumber {
   bool round_to_long(long &result) const;
   bool round_to_long(mpfr_rnd_t rnd, long &result) const;
   MPFRNumber rint(mpfr_rnd_t rnd) const;
+  MPFRNu...
[truncated]

amemov · 2025-09-13T01:09:01Z

@overmighty @lntue

-The accuracy improved drastically, but it still fails

- Refactored the implementation to match the proposal for constexpr - Added rsqrtf16 in Bazel build

libc/src/__support/math/rsqrtf16.h

lntue · 2025-09-13T01:38:09Z

Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); And has ULP Error = 1.0

Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

Can you compare the performance of this with

  fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));

amemov · 2025-09-13T16:10:07Z

Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); And has ULP Error = 1.0
Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

Can you compare the performance of this with
  fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));

I wrote this test to check the performance of the implementation and ran the tests for rsqrtf16 a few times:

TEST_F(LlvmLibcRsqrtf16Test, PositiveRange_OneOverSqrtFputil) {
  for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
    float16 x = FPBits(v).get_val();

    float16 y = LIBC_NAMESPACE::fputil::cast<float16, float>(
        1.0f / LIBC_NAMESPACE::fputil::sqrt<float, float>(
                   LIBC_NAMESPACE::fputil::cast<float, float16>(x)));

    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x, y, 1.0);
  }
}

Turns out that my implementation is ~3x slower than just directly calling 1.0f / fputil::sqrt
Not sure why is that - because of too many branches or I wrote over-complicated approximation. I understand it probably won't be as fast as directly calling CPU built-in instruction, but still. The one you see is the most minimal I was able to derive so far - I started with 7-degree polynomial and 2 iterations of Newton's method and was able to reduce it to 5-degree and 1 iteration. What do you think?

lntue · 2025-09-13T16:40:03Z

Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); And has ULP Error = 1.0
Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

Can you compare the performance of this with
  fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));
I wrote this test to check the performance of the implementation and ran the tests for rsqrtf16 a few times:
TEST_F(LlvmLibcRsqrtf16Test, PositiveRange_OneOverSqrtFputil) {
  for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
    float16 x = FPBits(v).get_val();

    float16 y = LIBC_NAMESPACE::fputil::cast<float16, float>(
        1.0f / LIBC_NAMESPACE::fputil::sqrt<float, float>(
                   LIBC_NAMESPACE::fputil::cast<float, float16>(x)));

    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x, y, 1.0);
  }
}
Turns out that my implementation is ~3x slower than just directly calling 1.0f / fputil::sqrt :/ Not sure why is that - because of too many branches or I wrote over-complicated approximation. The one you see is the most minimal I was able to derive so far - I started with 7-degree polynomial and 2 iterations of Newton's method and was able to reduce it to 5-degree and 1 iteration. What do you think?

It is actually expected, because single/double precision division and square root in modern hardware are quite efficient.
You can see for example Zen3 in https://www.agner.org/optimize/instruction_tables.pdf
SQRTSS and DIVSS latencies are 14 and 10.5 clocks respectively, while ADDSS/MULPS and VFMA are 3 and 4 clocks.

So unless you can reduce to maybe 3, 4 multiply-adds, the extra logic like branching, exponent reductions, ... around the computations will make it slower than the straightforward sqrt + div in single precision.

For rsqrtf16, the newton-raphson method will be better than sqrt + div for targets without single precision hardware, such as some embedded system. But in that case, you will need to implement polynomial approximation + newton raphson in integer / fixed point arithmetic to gain the efficiency.

amemov force-pushed the rsqrtf16-for-c23 branch 2 times, most recently from aaa897a to 1fdc319 Compare September 13, 2025 01:06

amemov marked this pull request as ready for review September 13, 2025 01:08

amemov requested review from rupprecht, keith and aaronmondal as code owners September 13, 2025 01:08

llvmbot added libc bazel "Peripheral" support tier build system: utils/bazel labels Sep 13, 2025

amemov added 9 commits September 12, 2025 18:22

- rsqrtf16 refactored

f99067f

Clang-formated the files

19ccc51

Replaced the computation for valid X with polynomial approximation

6b0f603

Added range reduction to the approximation

71162aa

Added Newton-Raphson iterations

2503252

-The accuracy improved drastically, but it still fails

Added separate handling for mantissa == 0.5f. Resulted in fewer errors

877f647

- Fixed ULP errors

a97ccde

- Refactored the implementation to match the proposal for constexpr - Added rsqrtf16 in Bazel build

clang-formatted the files

7ab2ca9

Formatted BUILD.Bazel w/ buildifier

07c8ad7

amemov force-pushed the rsqrtf16-for-c23 branch from cd0b0d4 to 07c8ad7 Compare September 13, 2025 01:24

lntue reviewed Sep 13, 2025

View reviewed changes

libc/src/__support/math/rsqrtf16.h Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libc][math][c23] Add rsqrtf16() function #137545

[libc][math][c23] Add rsqrtf16() function #137545

amemov commented Apr 27, 2025

Uh oh!

github-actions bot commented Apr 27, 2025 •

edited

Loading

Uh oh!

amemov commented Apr 30, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 13, 2025

Uh oh!

amemov commented Sep 13, 2025

Uh oh!

Uh oh!

lntue commented Sep 13, 2025

Uh oh!

amemov commented Sep 13, 2025 •

edited

Loading

Uh oh!

lntue commented Sep 13, 2025

Uh oh!

Uh oh!

[libc][math][c23] Add rsqrtf16() function #137545

Are you sure you want to change the base?

[libc][math][c23] Add rsqrtf16() function #137545

Conversation

amemov commented Apr 27, 2025

Uh oh!

github-actions bot commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amemov commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 13, 2025

Uh oh!

amemov commented Sep 13, 2025

Uh oh!

Uh oh!

lntue commented Sep 13, 2025

Uh oh!

amemov commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lntue commented Sep 13, 2025

Uh oh!

Uh oh!

github-actions bot commented Apr 27, 2025 •

edited

Loading

amemov commented Apr 30, 2025 •

edited

Loading

amemov commented Sep 13, 2025 •

edited

Loading