Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

petermcneeleychromium
Copy link
Contributor

@petermcneeleychromium petermcneeleychromium commented May 21, 2025

After a bit of investigation it has been determined that tanh has a max Absolute error of 1e-5 for some devices (nvidia)

We discussed polyfilling this function (sinh/cosh) but this inaccuracy is likely intentional as tanh is commonly used in sigmoid functions for ML.

crbug.com/390221422

Copy link
Contributor

github-actions bot commented May 21, 2025

Previews, as seen when this build job started (b4f9a70):
WebGPU webgpu.idl | Explainer | Correspondence Reference
WGSL grammar.js | wgsl.lalr.txt

@petermcneeleychromium
Copy link
Contributor Author

Assuming this pr gets approved this pull is waiting in the wings
gpuweb/cts#4392

Copy link
Contributor

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please attach images of the errors you found for the NVIDIA and Intel devices.

This spec change will need approval by the WG, but I think the CTS change should land sooner.

@dneto0 dneto0 added this to the Milestone 1 milestone May 22, 2025
@petermcneeleychromium
Copy link
Contributor Author

I evaluated the precision of existing webgpu tanh on nvidia (hlsl) and intel (vulkan)
Nvidia simply has low precision

Here is the data shared via sheets (chromium account)
https://docs.google.com/spreadsheets/d/1i5n8sFlkHi0QrFwa00oXVcY8iBo6VftXzxOsziWzrsg/edit?gid=176471445#gid=176471445

@petermcneeleychromium
Copy link
Contributor Author

Waiting on
#5206

@petermcneeleychromium petermcneeleychromium merged commit 9cd0b02 into gpuweb:main Jul 30, 2025
4 checks passed
@jimblandy
Copy link
Contributor

I'm fine with this change to the spec. I just wanted to note what I assume is the history here, which is pretty funny:

  1. Long ago, tanh somehow made it into the canon of mathy functions everyone has to implement, even though uses of hyperbolic trig functions are few and far between compared to circular trig functions. tanh is implemented with similar accuracy to the other trig functions.

  2. Someone notices that tanh has a nice 'S' shape, and it's easily available, so it becomes popular as an ML activation function.

  3. Activation functions need to be fast, but nobody really cares exactly what they compute as long as they have that nice 'S' shape, so GPU vendors change their tanh implementations to be faster and less accurate to win ML benchmarks.

  4. Denouement: Anyone who actually wants to compute a hyperbolic tangent needs to evaluate 1 - 2/(exp(2u) + 1) themselves.

llvm-beanz added a commit to llvm/offload-test-suite that referenced this pull request Aug 12, 2025
Unlike the standard trig functions which may be implemented as natural
calculations or as hyperbolic aproximations, the hyperbolic functions
are guaranteed to be hyperbolc aproximations and thus have consistent
precision characteristics.

These can all be guaranteed within 2 ULP, which gives a better match and
will resolve the test failures on AMD GPUs.

The `cosh.16` test is not updated in this PR because it is already using
ULP rules and 2 ULP range specification.

I've captured a spec issue to follow up on the very wide range
requirements for `tanh` on NVIDIA GPUs:
microsoft/hlsl-specs#601

This issue has also been observed by WGSL and is reflected in their
spec:
gpuweb/gpuweb#5199

Fixes #326
@Kangz
Copy link
Contributor

Kangz commented Aug 19, 2025

GPU Web WG 2025-07-30 Atlantic-time
  • PM: this is NVIDIA. Either accept wider range, or polyfill. Polyfill is at least 2x slower. David found that NVIDIA was choosing lower precision.
  • DN: this is only on recent GPUs with tensor cores.
  • PM: sigmoid function's used in ML, so assume NVIDIA's optimizing its speed.
  • DN: have similar things for acos / asin on various Intel GPUs for example. Suggest we relax it.
  • JB: users can write (?? a specific approximation?) if they want.
  • PM: agree. 2x slower in my benchmarking though.
  • CW: consensus via voiced agreement to lower the precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants