Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kvark
Copy link
Contributor

@kvark kvark commented Sep 1, 2021

Closes #2077
Fixes #2076

  • On SPIR-V, this would decorate affected expressions with NoContraction.
  • On Metal, this would add "-fno-fast-math" (or a subset of it) to the affected MTLLibrary.
  • On DX12 this would add precise to the variable declarations used by the affected functions.

Note: SignedZeroInfNanPreserve and other features of VK_KHR_shader_float_controls are intentionally not included.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2021

Previews, as seen when this build job started (c2e4178):
WebGPU | IDL
WGSL
Explainer

Copy link
Contributor

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this stab. It's getting there

wgsl/index.bs Outdated
<tr><td><dfn noexport dfn-for="attribute">`precise_math`</dfn>
<td>*None*

Indicates that the arithmetic computations in the function need to be performed with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have trouble with the word "precision" here, because that means "with more bits represented".
(Never mind the "precise" part of the attribute name, inherited from GLSL. It's good to reuse the GLSL word.)

Also, this should be constrained to floating point, I think.

How about:

Indicates that the floating point arithmetic computations in the function should be performed

  • without [=reassociation/reassociating=] subexpressions
  • while preserving infinities, NaNs, and signed zeroes

Apply this attribute when the correctness of the function is numerically sensitive, and it is acceptable to incur potential performance loss when forbidding such optimizations.

blah blah blah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
I took the liberty of modifying this a bit more. Let me know if it needs more fixing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt that it was important to refer to the floating point evaluation section from here

@kvark kvark requested a review from dneto0 September 1, 2021 20:21
@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2021

Previews, as seen when this build job started (9c8c768):
WebGPU | IDL
WGSL
Explainer

dneto0
dneto0 previously approved these changes Sep 1, 2021
Copy link
Contributor

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me now.

The key word is "should", instead of "must"

The group should review this.

@dneto0 dneto0 added the wgsl WebGPU Shading Language Issues label Sep 1, 2021
@dneto0 dneto0 added this to the V1.0 milestone Sep 1, 2021
@litherum
Copy link
Contributor

litherum commented Sep 7, 2021

Metal exposes fastmath on the entire module: https://developer.apple.com/documentation/metal/mtlcompileoptions?language=objc. So this is a good idea, but it should be elevated to module-level (either by something at the global scope in the language, or as additional data to createShaderModule()).

@litherum
Copy link
Contributor

litherum commented Sep 7, 2021

The SPIR-V registry says SignedZeroInfNanPreserve is missing before version 1.4. The earliest version of Vulkan to require SPIR-V 1.4 is Vulkan 1.2, which I thought was unavailable on most Android devices. Can we really require it?

@kvark
Copy link
Contributor Author

kvark commented Sep 7, 2021

I don't think we require this SignedZeroInfNanPreserve. The precise_math is basically a "best effort" attribute. If SPIR-V doesn't support SignedZeroInfNanPreserve, then we don't use it.

So this is a good idea, but it should be elevated to module-level

There is definitely value in having it exposed in a more granular level than the module scope:

  • SPIR-V implementations can use it
  • Metal implementations that have generate MTLLibrary at pipeline creation time (wgpu and Dawn, at least) can generate some entry points precise and other with fast math, if requested by the user.

@kdashg
Copy link
Contributor

kdashg commented Sep 8, 2021

WGSL meeting minutes 2021-09-07
  • DN: Further discussion today: Concerns that it’s not testable. Also concerned that it’s not stable, no way to make sure that (because untestable) it will keep working. One possibility is to make it an extension with actual strict requirements, but without signing us up for these sometimes-impossible strict requirements in core.
  • MM: In version of spir-v that wgsl is targeting, there’s no way to guarantee/require ieee floats?
  • DN: OpenCL can, but not spir-v in general.
  • DM: Spir-v doesn’t support some things, but not everything we need?
  • DN: NoContraction is visible, feature in vulkan-spirv. Not sure how strict the vulkan conformance tests are good enough to guarantee what we need. I would need more time to test whether it’s feasible on vulkan.
  • MM: When this extension is enabled, we can test that the math would be right. Problem is when we don’t have the extension, where we can’t really guarantee anything.
  • DN: When this was “SHOULD”, it’s easy to try for. Making an extension would require more work to see if we can support “MUST”.
  • GR: Why was it said that this would have no effect on DX12?
  • DN: I think the original poster did minor testing and didn’s have issues, so didn’t look into this?
  • GR: We do have precise which should work, but yeah, not sure how to test non-precise.
  • DM: precise is applied to variables? (yes)
  • MM: Another question, is can global variables be precise? This leads me to a recommendation that it be per-module, and also that this all Metal can handle.
  • DN: Could (galaxy-brain idea) compile multiple times with and without precise as needed, since we control when entrypoints are used?
  • MM: Compiling multiple times would be bad, because compiling is already slow.
  • DM: I think we sort of already handle this (function granularity) in Metal backends.
  • MM: Why per-function, when no native API does that. Metal is per-module, others are per-variable.
  • DN: There’s some concerns about how deep (into function calls?) to propagate precise when on variables, at least in a way that’s not super verbose.
  • (timebox hit, tabled to next meeting)

@dneto0
Copy link
Contributor

dneto0 commented Sep 8, 2021

To fill in some a detail:

  • First, @kvark is right about the implications of the best-effort framing
  • That said, the SignedZeroInfNanPreserve was first made available in a Vulkan extension VK_KHR_shader_float_controls / SPIR-V extension SPV_KHR_float_controls which has support going back almost three years..
  • You can use the SPIR-V extension in pre 1.4 SPIR-V modules if you declare the extension (OpExtension "SPV_KHR_float_controls"). Saying the feature was "incorporated into 1.4" means you can use the feature without having to declare the extension.

@kvark
Copy link
Contributor Author

kvark commented Sep 8, 2021

My reading of the current state of the debate is that we need to decide if this functionality is testable or not. I believe having it testable would make a stronger API, and thus we need to explore this path before proceeding (with this PR as it stands now).

It sounds like DX12 and Metal support this "precise" mode unconditionally, and there is a chance we'll be able to test it. In Vulkan, it's more complicated. As @dneto0 noted, there is an extension. However, one has to check for the properties of this extension before using them: https://vulkan.gpuinfo.org/listpropertiesextensions.php?extension=VK_KHR_shader_float_controls&platform=all
It's concerning to see "shaderSignedZeroInfNanPreserveFloat32" only supported by "46%" of reports.

If we make this an optional feature, we'd deny access to it for users who either don't care about shaderSignedZeroInfNanPreserveFloat32 specifically, or happy with Vulkan driver behavior by default. I don't think we want to end up in a situation where people write if features.contains(PreciseMath) || IsVulkan().

@dneto0
Copy link
Contributor

dneto0 commented Sep 8, 2021

My reading of the current state of the debate is that we need to decide if this functionality is testable or not.

Agreed. I wasn't sure on the call yesterday, so I investigated what Vulkan does to test NoContract:

The NoContract feature has been supported by SPIR-V / Vulkan from the start.
Its test is here

The test tempts the compiler to fuse a multiply-add into one operation (FMA).
FMA is spec'd to produce a rounded result where the intermediate results are computed with infinite precision and accuracy. The test uses sample values that produce catastrophic cancellation. A fused operation would produce a tiny-magnitude number (2**-46), but a non-fused result produces either zero or a small but larger number (2**-24).

This depends crucially on the fact that certain basic operations (add, subtract, multiply) are "correctly rounded" (as defined by IEEE 754, and adopted by Vulkan and WGSL).

In general, catastrophic cancellation can be used to magnify errors for other undesirable cases: reassociation, distribution of multiply over addiiton.

So I think fusing, reassociation, and distribution aspects are testable.

@mrshannon
Copy link
Contributor

mrshannon commented Sep 8, 2021

In an ideal world precise (meaning no fusing, reassociation, or distribution) and the other fast math optimizations would be separate. It seems they can be on DX12 and Vulkan, but as far as I could tell Metal is all or nothing. I think a majority of use cases could be solved with precise alone. So perhaps a lesser feature could be made core where at some level of:

  • module
  • function
  • variable

precise mode could be enabled which would not enable shaderSignedZeroInfNanPreserveFloat32 (as support for that is not great, even on desktops) but would just:

  • Use precise on DX12
  • Use NoContraction and possibly Invariant on Vulkan
  • Use -fno-fast-math on Metal.

We already have the invariant qualifier which maps to precise in HLSL but it can only be used for the built-in position output. Also this maps to Invariant in SPIR-V and not NoContraction while precise in HLSL implies both.

@kvark
Copy link
Contributor Author

kvark commented Sep 8, 2021

Metal is not exactly all or nothing. As @kainino0x pointed in #2076 (comment), we can pick a subset of fast-math stuff. It sounds like you are suggesting to adopt the current PR but cut out everything related to VK_KHR_shader_float_controls, since it's not universally available. This means Metal compiler wouldn't need "-fno-signed-zeros" for example, and possibly other things. Do I understand your proposal, @mrshannon ?

Then we can have an optional feature exposing something that captures VK_KHR_shader_float_controls functionality, as a follow-up.

@mrshannon
Copy link
Contributor

mrshannon commented Sep 8, 2021

Do I understand your proposal, @mrshannon ?

Yes, just disable fusing, reassociation, and distribution. With signed zeros and such not universal, and the lack of example code that would be effected by them I am proposing scaling back to only what precise in HLSL promises as there is plenty of rendering code in the wild which relies on that.

Metal is not exactly all or nothing. As @kainino0x pointed in #2076 (comment), we can pick a subset of fast-math stuff.

I was not sure if that was kosher since it was not documented in the Metal spec.

Then we can have an optional feature exposing something that captures VK_KHR_shader_float_controls functionality, as a follow-up.

Or you could wait until someone needs it, its probably a failure of my imagination but I can't think of a case where asymptotic limits would be of use in rendering.

@kvark
Copy link
Contributor Author

kvark commented Sep 8, 2021

The last commit here describes this semantics. I'm sure @dneto0 would want to put more technical details of what is preserved, adding examples and such, and I'm hoping we can follow-up with this.

@munrocket
Copy link
Contributor

Love to see where it is going, thanks @kvark. Floating point expansion definitely not rely on NaN/Infinity/SignedZero's.

@litherum
Copy link
Contributor

litherum commented Sep 9, 2021

From talking with the Metal team, we haven't gotten requests to apply fastMath per function rather than per MTLLibrary.

This makes intuitive sense, because the use cases that need IEEE precision are things like scientific computing, where it's likely that all the functions in the library will need to be precise. Conversely, for use cases like games, it's likely that none of the functions in the library will need to be precise.

(Games do need things like the invariant keyword, but that's a different thing.)

@litherum
Copy link
Contributor

litherum commented Sep 9, 2021

Metal is not exactly all or nothing. As @kainino0x pointed in #2076 (comment), we can pick a subset of fast-math stuff.

These things aren't API. Ideally, WebGPU / WGSL wouldn't rely on anything that isn't API in the 3 backend APIs. The API is a single boolean switch.

(Anything that isn't API is unsupported, and able/willing to be removed at any point in the future.)

@litherum
Copy link
Contributor

litherum commented Sep 9, 2021

It would be unfortunate to make fastMath a "best effort" attribute.

From an author's perspective, what's the point of a precision guarantee if the guarantee isn't actually guaranteed?

From an implementor's perspective, why would an implementor implement any of the feature at all if it just slows down code and doesn't actually have any expected (testable) behavior? Or, stated a different way: Let's say I want to implement this feature in a particular WebGPU implementation, and I sit down and start typing code into the computer to do it. How do I know when I'm done? Why shouldn't I consider myself to be done implementing the feature before writing a single line of code?

@kvark
Copy link
Contributor Author

kvark commented Sep 9, 2021

@litherum it sounds like the desire to have this behavior testable is shared between all parties, so it's good to have this settled. The last version of the PR, which I mentioned in #2080 (comment), already makes it normative. It just doesn't spell out the exact norms affecting it, which is intended to be written at some point. So, no "best effort" any more.

As for the scope of the change, I'm curious what use cases are to consider. From the distance, it felt useful to be able to make, say, vertex shaders precise but not the fragment shaders. Or even just computation of one specific output of a vertex shader. But I haven't used this myself, so happy to hear ISV feedback!

@mrshannon could you share the intended usage of this attribute? Would you be doing it for the whole module, or potentially more granularly?

@mrshannon
Copy link
Contributor

mrshannon commented Sep 9, 2021

@mrshannon could you share the intended usage of this attribute? Would you be doing it for the whole module, or potentially more granularly?

First, I am specifically talking about precise as it exists in HLSL (rearrangement etc), not signed zero and the rest. We have two use cases:

The first is extremely large scale terrain generation in a compute shader which requires double precision. An existing example of this is Elite Dangerous which uses real doubles on some cards and emulated doubles (which require precise) on others. Their reason for emulation is because it's faster than the real thing on some cards, our reason is because we don't have real doubles at all. In this case, while there will be calculations in the compute shader which do not require precise it is likely that at least half of the compute shader module will require it.

Use in the wild: Generating the Universe in Elite Dangerous

The second case is when rendering very large objects (which cannot be handled in other ways). To avoid jitter we need to perform the model to camera space transform in double precision sometimes. Therefore, again emulated doubles. But in this case the calculation is in the vertex shader and is pretty narrow in scope as it is just used for the model to world transform and furthermore is only used on a small subset of vertices (those close to the camera). Therefore it would be undesirable to require precise at the module level since variable, statement, or function level would allow the disabling of the optimizations at a narrow scope for a tiny part of the vertex shader and in our case only on some invocations.

Use in the wild: 3D Engine Design for Virtual Globes

Conversely, for use cases like games, it's likely that none of the functions in the library will need to be precise.

(Games do need things like the invariant keyword, but that's a different thing.)

This is not true, see Generating the Universe in Elite Dangerous. What is required is not IEEE but specifically the guarantees that HLSL gives with its precise decorator which is more than what invariant guarantees, except on DX12 where invariant maps to precise.

In general there are cases where floating point error needs to be mitigated, even in rendering, which requires controlling the order of operations.

@kainino0x
Copy link
Contributor

As @kainino0x pointed in #2076 (comment), we can pick a subset of fast-math stuff.

FWIW the flags I pointed to can probably only be used when invoking an MSL compiler via command line, but not via newLibraryWithSource. However I found the associated clang pragmas:
https://clang.llvm.org/docs/LanguageExtensions.html#extensions-to-specify-floating-point-flags
(I haven't tested them, and it's entirely possible they don't actually work because MSL's LLVM backend doesn't understand them.)

Of course @litherum's point that these aren't officially supported still stands.

@mrshannon
Copy link
Contributor

mrshannon commented Sep 14, 2021

Here implementation of emulated double in WebGPU that works right now in Chrome/Firefox on MacBook and PC with Linux, uncomment trick with mix if you testing precise math. https://codepen.io/munrocket/pen/vYZgyqa

@munrocket Not sure that it is working on Windows, is the top of the fractal supposed to be filled with strange bands.

Also not sure you need mix, the select, or an if statement, should be enough.

@munrocket
Copy link
Contributor

munrocket commented Sep 14, 2021

@mrshannon yes, it shows that float32 with limited precision. It’s intentional.

I am started to think that fast-math is pretty ok even for this purposes because Dekker multiplication algorithm become smaller in x10 (2 FLOP vs 17 FLOP) with hardware fma instruction. It is implicitly inherited from fma(a,b,c) in current WebGPU implementation in Chrome/Firefox. Also with select trick you could implement NoContract for Moller/Knuth’s summation and it will be calculated correctly but little bit slower. At lest on my machines all works pretty good. 😍

If you going to expose precise math in this PR then fma(a, b, c) will become twice rounded expression RN(RN(a * b) + c). And your will need to use more slower algorithm. I don’t know could you add support for hardware fma in this PR or not. But currently it is a trade-off.

Fast multiplication and slow summation VS slow multiplication and fast summation

@mrshannon
Copy link
Contributor

mrshannon commented Sep 14, 2021

@munrocket We just tested the select trick in our implementation and it works. Thanks for the idea.

We are likely to use it over this PR (even if it is merged in) as it has better performance on Metal due to not disabling all optimizations and works at the expression and not at the module level.

@munrocket
Copy link
Contributor

Glad to help. The only reason why someone will still need to turn off fast-math if they detect that host doesn’t support FMA in hardware. After that fma emulation with select trick will be painful.

I am removed mix in NoContraction trick thanks, because reordering turned off without it. Also if you find that some devices not support this, please share.

@kvark
Copy link
Contributor Author

kvark commented Sep 15, 2021

Hey users, if you keep finding nice hacks and workarounds for this, we'll have no incentive to do anything with the spec! 🤪

@munrocket
Copy link
Contributor

munrocket commented Sep 15, 2021

Ha-ha, that was fun.

It's actually miracle how it works. Because current rounding is UB and not specified, as well as fast-math mode. This PR still have potential. For example if somebody figure out how to turn on correct math and fused-multiply-add at the same time, mrshannon probably will use it.

@kdashg
Copy link
Contributor

kdashg commented Sep 15, 2021

WGSL meeting minutes 2021-09-14
  • DN: Discussing earlier and with MS. Our concern is two things
    • Making sure when targeting FXC that the math survives as well as we hope. We’re concerned about stability/reliability here
    • The name precise may end up promising more than all underlying platforms can do, so we may want to revisit the name if we can’t guarantee its behavior everywhere.
    • Thanks for the feedback about infinities and NaNs not being too important
  • MS: Requires operations to not be reordered
  • DN: Ops that are correctly rounded are +/-/*, but division is a harder request. Do you need division? (maybe?)
  • DM: MM worried about spooky action at a distance. (SAAAD)
  • KN: If this is defined as best-effort, at least for Metal, I think would prefer to use clang pragmas rather than SAAAD. I’m worried that we wouldn’t want to use all the flags, for perf reasons. Just reordering is less bad.
  • DN: Rough consensus that we want this to be testable, rather than a pure hint.
  • DN: I want an investigation to show that assured non-reordering and non-reassociation are both implementable. On DX11 (FXC), DX12, and Vulkan (desktop and mobile).
  • DM: Sounds like something we need to figure out before v1.
  • AB: Why?
  • DM: We see real issues, MS and users of MoltenVk exist and they need this.
  • JG: Is this a need that e.g. webgl already had.
  • MS: We’re going from desktop directly to WebGPU. There’s extant code for this. Without this, we would need to do vert shading on CPU. We do that today, but we’re expecting to want to change this. We’d be pretty sad to not have this.

@dneto0
Copy link
Contributor

dneto0 commented Sep 15, 2021

Hey @munrocket thanks for this technique!

And thank you also for a nice compact test case. We had been discussing the need for a good way to test the behaviour.

Some thoughts:

  1. Will this continue to work with future implementations? I think basically yes. The select introduces a data dependency and possibly a control dependency on the test condition. The compiler must not be able to know the value of that condition (statically). The safest thing (from the programmer's perspective) is to pass in a known-to-be-zero value from the outside (a parameter buffer), and compare a value against that. This is what GLSL fuzz does (blog paper)

  2. What's the performance cost? I think probably low? That assumes a few things: (a) we care most about throughput (b) the test condition is cheap (e.g. compare against opaque zero) (3) the implementation evaluates both options, and uses predicated execution, and at least one side is cheap. Then I would guess the additional costs are small, so performance is probably going to be pretty good.

So two thumbs up for this technique!

@dneto0
Copy link
Contributor

dneto0 commented Sep 15, 2021

Hey users, if you keep finding nice hacks and workarounds for this, we'll have no incentive to do anything with the spec!

It's a feature, not a bug. :-)
And this is why open processes can be so great.

@dneto0
Copy link
Contributor

dneto0 commented Sep 15, 2021

Another thing about the performance cost: Yes, this prevents the compiler from rearranging code to go faster, but that's exactly what the programmer wanted.

@munrocket
Copy link
Contributor

Will this continue to work with future implementations?

It works with round-to-nearest-even floating point rounding, which is default usually, but not specified for some reason in DX11 for example. Also as mentioned: floating point arithmetic not associative, muladd should be allowed only for fma.

What's the performance cost?

Usually for emulated double addition 20 flop, multiplication 24 flop with software FMA, 9 flop with hardware. So it's cheap. When we using select I don't know, but it is possible to measure. Probably select not so perfect, it's branching?

This is what GLSL fuzz does, papers

Interesting, if we need a stronger confidence, we can pass variable there.

@dneto0
Copy link
Contributor

dneto0 commented Sep 15, 2021

About the performance cost, I meant the additional performance cost of using the select. Thanks for the extra info for the cost of the double precision emulation. :-)

Right, rounding mode is not specified for graphics APIs because some devices use round-to-even, some use round-to-zero (which is cheaper in hardware).

Does select do branching? It is common for GPUs to use predicated execution: they execute both paths, but selectively turn off side effects of that path "not taken", and then only use the chosen result. (wikipedia This trades off possibly wasting cycles stepping through the dead code path, but saves the machine from taking a branch and destroying internal state.

So that's why I would hope to make the evaluation of the "other" path and the condition cheap: we want that so on a predicated execution they don't waste too much extra time.

@litherum
Copy link
Contributor

litherum commented Sep 21, 2021

@mrshannon

The first is extremely large scale terrain generation in a compute shader which requires double precision.

The second case is when rendering very large objects ... in this case the calculation is in the vertex shader and is pretty narrow in scope

Both of these use cases are supported by putting the precise attribute on the entire module rather than the individual function - just put these two entry points and their dependent functionality in a separate module. Linking a vertex shader from one module and a fragment shader from a different module is supported.

@kvark
Copy link
Contributor Author

kvark commented Sep 21, 2021

If I understand correctly, we are mostly fine with introducing precise_math attribute (in a way that we can test), we just can't agree on what scope it covers:

  1. function scope (the current shape of this PR):
    - This is nice for SPIR-V and HLSL.
    - On Metal, it has a SAaaD effect: adding an attribute to one function can end up affecting other functions. On wgpu and Dawn, it would affect all the functions in the call graph of a specific entry point (if one of them has the attribute). On Safari's implementation, it would affect all the functions in the module.
  2. entry point scope:
    - This is nice for Metal via wgpu or Dawn, since they build MTLLibrary per entry point.
    - Has SAaaD on SPIR-V and HLSL, since using a function from another entry point (which has the attribute enabled) would make it slower for other entry points using this function.
    - Has SAaaD on Metal in Safari, since one entry point will affect the others.
  3. module scope:
    - Can be mapped to all of the APIs
    - less optimal for SPIR-V and HLSL
    - No SAaaD

@dneto0
Copy link
Contributor

dneto0 commented Sep 21, 2021

The user has reasonable workaround, and it appears to be performant and likely stable over time. I thought this was an easy "not in V1.0" decision.

@kvark
Copy link
Contributor Author

kvark commented Sep 21, 2021

I'm not happy about this workaround becoming sort of a tribal knowledge thing.
If we consider it good that people do this, can we hide the workaround behind something like:

fn compute(val: T) -> T

So doing let x = compute(a * b) + c would effectively put NoCompaction attribute on the intermediate result (and precise in HLSL). On Metal, it could use the select trick internally.

@mrshannon
Copy link
Contributor

Both of these use cases are supported by putting the precise attribute on the entire module rather than the individual function - just put these two entry points and their dependent functionality in a separate module. Linking a vertex shader from one module and a fragment shader from a different module is supported.

It would be wasteful in the 2nd case. Perhaps as much as 10% of vertices (depending on camera location) in any given object need emulated double vertex position. The rest can take the faster 32-bit float path as they are further from the camera.

@mrshannon
Copy link
Contributor

So doing let x = compute(a * b) + c would effectively put NoCompaction attribute on the intermediate result (and precise in HLSL). On Metal, it could use the select trick internally.

Either this or actual function scope (including Metal) would keep us from using the select trick. I agree with the tribal knowledge issue but I am not going to severely harm performance to avoid it. Not sure compute is the right term but I can't think of anything better at the moment.

@kvark
Copy link
Contributor Author

kvark commented Sep 28, 2021

@litherum it looks like MSL supports [[clang::optnone]] on functions - KhronosGroup/SPIRV-Cross#1746 . We could consider it as a direct effect of [[precise_math]] in WGSL.

@kainino0x
Copy link
Contributor

[[clang::optnone]] seems like far too heavy of a hammer to me.

The optnone attribute suppresses essentially all optimizations on a function or method, regardless of the optimization level applied to the compilation unit as a whole. This is particularly useful when you need to debug a particular function, but it is infeasible to build the entire application without optimization. Avoiding optimization on the specified function can improve the quality of the debugging information for that function.

@kdashg kdashg modified the milestones: V1.0, post-V1 Sep 28, 2021
@kdashg
Copy link
Contributor

kdashg commented Sep 29, 2021

WGSL meeting minutes 2021-09-28
  • (Previously: MM: Let’s postpone.)
  • DM: Offline, discussed MSL’s clang::optnone, which is on function scope. Kai noted it’s a big hammer and may not do what’s wanted.
  • MM: I thought we postponed this until after MVP.
  • DM: New information came up.
  • MM: Would like to re-propose postpone to MVP. That attribute is not part of the Metal API. So you can’t rely on it. Don’t think it’s a good solution.
  • DM: Other idea is to expose a function that shields optimizations across its argument vs. its result. Think we can implement it well on all backends. (NoContract on SPIR-V, or select trick as discussed on the issue.)
  • MM: Not familiar with the technique, and I didn’t prepare; think it was a mistake to put this on the agenda.
  • DN: Also think this can be postponed until after MVP. Have workaround, even if it’s in the “lore” category.
  • JG: Will mark as milestone Post-V1.

@greggman
Copy link
Contributor

greggman commented Apr 7, 2024

I'm not sure this idea appeared above but .... what about module level flag that only works if a feature like "high-precision" exists? So you check if the adapter supports "high-precision". If it does you request a device with {requiredFeatures: ['high-precision']}. Now you can pass 'high-precision' to createShaderModule

This way, if an GPU/driver can't pass the high-precision CTS tests it doesn't advertise the feature.

If you don't like features bleeding into WGSL you could move the check into pipeline creation where you use the precision keywords/options in WGSL but when you go try to make a pipeline, if you didn't request the 'high-precision' feature then you get an error your shader isn't supported on this device.

@TimTheBig
Copy link

Is there any way I can get this moving again?

* without [=Reassociation|reassociating=] subexpressions

Note: this translates to `NoContraction` decoration in SPIR-V, `precise` qualifier in HLSL,
and a subset of `"-fno-fast-math" group of compile options in MSL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    and a subset of `"-fno-fast-math" group of compile options in MSL.

Should the subset not be documented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wgsl WebGPU Shading Language Issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add method to disable fast-math on a per-shader basis.

9 participants