device-libs: Guard trig reduction quadrant index against NaN UB#2752
device-libs: Guard trig reduction quadrant index against NaN UB#2752adelejjeh wants to merge 1 commit into
Conversation
Guard the `(int)fn & 0x3` quadrant index computation in trig reduction functions against NaN input. `fptosi NaN` is UB in C and produces `poison` in LLVM IR, which the compiler exploits during constant-folding to return garbage from `cos(inf)`, `sin(inf)`, etc. Fix by adding an isnan check: `isnan(fn) ? 0 : ((int)fn & 0x3)`. The AMDGPU backend folds away the guard at codegen since v_cvt_i32_f32 already returns 0 for NaN (see llvm#200960). Fixes: LCOMPILER-2150 Co-Authored-By: Claude Opus 4.6 <[email protected]>
69edd27 to
cf5e0cb
Compare
| struct redret ret; | ||
| ret.hi = MATH_MAD(t, -0.5, x); | ||
| ret.i = (int)t & 0x3; | ||
| ret.i = BUILTIN_ISNAN_F64(t) ? 0 : ((int)t & 0x3); |
There was a problem hiding this comment.
Can you rewrite this as is-inf-or-nan(x)? It's harder to prove that t isn't a nan based on the input, but only inf or nan inputs should result in nan results
There was a problem hiding this comment.
I don't want this statement to result in any instructions besides the cvt_i32_f64 and similarly for the other types.
There was a problem hiding this comment.
@b-sumner The upstream PR handles pattern matching the generated LLVM IR and replacing it with a single llvm.fptosi.sat
There was a problem hiding this comment.
@arsenm if we change the check to check x instead of t it would make it harder to pattern match and replace with the saturating intrinsic.
There was a problem hiding this comment.
ultimately, the pattern matches in instcombine will fold the resulting checks and we will result in the same, just an fptosi.sat instead of fptosi.
Guard the
(int)fn & 0x3quadrant index computation in trig reductionfunctions against NaN input.
fptosi NaNis UB in C and producespoisonin LLVM IR, which the compiler exploits during constant-foldingto return garbage from
cos(inf),sin(inf), etc.Fix:
ret.i = BUILTIN_ISNAN(fn) ? 0 : ((int)fn & 0x3);Applied to 7 locations across 6 files (trigredsmall F/D, trigred H,
trigpired F/D/H). Upstream PR llvm#201435 pattern matches the isnan
guard replacing it with the saturating intrinsic which removes the UB.
Verified: identical instruction count, all reproducer variants pass.
Fixes: LCOMPILER-2150