You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the miopen-provider CI was pointed at the shared cross-provider integration suite (#7433), bf16 3D conv weight-gradient tests began failing flakily (Smoke/IntegrationGpuConvWrw3dBfp16.Correctness/14, ConvolutionWgrad_0::DW). The cause on the test side is a tolerance regression: the shared suite validates conv-wgrad with a fixed tolerance (getToleranceWrw<bf16>() = 0.2), whereas the pre-migration miopen-provider tests used a dynamic, per-shape tolerance (calculateConvWrwTolerance<...>), which for this shape was on the order of hundreds.
The fixed 0.2 is below the bf16 reduction-noise floor for deep conv-wgrad reductions, so it intermittently flags results that are within expected bf16 error. (There is also a real CK kernel precision bug that makes some kernels noisier than necessary — filed separately; see Related. This issue is specifically about the test tolerance being wrong, which should be fixed regardless.)
Concrete code
New (shared) test — dnn-providers/integration-tests/src/integration_tests/conv/IntegrationGpuConvBackwardWeights.cpp:95:
For this case (dyDims=[8,1,16,16,16], range [-1,1]): reduction depth N·D·H·W = 32768, sumAbsProductBound = 32768.
Old dynamiccalculateConvWrwTolerance ≈ gamma·sumAbsProductBound (≈258) + output-cast (eps_bf16·32768 ≈ 256) ≈ ~500 atol. This is a worst-case coherent bound (γₙ·Σ|xᵢ| = n²·u, and it bounds the result by Σ|products| rather than the actual cancelled |DW|≈100). It's so loose it would pass almost anything — which is why dynamic tolerances were disabled.
New fixed0.2 atol — too tight: it's below the per-element bf16 reduction noise for a 32768-deep reduction (legit abs error is ~eps_bf16·|partials| ≈ O(0.5–2)), and it ignores reduction depth and output magnitude.
A proper tolerance should be statistical and shape/depth-aware, e.g. scale an absolute term with ~ eps_bf16 · √(reduction_depth) · (typical product magnitude) plus a small relative term for the output cast — not a flat constant, and not the worst-case linear/coherent bound.
Why the failure is on a small weight (and why MIOpen's own tests miss it)
DW here is 16 weights spanning ~[0.7, 135]. The bf16 reduction error is roughly uniform in absolute terms across weights, so it only trips the per-element check on the small-magnitude weight (e.g. ref 0.69 → impl 1.09; abs err 0.40 > 0.2+0.2·0.69=0.338). The relative-L2 / RMS metric MIOpen and MIOpenDriver -V use is dominated by the large weights (‖ref‖≈200), so the same error is ~0.017 ≪ threshold → it passes there. The per-element check the shared suite uses is the more correct check; the issue is purely that its tolerance value is wrong for this op/shape.
Asks
Replace the fixed getToleranceWrw (and the other fixed conv tolerances, if similarly affected) with a realistic shape/depth-aware tolerance for conv-wgrad — or restore a fixed dynamic-tolerance that uses a statistical (not worst-case-coherent) bound.
Coordinate the chosen tolerance with the CK kernel precision fix (Related) so the test reflects achievable accuracy once the kernel keeps split-k partials in fp32.
Environment
Surfaced on: Test miopenprovider (gfx94X-dcgpu) cross-provider integration check, MI300X, ROCm 7.14.
Summary
When the miopen-provider CI was pointed at the shared cross-provider integration suite (#7433), bf16 3D conv weight-gradient tests began failing flakily (
Smoke/IntegrationGpuConvWrw3dBfp16.Correctness/14,ConvolutionWgrad_0::DW). The cause on the test side is a tolerance regression: the shared suite validates conv-wgrad with a fixed tolerance (getToleranceWrw<bf16>() = 0.2), whereas the pre-migration miopen-provider tests used a dynamic, per-shape tolerance (calculateConvWrwTolerance<...>), which for this shape was on the order of hundreds.The fixed 0.2 is below the bf16 reduction-noise floor for deep conv-wgrad reductions, so it intermittently flags results that are within expected bf16 error. (There is also a real CK kernel precision bug that makes some kernels noisier than necessary — filed separately; see Related. This issue is specifically about the test tolerance being wrong, which should be fixed regardless.)
Concrete code
dnn-providers/integration-tests/src/integration_tests/conv/IntegrationGpuConvBackwardWeights.cpp:95:getToleranceWrw<bfloat16>()= 0.2 (projects/hipdnn/test_sdk/include/hipdnn_test_sdk/utilities/TestTolerances.hpp:228), with rtol 0.2. Per-element check|impl-ref| > atol + rtol*|ref|(CpuFpReferenceValidation.hpp).dddc256cb50~1:dnn-providers/miopen-provider/integration_tests/IntegrationGpuConvBackwardWeights.cpp(deleted by [ci][miopen-provider] Run cross-provider integration suite in CI for Miopen Provider #7433):Why neither old nor new tolerance is right
For this case (
dyDims=[8,1,16,16,16], range [-1,1]): reduction depthN·D·H·W = 32768,sumAbsProductBound = 32768.calculateConvWrwTolerance≈gamma·sumAbsProductBound (≈258) + output-cast (eps_bf16·32768 ≈ 256)≈ ~500 atol. This is a worst-case coherent bound (γₙ·Σ|xᵢ| = n²·u, and it bounds the result byΣ|products|rather than the actual cancelled|DW|≈100). It's so loose it would pass almost anything — which is why dynamic tolerances were disabled.0.2atol — too tight: it's below the per-element bf16 reduction noise for a 32768-deep reduction (legit abs error is ~eps_bf16·|partials| ≈ O(0.5–2)), and it ignores reduction depth and output magnitude.A proper tolerance should be statistical and shape/depth-aware, e.g. scale an absolute term with
~ eps_bf16 · √(reduction_depth) · (typical product magnitude)plus a small relative term for the output cast — not a flat constant, and not the worst-case linear/coherent bound.Why the failure is on a small weight (and why MIOpen's own tests miss it)
DWhere is 16 weights spanning ~[0.7, 135]. The bf16 reduction error is roughly uniform in absolute terms across weights, so it only trips the per-element check on the small-magnitude weight (e.g. ref 0.69 → impl 1.09; abs err 0.40 > 0.2+0.2·0.69=0.338). The relative-L2 / RMS metric MIOpen andMIOpenDriver -Vuse is dominated by the large weights (‖ref‖≈200), so the same error is ~0.017 ≪ threshold → it passes there. The per-element check the shared suite uses is the more correct check; the issue is purely that its tolerance value is wrong for this op/shape.Asks
getToleranceWrw(and the other fixed conv tolerances, if similarly affected) with a realistic shape/depth-aware tolerance for conv-wgrad — or restore a fixed dynamic-tolerance that uses a statistical (not worst-case-coherent) bound.Environment
Test miopenprovider (gfx94X-dcgpu)cross-provider integration check, MI300X, ROCm 7.14.Smoke/IntegrationGpuConvWrw3dBfp16.Correctness/14(NDHWC, x:[8,16,16,16,16] w:[1,16,1,1,1], bf16).Related