-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: "overflow encountered in divide" depending on the number of zeros #25097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sounds like it might be connected to mixed SIMD (8 elements) + normal (last element) code. @seiko2plus any thoughts? |
The reason will be that the simd operation fills up the a partial vector with ones to avoid exactly this issue, but Not sure what can actually be done to have partial loads there if we really worry about this, filling with EDIT: If NaN doesn't work (generally), the only alternative would be to ensure the divisor is 1, while currently we ensure the dividend is for |
Yes that main reason, why handling remaining lanes for such basic operation via manual SIMD is important to avoid unexpected compiler optimization behavior that may lead to raise FP errors, mainly wide versions CLANG once sniffing a clear contiguous path.
Filling the remaining lanes of the vector divisor with ones is what actually matters to avoid division by zero FPe while the dividend
It doesn't affect the performance even scalarising the remains lanes will be okay since there's another loop that handles the full vectors load and store. I think the following patch can fix this issue: diff --git a/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src b/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
index 30111258d6..4f083a5efa 100644
--- a/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
+++ b/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
@@ -138,7 +138,7 @@ NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(@TYPE@_@kind@)
npyv_store_@sfx@((@type@*)(dst + vstep), r1);
}
for (; len > 0; len -= hstep, src0 += vstep, dst += vstep) {
- #if @is_div@ || @is_mul@
+ #if @is_mul@
npyv_@sfx@ a = npyv_load_till_@sfx@((const @type@*)src0, len, 1.0@c@);
#else
npyv_@sfx@ a = npyv_load_tillz_@sfx@((const @type@*)src0, len); |
Right, maybe that is the sane solution. Padding with |
Yes but may open the door for compiler bugs likely FP error divide-by-zero as I mentioned.
Agrees, I think you mean it will lead to raising two separate exceptions invalid-value and divide-by-zero rather than only "divide-by-zero".
Looks practical solution to me, #25129 apply it.
The value that is used within the padded vector isn't generalized and should always defined based on the nature of the operation we deal with. |
Thanks everyone! |
This is fixed in 1.26.3 and later. |
Uh oh!
There was an error while loading. Please reload this page.
Describe the issue:
(Cross-post from https://stackoverflow.com/q/77451924/)
I am getting overflow
RuntimeWarning
s for an array of 9 elements but not for an array of 8 (or 200, ...) elements. It seems I get warnings iff I have more than 8 zeros and their number is not divisible by 4.Reproduce the code example:
Error message:
Runtime information:
1.26.1
3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]
WARNING:
threadpoolctl
not found in system! Install it bypip install threadpoolctl
. Once installed, trynp.show_runtime
again for more detailed build information[{'numpy_version': '1.26.1',
'python': '3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 '
'64 bit (AMD64)]',
'uname': uname_result(system='Windows', node='hostname', release='10', version='10.0.19045', machine='AMD64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': []}}]
None
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: