Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: "overflow encountered in divide" depending on the number of zeros #25097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bersbersbers opened this issue Nov 9, 2023 · 7 comments · Fixed by #25129
Closed

BUG: "overflow encountered in divide" depending on the number of zeros #25097

bersbersbers opened this issue Nov 9, 2023 · 7 comments · Fixed by #25129
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@bersbersbers
Copy link
Contributor

bersbersbers commented Nov 9, 2023

Describe the issue:

(Cross-post from https://stackoverflow.com/q/77451924/)

I am getting overflow RuntimeWarnings for an array of 9 elements but not for an array of 8 (or 200, ...) elements. It seems I get warnings iff I have more than 8 zeros and their number is not divisible by 4.

Reproduce the code example:

import numpy as np

# This works fine
np.zeros(1) / np.float64(1e-309)
np.zeros(4) / np.float64(1e-309)
np.zeros(7) / np.float64(1e-309)
np.zeros(8) / np.float64(1e-309)
np.zeros(12) / np.float64(1e-309)
np.zeros(16) / np.float64(1e-309)
np.zeros(100) / np.float64(1e-309)
np.zeros(200) / np.float64(1e-309)

# This gives "RuntimeWarning: overflow encountered in divide"
np.zeros(9) / np.float64(1e-309)

Error message:

<stdin>:1: RuntimeWarning: overflow encountered in divide

Runtime information:

1.26.1
3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]

WARNING: threadpoolctl not found in system! Install it by pip install threadpoolctl. Once installed, try np.show_runtime again for more detailed build information
[{'numpy_version': '1.26.1',
'python': '3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 '
'64 bit (AMD64)]',
'uname': uname_result(system='Windows', node='hostname', release='10', version='10.0.19045', machine='AMD64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': []}}]
None

Context for the issue:

No response

@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Nov 9, 2023
@mattip
Copy link
Member

mattip commented Nov 10, 2023

Sounds like it might be connected to mixed SIMD (8 elements) + normal (last element) code. @seiko2plus any thoughts?

@seberg
Copy link
Member

seberg commented Nov 10, 2023

The reason will be that the simd operation fills up the a partial vector with ones to avoid exactly this issue, but 1/1e-309 overflows to inf.

Not sure what can actually be done to have partial loads there if we really worry about this, filling with NaN would be safe w.r.t. FPEs, if it doesn't stall performance for some reason.

EDIT: If NaN doesn't work (generally), the only alternative would be to ensure the divisor is 1, while currently we ensure the dividend is for a / scalar (scalar would need to be 1 padded not a)

@seiko2plus
Copy link
Member

seiko2plus commented Nov 11, 2023

The reason will be that the simd operation fills up the a partial vector with ones to avoid exactly this issue, but 1/1e-309 overflows to inf.

Yes that main reason, why handling remaining lanes for such basic operation via manual SIMD is important to avoid unexpected compiler optimization behavior that may lead to raise FP errors, mainly wide versions CLANG once sniffing a clear contiguous path.

Not sure what can actually be done to have partial loads there if we really worry about this, filling with NaN would be safe w.r.t. FPEs

Filling the remaining lanes of the vector divisor with ones is what actually matters to avoid division by zero FPe while the dividend nan or zero should be okay.

if it doesn't stall performance for some reason.

It doesn't affect the performance even scalarising the remains lanes will be okay since there's another loop that handles the full vectors load and store.

I think the following patch can fix this issue:

diff --git a/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src b/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
index 30111258d6..4f083a5efa 100644
--- a/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
+++ b/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
@@ -138,7 +138,7 @@ NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(@TYPE@_@kind@)
                 npyv_store_@sfx@((@type@*)(dst + vstep), r1);
             }
             for (; len > 0; len -= hstep, src0 += vstep, dst += vstep) {
-            #if @is_div@ || @is_mul@
+            #if @is_mul@
                 npyv_@sfx@ a = npyv_load_till_@sfx@((const @type@*)src0, len, 1.0@c@);
             #else
                 npyv_@sfx@ a = npyv_load_tillz_@sfx@((const @type@*)src0, len);

@seberg
Copy link
Member

seberg commented Nov 11, 2023

even scalarising the remains lanes

Right, maybe that is the sane solution. Padding with 0 isn't quite right either because 0 / 0 is different from nonzero / 0: FPE will be "invalid value" vs. "overflow".
Padding with NaN should work for everything (never triggers FPE, except comparisons need care). But its not nice if NaNs might stall things in some other way.

@seiko2plus
Copy link
Member

Right, maybe that is the sane solution.

Yes but may open the door for compiler bugs likely FP error divide-by-zero as I mentioned.

different from nonzero / 0: FPE will be "invalid value" vs. "overflow".

Agrees, I think you mean it will lead to raising two separate exceptions invalid-value and divide-by-zero rather than only "divide-by-zero".

Padding with NaN

Looks practical solution to me, #25129 apply it.

But its not nice if NaNs might stall things in some other way.

The value that is used within the padded vector isn't generalized and should always defined based on the nature of the operation we deal with.

@bersbersbers
Copy link
Contributor Author

Thanks everyone!

@bersbersbers
Copy link
Contributor Author

This is fixed in 1.26.3 and later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants