BUG: "overflow encountered in divide" depending on the number of zeros #25097

bersbersbers · 2023-11-09T13:16:15Z

Describe the issue:

(Cross-post from https://stackoverflow.com/q/77451924/)

I am getting overflow RuntimeWarnings for an array of 9 elements but not for an array of 8 (or 200, ...) elements. It seems I get warnings iff I have more than 8 zeros and their number is not divisible by 4.

Reproduce the code example:

import numpy as np

# This works fine
np.zeros(1) / np.float64(1e-309)
np.zeros(4) / np.float64(1e-309)
np.zeros(7) / np.float64(1e-309)
np.zeros(8) / np.float64(1e-309)
np.zeros(12) / np.float64(1e-309)
np.zeros(16) / np.float64(1e-309)
np.zeros(100) / np.float64(1e-309)
np.zeros(200) / np.float64(1e-309)

# This gives "RuntimeWarning: overflow encountered in divide"
np.zeros(9) / np.float64(1e-309)

Error message:

<stdin>:1: RuntimeWarning: overflow encountered in divide

Runtime information:

1.26.1
3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]

WARNING: threadpoolctl not found in system! Install it by pip install threadpoolctl. Once installed, try np.show_runtime again for more detailed build information
[{'numpy_version': '1.26.1',
'python': '3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 '
'64 bit (AMD64)]',
'uname': uname_result(system='Windows', node='hostname', release='10', version='10.0.19045', machine='AMD64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': []}}]
None

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

mattip · 2023-11-10T07:01:25Z

Sounds like it might be connected to mixed SIMD (8 elements) + normal (last element) code. @seiko2plus any thoughts?

seberg · 2023-11-10T07:38:32Z

The reason will be that the simd operation fills up the a partial vector with ones to avoid exactly this issue, but 1/1e-309 overflows to inf.

Not sure what can actually be done to have partial loads there if we really worry about this, filling with NaN would be safe w.r.t. FPEs, if it doesn't stall performance for some reason.

EDIT: If NaN doesn't work (generally), the only alternative would be to ensure the divisor is 1, while currently we ensure the dividend is for a / scalar (scalar would need to be 1 padded not a)

seiko2plus · 2023-11-11T17:37:22Z

The reason will be that the simd operation fills up the a partial vector with ones to avoid exactly this issue, but 1/1e-309 overflows to inf.

Yes that main reason, why handling remaining lanes for such basic operation via manual SIMD is important to avoid unexpected compiler optimization behavior that may lead to raise FP errors, mainly wide versions CLANG once sniffing a clear contiguous path.

Not sure what can actually be done to have partial loads there if we really worry about this, filling with NaN would be safe w.r.t. FPEs

Filling the remaining lanes of the vector divisor with ones is what actually matters to avoid division by zero FPe while the dividend nan or zero should be okay.

if it doesn't stall performance for some reason.

It doesn't affect the performance even scalarising the remains lanes will be okay since there's another loop that handles the full vectors load and store.

I think the following patch can fix this issue:

diff --git a/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src b/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
index 30111258d6..4f083a5efa 100644
--- a/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
+++ b/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src
@@ -138,7 +138,7 @@ NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(@TYPE@_@kind@)
                 npyv_store_@sfx@((@type@*)(dst + vstep), r1);
             }
             for (; len > 0; len -= hstep, src0 += vstep, dst += vstep) {
-            #if @is_div@ || @is_mul@
+            #if @is_mul@
                 npyv_@sfx@ a = npyv_load_till_@sfx@((const @type@*)src0, len, 1.0@c@);
             #else
                 npyv_@sfx@ a = npyv_load_tillz_@sfx@((const @type@*)src0, len);

seberg · 2023-11-11T18:34:43Z

even scalarising the remains lanes

Right, maybe that is the sane solution. Padding with 0 isn't quite right either because 0 / 0 is different from nonzero / 0: FPE will be "invalid value" vs. "overflow".
Padding with NaN should work for everything (never triggers FPE, except comparisons need care). But its not nice if NaNs might stall things in some other way.

seiko2plus · 2023-11-13T00:27:01Z

Right, maybe that is the sane solution.

Yes but may open the door for compiler bugs likely FP error divide-by-zero as I mentioned.

different from nonzero / 0: FPE will be "invalid value" vs. "overflow".

Agrees, I think you mean it will lead to raising two separate exceptions invalid-value and divide-by-zero rather than only "divide-by-zero".

Padding with NaN

Looks practical solution to me, #25129 apply it.

But its not nice if NaNs might stall things in some other way.

The value that is used within the padded vector isn't generalized and should always defined based on the nature of the operation we deal with.

bersbersbers · 2023-11-15T21:36:33Z

Thanks everyone!

bersbersbers · 2024-04-11T06:57:53Z

This is fixed in 1.26.3 and later.

bersbersbers added the 00 - Bug label Nov 9, 2023

seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Nov 9, 2023

seiko2plus mentioned this issue Nov 13, 2023

BUG: Fix FP overflow error in division when the divisor is scalar #25129

Merged

seberg closed this as completed in #25129 Nov 15, 2023

charris mentioned this issue Nov 19, 2023

BUG: Fix FP overflow error in division when the divisor is scalar #25191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: "overflow encountered in divide" depending on the number of zeros #25097

BUG: "overflow encountered in divide" depending on the number of zeros #25097

bersbersbers commented Nov 9, 2023 •

edited

Loading

mattip commented Nov 10, 2023

Uh oh!

seberg commented Nov 10, 2023 •

edited

Loading

Uh oh!

seiko2plus commented Nov 11, 2023 •

edited

Loading

Uh oh!

seberg commented Nov 11, 2023

Uh oh!

seiko2plus commented Nov 13, 2023

Uh oh!

bersbersbers commented Nov 15, 2023

Uh oh!

bersbersbers commented Apr 11, 2024

Uh oh!

Uh oh!

BUG: "overflow encountered in divide" depending on the number of zeros #25097

BUG: "overflow encountered in divide" depending on the number of zeros #25097

Comments

bersbersbers commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

mattip commented Nov 10, 2023

Uh oh!

seberg commented Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seiko2plus commented Nov 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Nov 11, 2023

Uh oh!

seiko2plus commented Nov 13, 2023

Uh oh!

bersbersbers commented Nov 15, 2023

Uh oh!

bersbersbers commented Apr 11, 2024

Uh oh!

bersbersbers commented Nov 9, 2023 •

edited

Loading

seberg commented Nov 10, 2023 •

edited

Loading

seiko2plus commented Nov 11, 2023 •

edited

Loading