TST: Add new tests for array coercion #16571

seberg · 2020-06-11T01:32:33Z

Our current array-coercion tests are not all that bad, however, they are partially scattered
around.
These adds two main (extensive) tests. First that our scalars (not python scalars!) behave identically when assigned, or coerced when compared to normal casting.

Second, it adds some additional tests for nested arrays.

The main changes are that our scalars (with the one exception of void, because it is both structured and not structures) always behave the same.

There are two larger "change" that is a bit less obvious:

np.array([np.float64("nan")], dtype=int) and some other things (identical to the datetime paths here) would use float(item) or int(item) and then fail for NaN... For our scalars, in my branch these do the same thing as casting. Which is currently succeed.
Coercing float128 to complex256, I believe used to lose precision, it will not lose precision now, since it uses the normal casting machinery.

To be clear, the handling of Python scalars does not change in my branch. There are a couple of potential corner cases to map out, but please do not let this stop this here. It is probably easier to expand in a second PR anyway.

Subclasses of python scalars are unmodified by my changes in most cases, because they are considered "unknown" (except during dtype discovery).

I am open to reorganizing this a bit, but its pretty tedious and I am not sure its worth optimizing the code for not repeating patterns much, considering these are tests...

These tests have many xfails (some technically maybe not correct), which describe in detail what will change when merging the array-coercion changes; since all of those xfails are going to be removed.

These should never happen and rightly should lead to undefined behaviour (and preferably errors), but they excercise some more tricky code branches and should probably not crash.

seberg · 2020-06-12T01:19:32Z

@mattip do you know from the top of the head why int(np.complex128(3)) fails on PyPy? Is it because of the ComplexWarning making problems?

Its not an issue, I will just xfail the test for now on pypy I think. Just curious. It will probably work on my branch anyway.

PyPy seems to have issues with int(numpy_complex), maybe because it gives a warning during conversion (python does not define it). So simply mark it as xfail, it should work in my branch. Also some smaller cleanups.

mattip · 2020-06-12T07:55:39Z

Why int(np.complex128(3)) fails on PyPy?

Calling int(np.complex128(3)) fails, but directly calling __int__ succeeds. Strange since the internal PyPy code for int seems to lookup and call __int__.

$ pypy3 -Walways
Python 3.6.9 (2ad108f17bdb, Apr 07 2020, 02:59:05)
[PyPy 7.3.1 with GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import numpy as np
>>>> int(np.complex128(3)) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't convert complex to int
>>>> np.complex64(3)
(3+0j)
>>>> np.complex64(3).__int__()
__main__:1: ComplexWarning: Casting complex values to real discards the imaginary part
3
>>>> np.complex128(3).__int__()
__main__:1: ComplexWarning: Casting complex values to real discards the imaginary part
3

mattip · 2020-06-12T12:30:33Z

This is a PyPy bug in subclasses of complex, now fixed and will be part of the next release. Thanks for pointing it out.

charris · 2020-06-16T16:02:53Z

Hmm, running the tests with -vv isn't as informative as might be wished, the parametrized values are simply numbered so that it is difficult to tell what failed. Let me think about this for a while...

seberg · 2020-06-16T16:07:35Z

I think there was some way to name the parameters? have to look up how it worked though...

charris · 2020-06-16T16:14:09Z

I think there was some way to name the parameters?

I couldn't think of good names :) It also wouldn't fit nicely into your generator approach. Maybe if the yielded values could simply have a comment scalar0, scalar1, ... to mark them. But one would still need to look at the code to see what failed and it would require more maintenance effort.

charris · 2020-06-16T16:26:07Z

Could have the generator functions return a (named)tuple of two lists, one of names and the other of values. could then pass that as scalar.names, scalar.values or use it to make a pytest fixture. Alternatively, return a list of tuples and unzip them

>>> l = [(1,2), (3,4), (8,9)]
>>> list(zip(*l))
[(1, 3, 8), (2, 4, 9)]

seberg · 2020-06-16T16:44:59Z

OK, want to look into a few new tests (and clean up the fallout that always comes if you test edge-cases of edge-cases) and probably a couple of small, additional benchmarks just to document the changes.

The named tuple I know roughly how to do. I am not even sure the yields are perfect, I could also just make lists, but I doubt that would help much. Some custom fixture, or hypothesis like things also could work I guess, but have to find out how to do that first...

The failures will go away, since this must again be related to implemetning int() and float().

numpy/conftest.py

charris · 2020-06-16T20:15:17Z

Named tuples don't look useful for this application. I think lists are an option.

anirudh2290 · 2020-06-16T20:22:03Z

numpy/core/tests/test_array_coercion.py

+        assert arr.shape == (1,)
+
+    def test_char_special_case(self):
+        arr = np.array("string", dtype="c")


is this operation the same as arr = np.array("string", dtype="S1") ?
if i use dtype as S1 it gives a different result:

>>> arr = np.array("string", dtype="S1") >>> arr array(b's', dtype='|S1')

Yeah, its weird, "c" is the same as "S1", but with the special case that arr.dtype.char is c and that here the result gives 6 individual characters instead of the "string". We should get rid of it some of these days IMO.

IIRC, we have been planning to get rid of c for many years. It may be used somewhere, though. I think it was originally for data input for old applications.

Yeah, we definitely should get rid of it, I just didn't feel like looking into it too deeply, because it seemed just as easy to add the special case for now. I think c is actually partially deprecated on the C-API side, but not on the python side...

anirudh2290 · 2020-06-16T20:27:08Z

numpy/core/tests/test_array_coercion.py

+            nested = [nested]
+
+        arr = np.array(nested, dtype='c')
+        assert arr.shape == (1,) * (np.MAXDIMS - 1) + (6,)


This test will also fail if i replace dtype='c' with dtype='S1'

anirudh2290 · 2020-06-16T20:45:28Z

numpy/core/tests/test_array_coercion.py

+    yield np.uint8(2)
+    yield np.uint16(2)
+    yield np.uint32(2)
+    yield np.uint64(2)


is there a reason np.bool8 not needed here or is it not necessary ?

Thanks, seems I also forgot strings while refactoring...

Argg, I commented it out for now, I had not thought of gh-9875 and my array-coercion PR of course homogenizes behaviour in that regard. Although mainly for the numpy types, not for general strings.
We need to figure it out, either rip of the band-aid and fix it, or add on some type of band-aid before 1.20, for now I will open an issue for this and the bad float cases...

seberg · 2020-06-16T20:57:53Z

Thought I remembered that there was something: https://docs.pytest.org/en/latest/example/parametrize.html#paramexamples you can yield a pytest.param object which can have an id.

charris · 2020-06-16T21:25:03Z

Thought I remembered that there was something:

Looks like that should work.

…-coercion-tests

charris · 2020-06-16T22:41:09Z

Looks much better. Strings need ids and looks like the nested parametrized cases in TestTimeScalars could also use some.

seberg · 2020-06-18T22:16:10Z

@charris I think I improved the cases you were looking at, with the last commit?

charris · 2020-06-19T13:46:20Z

Thanks Sebastian.

seberg added 3 commits June 10, 2020 20:22

ENH: Add traceback-skip to assert_array_compare

f7ffeed

TST: Add tests for arraycoercion

2826549

These tests have many xfails (some technically maybe not correct), which describe in detail what will change when merging the array-coercion changes; since all of those xfails are going to be removed.

TST: Add test for empty sequences

4e74028

charris added 05 - Testing component: numpy._core labels Jun 11, 2020

seberg added 2 commits June 11, 2020 16:07

TST: Add tests for bad self-mutating sequence inputs to np.array

d0424f8

These should never happen and rightly should lead to undefined behaviour (and preferably errors), but they excercise some more tricky code branches and should probably not crash.

Simplify the "all scalars" logic (hardcode) and fix complex cases

147aacb

MAINT: Some cleanup, and xfail pypy

482b8a7

PyPy seems to have issues with int(numpy_complex), maybe because it gives a warning during conversion (python does not define it). So simply mark it as xfail, it should work in my branch. Also some smaller cleanups.

seberg added 2 commits June 16, 2020 12:47

TST: Add some further test, re-add missing rational

d5b548d

MAINT: Make rationals optional, because they fail some tests currently

b2a14c2

The failures will go away, since this must again be related to implemetning int() and float().

seberg commented Jun 16, 2020

View reviewed changes

numpy/conftest.py Outdated Show resolved Hide resolved

Update numpy/conftest.py

e70d9c8

anirudh2290 reviewed Jun 16, 2020

View reviewed changes

seberg added 2 commits June 16, 2020 17:11

MAINT: Fix strings and use pytest.param to clean things up

118265a

Merge branch 'new-coercion-tests' of github.com:seberg/numpy into new…

42a57e3

…-coercion-tests

seberg added 2 commits June 18, 2020 12:02

TST: Add tests for 0-D array-like input to np.array() corner-cases

5afde55

TST: Improve parameterization IDs to get better printing at -vv

4d65bbe

anirudh2290 approved these changes Jun 19, 2020

View reviewed changes

charris merged commit f253a7e into numpy:master Jun 19, 2020

seberg deleted the new-coercion-tests branch June 19, 2020 14:12

Uh oh!

TST: Add new tests for array coercion #16571

TST: Add new tests for array coercion #16571

Uh oh!

Conversation

seberg commented Jun 11, 2020

Uh oh!

seberg commented Jun 12, 2020

Uh oh!

mattip commented Jun 12, 2020

Uh oh!

mattip commented Jun 12, 2020

Uh oh!

charris commented Jun 16, 2020

Uh oh!

seberg commented Jun 16, 2020

Uh oh!

charris commented Jun 16, 2020

Uh oh!

charris commented Jun 16, 2020

Uh oh!

seberg commented Jun 16, 2020

Uh oh!

Uh oh!

charris commented Jun 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Jun 16, 2020

Uh oh!

charris commented Jun 16, 2020

Uh oh!

charris commented Jun 16, 2020

Uh oh!

seberg commented Jun 18, 2020

Uh oh!

charris commented Jun 19, 2020

Uh oh!

Uh oh!