Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TST: Add new tests for array coercion #16571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 19, 2020
Merged

Conversation

seberg
Copy link
Member

@seberg seberg commented Jun 11, 2020

Our current array-coercion tests are not all that bad, however, they are partially scattered
around.
These adds two main (extensive) tests. First that our scalars (not python scalars!) behave identically when assigned, or coerced when compared to normal casting.

Second, it adds some additional tests for nested arrays.

The main changes are that our scalars (with the one exception of void, because it is both structured and not structures) always behave the same.

There are two larger "change" that is a bit less obvious:

  • np.array([np.float64("nan")], dtype=int) and some other things (identical to the datetime paths here) would use float(item) or int(item) and then fail for NaN... For our scalars, in my branch these do the same thing as casting. Which is currently succeed.
  • Coercing float128 to complex256, I believe used to lose precision, it will not lose precision now, since it uses the normal casting machinery.

To be clear, the handling of Python scalars does not change in my branch. There are a couple of potential corner cases to map out, but please do not let this stop this here. It is probably easier to expand in a second PR anyway.

  • Subclasses of python scalars are unmodified by my changes in most cases, because they are considered "unknown" (except during dtype discovery).

I am open to reorganizing this a bit, but its pretty tedious and I am not sure its worth optimizing the code for not repeating patterns much, considering these are tests...

seberg added 3 commits June 10, 2020 20:22
These tests have many xfails (some technically maybe not correct),
which describe in detail what will change when merging the
array-coercion changes; since all of those xfails are going
to be removed.
seberg added 2 commits June 11, 2020 16:07
These should never happen and rightly should lead to undefined
behaviour (and preferably errors), but they excercise some more
tricky code branches and should probably not crash.
@seberg
Copy link
Member Author

seberg commented Jun 12, 2020

@mattip do you know from the top of the head why int(np.complex128(3)) fails on PyPy? Is it because of the ComplexWarning making problems?

Its not an issue, I will just xfail the test for now on pypy I think. Just curious. It will probably work on my branch anyway.

PyPy seems to have issues with int(numpy_complex), maybe because
it gives a warning during conversion (python does not define it).
So simply mark it as xfail, it should work in my branch.
Also some smaller cleanups.
@mattip
Copy link
Member

mattip commented Jun 12, 2020

Why int(np.complex128(3)) fails on PyPy?

Calling int(np.complex128(3)) fails, but directly calling __int__ succeeds. Strange since the internal PyPy code for int seems to lookup and call __int__.

$ pypy3 -Walways
Python 3.6.9 (2ad108f17bdb, Apr 07 2020, 02:59:05)
[PyPy 7.3.1 with GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import numpy as np
>>>> int(np.complex128(3)) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't convert complex to int
>>>> np.complex64(3)
(3+0j)
>>>> np.complex64(3).__int__()
__main__:1: ComplexWarning: Casting complex values to real discards the imaginary part
3
>>>> np.complex128(3).__int__()
__main__:1: ComplexWarning: Casting complex values to real discards the imaginary part
3

@mattip
Copy link
Member

mattip commented Jun 12, 2020

This is a PyPy bug in subclasses of complex, now fixed and will be part of the next release. Thanks for pointing it out.

@charris
Copy link
Member

charris commented Jun 16, 2020

Hmm, running the tests with -vv isn't as informative as might be wished, the parametrized values are simply numbered so that it is difficult to tell what failed. Let me think about this for a while...

@seberg
Copy link
Member Author

seberg commented Jun 16, 2020

I think there was some way to name the parameters? have to look up how it worked though...

@charris
Copy link
Member

charris commented Jun 16, 2020

I think there was some way to name the parameters?

I couldn't think of good names :) It also wouldn't fit nicely into your generator approach. Maybe if the yielded values could simply have a comment scalar0, scalar1, ... to mark them. But one would still need to look at the code to see what failed and it would require more maintenance effort.

@charris
Copy link
Member

charris commented Jun 16, 2020

Could have the generator functions return a (named)tuple of two lists, one of names and the other of values. could then pass that as scalar.names, scalar.values or use it to make a pytest fixture. Alternatively, return a list of tuples and unzip them

>>> l = [(1,2), (3,4), (8,9)]
>>> list(zip(*l))
[(1, 3, 8), (2, 4, 9)]

@seberg
Copy link
Member Author

seberg commented Jun 16, 2020

OK, want to look into a few new tests (and clean up the fallout that always comes if you test edge-cases of edge-cases) and probably a couple of small, additional benchmarks just to document the changes.

The named tuple I know roughly how to do. I am not even sure the yields are perfect, I could also just make lists, but I doubt that would help much. Some custom fixture, or hypothesis like things also could work I guess, but have to find out how to do that first...

seberg added 2 commits June 16, 2020 12:47
The failures will go away, since this must again be related to
implemetning int() and float().
@charris
Copy link
Member

charris commented Jun 16, 2020

Named tuples don't look useful for this application. I think lists are an option.

assert arr.shape == (1,)

def test_char_special_case(self):
arr = np.array("string", dtype="c")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this operation the same as arr = np.array("string", dtype="S1") ?
if i use dtype as S1 it gives a different result:

>>> arr = np.array("string", dtype="S1")
>>> arr
array(b's', dtype='|S1')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, its weird, "c" is the same as "S1", but with the special case that arr.dtype.char is c and that here the result gives 6 individual characters instead of the "string". We should get rid of it some of these days IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we have been planning to get rid of c for many years. It may be used somewhere, though. I think it was originally for data input for old applications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely should get rid of it, I just didn't feel like looking into it too deeply, because it seemed just as easy to add the special case for now. I think c is actually partially deprecated on the C-API side, but not on the python side...

nested = [nested]

arr = np.array(nested, dtype='c')
assert arr.shape == (1,) * (np.MAXDIMS - 1) + (6,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will also fail if i replace dtype='c' with dtype='S1'

yield np.uint8(2)
yield np.uint16(2)
yield np.uint32(2)
yield np.uint64(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason np.bool8 not needed here or is it not necessary ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, seems I also forgot strings while refactoring...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argg, I commented it out for now, I had not thought of gh-9875 and my array-coercion PR of course homogenizes behaviour in that regard. Although mainly for the numpy types, not for general strings.
We need to figure it out, either rip of the band-aid and fix it, or add on some type of band-aid before 1.20, for now I will open an issue for this and the bad float cases...

@seberg
Copy link
Member Author

seberg commented Jun 16, 2020

Thought I remembered that there was something: https://docs.pytest.org/en/latest/example/parametrize.html#paramexamples you can yield a pytest.param object which can have an id.

@charris
Copy link
Member

charris commented Jun 16, 2020

Thought I remembered that there was something:

Looks like that should work.

@charris
Copy link
Member

charris commented Jun 16, 2020

Looks much better. Strings need ids and looks like the nested parametrized cases in TestTimeScalars could also use some.

@seberg
Copy link
Member Author

seberg commented Jun 18, 2020

@charris I think I improved the cases you were looking at, with the last commit?

@charris charris merged commit f253a7e into numpy:master Jun 19, 2020
@charris
Copy link
Member

charris commented Jun 19, 2020

Thanks Sebastian.

@seberg seberg deleted the new-coercion-tests branch June 19, 2020 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants