-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
TST: Add new tests for array coercion #16571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These tests have many xfails (some technically maybe not correct), which describe in detail what will change when merging the array-coercion changes; since all of those xfails are going to be removed.
These should never happen and rightly should lead to undefined behaviour (and preferably errors), but they excercise some more tricky code branches and should probably not crash.
@mattip do you know from the top of the head why Its not an issue, I will just xfail the test for now on pypy I think. Just curious. It will probably work on my branch anyway. |
PyPy seems to have issues with int(numpy_complex), maybe because it gives a warning during conversion (python does not define it). So simply mark it as xfail, it should work in my branch. Also some smaller cleanups.
Calling
|
This is a PyPy bug in subclasses of complex, now fixed and will be part of the next release. Thanks for pointing it out. |
Hmm, running the tests with |
I think there was some way to name the parameters? have to look up how it worked though... |
I couldn't think of good names :) It also wouldn't fit nicely into your generator approach. Maybe if the yielded values could simply have a comment |
Could have the generator functions return a (named)tuple of two lists, one of names and the other of values. could then pass that as
|
OK, want to look into a few new tests (and clean up the fallout that always comes if you test edge-cases of edge-cases) and probably a couple of small, additional benchmarks just to document the changes. The named tuple I know roughly how to do. I am not even sure the yields are perfect, I could also just make lists, but I doubt that would help much. Some custom fixture, or hypothesis like things also could work I guess, but have to find out how to do that first... |
The failures will go away, since this must again be related to implemetning int() and float().
Named tuples don't look useful for this application. I think lists are an option. |
assert arr.shape == (1,) | ||
|
||
def test_char_special_case(self): | ||
arr = np.array("string", dtype="c") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this operation the same as arr = np.array("string", dtype="S1")
?
if i use dtype as S1 it gives a different result:
>>> arr = np.array("string", dtype="S1")
>>> arr
array(b's', dtype='|S1')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, its weird, "c"
is the same as "S1"
, but with the special case that arr.dtype.char
is c
and that here the result gives 6 individual characters instead of the "string"
. We should get rid of it some of these days IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we have been planning to get rid of c
for many years. It may be used somewhere, though. I think it was originally for data input for old applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we definitely should get rid of it, I just didn't feel like looking into it too deeply, because it seemed just as easy to add the special case for now. I think c
is actually partially deprecated on the C-API side, but not on the python side...
nested = [nested] | ||
|
||
arr = np.array(nested, dtype='c') | ||
assert arr.shape == (1,) * (np.MAXDIMS - 1) + (6,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will also fail if i replace dtype='c' with dtype='S1'
yield np.uint8(2) | ||
yield np.uint16(2) | ||
yield np.uint32(2) | ||
yield np.uint64(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason np.bool8 not needed here or is it not necessary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, seems I also forgot strings while refactoring...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argg, I commented it out for now, I had not thought of gh-9875 and my array-coercion PR of course homogenizes behaviour in that regard. Although mainly for the numpy types, not for general strings.
We need to figure it out, either rip of the band-aid and fix it, or add on some type of band-aid before 1.20, for now I will open an issue for this and the bad float cases...
Thought I remembered that there was something: https://docs.pytest.org/en/latest/example/parametrize.html#paramexamples you can yield a |
Looks like that should work. |
Looks much better. Strings need ids and looks like the nested parametrized cases in |
@charris I think I improved the cases you were looking at, with the last commit? |
Thanks Sebastian. |
Our current array-coercion tests are not all that bad, however, they are partially scattered
around.
These adds two main (extensive) tests. First that our scalars (not python scalars!) behave identically when assigned, or coerced when compared to normal casting.
Second, it adds some additional tests for nested arrays.
The main changes are that our scalars (with the one exception of void, because it is both structured and not structures) always behave the same.
There are two larger "change" that is a bit less obvious:
np.array([np.float64("nan")], dtype=int)
and some other things (identical to the datetime paths here) would usefloat(item)
orint(item)
and then fail for NaN... For our scalars, in my branch these do the same thing as casting. Which is currently succeed.To be clear, the handling of Python scalars does not change in my branch. There are a couple of potential corner cases to map out, but please do not let this stop this here. It is probably easier to expand in a second PR anyway.
I am open to reorganizing this a bit, but its pretty tedious and I am not sure its worth optimizing the code for not repeating patterns much, considering these are tests...