Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Don't convert inputs to np.float64 in digitize #11464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2018

Conversation

eric-wieser
Copy link
Member

This converts digitize to a pure-python function that falls back on searchsorted.

Performance doesn't really matter here anyway - if you care about performance, then you should just call searchsorted directly, rather than checking the order of the bins.

Partially fixes gh-11022



def digitize(x, bins, right=False):
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring is copied verbatim

@eric-wieser
Copy link
Member Author

Also makes np.digitize(x, []) == np.zeros_like(x), rather than erroring, which seems correct to me

x = 2**54 # loses precision in a float
assert_equal(np.digitize(x, [x - 1, x + 1]), 1)

@dec.knownfailureif(True, "np.core.multiarray._monoticity loses precision")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs:

from numpy.testing import dec

I took a look at the source for knownfailureif, and it is using nose under the hood. I'd normally suggest using @pytest.mark.xfail instead, but NumPy roadmap seems to imply that we want to avoid using "pytest magic" and mostly just use it as a runner (it seems Guido agrees, but if nose is unmaintained may need some thought). I'm not sure if that nose stuff behind dec (and perhaps elsewhere?) will eventually have to be replaced though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumPy roadmap seems to imply that we want to avoid using "pytest magic" and mostly just use it as a runner

Don't use anything with nose in it, we don't want the dependency, nose itself is unmaintained, and the the version up on pip is not python 3.7 compatible. Definitely use xfail instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep dec for backwards compatibility with folks using the numpy testing framework, you will note that numpy itself no longer uses it anywhere except for testing testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guido has a point about the pytest "documentation by example", the voluminous documentation for pytest is very hard to use for reference and learning, but I expect that at some point someone will make a "real" reference :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless there's an equally concise alternative, I don't see a problem with @pytest.mark.xfail. The xfail and skip marks are kind of essential.

@eric-wieser
Copy link
Member Author

Switched to use xfail - I didn't check to see if the dec.knownFailureIf example was within the meta-test stuff

This converts digitize to a pure-python function that falls back on searchsorted.

Performance doesn't really matter here anyway - if you care about performance, then you should just call searchsorted directly, rather than checking the order of the bins.

Partially fixes numpygh-11022
@eric-wieser
Copy link
Member Author

Rebased on #11474

@charris
Copy link
Member

charris commented Jul 8, 2018

Thanks Eric. I'm guessing this has pretty much the same performance as before when the arrays have significant size. Might be good to have a benchmark at some point.

@charris charris merged commit 7cd94f2 into numpy:master Jul 8, 2018
@charris
Copy link
Member

charris commented Jul 8, 2018

Was there a downside to float64?

@eric-wieser
Copy link
Member Author

Yes - conversion from uint64 to float64 is lossy, so digitize(uint64_array, uint64_bins) would produce incorrect results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.digitize casts integers to float64
4 participants