Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] MNT Use isinstance instead of dtype.kind check for scalar validation. #7394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

raghavrv
Copy link
Member

@raghavrv raghavrv commented Sep 12, 2016

Addresses one of the issues listed in #5053

Uses ininstance(..., (np.integer, numbers.Integral)) instead of the dtype.kind based check.

NOTE np 1.6's int does not seem to inherit from numbers.Integral, hence we need to check for both numbers.Integral and np.integer...

Minor PR @vene @jnothman @amueller


This change is Reviewable

@jnothman
Copy link
Member

Test errors.

Why is kind testing bad?

In at least some versions of numpy, I think, some variants int float types are not registered as subclasses of Integral. Why do you prefer this way? Perhaps we should create a helper isfloat/isint rather than potentially break some variant ways of specifying scalars... In which case, I'd prefer the kind approach.

@raghavrv raghavrv force-pushed the model_selection_enhancements branch from f0e0776 to 6d8dd0c Compare September 12, 2016 11:22
@raghavrv
Copy link
Member Author

It should pass now. This issue was raised during the review of #4294... by @vene or @amueller

And which versions of numpy exactly? Because we use isinstance check for scalar validation across sklearn.

grep of "dtype.kind" shows it (dtype.kind validation/testing) is only done for X / y arrays...

@jnothman
Copy link
Member

Usually scalars will be unwrapped from arrays, but not necessarily.

On 12 September 2016 at 21:26, Raghav RV [email protected] wrote:

It should pass now. This issue was raised during the review of #4294
#4294... by @vene
https://github.com/vene or @amueller https://github.com/amueller

And which versions of numpy exactly? Because we use isinstance check for
scalar validation across sklearn.

grep of "dtype.kind" shows it (dtype.kind validation/testing) is only
done for X / y arrays...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7394 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6-5nm4YNHRP8vRBd9hO2OBcVVLuBks5qpTbcgaJpZM4J6atB
.

@raghavrv
Copy link
Member Author

The current test failures answered my question ;) Is it worth doing the isint / isfloat for validation of scalars like you suggested or simply close this PR and leave it as dtype.kind?

@jnothman
Copy link
Member

The point is, I suppose, is that it may be possible for this change to be
backwards-incompatible... except for the fact that it's being applied to
model_selection before release. I think generally we're better off
allowing numpy scalars, using a helper that mixes isscalar with a
dtype.kind check.

On 12 September 2016 at 22:24, Joel Nothman [email protected] wrote:

Usually scalars will be unwrapped from arrays, but not necessarily.

On 12 September 2016 at 21:26, Raghav RV [email protected] wrote:

It should pass now. This issue was raised during the review of #4294
#4294... by @vene
https://github.com/vene or @amueller https://github.com/amueller

And which versions of numpy exactly? Because we use isinstance check for
scalar validation across sklearn.

grep of "dtype.kind" shows it (dtype.kind validation/testing) is only
done for X / y arrays...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7394 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6-5nm4YNHRP8vRBd9hO2OBcVVLuBks5qpTbcgaJpZM4J6atB
.

@jnothman
Copy link
Member

It's a bit weird that this code doesn't validate that they're scalars, but
I don't think there's any hurry to fix it.

On 12 September 2016 at 22:27, Raghav RV [email protected] wrote:

The current test failures answered my question ;) Is it worth doing the
isint / isfloat for validation of scalars like you suggested or simply
close this PR and leave it as dtype.kind?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7394 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6xtIcyIbzFqRnCs_GDt8FlLoWYuGks5qpUVBgaJpZM4J6atB
.

@raghavrv
Copy link
Member Author

raghavrv commented Sep 12, 2016

Indeed this was aimed at separating the leftover unaddressed issues at #5053 and closing it subsequently...

"""
Check if val is a scalar of type int (including all numpy int types)
"""
print val
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops

@raghavrv
Copy link
Member Author

I've added the isint / isfloat which check if the val is scalar and also int / float. Take a look and also let me know if this should be extended to other places across scikit-learn... (NOTE: No urgency for 0.18...)

@amueller
Copy link
Member

if you want to add these helpers, I guess we should use them consistently.
Try

git grep numbers | grep import

to see where numbers is used.

But I don't really see the point.

import numbers
import numpy
isinstance(np.array([1])[0], numbers.Integral)

True

So using isinstance should work in all places, and we don't need to mess with numpy kind?

@amueller
Copy link
Member

amueller commented Sep 12, 2016

For example in iforest, the INTEGER_TYPES seems redundant because np.integer is a subclass of numbers.integer from what it looks like. (Hm thought trying to inspect the class hierarchy I don't see np.int extending numbers.Integral, that's weird)

@raghavrv
Copy link
Member Author

Exactly. Even I assumed isinstance(..., numbers.Integral) would work until this failure proved me wrong.

@raghavrv
Copy link
Member Author

if you want to add these helpers, I guess we should use them consistently.

Indeed I was just waiting for a confirmation... I'll proceed making them consistent now...

@GaelVaroquaux
Copy link
Member

import numbers
import numpy
isinstance(np.array([1])[0], numbers.Integral)

True

So using isinstance should work in all places, and we don't need to mess with
numpy kind?

+1

@raghavrv
Copy link
Member Author

raghavrv commented Sep 12, 2016

@GaelVaroquaux isinstance(np.int32(2), numbers.Integral) is False in python2... Which is because as Andy pointed out, np.int doesn't subclass from numbers.Integral...

@amueller
Copy link
Member

@raghavrv can you please explain the failure case? oh it's Python2? Let's drop support ;)

@raghavrv
Copy link
Member Author

lol

@amueller
Copy link
Member

(for real, please use the helper everywhere and document in the helper why it is needed).
Please also git grep kind and git grep isinstance.

@raghavrv
Copy link
Member Author

Yes indeed sure in 10 mins... ;)

@raghavrv
Copy link
Member Author

Selfnote: at the end try switching to

isinstance(..., (numbers.Integral, np.integer)

and

isinstance(..., (float, np.floating))

Would avoid the cost of np.asarray?

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux isinstance(np.int32(2), numbers.Integral) is False in python2...

Is that really true? I can't seem to reproduce:

In [1]: a = np.int32(2)
In [2]: import numbers
In [3]: isinstance(a, numbers.Integral)
Out[3]: True
In [4]: import sys
In [5]: sys.version
Out[5]: '2.7.12 (default, Jul  1 2016, 15:12:24) \n[GCC 5.4.0 20160609]'

@amueller
Copy link
Member

the failure was on 2.6....

@raghavrv
Copy link
Member Author

Yes. 2.6... Sorry for not being clear...

BTW When are we dropping support for 2.6?

@amueller
Copy link
Member

we are dropping 2.6 in 0.19

@amueller
Copy link
Member

I can't reproduce on 2.6.9 either... but maybe with older numpy?

@amueller
Copy link
Member

ok actually it's just the old numpy, not python version dependent. It fails in numpy 1.6

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Sep 12, 2016 via email

@raghavrv
Copy link
Member Author

bump?

@jnothman
Copy link
Member

is this urgent?

@raghavrv
Copy link
Member Author

No... :) Let's revisit after 0.18 maybe...

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2016

Should we review and merge this now? This is really a trivial but subtle change that would decide the fate of numpy ints which are not compatible with numbers.Integral being permitted where we accept ints...

@lesteve
Copy link
Member

lesteve commented Dec 5, 2016

For the record isinstance(np.int32(2), numbers.Integral) returns True only for numpy >= 1.9.

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2016

Thanks for the comment! Indeed which is why I've tried to do isintance(..., (numbers.Integral, np.integer))... @lesteve Could I trouble you for a review of the PR btw? ;)

@lesteve
Copy link
Member

lesteve commented Dec 5, 2016

I have read the discussion but I am not sure why the idea of using helper functions was dropped. This is quite hard to get right across versions of numpy as this issue showed.

Also is the idea of this PR to do the change over the whole scikit-learn codebase or only in the model_selection package?

@lesteve
Copy link
Member

lesteve commented Dec 5, 2016

For the record isinstance(np.int32(2), numbers.Integral) returns True only for numpy >= 1.9.

Thanks for the comment! Indeed which is why I've tried to do isintance(..., (numbers.Integral, np.integer))

Yeah I said that only because when you read the discussion you think Python 2.6 is to blame until you realised the Python version has nothing to do with it. And then numpy 1.6 is blamed but what I meant to emphasize is that the issue happens with more recent numpy versions (numpy <= 1.8).

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2016

Yeah I said that only because when you read the discussion you think Python 2.6 is to blame until you realised the Python version has nothing to do with it.

True. Side effect of multi-tasking ;(

BTW this PR handles it in model_selection only as previously in cross_validation we accepted numpy integers... We can raise a dedicated issue for the whole codebase so it can be taken on by multiple contributors if needed....

@lesteve
Copy link
Member

lesteve commented Dec 5, 2016

Even if you change the code only in model_selection, I would be +1 for a helper function like is_int_like/is_float_like (or better names) somewhere in utils.

I guess a check_int_like/check_float_like would work in your case but is a tad less convenient for more complicated cases (like max_df in the CountVectorizer which can be a float or an int). Full disclosure: I was midlly annoyed the other day where I realised there was no equivalent of numpy.testing.assert_array_equal that was returning a boolean instead of raising an AssertionError and I am still aching a bit.

@raghavrv
Copy link
Member Author

raghavrv commented Dec 6, 2016

@GaelVaroquaux @amueller @jnothman Are we all in the same page? Do you guys also want the isint/isfloat reinstated back into this PR? (I think I removed as I felt @GaelVaroquaux was not entirely happy with such helpers...)

@jnothman
Copy link
Member

jnothman commented Dec 6, 2016 via email

@jnothman
Copy link
Member

jnothman commented Dec 6, 2016 via email

@raghavrv
Copy link
Member Author

I won't have time to do this. I'm tagging this "Need contributor" and "Easy". The idea is a list of classes that correspond to int and string and use that to validate all int and strings..

@raghavrv raghavrv removed this from the 0.19 milestone Jun 12, 2017
@jnothman jnothman added the Easy Well-defined and straightforward way to resolve label Jun 14, 2017
@jnothman jnothman added this to the 0.20 milestone Jun 14, 2017
@jnothman
Copy link
Member

jnothman commented Feb 6, 2018

Superseded by #10017

@jnothman jnothman closed this Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve help wanted Stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants