Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix _estimate_mi discrete_features str and value check #13497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 1, 2019

Conversation

hermidalc
Copy link
Contributor

@hermidalc hermidalc commented Mar 23, 2019

Reference Issues/PRs

Fixes #13481

What does this implement/fix? Explain your changes.

In _estimate_mi the parameter discrete_features can be a string value 'auto' as well as an array of indices or a boolean mask. Code initially checks if discrete_features == 'auto' which will error in future versions of numpy. Fix first checks str instance and then for valid value.

@hermidalc
Copy link
Contributor Author

Sorry I'm new to codecov. When I look at the details and see the two lines of code that are highlighted in red is it telling me that I need tests that cover those cases?

@adrinjalali
Copy link
Member

adrinjalali commented Mar 23, 2019 via email

@hermidalc
Copy link
Contributor Author

yes, but only if those are caused by your changes (I haven't checked in your case).

One of the two highlighted red lines of code is from my change, one isn't. So I shouldn't also add a test that covers the line of code that I didn't change?

@@ -247,8 +247,11 @@ def _estimate_mi(X, y, discrete_features='auto', discrete_target=False,
X, y = check_X_y(X, y, accept_sparse='csc', y_numeric=not discrete_target)
n_samples, n_features = X.shape

if discrete_features == 'auto':
discrete_features = issparse(X)
if isinstance(discrete_features, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have this whole block:

    if isinstance(discrete_features, str):
        if discrete_features == 'auto':
            discrete_features = issparse(X)
        else:
            raise ValueError("Invalid string value for discrete_features.")

    if isinstance(discrete_features, bool):
        discrete_mask = np.empty(n_features, dtype=bool)
        discrete_mask.fill(discrete_features)
    else:
        discrete_features = np.asarray(discrete_features)
        if discrete_features.dtype != 'bool':
            discrete_mask = np.zeros(n_features, dtype=bool)
            discrete_mask[discrete_features] = True
        else:
            discrete_mask = discrete_features

in a series of if/elses and also use utils.validaton.check_array instead of
np.asarray?

Then in the last else you can raise an exception and complain about the given
discrete_features not supported.

Copy link
Contributor Author

@hermidalc hermidalc Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali if I'm understanding you correctly, in order to change this to a series of if/elifs with a final else where exception is raised then will need to explicitly test if discrete_features is "array like" instead of letting this case fall to final else. Currently numpy.asarray conversion is implicitly doing that. For second elif is there a better way to do that?

if isinstance(discrete_features, str):
    ...
elif isinstance(discrete_features, bool):
    ...
elif check_array(discrete_features):
    discrete_features = check_array(discrete_features)
    ...
else:
   raise ValueError("Invalid value for discrete_features.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's a good idea to explicitly check if it's array-like. Just note that check_array returns the array, not a boolean. Maybe something like:

if isinstance(discrete_features, str):
    ...
elif isinstance(discrete_features, bool):
    ...
else:
    try:
        discrete_features = check_array(discrete_features)
        ...
    except:
       raise ValueError("Invalid value for discrete_features.")

Copy link
Contributor Author

@hermidalc hermidalc Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(discrete_features, str):
    ...
elif isinstance(discrete_features, bool):
    ...
else:
    try:
        discrete_features = check_array(discrete_features)
        ...
    except:
       raise ValueError("Invalid value for discrete_features.")

Don't think we need the try/except around check_array since function already raises appropriate errors from within. Also since str and bool cases use overlapping code I structured the if statement a bit differently. In latest commit you will see finished impl.

@adrinjalali
Copy link
Member

I'd say that other codecov compaint is also related to your code, and refactoring and fixing the code and adding and fixing all the related tests would be in the scope of this PR.

@hermidalc
Copy link
Contributor Author

hermidalc commented Mar 26, 2019

@adrinjalali I'm not sure what is going on with the cicleci doc and doc-min-dependencies tests looks like it couldn't download (404 error) some external artifacts which is unrelated to what I did?

@jnothman
Copy link
Member

jnothman commented Mar 27, 2019 via email

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @hermidalc, the tests seem unrelated to this PR, you can rebase on master though.

@hermidalc
Copy link
Contributor Author

hermidalc commented Mar 27, 2019

LGTM, thanks @hermidalc, the tests seem unrelated to this PR, you can rebase on master though.

@adrinjalali sorry to ask a stupid question, what does one do to rebase on master? What specific git commands should I execute and in which branches?

@hermidalc
Copy link
Contributor Author

@jnothman @adrinjalali I will add tests for discrete_features as an array since none existed.

@jnothman
Copy link
Member

jnothman commented Mar 28, 2019 via email

@hermidalc
Copy link
Contributor Author

LGTM, thanks @hermidalc, the tests seem unrelated to this PR, you can rebase on master though.

@adrinjalali @jnothman I added tests we needed. Do you need to approve again or should I just rebase now?

@adrinjalali
Copy link
Member

@hermidalc rebasing is kinda independent of the reviews. You can do anytime.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a |Fix| entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@jnothman
Copy link
Member

Oh I forgot the functionality was not currently broken. Perhaps then we don't need a what's new.

@adrinjalali
Copy link
Member

Thanks @hermidalc

@adrinjalali adrinjalali merged commit 2a5c845 into scikit-learn:master Apr 1, 2019
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
…13497)

* discrete_features str and value check

* Update if logic

* Add discrete_features bad str value test

* Remove unnecessary nested isinstance str check

* Add back nested isinstance str check

* New/updates to tests

* Add v0.21 whats new entry

* Undo v0.21 whats new entry
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…13497)

* discrete_features str and value check

* Update if logic

* Add discrete_features bad str value test

* Remove unnecessary nested isinstance str check

* Add back nested isinstance str check

* New/updates to tests

* Add v0.21 whats new entry

* Undo v0.21 whats new entry
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Jun 26, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
…13497)

* discrete_features str and value check

* Update if logic

* Add discrete_features bad str value test

* Remove unnecessary nested isinstance str check

* Add back nested isinstance str check

* New/updates to tests

* Add v0.21 whats new entry

* Undo v0.21 whats new entry
@hermidalc hermidalc deleted the mi_discrete_features_fix branch June 14, 2022 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comparing string to array in _estimate_mi
3 participants