-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fix _estimate_mi discrete_features str and value check #13497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix _estimate_mi discrete_features str and value check #13497
Conversation
Sorry I'm new to codecov. When I look at the details and see the two lines of code that are highlighted in red is it telling me that I need tests that cover those cases? |
yes, but only if those are caused by your changes (I haven't checked in
your case).
…On Sat, Mar 23, 2019, 15:56 Leandro Hermida, ***@***.***> wrote:
Sorry I'm new to codecov. When I look at the details and see the two lines
of code that are highlighted in red is it telling me that I need tests that
cover those cases?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#13497 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABljeAyW-tGo76LfD3HQm0MHMw3_d68sks5vZkCPgaJpZM4cE8Wc>
.
|
One of the two highlighted red lines of code is from my change, one isn't. So I shouldn't also add a test that covers the line of code that I didn't change? |
@@ -247,8 +247,11 @@ def _estimate_mi(X, y, discrete_features='auto', discrete_target=False, | |||
X, y = check_X_y(X, y, accept_sparse='csc', y_numeric=not discrete_target) | |||
n_samples, n_features = X.shape | |||
|
|||
if discrete_features == 'auto': | |||
discrete_features = issparse(X) | |||
if isinstance(discrete_features, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have this whole block:
if isinstance(discrete_features, str):
if discrete_features == 'auto':
discrete_features = issparse(X)
else:
raise ValueError("Invalid string value for discrete_features.")
if isinstance(discrete_features, bool):
discrete_mask = np.empty(n_features, dtype=bool)
discrete_mask.fill(discrete_features)
else:
discrete_features = np.asarray(discrete_features)
if discrete_features.dtype != 'bool':
discrete_mask = np.zeros(n_features, dtype=bool)
discrete_mask[discrete_features] = True
else:
discrete_mask = discrete_features
in a series of if/elses and also use utils.validaton.check_array
instead of
np.asarray
?
Then in the last else
you can raise an exception and complain about the given
discrete_features
not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali if I'm understanding you correctly, in order to change this to a series of if/elifs with a final else where exception is raised then will need to explicitly test if discrete_features
is "array like" instead of letting this case fall to final else. Currently numpy.asarray conversion is implicitly doing that. For second elif is there a better way to do that?
if isinstance(discrete_features, str):
...
elif isinstance(discrete_features, bool):
...
elif check_array(discrete_features):
discrete_features = check_array(discrete_features)
...
else:
raise ValueError("Invalid value for discrete_features.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's a good idea to explicitly check if it's array-like. Just note that check_array
returns the array, not a boolean. Maybe something like:
if isinstance(discrete_features, str):
...
elif isinstance(discrete_features, bool):
...
else:
try:
discrete_features = check_array(discrete_features)
...
except:
raise ValueError("Invalid value for discrete_features.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(discrete_features, str): ... elif isinstance(discrete_features, bool): ... else: try: discrete_features = check_array(discrete_features) ... except: raise ValueError("Invalid value for discrete_features.")
Don't think we need the try/except around check_array
since function already raises appropriate errors from within. Also since str and bool cases use overlapping code I structured the if statement a bit differently. In latest commit you will see finished impl.
I'd say that other codecov compaint is also related to your code, and refactoring and fixing the code and adding and fixing all the related tests would be in the scope of this PR. |
@adrinjalali I'm not sure what is going on with the cicleci doc and doc-min-dependencies tests looks like it couldn't download (404 error) some external artifacts which is unrelated to what I did? |
The CircleCI problem is not yours. We're working on it.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @hermidalc, the tests seem unrelated to this PR, you can rebase on master though.
@adrinjalali sorry to ask a stupid question, what does one do to rebase on master? What specific git commands should I execute and in which branches? |
@jnothman @adrinjalali I will add tests for |
git checkout mybranch
git fetch upstream
git merge upstream/master
git push
|
@adrinjalali @jnothman I added tests we needed. Do you need to approve again or should I just rebase now? |
@hermidalc rebasing is kinda independent of the reviews. You can do anytime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a |Fix|
entry to the change log at doc/whats_new/v0.21.rst
. Like the other entries there, please reference this pull request with :issue:
and credit yourself (and other contributors if applicable) with :user:
Oh I forgot the functionality was not currently broken. Perhaps then we don't need a what's new. |
Thanks @hermidalc |
…13497) * discrete_features str and value check * Update if logic * Add discrete_features bad str value test * Remove unnecessary nested isinstance str check * Add back nested isinstance str check * New/updates to tests * Add v0.21 whats new entry * Undo v0.21 whats new entry
…13497) * discrete_features str and value check * Update if logic * Add discrete_features bad str value test * Remove unnecessary nested isinstance str check * Add back nested isinstance str check * New/updates to tests * Add v0.21 whats new entry * Undo v0.21 whats new entry
…t-learn#13497)" This reverts commit ee08cd0.
…t-learn#13497)" This reverts commit ee08cd0.
…13497) * discrete_features str and value check * Update if logic * Add discrete_features bad str value test * Remove unnecessary nested isinstance str check * Add back nested isinstance str check * New/updates to tests * Add v0.21 whats new entry * Undo v0.21 whats new entry
Reference Issues/PRs
Fixes #13481
What does this implement/fix? Explain your changes.
In
_estimate_mi
the parameterdiscrete_features
can be a string value 'auto' as well as an array of indices or a boolean mask. Code initially checks ifdiscrete_features == 'auto'
which will error in future versions of numpy. Fix first checks str instance and then for valid value.