-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG + 1] fix kdd_kddcup99 #9731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
|
|
||
| def test_shuffle(): | ||
| dataset = fetch_kddcup99(subset='SA', shuffle=True, percent10=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix a random_state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks!
|
|
||
| def test_shuffle(): | ||
| dataset = fetch_kddcup99(subset='SA', shuffle=True, percent10=True) | ||
| assert(any(dataset.target[-100:] == b'normal.')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fails on master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does.
|
@ngoix you'll need an entry in what's new bug section |
|
lgtm |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also remove the shuffle param from _fetch_brute_kddcup99?
|
|
|
also added a SkipTest in case kdd data are not downloaded, as it's not small data. |
|
@jnothman I removed the shuffle param from _fetch_brute_kddcup99 |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that's reasonable although ideally we'd then have a CI that fetches all the datasets and runs the datasets tests...
LGTM
|
CI failing? |
| dataset = fetch_kddcup99(random_state=0, subset='SA', shuffle=True, | ||
| percent10=True, download_if_missing=False) | ||
| except IOError: | ||
| raise SkipTest("kddcup99 dataset can not be loaded.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is not tested in CIs?
|
I removed the SkipTest, and now the coverage tests are passing. |
The CIs were failing because of coverage issues. My understanding is that we do not want to download datasets on Travis during the tests. Edit: just to be clear I was suggesting reverting your last commit. |
|
I'd agree that we don't usually want to download them in Travis, and poor
coverage here is acceptable, but it would also be nice if we had a Travis
instance that did test this sort of thing.
…On 13 Sep 2017 10:37 pm, "Loïc Estève" ***@***.***> wrote:
I removed the SkipTest, and now the coverage tests are passing.
The CIs were failing because of coverage issues. My understanding is that
we do not want to download datasets on Travis during the tests.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9731 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz63tNZXTegGDCSnq2pSBJcG2kIIDeks5sh8yYgaJpZM4PTVn9>
.
|
I am not sure how this would work on Travis. When I was working on testing datasets on figshare, it was taking quite a while to download all the datasets from scratch (maybe 30-40 minutes off the top of my head from a good university network) and maybe we do not have the time to do it within a Travis build. A possible work-around is to run the datasets tests on CircleCI where some of the datasets are downloaded/cached already. We chatted about something a bit related with @ogrisel. For some tests, it would be nice to run them once in a while but not on each PR. The idea was to set-up a separate repo in the scikit-learn organization and use daily cron jobs in Travis. Amongst the things we thought of:
Note that neither the CircleCI nor the Travis cron job play nice with the coverage ... |
This reverts commit ecd1e9e.
|
commit reverted! |
|
Merging, thanks a lot @ngoix. |
fix #9730