-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Remove the MLComp text categorization example #8264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Chatting with @ogrisel he agrees that deprecating |
sklearn/datasets/mlcomp.py
Outdated
@@ -68,6 +68,11 @@ def load_mlcomp(name_or_id, set_="raw", mlcomp_root=None, **kwargs): | |||
if not os.path.exists(mlcomp_root): | |||
raise ValueError("Could not find folder: " + mlcomp_root) | |||
|
|||
if name_or_id in ['20news-18828', '20news-19997', '20news-bydate']: | |||
raise DeprecationWarning("please consider using " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecate the whole function with a @deprecated
decorator. Also mention which version it will be deprecated in (0.19) and which version it will be removed (0.21). Look either at the contributing guidelines or other deprecation messages in the scikit-learn code.
Thanks for the review @lesteve . I addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps note somewhere in docs (deprecation message or docstring) that the site itself is closing down. Otherwise LGTM
Thanks for the review @jnothman ! I addressed your comment. |
The Travis problem has been fixed, I have restarted the build. |
Looks good, merging, thanks a lot! |
This PR fixes issue #8229 (document classification example should use 'latin-1' encoding) by removing the example (as suggested by @lesteve ).
This also raises a deprecation warning when
load_mlcomp
is used to load the 20 newsgoups example, wherefetch_20newsgroups
should preferably be used instead. However, as both datasets are not strictly identical, maybe that's not the best solution.The MLComp text categorization example is mostly redundant with the other text categorization example while being less complete, and incites user to use a more complex way of loading the 20 newsgoups dataset via
load_mlcomp
instead offetch_20newsgroups
.