-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG+1] Test to make sure deletion of stop_words_ does not affect transformation.
#4037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Test to make sure deletion of stop_words_ does not affect transformation.
#4037
Conversation
|
I'm -1 on this feature. The additional parameter just adds noise to what is already a huge and confusing parameter list on |
|
So these tests would suffice for now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't really need a pickle in there. Just need to test that deletion doesn't affect transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we recommend to do so to users, that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! Also just realized that transform calls fit_transform which is the method that sets the stop_word_ attribute... so should we need a test at all?
791366c to
b9a2a54
Compare
|
@jnothman Does this look okay...?
|
stop_words_ does not affect transformation.
Not as far as I can see! |
|
Ah sorry for the blunder, it was |
|
sorry for the confusion :) |
b2e899a to
a8fa121
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make a separate test. This is no longer about pickling, and the current approach makes the above loop through instances more confusing.
|
Please add a Notes section to the |
a8fa121 to
7c02fe0
Compare
|
@jnothman Done! BTW why is |
|
+0 on adding the tests.... The notes should show up in html, probably malformed docstrings. |
|
@amueller Do you feel the test is unnecessary? |
|
It is not even documented that you can remove it, right? I would rather document it than test for it. Or maybe document and also test. Then the test would ensure the documentation is correct. |
|
@amueller Does this change look okay? |
sklearn/feature_extraction/text.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would document it at the attribute documentation, saying "only for introspection and can savely be removed before pickling" or something like that.
75ef655 to
427fb5f
Compare
stop_words_ does not affect transformation.stop_words_ does not affect transformation.
|
I thought documenting it at the model level made more sense because users aren't going to looking for things that can be removed. They're going to look for "why is this model so big?". But not too concerned either way. |
|
I don't have a strong opinion either way, sorry if I created work. |
|
Should I revert back to old form? ( Notes ) |
427fb5f to
0875593
Compare
sklearn/feature_extraction/text.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Should I revert this to Notes?
070f0b1 to
416fea3
Compare
|
I'll squash the 2nd and 3rd commits once the 3rd is approved! |
|
Do you think the tests are moot? perhaps we could have the documentation alone? ( In which case I'll delete 1st commit and squash 2 and 3 ) ? |
|
As I said, I don't have a strong opinion. |
5c81e60 to
155d575
Compare
|
@jnothman your view on whether the test feels unnecessary? |
|
I think that if it's documented, the test is worthwhile. Don't promise your On 16 January 2015 at 10:14, ragv [email protected] wrote:
|
|
Okay! thanks for the quick reply! :) |
|
ok then lets merge. |
155d575 to
f71248a
Compare
70c7435 to
43ad91a
Compare
|
@jnothman Could this be closed / merged ? |
ff36cb5 to
cd8bd93
Compare
DOC Add a line to {Count, Tfidf}Vectorizer about removal of stop_words_
DOC Add documentation of stop_words_ attr in TfidfVectorizer
cd8bd93 to
21369dd
Compare
[MRG] Test to make sure deletion of `stop_words_` does not affect transformation.
|
Thanks @ragv |
Addresses #4032
stop_words_does not affect transforming.@jnothman @amueller Please take a look...