-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] Test to make sure deletion of stop_words_
does not affect transformation.
#4037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Test to make sure deletion of stop_words_
does not affect transformation.
#4037
Conversation
I'm -1 on this feature. The additional parameter just adds noise to what is already a huge and confusing parameter list on |
So these tests would suffice for now? |
@@ -857,6 +874,27 @@ def test_pickling_vectorizer(): | |||
assert_array_equal( | |||
copy.fit_transform(JUNK_FOOD_DOCS).toarray(), | |||
orig.fit_transform(JUNK_FOOD_DOCS).toarray()) | |||
|
|||
# Ensure that deleting the stop_words_ attribute doesn't affect pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't really need a pickle in there. Just need to test that deletion doesn't affect transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we recommend to do so to users, that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! Also just realized that transform
calls fit_transform
which is the method that sets the stop_word_
attribute... so should we need a test at all?
791366c
to
b9a2a54
Compare
@jnothman Does this look okay...?
|
stop_words_
does not affect transformation.
Not as far as I can see! |
Ah sorry for the blunder, it was |
sorry for the confusion :) |
b2e899a
to
a8fa121
Compare
@@ -857,6 +860,19 @@ def test_pickling_vectorizer(): | |||
assert_array_equal( | |||
copy.fit_transform(JUNK_FOOD_DOCS).toarray(), | |||
orig.fit_transform(JUNK_FOOD_DOCS).toarray()) | |||
|
|||
# Ensure that deleting the stop_words_ attribute doesn't affect transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make a separate test. This is no longer about pickling, and the current approach makes the above loop through instances
more confusing.
Please add a Notes section to the |
a8fa121
to
7c02fe0
Compare
@jnothman Done! BTW why is |
+0 on adding the tests.... The notes should show up in html, probably malformed docstrings. |
@amueller Do you feel the test is unnecessary? |
It is not even documented that you can remove it, right? I would rather document it than test for it. Or maybe document and also test. Then the test would ensure the documentation is correct. |
@amueller Does this change look okay? |
|
||
Notes | ||
----- | ||
The ``stop_words_`` attribute can get large which may not be desirable when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would document it at the attribute documentation, saying "only for introspection and can savely be removed before pickling" or something like that.
75ef655
to
427fb5f
Compare
stop_words_
does not affect transformation.stop_words_
does not affect transformation.
I thought documenting it at the model level made more sense because users aren't going to looking for things that can be removed. They're going to look for "why is this model so big?". But not too concerned either way. |
I don't have a strong opinion either way, sorry if I created work. |
Should I revert back to old form? ( Notes ) |
427fb5f
to
0875593
Compare
|
||
This attribute is provided only for introspection and can be safely | ||
removed using delattr or set to None before pickling. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Should I revert this to Notes
?
070f0b1
to
416fea3
Compare
I'll squash the 2nd and 3rd commits once the 3rd is approved! |
Do you think the tests are moot? perhaps we could have the documentation alone? ( In which case I'll delete 1st commit and squash 2 and 3 ) ? |
As I said, I don't have a strong opinion. |
5c81e60
to
155d575
Compare
@jnothman your view on whether the test feels unnecessary? |
I think that if it's documented, the test is worthwhile. Don't promise your On 16 January 2015 at 10:14, ragv [email protected] wrote:
|
Okay! thanks for the quick reply! :) |
ok then lets merge. |
155d575
to
f71248a
Compare
70c7435
to
43ad91a
Compare
@jnothman Could this be closed / merged ? |
ff36cb5
to
cd8bd93
Compare
DOC Add a line to {Count, Tfidf}Vectorizer about removal of stop_words_ DOC Add documentation of stop_words_ attr in TfidfVectorizer
cd8bd93
to
21369dd
Compare
[MRG] Test to make sure deletion of `stop_words_` does not affect transformation.
Thanks @ragv |
Addresses #4032
stop_words_
does not affect transforming.@jnothman @amueller Please take a look...