Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Dibyo10
Copy link

@Dibyo10 Dibyo10 commented Jul 29, 2025

Reference Issues/PRs

Closes #30689

What does this implement/fix? Explain your changes.

This PR exposes the requires_fit=False tag for both the FeatureHasher and HashingVectorizer classes in sklearn.feature_extraction. Both of these estimators are stateless, and this tag signals to downstream tools and users that calling .fit() is not required for them—improving consistency across scikit-learn and enabling better introspection and pipeline optimization.

Key changes:

  • Added the _more_tags method to both FeatureHasher and HashingVectorizer, returning {"requires_fit": False}.
  • Added unit tests in sklearn/feature_extraction/tests/test_hash.py and sklearn/feature_extraction/tests/test_text.py to verify that the requires_fit tag is correctly set to False for each class.

Any other comments?

  • No API changes or deprecations are introduced.
  • Code style follows the scikit-learn conventions and all tests pass locally.
  • This brings these estimators in line with other stateless estimators in scikit-learn, improving user and developer experience.

Thank you for your time and consideration!

Dibyo10 added 4 commits July 29, 2025 21:05
Expose the requires_fit=False tag in the FeatureHasher class to indicate that it is a stateless estimator and does not require fitting. This ensures consistency with other stateless estimators in scikit-learn and helps downstream tools and users.
Expose the requires_fit=False tag in the HashingVectorizer class to indicate that it is a stateless estimator and does not require fitting. This brings it in line with other stateless estimators and improves clarity for downstream users and tools.
Add a unit test to verify that FeatureHasher exposes the requires_fit tag as False, ensuring the tag is present and correct.
Add a unit test to check that HashingVectorizer exposes the requires_fit tag as False, confirming statelessness.
@github-actions
Copy link

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.

Details

sklearn/feature_extraction/text.py:930:1: W293 [*] Blank line contains whitespace
    |
928 |         return{"requires_fit":False}
929 |
930 |  
    | ^ W293
    |
    = help: Remove whitespace from blank line

sklearn/tests/test_hash.py:1:1: I001 [*] Import block is un-sorted or un-formatted
  |
1 | / import pytest
2 | | from sklearn.feature_extraction import FeatureHasher
  | |____________________________________________________^ I001
3 |
4 |   def test_feature_hasher_requires_fit_tag():
  |
  = help: Organize imports

sklearn/tests/test_hash.py:1:8: F401 [*] `pytest` imported but unused
  |
1 | import pytest
  |        ^^^^^^ F401
2 | from sklearn.feature_extraction import FeatureHasher
  |
  = help: Remove unused import: `pytest`

sklearn/tests/test_text.py:1:1: I001 [*] Import block is un-sorted or un-formatted
  |
1 | / import pytest
2 | | from sklearn.feature_extraction.text import HashingVectorizer
  | |_____________________________________________________________^ I001
3 |
4 |   def test_hashing_vectorizer_requires_fit_tag():
  |
  = help: Organize imports

sklearn/tests/test_text.py:1:8: F401 [*] `pytest` imported but unused
  |
1 | import pytest
  |        ^^^^^^ F401
2 | from sklearn.feature_extraction.text import HashingVectorizer
  |
  = help: Remove unused import: `pytest`

Found 5 errors.
[*] 5 fixable with the `--fix` option.

ruff format

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.

Details

--- sklearn/feature_extraction/_hash.py
+++ sklearn/feature_extraction/_hash.py
@@ -205,5 +205,6 @@
         elif self.input_type == "dict":
             tags.input_tags.dict = True
         return tags
+
     def _more_tags(self):
-        return{"requires_fit":False}
+        return {"requires_fit": False}

--- sklearn/feature_extraction/text.py
+++ sklearn/feature_extraction/text.py
@@ -924,10 +924,9 @@
         tags.input_tags.string = True
         tags.input_tags.two_d_array = False
         return tags
-    def _more_tags(self):
-        return{"requires_fit":False}
 
- 
+    def _more_tags(self):
+        return {"requires_fit": False}
 
 
 def _document_frequency(X):

--- sklearn/tests/test_hash.py
+++ sklearn/tests/test_hash.py
@@ -1,6 +1,7 @@
 import pytest
 from sklearn.feature_extraction import FeatureHasher
 
+
 def test_feature_hasher_requires_fit_tag():
     hasher = FeatureHasher()
     assert hasher._get_tags()["requires_fit"] is False

--- sklearn/tests/test_text.py
+++ sklearn/tests/test_text.py
@@ -1,6 +1,7 @@
 import pytest
 from sklearn.feature_extraction.text import HashingVectorizer
 
+
 def test_hashing_vectorizer_requires_fit_tag():
     vectorizer = HashingVectorizer()
     assert vectorizer._get_tags()["requires_fit"] is False

4 files would be reformatted, 924 files already formatted

Generated for commit: 1107fe3. Link to the linter CI: here

@adrinjalali
Copy link
Member

closing as a duplicate of #31851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FeatureHasher and HashingVectorizer does not expose requires_fit=False tag

2 participants