MAINT: Add requires_fit=False tag for FeatureHasher and HashingVectorizer #31852

Dibyo10 · 2025-07-29T15:53:52Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR exposes the requires_fit=False tag for both the FeatureHasher and HashingVectorizer classes in sklearn.feature_extraction. Both of these estimators are stateless, and this tag signals to downstream tools and users that calling .fit() is not required for them—improving consistency across scikit-learn and enabling better introspection and pipeline optimization.

Key changes:

Added the _more_tags method to both FeatureHasher and HashingVectorizer, returning {"requires_fit": False}.
Added unit tests in sklearn/feature_extraction/tests/test_hash.py and sklearn/feature_extraction/tests/test_text.py to verify that the requires_fit tag is correctly set to False for each class.

Any other comments?

No API changes or deprecations are introduced.
Code style follows the scikit-learn conventions and all tests pass locally.
This brings these estimators in line with other stateless estimators in scikit-learn, improving user and developer experience.

Thank you for your time and consideration!

Expose the requires_fit=False tag in the FeatureHasher class to indicate that it is a stateless estimator and does not require fitting. This ensures consistency with other stateless estimators in scikit-learn and helps downstream tools and users.

Expose the requires_fit=False tag in the HashingVectorizer class to indicate that it is a stateless estimator and does not require fitting. This brings it in line with other stateless estimators and improves clarity for downstream users and tools.

Add a unit test to verify that FeatureHasher exposes the requires_fit tag as False, ensuring the tag is present and correct.

Add a unit test to check that HashingVectorizer exposes the requires_fit tag as False, confirming statelessness.

github-actions · 2025-07-29T15:54:45Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff check`

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.

Details


sklearn/feature_extraction/text.py:930:1: W293 [*] Blank line contains whitespace
    |
928 |         return{"requires_fit":False}
929 |
930 |  
    | ^ W293
    |
    = help: Remove whitespace from blank line

sklearn/tests/test_hash.py:1:1: I001 [*] Import block is un-sorted or un-formatted
  |
1 | / import pytest
2 | | from sklearn.feature_extraction import FeatureHasher
  | |____________________________________________________^ I001
3 |
4 |   def test_feature_hasher_requires_fit_tag():
  |
  = help: Organize imports

sklearn/tests/test_hash.py:1:8: F401 [*] `pytest` imported but unused
  |
1 | import pytest
  |        ^^^^^^ F401
2 | from sklearn.feature_extraction import FeatureHasher
  |
  = help: Remove unused import: `pytest`

sklearn/tests/test_text.py:1:1: I001 [*] Import block is un-sorted or un-formatted
  |
1 | / import pytest
2 | | from sklearn.feature_extraction.text import HashingVectorizer
  | |_____________________________________________________________^ I001
3 |
4 |   def test_hashing_vectorizer_requires_fit_tag():
  |
  = help: Organize imports

sklearn/tests/test_text.py:1:8: F401 [*] `pytest` imported but unused
  |
1 | import pytest
  |        ^^^^^^ F401
2 | from sklearn.feature_extraction.text import HashingVectorizer
  |
  = help: Remove unused import: `pytest`

Found 5 errors.
[*] 5 fixable with the `--fix` option.

`ruff format`

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.

Details


--- sklearn/feature_extraction/_hash.py
+++ sklearn/feature_extraction/_hash.py
@@ -205,5 +205,6 @@
         elif self.input_type == "dict":
             tags.input_tags.dict = True
         return tags
+
     def _more_tags(self):
-        return{"requires_fit":False}
+        return {"requires_fit": False}

--- sklearn/feature_extraction/text.py
+++ sklearn/feature_extraction/text.py
@@ -924,10 +924,9 @@
         tags.input_tags.string = True
         tags.input_tags.two_d_array = False
         return tags
-    def _more_tags(self):
-        return{"requires_fit":False}
 
- 
+    def _more_tags(self):
+        return {"requires_fit": False}
 
 
 def _document_frequency(X):

--- sklearn/tests/test_hash.py
+++ sklearn/tests/test_hash.py
@@ -1,6 +1,7 @@
 import pytest
 from sklearn.feature_extraction import FeatureHasher
 
+
 def test_feature_hasher_requires_fit_tag():
     hasher = FeatureHasher()
     assert hasher._get_tags()["requires_fit"] is False

--- sklearn/tests/test_text.py
+++ sklearn/tests/test_text.py
@@ -1,6 +1,7 @@
 import pytest
 from sklearn.feature_extraction.text import HashingVectorizer
 
+
 def test_hashing_vectorizer_requires_fit_tag():
     vectorizer = HashingVectorizer()
     assert vectorizer._get_tags()["requires_fit"] is False

4 files would be reformatted, 924 files already formatted

_{Generated for commit: 1107fe3. Link to the linter CI: here}

adrinjalali · 2025-07-30T10:48:26Z

closing as a duplicate of #31851

Dibyo10 added 4 commits July 29, 2025 21:05

TST: Add test for requires_fit tag in FeatureHasher

5579ff3

Add a unit test to verify that FeatureHasher exposes the requires_fit tag as False, ensuring the tag is present and correct.

TST: Add test for requires_fit tag in HashingVectorizer

1107fe3

Add a unit test to check that HashingVectorizer exposes the requires_fit tag as False, confirming statelessness.

github-actions bot added the module:feature_extraction label Jul 29, 2025

Dibyo10 mentioned this pull request Jul 29, 2025

FeatureHasher and HashingVectorizer does not expose requires_fit=False tag #30689

Closed

adrinjalali closed this Jul 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: Add requires_fit=False tag for FeatureHasher and HashingVectorizer #31852

MAINT: Add requires_fit=False tag for FeatureHasher and HashingVectorizer #31852

Uh oh!

Dibyo10 commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

adrinjalali commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

MAINT: Add requires_fit=False tag for FeatureHasher and HashingVectorizer #31852

MAINT: Add requires_fit=False tag for FeatureHasher and HashingVectorizer #31852

Uh oh!

Conversation

Dibyo10 commented Jul 29, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Key changes:

Any other comments?

Uh oh!

github-actions bot commented Jul 29, 2025

❌ Linting issues

ruff check

ruff format

Uh oh!

adrinjalali commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`ruff check`

`ruff format`