-
-
Notifications
You must be signed in to change notification settings - Fork 26.4k
FIX preprocessing: Fix OneHotEncoder handle_unknown='warn' behavior #32592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Nithurshen
wants to merge
6
commits into
scikit-learn:main
Choose a base branch
from
Nithurshen:bug/onehotencoder-handle_unknown=warn
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
FIX preprocessing: Fix OneHotEncoder handle_unknown='warn' behavior #32592
Nithurshen
wants to merge
6
commits into
scikit-learn:main
from
Nithurshen:bug/onehotencoder-handle_unknown=warn
+84
−21
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
betatim
reviewed
Oct 28, 2025
Member
|
Thanks for working on this. I think this looks good, except for the indentation change. |
ff7ee78 to
ef52e36
Compare
betatim
approved these changes
Oct 29, 2025
Contributor
Author
|
@betatim, Can you please request a second reviewer, as it has already been two weeks? |
Contributor
Author
|
@betatim, Can you please tell me what to do with the PR? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #32589
Description
According to the documentation
handle_unknown='warn'inOneHotEncoderis supposed to behave identically tohandle_unknown='infrequent_if_exist'(i.e., map unknown categories to the infrequent category) while also emitting aUserWarning.This PR fixes a bug where
handle_unknown='warn'was incorrectly behaving likehandle_unknown='ignore', causing unknown categories to be encoded as all zeros.Additionally, the
UserWarningitself was misleading. It incorrectly stated that unknown categories would be "encoded as all zeros" even whenhandle_unknown='infrequent_if_exist'was used.Changes Made
This PR addresses the bug in two ways:
Corrected the Behavior:
In
_BaseEncoder._map_infrequent_categories, the logic that un-masks unknown values (to map them to the infrequent category) was updated to includehandle_unknown='warn'. It previously only checked forhandle_unknown='infrequent_if_exist'. This ensures the behavior of'warn'now matches'infrequent_if_exist'.Corrected the Warning Message:
In
_BaseEncoder._transform, the warning-generation logic was updated to be conditional."...encoded as the infrequent category."whenhandle_unknownis'warn'or'infrequent_if_exist'."...encoded as all zeros"message whenhandle_unknown='ignore'.Testing
test_onehotencoder_handle_unknown_warn_maps_to_infrequent, to specifically verify that'warn'produces the same output as'infrequent_if_exist'and emits the new, correct warning.test_ohe_handle_unknown_warnandtest_ohe_drop_first_handle_unknown_ignore_warns) that were failing because they were asserting the old, incorrect warning message. They now expect the new, correct warning message.sklearn/preprocessing/tests/test_encoders.pynow pass.Checklist