[MRG] ENH Add 'minimum' and 'maximum' strategies to SimpleImputer #27986

mark-thm · 2023-12-19T16:36:52Z

Reference Issues/PRs

none

What does this implement/fix? Explain your changes.

Adds 'minimum' and 'maximum' strategies to SimpleImputer to impute values based on minimum or maximum values, respectively.

Any other comments?

github-actions · 2023-12-19T16:39:50Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: b06eb7c. Link to the linter CI: here}

mark-thm · 2023-12-20T03:44:44Z

It looks to me like all branches of the new code get exercised, I can’t tell why the coverage tool seems to think some elif statements don’t get hit but the blocks inside are fully covered.

adrinjalali · 2023-12-25T16:40:12Z

I wonder if we should instead accept a callable which takes a column and returns a constant to fill the values.

mark-thm · 2023-12-26T23:48:06Z

I wonder if we should instead accept a callable which takes a column and returns a constant to fill the values.

That's probably a bit bigger of a change and adapts the interface to SimpleImputer quite a bit more -- notably it might be a bit awkward becuase:

lambda authors would need to write both sparse and dense implementations
the strategies today are all different kinds of stats (mean, median, most_frequent), it seems awkward to suddenly expand this list with 'lambda' or 'custom'

Besides the point, I'm happy to field questions about the min/max implementation and usage from a support perspective, but I'd definitely hesitate to support pairs of arbitrary lambdas.

jnothman

What are the use cases for this? Seems reasonable to me as an implementation.

Extending to custom callables would especially enable other quantiles. I think in the custom callable case, we would just pass the callable an array of non-missing values.

mark-thm · 2023-12-27T23:11:28Z

What are the use cases for this?

For our models, the imputed value for some missing features performs best when set to minimum or maximum of the feature. Consider a feature which typically has a value of 0, provides highly negative signal when it's any positive integer, and for some of our inference-time executions is missing: in this case, we want to drive this feature with a value of 0 (so that our model doesn't weigh this feature heavily). We have a number of features where the right missingness value isn't something as obvious as 0, but we know it's "the typical minimum" or "the typical maximum", and rather than pre-compute this value by feature and store the constant, we'd rather 'train' the imputer to impute these features along with all the others.

mark-thm · 2023-12-31T15:06:26Z

What steps can I take to get this merged?

adrinjalali · 2024-01-02T17:12:09Z

Thanks for the info, I'm happy for it to be included. But don't have bandwidth to review, no blocker from my side though.

So next would be for two reviewers to have a look. Unfortunately that's a bottleneck and we don't necessarily have enough reviewers for a timely review of all PRs. So we shall wait and see.

jnothman · 2024-01-02T22:51:00Z

How hard do you think it would be to mock up a PR that takes a callable run over each array of non-missing values? Given how rarely the set of options here has had to change, I suspect that a generic solution will be more helpful for more users than this specific one...?

mark-thm · 2024-01-03T01:02:36Z

#28053

github-actions bot added the module:impute label Dec 19, 2023

mark-thm changed the title ~~Add 'minimum' and 'maximum' strategies to SimpleImputer~~ ENH Add 'minimum' and 'maximum' strategies to SimpleImputer Dec 19, 2023

mark-thm force-pushed the me/min-max-imputation branch 3 times, most recently from 48c0277 to f0755ee Compare December 21, 2023 15:25

mark-thm changed the title ~~ENH Add 'minimum' and 'maximum' strategies to SimpleImputer~~ [MRG] ENH Add 'minimum' and 'maximum' strategies to SimpleImputer Dec 21, 2023

mark-thm added 3 commits December 26, 2023 18:42

Add 'minimum' and 'maximum' strategies to SimpleImputer

27f807b

Satisfy code coverage issues

9951214

Move what's new to 1.5

b06eb7c

mark-thm force-pushed the me/min-max-imputation branch from c381b5a to b06eb7c Compare December 26, 2023 23:43

jnothman reviewed Dec 27, 2023

View reviewed changes

adrinjalali added the Waiting for Reviewer label Jan 2, 2024

mark-thm mentioned this pull request Jan 3, 2024

Add custom imputation strategy to SimpleImputer #28053

Merged

mark-thm closed this Jan 4, 2024

mark-thm deleted the me/min-max-imputation branch January 4, 2024 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] ENH Add 'minimum' and 'maximum' strategies to SimpleImputer #27986

[MRG] ENH Add 'minimum' and 'maximum' strategies to SimpleImputer #27986

Uh oh!

mark-thm commented Dec 19, 2023

Uh oh!

github-actions bot commented Dec 19, 2023 •

edited

Loading

Uh oh!

mark-thm commented Dec 20, 2023

Uh oh!

adrinjalali commented Dec 25, 2023

Uh oh!

mark-thm commented Dec 26, 2023

Uh oh!

jnothman left a comment

Uh oh!

mark-thm commented Dec 27, 2023

Uh oh!

mark-thm commented Dec 31, 2023

Uh oh!

adrinjalali commented Jan 2, 2024

Uh oh!

jnothman commented Jan 2, 2024

Uh oh!

mark-thm commented Jan 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[MRG] ENH Add 'minimum' and 'maximum' strategies to SimpleImputer #27986

[MRG] ENH Add 'minimum' and 'maximum' strategies to SimpleImputer #27986

Uh oh!

Conversation

mark-thm commented Dec 19, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

mark-thm commented Dec 20, 2023

Uh oh!

adrinjalali commented Dec 25, 2023

Uh oh!

mark-thm commented Dec 26, 2023

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

mark-thm commented Dec 27, 2023

Uh oh!

mark-thm commented Dec 31, 2023

Uh oh!

adrinjalali commented Jan 2, 2024

Uh oh!

jnothman commented Jan 2, 2024

Uh oh!

mark-thm commented Jan 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Dec 19, 2023 •

edited

Loading