Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@RUrlus
Copy link
Contributor

@RUrlus RUrlus commented Oct 21, 2023

Reference Issues/PRs

Closes #27373

What does this implement/fix? Explain your changes.

The current behaviour of the preprocessing.QuantileTransformer is to disable subsampling when subsampling > n_samples but this is not documented.
This PR adds a documented option (subsample=None) to disable subsampling as discussed with @lorentzenchr.

Any other comments?

Arguably the default should be changed to None in the future as subsampling introduces measurable bias, see #27373 for details.

@github-actions
Copy link

github-actions bot commented Oct 21, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a4d4386. Link to the linter CI: here

@RUrlus RUrlus changed the title [MRG] ENH Add option to disable subsampling to preprocessing.QuantileTransformer ENH Add option to disable subsampling to preprocessing.QuantileTransformer Oct 21, 2023
@RUrlus RUrlus force-pushed the qt_add_optional_subsampling branch from dfc209f to 93e9936 Compare October 25, 2023 15:54
@RUrlus
Copy link
Contributor Author

RUrlus commented Oct 25, 2023

@betatim Thanks for the review, I've incorporated all your feedback

@RUrlus RUrlus force-pushed the qt_add_optional_subsampling branch from 93e9936 to f37795a Compare October 27, 2023 05:57
@glemaitre glemaitre self-requested a review October 30, 2023 14:37
@glemaitre
Copy link
Member

In addition to @betatim comment, could you add an entry in the changelog as an enhancement.

@RUrlus RUrlus force-pushed the qt_add_optional_subsampling branch from af60c36 to 2f9fad3 Compare October 30, 2023 17:55
@glemaitre
Copy link
Member

@RUrlus Could you merge main in your branch. It seems that something went wrong.

@RUrlus RUrlus force-pushed the qt_add_optional_subsampling branch from c78ceef to 15bdb4b Compare October 30, 2023 19:16
@RUrlus RUrlus force-pushed the qt_add_optional_subsampling branch from 15bdb4b to b357a5d Compare October 31, 2023 16:51
@RUrlus RUrlus force-pushed the qt_add_optional_subsampling branch from b357a5d to 27e4a81 Compare October 31, 2023 16:55
@RUrlus
Copy link
Contributor Author

RUrlus commented Nov 2, 2023

@betatim, @glemaitre I've incorporated all the feedback. Can we merge this one in?

@RUrlus
Copy link
Contributor Author

RUrlus commented Nov 22, 2023

Hi @betatim, @glemaitre,

Polite ping on the above. Would be nice if we could this one done.

@glemaitre glemaitre self-requested a review November 22, 2023 17:39
@glemaitre
Copy link
Member

I'll have a look tomorrow. Thanks for pinging.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart of those small doc changes.

@glemaitre glemaitre added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 23, 2023
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied the remaining requested changes.

LGTM. Thanks @RUrlus

@jeremiedbb jeremiedbb enabled auto-merge (squash) March 8, 2024 10:34
@RUrlus
Copy link
Contributor Author

RUrlus commented Mar 8, 2024

@jeremiedbb thanks for the approval and the fixing the last comments. I was waiting to get feedback from the second before making the changes but this works!

@jeremiedbb jeremiedbb merged commit f4c058a into scikit-learn:main Mar 8, 2024
@RUrlus RUrlus deleted the qt_add_optional_subsampling branch March 8, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:preprocessing Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QuantileTransformer's default subsampling introduces artefacts for unbounded distributions

10 participants