-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Move imputation out of preprocessing #9726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't like this (even though we've done that in the past - inconsistently. Why is LDA and LinearSVC not in linear models?) I'm +.5 for |
Actually, given that MICE is not that far away, should be +1 |
MICE is not far away at all. If we make sklearn.impute, do we rename Imputer to |
Of course we could just make KNNImputer live under neighbors and MICE live under ?ensemble. I've wondered whether in some ways it would make sense to have a pseudo-module sklearn.classifiers, sklearn.regressors, sklearn.imputers, etc, that import from the relevant implementation locations... |
Yeah... I would prefer a semantic organization, but it's not how we have done things in the past. I guess you suggest having that in parallel to the current structure? I wouldn't be opposed, but it's a big change. Is the goal to keep two places to import from long-term? That seems slightly confusing.... |
it's not entirely true that we haven't done semantic organisation in the
past, sklearn.cluster,decomposition,manifold...
…On 9 Dec 2017 5:59 am, "Andreas Mueller" ***@***.***> wrote:
Yeah... I would prefer a semantic organization, but it's not how we have
done things in the past. I guess you suggest having that in parallel to the
current structure? I wouldn't be opposed, but it's a big change. Is the
goal to keep two places to import from long-term? That seems slightly
confusing....
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yLh_frd5DWue1XE_rdsyqIfjCXoks5s-YbzgaJpZM4PSgEV>
.
|
True, we have done a really weird mix. The fact that we have |
but for the users it's moot, as long as they can find stuff.
|
The MissingnessIndicator (#8075) should also live in this module, which may help users. |
I'd like another opinion, but I think this should happen. Make sklearn.impute and move Imputer and all the open imputation-related PRs to this module. |
I've opened this to contributors. Please
and after merge, please advise contributors at #8075, #8478, #9212 to move their work to the new module. |
Re: naming, I like |
it's comparable to DummyRegressor, but calling it DummyImputer seems
unreasonably disparaging :p
|
I agree because DummyRegressor is not actually useful for doing work, but the Imputer is quite useful. |
@jnothman I can work on this. Thanks for the detailed steps. |
@krishnakalyan3 Hey! If you have not done significant progress to this PR, I would like to address this issue. I have been working on this as well and will be opening PR in a while. |
You'd be welcome to find a way to split up the work. You do have a few things still on your plate elsewhere, @thechargedneutron |
@thechargedneutron ah you beat me to it. Thanks for taking up the task!. |
@krishnakalyan3 Sorry mate!! See you around 😄 |
Uh oh!
There was an error while loading. Please reload this page.
While we're considering additional imputers, I've wondered whether preprocessing is the right place for it. Yes, it is a preprocessing step before other learning, but it often makes use of other supervised and unsupervised learners and hence is a learning task of its own. And preprocessing is getting a bit cramped.
We could also do as with other models and have imputers appear in modules on the basis of how they work rather than function:
KNNImputer
could appear in neighbors for instance.MICE
could appear where..? And the basicImputer
in dummy? probably not.In practice I think it is more useful for users to
import sklearn.impute
, akin to our clusterers and decomposition, and unlike our predictors and outlier detectors that are grouped by algorithm.The text was updated successfully, but these errors were encountered: