-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
IterativeImputer shouldn't just use l2 loss by default #13286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
NMF? why? Sorry I don't follow. |
It'll be way slower with RF and perhaps not better! Is it identical to NMF? Or just similar? I'd love to see an example. |
I think NMF was a typo. Low rank matrix factorization was what was meant.
It would be slower, but if we want a fast imputer, we should code a low rank matrix factorisation.
Sent from my phone. Please forgive typos and briefness.
…On Mar 2, 2019, 02:04, at 02:04, Sergey Feldman ***@***.***> wrote:
It'll be way slower with RF and perhaps not better!
Is it identical to NMF? Or just similar? I'd love to see an example.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#13286 (comment)
|
Sorry, not NMF, just MF. Jotted this down in a spare moment so as to not
lose your comment on the defaults, Gaël.
… |
What's the effective rank when doing IterativeImputer? Is there an
equivalence proof somewhere or empirical results I can look at showing this
equivalence?
On Fri, Mar 1, 2019, 8:29 PM Gael Varoquaux <[email protected]>
wrote:
… I think NMF was a typo. Low rank matrix factorization was what was meant.
It would be slower, but if we want a fast imputer, we should code a low
rank matrix factorisation.
Sent from my phone. Please forgive typos and briefness.
On Mar 2, 2019, 02:04, at 02:04, Sergey Feldman ***@***.***>
wrote:
>It'll be way slower with RF and perhaps not better!
>
>Is it identical to NMF? Or just similar? I'd love to see an example.
>
>--
>You are receiving this because you were mentioned.
>Reply to this email directly or view it on GitHub:
>
#13286 (comment)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13286 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABya7IDi7etzzRNhrlsT9FqOUDi2gV-vks5vShpUgaJpZM4bScFx>
.
|
We should consider before 0.22 what a better default estimator is for IterativeImputer. |
What's wrong with |
One issue was speed (without sample_posterior), the other was the concern
here that it is an inefficient way to effectively do nmf if I understand
correctly.
|
I would love to see some experiments or simulations or theory showing the
MF equivalence...
…On Wed, Jul 3, 2019, 7:29 PM Joel Nothman ***@***.***> wrote:
One issue was speed (without sample_posterior), the other was the concern
here that it is an inefficient way to effectively do nmf if I understand
correctly.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13286?email_source=notifications&email_token=AAOJV3GMNARQDAUD32W7CJTP5VOAFA5CNFSM4G2JYFY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZGEJSQ#issuecomment-508314826>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOJV3CNGNK5JZH4MNGFQK3P5VOAFANCNFSM4G2JYFYQ>
.
|
I strongly believe a linear estimator is the best default.
If we want linear conditional imputation, we should be using low-rank
matrix factorization.
MissForest, the R package implementing conditional imputation based on
forest, is one of the packages that works best as a black-box method. I
think that this is what the IterativeImputer should be aiming to mimick.
|
I would appreciate seeing some performance numbers comparing BayesianRidge
vs RandomForest vs ExtraTrees as estimators in terms of both quality and
also running time. Speed is pretty important for defaults, and we'd want to
know what kind of gains are possible for the extra time paid. Do you two
agree?
In case we decide to keep linear as default. One option would be to run
RidgeCV for one round, and freeze reg params chosen for each column. Should
stop the odd jumpiness we observed.
I personally think a more important addition for 0.22 is to add a
classifier for the categorical columns.
…On Wed, Jul 3, 2019, 11:10 PM Gael Varoquaux ***@***.***> wrote:
> I strongly believe a linear estimator is the best default.
If we want linear conditional imputation, we should be using low-rank
matrix factorization.
MissForest, the R package implementing conditional imputation based on
forest, is one of the packages that works best as a black-box method. I
think that this is what the IterativeImputer should be aiming to mimick.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13286?email_source=notifications&email_token=AAOJV3APLJOVKK5CZ5GOTV3P5WH4FA5CNFSM4G2JYFY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZGNJOA#issuecomment-508351672>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOJV3BAL36XN4A6R3WDZUDP5WH4FANCNFSM4G2JYFYQ>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
@GaelVaroquaux points out that iterative imputation with a regularised least-squares model is more-or-less the same as using NMF for imputation. We should instead use RandomForestRegressor as the default regressor in IterativeImputer, at least if sample_posterior=False (or we can implement predict(return_std=True) on RandomForestRegressor!).
The text was updated successfully, but these errors were encountered: