Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC Improve documentation regarding some pitfalls in interpretation #20451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jan 11, 2023

Conversation

jygerardy
Copy link
Contributor

Reference Issues/PRs

Fixes #19413

What does this implement/fix? Explain your changes.

In the Common pitfalls in interpretation of coefficients of linear models:

  • add warning on wrongly giving a causal interpretation to coefficients.
  • add warning on how interpretation from a model may not apply to the Data Generating Process
    when the model is poor or sample used is not representative of the population.

Add a tutorial to show--via simulation--that coefficients can be biased in the presence of unobserved
confounders.

Any other comments?

@glemaitre glemaitre changed the title [MRG] Fix Improve documentation regarding some pitfalls in interpretation DOC Improve documentation regarding some pitfalls in interpretation Jul 21, 2021
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice addition. I added a couple of style changes but regarding the example, I think this is good. We would need more insights from people that have more expertise regarding causal inference indeed.

ping @dsleo @GaelVaroquaux

print(__doc__)

import numpy as np
from sklearn.linear_model import LinearRegression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this import next to where it is used and the same for numpy

@dsleo
Copy link
Contributor

dsleo commented Jul 23, 2021

Full disclosure, we work with @jygerardy and I've pointed him to this issue after the last tech committee. And @jygerardy has strong expertise in causal inference - shameless article plug.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @jygerardy!

(Thinking about the data generating process first feels very... Bayesian :D)

@jygerardy
Copy link
Contributor Author

Thank you for all the useful suggestions @glemaitre @dsleo @thomasjpfan !
I added them all.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some small comments, otherwise I think this PR is ready to go!

@github-actions github-actions bot added the cython label Dec 7, 2021
@jygerardy
Copy link
Contributor Author

I think we're good now @glemaitre @thomasjpfan.
Thanks!

@glemaitre glemaitre self-requested a review December 14, 2021 18:34
@ogrisel
Copy link
Member

ogrisel commented Dec 20, 2021

It seems that there was a problem in the last merge: many unrelated commits ended up in this branch. Would you mind starting this PR again with only the few files that are originally intended to be edited by the PR?

@jygerardy jygerardy force-pushed the causal_interpretation branch from 5e0d8a6 to 21958f9 Compare January 12, 2022 16:39
@jjerphan jjerphan removed the cython label Jul 29, 2022
@glemaitre
Copy link
Member

I updated the example with the following changes:

  • sync with main
  • split the 2 predictive models analysis into 2 sections
  • add a plot to compare the coefficients of the true generative model and the predictive models

@ArturoAmorQ do you want to have a look at this example as well and give a review?
I think that it can go in the 1.3 release.

@glemaitre glemaitre added this to the 1.3 milestone Dec 1, 2022
Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still working on this PR @jygerardy? If so, here is a first batch of comments.

Comment on lines +734 to +744
# Warning: data and model quality
# -------------------------------
#
# Keep in mind that the outcome `y` and features `X` are the product
# of a data generating process that is hidden from us. Machine
# learning models are trained to approximate the unobserved
# mathematical function that links `X` to `y` from sample data. As a
# result, any interpretation made about a model may not necessarily
# generalize to the true data generating process. This is especially
# true when the model is of bad quality or when the sample data is
# not representative of the population.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new section, I would add this text as a note on the current line 21, i.e. in the header. This way such an important statement will gain visibility.

@betatim
Copy link
Member

betatim commented Dec 8, 2022

This is a nice example that helps with (what I think of) common misconceptions/limits of interpretability.

Is there something I can help with to move this forward/towards merge?

@glemaitre
Copy link
Member

I will apply the changes and make sure the CI works. Then we can make a final review and merge upon the three approvals.

@haiatn
Copy link
Contributor

haiatn commented Dec 16, 2022

Waiting for this to merge. Good job

@glemaitre glemaitre merged commit c892ade into scikit-learn:main Jan 11, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…cikit-learn#20451)

Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Arturo Amor <[email protected]>
Co-authored-by: Tim Head <[email protected]>
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…cikit-learn#20451)

Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Arturo Amor <[email protected]>
Co-authored-by: Tim Head <[email protected]>
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
…cikit-learn#20451)

Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Arturo Amor <[email protected]>
Co-authored-by: Tim Head <[email protected]>
adrinjalali pushed a commit that referenced this pull request Jan 24, 2023
…20451)

Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Jean-Yves Gerardy <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Arturo Amor <[email protected]>
Co-authored-by: Tim Head <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve documentation regarding some pitfalls in interpretation
10 participants