-
-
Notifications
You must be signed in to change notification settings - Fork 26k
DOC Improve documentation regarding some pitfalls in interpretation #20451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Improve documentation regarding some pitfalls in interpretation #20451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a nice addition. I added a couple of style changes but regarding the example, I think this is good. We would need more insights from people that have more expertise regarding causal inference indeed.
ping @dsleo @GaelVaroquaux
print(__doc__) | ||
|
||
import numpy as np | ||
from sklearn.linear_model import LinearRegression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this import next to where it is used and the same for numpy
examples/inspection/plot_linear_model_coefficient_interpretation.py
Outdated
Show resolved
Hide resolved
examples/inspection/plot_linear_model_coefficient_interpretation.py
Outdated
Show resolved
Hide resolved
examples/inspection/plot_linear_model_coefficient_interpretation.py
Outdated
Show resolved
Hide resolved
examples/inspection/plot_linear_model_coefficient_interpretation.py
Outdated
Show resolved
Hide resolved
examples/inspection/plot_linear_model_coefficient_interpretation.py
Outdated
Show resolved
Hide resolved
Full disclosure, we work with @jygerardy and I've pointed him to this issue after the last tech committee. And @jygerardy has strong expertise in causal inference - shameless article plug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @jygerardy!
(Thinking about the data generating process first feels very... Bayesian :D)
Thank you for all the useful suggestions @glemaitre @dsleo @thomasjpfan ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some small comments, otherwise I think this PR is ready to go!
I think we're good now @glemaitre @thomasjpfan. |
It seems that there was a problem in the last merge: many unrelated commits ended up in this branch. Would you mind starting this PR again with only the few files that are originally intended to be edited by the PR? |
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
5e0d8a6
to
21958f9
Compare
I updated the example with the following changes:
@ArturoAmorQ do you want to have a look at this example as well and give a review? |
examples/inspection/plot_linear_model_coefficient_interpretation.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you still working on this PR @jygerardy? If so, here is a first batch of comments.
# Warning: data and model quality | ||
# ------------------------------- | ||
# | ||
# Keep in mind that the outcome `y` and features `X` are the product | ||
# of a data generating process that is hidden from us. Machine | ||
# learning models are trained to approximate the unobserved | ||
# mathematical function that links `X` to `y` from sample data. As a | ||
# result, any interpretation made about a model may not necessarily | ||
# generalize to the true data generating process. This is especially | ||
# true when the model is of bad quality or when the sample data is | ||
# not representative of the population. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new section, I would add this text as a note on the current line 21, i.e. in the header. This way such an important statement will gain visibility.
examples/inspection/plot_linear_model_coefficient_interpretation.py
Outdated
Show resolved
Hide resolved
This is a nice example that helps with (what I think of) common misconceptions/limits of interpretability. Is there something I can help with to move this forward/towards merge? |
I will apply the changes and make sure the CI works. Then we can make a final review and merge upon the three approvals. |
Co-authored-by: Arturo Amor <[email protected]> Co-authored-by: Tim Head <[email protected]>
Waiting for this to merge. Good job |
…cikit-learn#20451) Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Arturo Amor <[email protected]> Co-authored-by: Tim Head <[email protected]>
…cikit-learn#20451) Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Arturo Amor <[email protected]> Co-authored-by: Tim Head <[email protected]>
…cikit-learn#20451) Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Arturo Amor <[email protected]> Co-authored-by: Tim Head <[email protected]>
…20451) Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Jean-Yves Gerardy <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Arturo Amor <[email protected]> Co-authored-by: Tim Head <[email protected]>
Reference Issues/PRs
Fixes #19413
What does this implement/fix? Explain your changes.
In the Common pitfalls in interpretation of coefficients of linear models:
when the model is poor or sample used is not representative of the population.
Add a tutorial to show--via simulation--that coefficients can be biased in the presence of unobserved
confounders.
Any other comments?