-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FIX Fixes AgglomerativeClustering with precomputed affinity #20597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if affinity == "precomputed": | ||
| D = X[np.ix_(idx_i, idx_j)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completing the connectivity based on the affinity is a hack, because the connectivity can be arbitrary and unrelated to X and its affinity matrix. For example in this example, the connectivity is based on the pixel grid and has nothing to do with the pixel value. But if we do not question this hack, your fix makes it work with precomputed affinity, which does fix the linked issue.
In #20531, the situation is a bit different because the connectivity is not separate from the distance matrix. So if the distance matrix is precomputed as a sparse graph, the connectivity is directly given by the sparse graph connectivity, and there is no way to reconnect disconnected components based on the precomputed sparse graph. So I raise en error instead.
| if affinity == "precomputed": | ||
| D = X[np.ix_(idx_i, idx_j)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completing the connectivity based on the affinity is a hack, because the connectivity can be arbitrary and unrelated to X and its affinity matrix. For example in this example, the connectivity is based on the pixel grid and has nothing to do with the pixel value. But if we do not question this hack, your fix makes it work with precomputed affinity, which does fix the linked issue.
In #20531, the situation is a bit different because the connectivity is not separate from the distance matrix. So if the distance matrix is precomputed as a sparse graph, the connectivity is directly given by the sparse graph connectivity, and there is no way to reconnect disconnected components based on the precomputed sparse graph. So I raise en error instead.
Co-authored-by: Tom Dupré la Tour <[email protected]>
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well. I am convinced that the enforced consistency makes sense. The non-regression test makes sense.
…earn#20597) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Tom Dupré la Tour <[email protected]>
…earn#20597) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Tom Dupré la Tour <[email protected]>
Reference Issues/PRs
Fixes #16151
What does this implement/fix? Explain your changes.
Queries
Xwhen it is precomputed.