Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #16151

What does this implement/fix? Explain your changes.

Queries X when it is precomputed.

Comment on lines +77 to +78
if affinity == "precomputed":
D = X[np.ix_(idx_i, idx_j)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomDLT I see that #20531 raises an error for precomputed metrics. Does it make sense to complete the matrix like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completing the connectivity based on the affinity is a hack, because the connectivity can be arbitrary and unrelated to X and its affinity matrix. For example in this example, the connectivity is based on the pixel grid and has nothing to do with the pixel value. But if we do not question this hack, your fix makes it work with precomputed affinity, which does fix the linked issue.

In #20531, the situation is a bit different because the connectivity is not separate from the distance matrix. So if the distance matrix is precomputed as a sparse graph, the connectivity is directly given by the sparse graph connectivity, and there is no way to reconnect disconnected components based on the precomputed sparse graph. So I raise en error instead.

Comment on lines +77 to +78
if affinity == "precomputed":
D = X[np.ix_(idx_i, idx_j)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completing the connectivity based on the affinity is a hack, because the connectivity can be arbitrary and unrelated to X and its affinity matrix. For example in this example, the connectivity is based on the pixel grid and has nothing to do with the pixel value. But if we do not question this hack, your fix makes it work with precomputed affinity, which does fix the linked issue.

In #20531, the situation is a bit different because the connectivity is not separate from the distance matrix. So if the distance matrix is precomputed as a sparse graph, the connectivity is directly given by the sparse graph connectivity, and there is no way to reconnect disconnected components based on the precomputed sparse graph. So I raise en error instead.

Co-authored-by: Tom Dupré la Tour <[email protected]>
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well. I am convinced that the enforced consistency makes sense. The non-regression test makes sense.

@ogrisel ogrisel merged commit bcf2b27 into scikit-learn:main Jul 27, 2021
TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Jul 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agglomerative clustering ValueError: Precomputed metric requires shape (n_queries, n_indexed). Got (7, 57) for 50 indexed.

3 participants