-
-
Notifications
You must be signed in to change notification settings - Fork 26k
CLN HDBSCAN _tree.pyx::do_labelling
refactor
#26101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the logic is the same. Since the logic is a little involved, I think this needs a second review.
Have you measured any impact on performance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the do_labelling
function is complex enough to be turned into a cpdef
function to be testable with a few dedicated unit tests, at least for the nominal cases, on simplistic data and ideally on a few edge case.
Testing the nominal cases would serve as:
- documenting how this function is expected to work for future maintainers,
- and for current day HDBSCAN reviewers,
- non regression tests for future refactorings.
I also thinkg that a docstring would help (in addition).
Similar comment for the other functions / classes of this Cython module although probably in decidated PRs to keep this one focused.
@ogrisel I've added docstring and tests for |
@scikit-learn/core-devs If anyone has some extra bandwidth I would greatly appreciate a second review on this PR :) |
Reference Issues/PRs
Addresses #24686
Selected subset of #26011
What does this implement/fix? Explain your changes.
Changes variable names to new standard, and includes an algorithm refactor to
do_labelling
. The new function is logically equivalent to the old, just with if-statement de-nesting and improved naming of intermediate values for readability.Any other comments?
These changes were extracted from #26011 to facilitate quick review