Thanks to visit codestin.com
Credit goes to github.com

Skip to content

CLN HDBSCAN _tree.pyx::do_labelling refactor #26101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 16, 2023

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Apr 5, 2023

Reference Issues/PRs

Addresses #24686
Selected subset of #26011

What does this implement/fix? Explain your changes.

Changes variable names to new standard, and includes an algorithm refactor to do_labelling. The new function is logically equivalent to the old, just with if-statement de-nesting and improved naming of intermediate values for readability.

Any other comments?

These changes were extracted from #26011 to facilitate quick review

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the logic is the same. Since the logic is a little involved, I think this needs a second review.

@thomasjpfan thomasjpfan added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Apr 5, 2023
@jjerphan jjerphan self-requested a review April 24, 2023 15:06
@ogrisel
Copy link
Member

ogrisel commented Apr 24, 2023

Changes variable names to new standard, and includes an algorithm refactor to do_labelling. The new function is logically equivalent to the old, just with if-statement de-nesting and improved naming of intermediate values for readability.

Have you measured any impact on performance?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the do_labelling function is complex enough to be turned into a cpdef function to be testable with a few dedicated unit tests, at least for the nominal cases, on simplistic data and ideally on a few edge case.

Testing the nominal cases would serve as:

  • documenting how this function is expected to work for future maintainers,
  • and for current day HDBSCAN reviewers,
  • non regression tests for future refactorings.

I also thinkg that a docstring would help (in addition).

Similar comment for the other functions / classes of this Cython module although probably in decidated PRs to keep this one focused.

@Micky774
Copy link
Contributor Author

Micky774 commented May 3, 2023

@ogrisel I've added docstring and tests for _do_labelling and I agree that it is worth revisiting those for other functions as well in other PRs, however I would prefer to save those for after merger into main.

@Micky774
Copy link
Contributor Author

@scikit-learn/core-devs If anyone has some extra bandwidth I would greatly appreciate a second review on this PR :)

@adrinjalali adrinjalali merged commit a773efb into scikit-learn:hdbscan May 16, 2023
@Micky774 Micky774 deleted the hdbscan_labelling branch May 16, 2023 14:00
Micky774 added a commit to Micky774/scikit-learn that referenced this pull request May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cython module:cluster No Changelog Needed Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants