Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@hmustafamail
Copy link

@hmustafamail hmustafamail commented May 26, 2025

I am following up on issue #9570 (link to issue) by creating this pull request, which implements Medcouple in O(N log N) time.

Legacy functionality with O(N**2) time is preserved with the use of a flag use_fast. This flag defaults to the new behavior.

Please let me know if you have any questions, or if you need anything else.

Overview:

  • added fast algorithm

  • added legacy multiplexing option

  • revised tests (passed locally)

  • suggested draft release note

  • closes ENH: Medcouple in O(N Log N) time #9570

  • tests added / passed.

  • code/documentation is well formatted.

  • properly formatted commit message. See
    NumPy's guide.

- added fast algorithm
- added legacy multiplexing option
- revised tests
- suggested draft release note
@hmustafamail hmustafamail changed the title medcouple n log n ENH medcouple n log n (see #9570) May 26, 2025
@hmustafamail hmustafamail changed the title ENH medcouple n log n (see #9570) ENH: medcouple n log n (see #9570) May 26, 2025
Copy link
Member

@bashtage bashtage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking pretty good.

-----
This is a helper function for the O(N log N) medcouple algorithm.
"""
AW = sorted(zip(A, W), key=lambda x: x[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be ndarray at this point, in which case we could avoid the python zip and sorted, which would normally be much slower for even modest sized data?

mid = (beg + end) // 2
trial = AW[mid][0]

wleft = sum(w for a, w in AW if a < trial)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might also be slow since using Python objects.

wleft = sum(w for a, w in AW if a < trial)
wright = sum(w for a, w in AW if a >= trial)

if 2 * wleft > wtot:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some nan protection somwhere? Usually need to ensure that all values are non-nan which using boolean comp.

@hmustafamail hmustafamail marked this pull request as draft May 27, 2025 16:35
ENH: other numpy library usage.
STY: docstring improvements.
@hmustafamail
Copy link
Author

hmustafamail commented May 27, 2025

Thank you for your code critique. I made the following main changes:

  • Replaced Python lists with NumPy arrays
  • Removed zip(), replaced sorted() with np.sort(), other similar changes
  • Added NaN check to the top of _medcouple_nlogn()

Please let me know what else may be needed.

@hmustafamail hmustafamail marked this pull request as ready for review May 27, 2025 19:28
@hmustafamail hmustafamail requested a review from bashtage May 29, 2025 18:43
@josef-pkt josef-pkt added this to the 0.15 milestone Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Medcouple in O(N Log N) time

3 participants