-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-45530: speed listobject.c's unsafe_tuple_compare() #29076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is it okay that this changes observable behavior? In particular, >>> class X:
... def __init__(self, label):
... self.label = label
... def __eq__(self, other):
... print(self.label, "==", other.label)
... return True
... def __lt__(self, other):
... print(self.label, "<", other.label)
... return True
... def __repr__(self):
... return self.label
...
...
>>> L = [X("A"), X("B"), X("C"), X("D")]
>>> sorted(L)
B < A
C < B
D < C
[D, C, B, A]
############## Before ##############
>>> sorted([(a,) for a in L])
B == A
C == B
D == C
[(A,), (B,), (C,), (D,)]
############## After ##############
>>> sorted([(a,) for a in L])
B < A
C < B
D < C
[(D,), (C,), (B,), (A,)] If we didn't want this change, then there could be stricter checks in the pre-sort scan so that |
I think it's fine, just not for a bugfix release. Python defines very little about its sorting algorithm, and effectively doesn't really define anything about the specific example you gave, since class If you have a class that defines a "for real" total ordering, then the result is defined, including that pairs comparing equal must retain their original order. |
When the first elements are equal, which is faster, the two calls to tuple_elem_compare() or the one call to Py_RichCompareBool(Py_EQ)? |
Can't answer without knowing the specific function The two calls may be faster then. In general, though, I expect the two calls would be slower. |
@tim-one Even for latin strings, I think if they're long enough, two Related: The comment above |
Summary of how I see it: Supposedly common case, where the first element differs:
Supposedly rare case, where the first element is equal.
=> Winner overall: Also depends on how common/rare equality at the first element really is. |
@pochmann, yes, if two latin strings are equal, memcmp will have to look at every pair of characters regardless of which comparison outcome is asked for. The patch here appears to be pretty much a wash for the StackOverflow program. I don't care - I think his keys were obviously and highly contrived, and so was his raw data. It wasn't "a real program" in any respect. But it was a program anyone could run as-is, which is the primary thing on SO. You noted that there a were lot of duplicates among the |
@tim-one Yes, the stackoverflow question uses contrived data, I just mean it made me aware that differences at the second element can cause a lot of additional first-element comparisons. Is it unrealistic? If you for example sort events by day and then by time, you likely do have days duplicated a lot and not much duplication among times within each day. I noticed the higher duplicate among
|
resolved by the very first tuple elements, and adjust strategy accordingly.
strategy. This looks to be quite successful. It loses a few per cent in speed in cases that always want to use the cheaper tests, but can gain far more (compared to this branch's state before this commit) in some cases where PyObject_RichCompareBool(..., Py_EQ) typically returns 1 (they're equal) when applied to the first pair.
Swap the order of if/else blocks to put the more likely block first.
…nGH-29076)" This reverts commit 51ed2c5.
bpo-45530: speed listobject.c's unsafe_tuple_compare()
https://bugs.python.org/issue45530