Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bpo-45530: speed listobject.c's unsafe_tuple_compare() #29076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 25, 2021

Conversation

tim-one
Copy link
Member

@tim-one tim-one commented Oct 19, 2021

bpo-45530: speed listobject.c's unsafe_tuple_compare()

https://bugs.python.org/issue45530

@tim-one tim-one self-assigned this Oct 19, 2021
@tim-one tim-one changed the title First stab. About 40% speedup on tupsort.py's "(float,)" case. bpo-45530: First stab. About 40% speedup on tupsort.py's "(float,)" case. Oct 20, 2021
@tim-one tim-one changed the title bpo-45530: First stab. About 40% speedup on tupsort.py's "(float,)" case. bpo-45530: speed listobject.c's unsafe_tuple_compare() Oct 20, 2021
@sweeneyde
Copy link
Member

Is it okay that this changes observable behavior? In particular,

>>> class X:
...     def __init__(self, label):
...         self.label = label
...     def __eq__(self, other):
...         print(self.label, "==", other.label)
...         return True
...     def __lt__(self, other):
...         print(self.label, "<", other.label)
...         return True
...     def __repr__(self):
...         return self.label
... 
...     
>>> L = [X("A"), X("B"), X("C"), X("D")]
>>> sorted(L)
B < A
C < B
D < C
[D, C, B, A]

############## Before ##############
>>> sorted([(a,) for a in L])
B == A
C == B
D == C
[(A,), (B,), (C,), (D,)]

############## After ##############
>>> sorted([(a,) for a in L])
B < A
C < B
D < C
[(D,), (C,), (B,), (A,)]

If we didn't want this change, then there could be stricter checks in the pre-sort scan so that unsafe_tuple_compare only gets used if tuple_elem_compare is known to be safe.

@tim-one
Copy link
Member Author

tim-one commented Oct 20, 2021

Is it okay that this changes observable behavior?

I think it's fine, just not for a bugfix release. Python defines very little about its sorting algorithm, and effectively doesn't really define anything about the specific example you gave, since class X doesn't define a total ordering (doesn't, e.g., satisfy trichotomy, and for any a and b, a < b and b < a are both True). In "garbage in, garbage out" cases, we don't promise to keep the same garbage out.

If you have a class that defines a "for real" total ordering, then the result is defined, including that pairs comparing equal must retain their original order.

@tim-one tim-one removed the skip news label Oct 20, 2021
@rhettinger
Copy link
Contributor

When the first elements are equal, which is faster, the two calls to tuple_elem_compare() or the one call to Py_RichCompareBool(Py_EQ)?

@tim-one
Copy link
Member Author

tim-one commented Oct 20, 2021

When the first elements are equal, which is faster, the two calls to tuple_elem_compare() or the one call to Py_RichCompareBool(Py_EQ)?

Can't answer without knowing the specific function tuple_elem_compare resolves to. For very simple types (like floats, ints that fit in one internal CPython "digit", strings represented with 1-byte characters, ... at least) it resolves to special functions defined in listobject.c, which are leaner and faster than the base types' __lt__ implementations (note that the functions in listobject.c have no logic at all to compute anything other than <, require no type checks, and don't cater to the possibility of needing conversions - the pre-scan of the list that set this all up ensured type homogeneity).

The two calls may be faster then. In general, though, I expect the two calls would be slower.

@pochmann
Copy link
Contributor

pochmann commented Oct 20, 2021

@tim-one Even for latin strings, I think if they're long enough, two unsafe_latin_compare could take twice as long as one __eq__, right?

Related: The comment above unsafe_tuple_compare says "The idea is that most tuple compares don't involve x[1:]". At first that seemed right to me, and it would mean that "half of most of the time", you'd only need one tuple_elem_compare call, not two. So on average it would take 1.5 tuple_elem_compare. But after the analysis in my stackoverflow answer I'm not so sure anymore. In the smallest case, 11.01 out of 11.99 tuple comparisons were decided at the first element. But in the largest case, only 12.06 out of 21.26 were. I think partly because there were more duplicates at the first element, but also partly because the second element frequently differed, causing further tuple comparisons, which involves comparing equal first elements again.

@pochmann
Copy link
Contributor

Summary of how I see it:

Supposedly common case, where the first element differs:

  • Current way: 1 slow == and 1 fast <.
  • Proposed way: 1 or 2 fast <.
    => Winner: the proposed way

Supposedly rare case, where the first element is equal.

  • Current way: 1 slow ==.
  • Proposed way: 2 fast <.
    => Winner: depends on type/values.

=> Winner overall: Also depends on how common/rare equality at the first element really is.

@tim-one
Copy link
Member Author

tim-one commented Oct 20, 2021

@pochmann, yes, if two latin strings are equal, memcmp will have to look at every pair of characters regardless of which comparison outcome is asked for.

The patch here appears to be pretty much a wash for the StackOverflow program. I don't care - I think his keys were obviously and highly contrived, and so was his raw data. It wasn't "a real program" in any respect. But it was a program anyone could run as-is, which is the primary thing on SO.

You noted that there a were lot of duplicates among the ''.join(sorted(x)) keys, but there are far more duplicates among the x[::2] keys: the latter effectively builds a 3-digit decimal integer out of a 6-digit decimal integer, and so there are only about a thousand possible distinct results.

@pochmann
Copy link
Contributor

pochmann commented Oct 20, 2021

@tim-one Yes, the stackoverflow question uses contrived data, I just mean it made me aware that differences at the second element can cause a lot of additional first-element comparisons. Is it unrealistic? If you for example sort events by day and then by time, you likely do have days duplicated a lot and not much duplication among times within each day.

I noticed the higher duplicate among x[::2] later but forgot to update. Done now. It's less relevant, though. Duplicates of the primary keys are more relevant, as they allow the secondary keys to play a role, which can then cause the additional fruitless comparisons of equal primary-keys. I tried the experiment again with different secondary keys. With less secondary-key duplication, the == comparisons for the primary went a bit further up.

Comparisons per element, with the original x[::2] secondary key:
21.26 == ''.join(sorted(x))    (the primary key)
12.06 < ''.join(sorted(x))
 9.20 == x[::2]                (the secondary key)
 6.68 < x[::2]

With whole x as secondary key, i.e., only little duplication
(but still correlated with the primary key):
21.96 == ''.join(sorted(x))
12.03 < ''.join(sorted(x))
 9.92 == x
 8.12 < x

With random() as secondary key, i.e., likely no duplication
(and no relationship with the primary key):
21.96 == ''.join(sorted(x))
12.03 < ''.join(sorted(x))
 9.93 == random()
 9.93 < random()

With the string "x" as secondary key, i.e., complete duplication:
16.25 == ''.join(sorted(x))
13.43 < ''.join(sorted(x))
 2.82 == "x"

ambv and others added 7 commits October 20, 2021 20:35
resolved by the very first tuple elements, and adjust
strategy accordingly.
strategy. This looks to be quite successful. It loses a few
per cent in speed in cases that always want to use the cheaper
tests, but can gain far more (compared to this branch's state
before this commit) in some cases where
PyObject_RichCompareBool(..., Py_EQ) typically returns 1 (they're
equal) when applied to the first pair.
Swap the order of if/else blocks to put the more likely
block first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants