KEMBAR78
GH-116554: Relax list.sort()'s notion of "descending" runs by tim-one · Pull Request #116578 · python/cpython · GitHub
Skip to content

Conversation

@tim-one
Copy link
Member

@tim-one tim-one commented Mar 10, 2024

Not yet done, but good enough for timing.

tim-one and others added 5 commits March 10, 2024 19:19
`sortslice_reverse()`. The current

    sortslice_advance(&slice, n - neq);
    reverse_sortslice(&slice, neq);

pair of lines is an affront to the soul ;-) Putting
the compound name io

    <OBJECT TYPE NAME>_<METHOD NAME>

order feels significantly more natural.
with few distinct elements. It's in the nature of the beast that
this will catch nearly all plausible "off by 1" coding errors
in this context.
Switching to what I guess are some tool's different spelling of GH's single backticks.
@tim-one
Copy link
Member Author

tim-one commented Mar 11, 2024

This looks good to go to me. I wasn't able to provoke any errors in the code, and didn't find any significant timing surprises (compared to the current main branch). The OP's original "problem case" (on StackOverflow) runs much faster now, and, as expected, is down to about 1.5 compares per element (independent of list size - the first call of count_run() now sorts the whole thing in-place).

Reviews always appreciated, but if a few days pass without one, I'll just commit it anyway.

tim-one added 2 commits March 10, 2024 23:23
comments in binarysort(), and deleted a redundant computation.
Sue me ;-)
tim-one and others added 12 commits March 11, 2024 18:36
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
…e-116554.gYumG5.rst

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
I kept `lo` mostly to reduce fiddly typing needed for IFLT arguments.

But IFLT calls were tedious and error-prone too. So introduced two
tiny macros to capture the gibberish needed to spell "is the next
element smaller/larger?" once and for all. There was no real use
left for `lo` then, so got rid of it.

Although a vrbl named `lo` still exists. But with a different meaning.
It's a const capturing slo->keys, viewed as an array for `n` to index.

A modern optimizing compiler shouldn't need my help to realize that
marching through an array one at a time can be strength-reduced to
pointer increments (which is what the old `lo` did).
thing we should be oprimizi9ng for ;-) Seriously, they'll
return very early anyway, as a matter of course, after the
first loop terminates without doing anything, and then the
`n == nremaining` test will pass.
@tim-one tim-one merged commit bf121d6 into python:main Mar 13, 2024
@tim-one tim-one deleted the descend branch March 13, 2024 01:01
vstinner pushed a commit to vstinner/cpython that referenced this pull request Mar 20, 2024
…hon#116578)

* pythonGH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.

In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).

While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024
…hon#116578)

* pythonGH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.

In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).

While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…hon#116578)

* pythonGH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.

In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).

While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
@nanonyme
Copy link

This seems to be resulting in sorting changes between 3.12 and 3.13. Is the sort still Timsort after this? The descending criteria seems to now conflict with https://en.m.wikipedia.org/wiki/Timsort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

interpreter-core (Objects, Python, Grammar, and Parser dirs)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants