Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[RFC] Replace rust-timsort so RustPython sorting speed ~= CPython sorting speed #6093

@jackoconnordev

Description

@jackoconnordev

WIP/Draft PR

Summary

Replace the single usage of rust-timsort with a more performant sorting algorithm. Either Rust's default sorting algorithm or a crate like glidesort which has positive benchmarks against the default implementation.

Detailed Explanation

As of CPython 3.11, Timsort is no longer the default sorting algorithm. It is instead an algorithm called Powersort.

rust-timsort

RustPython currently uses a forked project https://github.com/RustPython/rust-timsort as its default sorting algorithm. To quote rust-timsort's README.

This is still an extreme work-in-progress, and performance has vast room for improvement.

This performance gap has become noticeable as new unit tests in later Python versions are introduced, which attempt to sort larger and larger lists. Sorting 1 million random numbers with rust-timsort takes 10-20 minutes vs 0.3 seconds for CPython.

RustPython on  statistics-module-kde-function [$!?] via :snake: v3.13.1 via :crab: v1.88.0 
✦ ❯ time python -c "from random import random; sorted([random() for i in range(1_000_000)]); print('DONE');"
DONE

real    0m0.309s
user    0m0.274s
sys    0m0.036s

RustPython on  statistics-module-kde-function [$!?] via :snake: v3.13.1 via :crab: v1.88.0 
✦ ❯ time cargo run --release -- -c "from random import random; sorted([random() for i in range(1_000_000)]); print('DONE');"
    Finished `release` profile [optimized] target(s) in 0.16s
     Running `target/release/rustpython -c 'from random import random; sorted([random() for i in range(1_000_000)]); print('\''DONE'\'');'`
DONE

real    16m52.217s
user    16m51.926s
sys    0m0.174s

glidesort

Glidesort is a crate which contains a sorting algorithm Glidesort which is an apparent enhancement to Powersort. Unlike rust-timsort, Glidesort has posted benchmark numbers showing it comparing as good or significantly better (in the case of largely presorted lists) than Rust's builtin sorting method which is a Mergesort implementation.

Image

Drawbacks, Rationale, and Alternatives

Rationale

CPython and PyPy both have left Timsort behind as better performing alternatives became available i.e. Powersort. I believe the usage of Timsort in any Python is an implementation detail and if better performing sorting algorithms are available, they should be fair game.

If we choose to go with the default Rust sort then an entire dependency is removed from RustPython and maintaining the sorting algorithm becomes a non-issue until such time as performance warrants using something different.

Drawbacks

  • A drawback with Glidesort is it does not seem to contain any tests and is not actively maintained. This would probably need forking like rust-timsort

Unresolved Questions

  • Is Rust's builtin sorting algorithm quick enough to be happy to drop dependencies on fancier algorithms for easier maintainability?
  • If not, is Glidesort not having tests and needing to be forked a blocker and should some other crates be considered?

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest for comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions