Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 9196a05

Browse files
authored
Merge pull request #31428 from seberg/sort-cleanup
MAINT,DOC: Clean up consolidated sorting code
2 parents 66446be + 44fbad1 commit 9196a05

20 files changed

Lines changed: 195 additions & 731 deletions

numpy/_core/fromnumeric.py

Lines changed: 33 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -953,10 +953,10 @@ def sort(a, axis=-1, kind=None, order=None, *, stable=None, descending=np._NoVal
953953
Axis along which to sort. If None, the array is flattened before
954954
sorting. The default is -1, which sorts along the last axis.
955955
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
956-
Sorting algorithm. The default is 'quicksort'. Note that both 'stable'
957-
and 'mergesort' use timsort or radix sort under the covers and,
958-
in general, the actual implementation will vary with data type.
959-
The 'mergesort' option is retained for backwards compatibility.
956+
Please use the `stable` parameter instead. This argument is retained
957+
for backwards compatibility and provides no additional control.
958+
'quicksort' and 'heapsort' are equivalent to ``stable=False``, while
959+
'mergesort' and 'stable' are equivalent to ``stable=True``.
960960
order : str or list of str, optional
961961
When `a` is an array with fields defined, this argument specifies
962962
which fields to compare first, second, etc. A single field can
@@ -993,24 +993,19 @@ def sort(a, axis=-1, kind=None, order=None, *, stable=None, descending=np._NoVal
993993
994994
Notes
995995
-----
996-
The various sorting algorithms are characterized by their average speed,
997-
worst case performance, work space size, and whether they are stable. A
998-
stable sort keeps items with the same key in the same relative
999-
order. The four algorithms implemented in NumPy have the following
1000-
properties:
1001-
1002-
=========== ======= ============= ============ ========
1003-
kind speed worst case work space stable
1004-
=========== ======= ============= ============ ========
1005-
'quicksort' 1 O(n^2) 0 no
1006-
'heapsort' 3 O(n*log(n)) 0 no
1007-
'mergesort' 2 O(n*log(n)) ~n/2 yes
1008-
'timsort' 2 O(n*log(n)) ~n/2 yes
1009-
=========== ======= ============= ============ ========
1010-
1011-
.. note:: The datatype determines which of 'mergesort' or 'timsort'
1012-
is actually used, even if 'mergesort' is specified. User selection
1013-
at a finer scale is not currently available.
996+
NumPy uses different sorting algorithms depending on whether the sort is
997+
stable and which data types are used. These are characterized by their
998+
worst case performance, work space size, and whether they are stable.
999+
A stable sort keeps items with the same key in the same relative
1000+
order. NumPy chooses between three algorithms:
1001+
1002+
======== ============ ============= ============ ================================
1003+
stable algorithm worst case work space note
1004+
======== ============ ============= ============ ================================
1005+
no Introsort O(n*log(n)) 0
1006+
yes Timsort O(n*log(n)) ~n/2
1007+
yes Radix sort O(n) n bools and narrow integers [1]_
1008+
======== ============ ============= ============ ================================
10141009
10151010
For performance, ``sort`` makes a temporary copy if needed to make the data
10161011
`contiguous <https://numpy.org/doc/stable/glossary.html#term-contiguous>`_
@@ -1034,32 +1029,20 @@ def sort(a, axis=-1, kind=None, order=None, *, stable=None, descending=np._NoVal
10341029
placements are sorted according to the non-nan part if it exists.
10351030
Non-nan values are sorted as before.
10361031
1037-
quicksort has been changed to:
1038-
`introsort <https://en.wikipedia.org/wiki/Introsort>`_.
1039-
When sorting does not make enough progress it switches to
1040-
`heapsort <https://en.wikipedia.org/wiki/Heapsort>`_.
1041-
This implementation makes quicksort O(n*log(n)) in the worst case.
1032+
NumPy uses `introsort <https://en.wikipedia.org/wiki/Introsort>`_
1033+
by default for unstable sorting.
10421034
1043-
'stable' automatically chooses the best stable sorting algorithm
1044-
for the data type being sorted.
1045-
It, along with 'mergesort' is currently mapped to
1046-
`timsort <https://en.wikipedia.org/wiki/Timsort>`_
1035+
For stable sorting, NumPy automatically chooses the best stable sorting
1036+
algorithm for the data type being sorted.
1037+
It is currently mapped to `timsort <https://en.wikipedia.org/wiki/Timsort>`_
10471038
or `radix sort <https://en.wikipedia.org/wiki/Radix_sort>`_
1048-
depending on the data type.
1049-
API forward compatibility currently limits the
1050-
ability to select the implementation and it is hardwired for the different
1051-
data types.
1052-
1053-
Timsort is added for better performance on already or nearly
1054-
sorted data. On random data timsort is almost identical to
1055-
mergesort. It is now used for stable sort while quicksort is still the
1056-
default sort if none is chosen. For timsort details, refer to
1057-
`CPython listsort.txt
1058-
<https://github.com/python/cpython/blob/3.7/Objects/listsort.txt>`_
1059-
'mergesort' and 'stable' are mapped to radix sort for integer data types.
1060-
Radix sort is an O(n) sort instead of O(n log n).
1061-
1062-
NaT now sorts to the end of arrays for consistency with NaN.
1039+
for bools and integer types with a width of 16 bits or less.
1040+
1041+
For numerical sorts, NaT and NaN always sort to the end of the array for
1042+
both ascending and descending sort order.
1043+
1044+
.. [1] Radix sort is used for stable sorting of bools and narrow integer
1045+
types (up to 16 bits). For these it performs better than Timsort.
10631046
10641047
Examples
10651048
--------
@@ -1131,10 +1114,10 @@ def argsort(a, axis=-1, kind=None, order=None, *, stable=None, descending=np._No
11311114
Axis along which to sort. The default is -1 (the last axis). If None,
11321115
the flattened array is used.
11331116
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
1134-
Sorting algorithm. The default is 'quicksort'. Note that both 'stable'
1135-
and 'mergesort' use timsort under the covers and, in general, the
1136-
actual implementation will vary with data type. The 'mergesort' option
1137-
is retained for backwards compatibility.
1117+
Please use the `stable` parameter instead. This argument is retained
1118+
for backwards compatibility and provides no additional control.
1119+
'quicksort' and 'heapsort' are equivalent to ``stable=False``, while
1120+
'mergesort' and 'stable' are equivalent to ``stable=True``.
11381121
order : str or list of str, optional
11391122
When `a` is an array with fields defined, this argument specifies
11401123
which fields to compare first, second, etc. A single field can

numpy/_core/meson.build

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -874,7 +874,6 @@ endforeach
874874
# --------------------------------------
875875
multiarray_gen_headers = [
876876
src_file.process('src/multiarray/arraytypes.h.src'),
877-
src_file.process('src/common/npy_sort.h.src'),
878877
]
879878
foreach gen_mtargets : [
880879
[
@@ -1255,7 +1254,6 @@ src_multiarray = multiarray_gen_headers + [
12551254
'src/multiarray/usertypes.c',
12561255
'src/multiarray/vdot.c',
12571256
'src/npysort/quicksort_generic.cpp',
1258-
'src/npysort/mergesort.cpp',
12591257
'src/npysort/timsort_generic.cpp',
12601258
'src/npysort/heapsort.cpp',
12611259
'src/npysort/npysort_methods.cpp',

numpy/_core/src/common/npy_sort.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ npy_default_sort_loop(PyArrayMethod_Context *context,
1818
PyArrayMethod_SortParameters *sort_params =
1919
(PyArrayMethod_SortParameters *)context->parameters;
2020
PyArray_SortImpl *sort_func = NULL;
21-
21+
2222
switch (sort_params->flags) {
2323
case NPY_SORT_DEFAULT:
2424
sort_func = npy_quicksort_impl;
2525
break;
2626
case NPY_SORT_STABLE:
27-
sort_func = npy_mergesort_impl;
27+
sort_func = npy_timsort_impl;
2828
break;
2929
default:
3030
PyErr_SetString(PyExc_ValueError, "Invalid sort kind");
@@ -45,13 +45,13 @@ npy_default_argsort_loop(PyArrayMethod_Context *context,
4545
PyArrayMethod_SortParameters *sort_params =
4646
(PyArrayMethod_SortParameters *)context->parameters;
4747
PyArray_ArgSortImpl *argsort_func = NULL;
48-
48+
4949
switch (sort_params->flags) {
5050
case NPY_SORT_DEFAULT:
5151
argsort_func = npy_aquicksort_impl;
5252
break;
5353
case NPY_SORT_STABLE:
54-
argsort_func = npy_amergesort_impl;
54+
argsort_func = npy_atimsort_impl;
5555
break;
5656
default:
5757
PyErr_SetString(PyExc_ValueError, "Invalid sort kind");
Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,8 @@ NPY_NO_EXPORT int register_all_sorts(void);
4141

4242

4343
NPY_NO_EXPORT int npy_quicksort(void *vec, npy_intp cnt, void *arr);
44-
NPY_NO_EXPORT int npy_heapsort(void *vec, npy_intp cnt, void *arr);
45-
NPY_NO_EXPORT int npy_mergesort(void *vec, npy_intp cnt, void *arr);
4644
NPY_NO_EXPORT int npy_timsort(void *vec, npy_intp cnt, void *arr);
4745
NPY_NO_EXPORT int npy_aquicksort(void *vec, npy_intp *ind, npy_intp cnt, void *arr);
48-
NPY_NO_EXPORT int npy_aheapsort(void *vec, npy_intp *ind, npy_intp cnt, void *arr);
49-
NPY_NO_EXPORT int npy_amergesort(void *vec, npy_intp *ind, npy_intp cnt, void *arr);
5046
NPY_NO_EXPORT int npy_atimsort(void *vec, npy_intp *ind, npy_intp cnt, void *arr);
5147

5248
/*
@@ -76,12 +72,16 @@ typedef int (PyArray_ArgSortImpl)(void *vv, npy_intp *tosort, npy_intp n,
7672

7773
NPY_NO_EXPORT int npy_quicksort_impl(void *start, npy_intp num, void *varr,
7874
npy_intp elsize, PyArray_CompareFunc *cmp);
79-
NPY_NO_EXPORT int npy_mergesort_impl(void *start, npy_intp num, void *varr,
80-
npy_intp elsize, PyArray_CompareFunc *cmp);
75+
NPY_NO_EXPORT int npy_heapsort_impl(void *start, npy_intp num, void *varr,
76+
npy_intp elsize, PyArray_CompareFunc *cmp);
77+
NPY_NO_EXPORT int npy_timsort_impl(void *start, npy_intp num, void *varr,
78+
npy_intp elsize, PyArray_CompareFunc *cmp);
8179
NPY_NO_EXPORT int npy_aquicksort_impl(void *vv, npy_intp *tosort, npy_intp num, void *varr,
8280
npy_intp elsize, PyArray_CompareFunc *cmp);
83-
NPY_NO_EXPORT int npy_amergesort_impl(void *v, npy_intp *tosort, npy_intp num, void *varr,
84-
npy_intp elsize, PyArray_CompareFunc *cmp);
81+
NPY_NO_EXPORT int npy_aheapsort_impl(void *vv, npy_intp *tosort, npy_intp num, void *varr,
82+
npy_intp elsize, PyArray_CompareFunc *cmp);
83+
NPY_NO_EXPORT int npy_atimsort_impl(void *v, npy_intp *tosort, npy_intp num, void *varr,
84+
npy_intp elsize, PyArray_CompareFunc *cmp);
8585

8686

8787
#ifdef __cplusplus
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
#ifndef NUMPY_CORE_SRC_COMMON_NUMPY_TAG_H_
2-
#define NUMPY_CORE_SRC_COMMON_NUMPY_TAG_H_
1+
#ifndef NUMPY_CORE_SRC_COMMON_NUMPY_TAG_HPP_
2+
#define NUMPY_CORE_SRC_COMMON_NUMPY_TAG_HPP_
33

44
#include "numpy/ndarraytypes.h"
55
#include "numpy/npy_common.h"
@@ -267,4 +267,4 @@ constexpr int cmp(Args... args)
267267

268268
} // namespace npy
269269

270-
#endif // NUMPY_CORE_SRC_COMMON_NUMPY_TAG_H_
270+
#endif // NUMPY_CORE_SRC_COMMON_NUMPY_TAG_HPP_

numpy/_core/src/multiarray/item_selection.c

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3173,11 +3173,6 @@ PyArray_MultiIndexSetItem(PyArrayObject *self, const npy_intp *multi_index,
31733173
}
31743174

31753175

3176-
/* Table of generic sort functions for use in PyArray_SortEx*/
3177-
static PyArray_SortFunc* const generic_sort_table[] = {npy_quicksort,
3178-
npy_heapsort,
3179-
npy_timsort};
3180-
31813176
/*NUMPY_API
31823177
* Sort an array in-place with extended parameters
31833178
*/
@@ -3261,10 +3256,10 @@ PyArray_Sort(PyArrayObject *op, int axis, NPY_SORTKIND flags)
32613256
}
32623257
switch (flags) {
32633258
case NPY_SORT_DEFAULT:
3264-
sort = generic_sort_table[NPY_QUICKSORT];
3259+
sort = npy_quicksort;
32653260
break;
32663261
case NPY_SORT_STABLE:
3267-
sort = generic_sort_table[NPY_STABLESORT];
3262+
sort = npy_timsort;
32683263
break;
32693264
default:
32703265
break;
@@ -3290,10 +3285,6 @@ PyArray_Sort(PyArrayObject *op, int axis, NPY_SORTKIND flags)
32903285
return ret;
32913286
}
32923287

3293-
/* Table of generic argsort function for use by PyArray_ArgSortEx */
3294-
static PyArray_ArgSortFunc* const generic_argsort_table[] = {npy_aquicksort,
3295-
npy_aheapsort,
3296-
npy_atimsort};
32973288

32983289
/*NUMPY_API
32993290
* ArgSort an array with extended parameters
@@ -3374,10 +3365,10 @@ PyArray_ArgSort(PyArrayObject *op, int axis, NPY_SORTKIND flags)
33743365
}
33753366
switch (flags) {
33763367
case NPY_SORT_DEFAULT:
3377-
argsort = generic_argsort_table[NPY_QUICKSORT];
3368+
argsort = npy_aquicksort;
33783369
break;
33793370
case NPY_SORT_STABLE:
3380-
argsort = generic_argsort_table[NPY_STABLESORT];
3371+
argsort = npy_atimsort;
33813372
break;
33823373
default:
33833374
break;

numpy/_core/src/npysort/binsearch.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
#include "npy_binsearch.h"
99
#include "npy_sort.h"
10-
#include "numpy_tag.h"
10+
#include "numpy_tag.hpp"
1111

1212
#include <array>
1313
#include <functional> // for std::less and std::less_equal

numpy/_core/src/npysort/heapsort.cpp

Lines changed: 10 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,30 @@
11
/* -*- c -*- */
22

33
/*
4-
* The purpose of this module is to add faster sort functions
5-
* that are type-specific. This is done by altering the
6-
* function table for the builtin descriptors.
7-
*
8-
* These sorting functions are copied almost directly from numarray
9-
* with a few modifications (complex comparisons compare the imaginary
10-
* part if the real parts are equal, for example), and the names
11-
* are changed.
12-
*
13-
* The original sorting code is due to Charles R. Harris who wrote
14-
* it for numarray.
15-
*/
16-
17-
/*
18-
* Quick sort is usually the fastest, but the worst case scenario can
19-
* be slower than the merge and heap sorts. The merge sort requires
20-
* extra memory and so for large arrays may not be useful.
21-
*
22-
* The merge sort is *stable*, meaning that equal components
23-
* are unmoved from their entry versions, so it can be used to
24-
* implement lexicographic sorting on multiple keys.
25-
*
26-
* The heap sort is included for completeness.
4+
* Comparator-function-driven version of the heapsort implemented in
5+
* ``npysort_heapsort.hpp``. Used by dtypes that register a
6+
* ``PyArray_CompareFunc`` rather than the type-specialised path.
7+
* See ``npysort_heapsort.hpp`` for the algorithm description.
278
*/
289

2910
#define NPY_NO_DEPRECATED_API NPY_API_VERSION
3011

3112
#include "npy_sort.h"
3213
#include "npysort_common.h"
33-
#include "numpy_tag.h"
34-
35-
#include "npysort_heapsort.h"
3614

3715
#include <cstdlib>
3816

39-
#define NOT_USED NPY_UNUSED(unused)
40-
#define PYA_QS_STACK 100
41-
#define SMALL_QUICKSORT 15
42-
#define SMALL_MERGESORT 20
43-
#define SMALL_STRING 16
44-
45-
4617
/*
4718
*****************************************************************************
4819
** GENERIC SORT **
4920
*****************************************************************************
5021
*/
5122

5223
NPY_NO_EXPORT int
53-
npy_heapsort(void *start, npy_intp num, void *varr)
24+
npy_heapsort_impl(void *start, npy_intp num, void *varr, npy_intp elsize,
25+
PyArray_CompareFunc *cmp)
5426
{
55-
PyArrayObject *arr = (PyArrayObject *)varr;
56-
npy_intp elsize = PyArray_ITEMSIZE(arr);
57-
PyArray_CompareFunc *cmp = PyDataType_GetArrFuncs(PyArray_DESCR(arr))->compare;
27+
void *arr = varr;
5828
if (elsize == 0) {
5929
return 0; /* no need for sorting elements of no size */
6030
}
@@ -111,12 +81,11 @@ npy_heapsort(void *start, npy_intp num, void *varr)
11181
}
11282

11383
NPY_NO_EXPORT int
114-
npy_aheapsort(void *vv, npy_intp *tosort, npy_intp n, void *varr)
84+
npy_aheapsort_impl(void *vv, npy_intp *tosort, npy_intp n, void *varr,
85+
npy_intp elsize, PyArray_CompareFunc *cmp)
11586
{
87+
void *arr = varr;
11688
char *v = (char *)vv;
117-
PyArrayObject *arr = (PyArrayObject *)varr;
118-
npy_intp elsize = PyArray_ITEMSIZE(arr);
119-
PyArray_CompareFunc *cmp = PyDataType_GetArrFuncs(PyArray_DESCR(arr))->compare;
12089
npy_intp *a, i, j, l, tmp;
12190

12291
/* The array needs to be offset by one for heapsort indexing */

0 commit comments

Comments
 (0)