Commit 5ce15fd
committed
ENH: Enable SVE detection for Highway VQSort
Leveraging the meson infrastructure to selectively enable SVE
specifically for Highway, which already supports SVE.
```
| Change | Before [94bc564] <main> | After [1ffedb85] <sve-sort> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|-------------------------------|---------|----------------------------------------------------------------------------------|
| + | 551±0.8μs | 654±5μs | 1.19 | bench_function_base.Sort.time_sort('merge', 'float32', ('random',)) |
| + | 72.9±0.1μs | 80.5±0.08μs | 1.11 | bench_function_base.Sort.time_argsort('quick', 'float32', ('ordered',)) |
| + | 553±1μs | 606±2μs | 1.1 | bench_function_base.Sort.time_sort('merge', 'float64', ('random',)) |
| + | 41.9±0.3μs | 45.6±0.04μs | 1.09 | bench_function_base.Sort.time_argsort('merge', 'int32', ('sorted_block', 1000)) |
| + | 493±1μs | 532±2μs | 1.08 | bench_function_base.Sort.time_sort('merge', 'float16', ('random',)) |
| + | 73.7±0.2μs | 78.8±0.04μs | 1.07 | bench_function_base.Sort.time_argsort('merge', 'int32', ('sorted_block', 100)) |
| + | 565±1μs | 600±1μs | 1.06 | bench_function_base.Sort.time_argsort('heap', 'float64', ('ordered',)) |
| + | 465±1μs | 491±1μs | 1.06 | bench_function_base.Sort.time_argsort('heap', 'int32', ('reversed',)) |
| + | 696±3μs | 739±2μs | 1.06 | bench_function_base.Sort.time_sort('heap', 'float16', ('random',)) |
| + | 645±3μs | 684±3μs | 1.06 | bench_function_base.Sort.time_sort('heap', 'float16', ('sorted_block', 10)) |
| + | 651±1μs | 691±1μs | 1.06 | bench_function_base.Sort.time_sort('heap', 'float16', ('sorted_block', 100)) |
| + | 627±3μs | 665±1μs | 1.06 | bench_function_base.Sort.time_sort('heap', 'float16', ('sorted_block', 1000)) |
| + | 467±1μs | 494±0.7μs | 1.06 | bench_function_base.Sort.time_sort('heap', 'float32', ('ordered',)) |
| + | 166±0.1μs | 174±1μs | 1.05 | bench_function_base.Sort.time_sort('merge', 'float32', ('sorted_block', 10)) |
| - | 77.4±0.08μs | 73.2±0.2μs | 0.95 | bench_function_base.Sort.time_argsort('merge', 'uint32', ('sorted_block', 100)) |
| - | 379±1μs | 359±0.1μs | 0.95 | bench_function_base.Sort.time_sort('heap', 'int32', ('ordered',)) |
| - | 341±0.5μs | 324±0.5μs | 0.95 | bench_function_base.Sort.time_sort('quick', 'float16', ('sorted_block', 1000)) |
| - | 590±0.5μs | 554±1μs | 0.94 | bench_function_base.Sort.time_argsort('heap', 'float32', ('ordered',)) |
| - | 239±5μs | 226±0.6μs | 0.94 | bench_function_base.Sort.time_argsort('merge', 'float16', ('sorted_block', 10)) |
| - | 195±2μs | 184±1μs | 0.94 | bench_function_base.Sort.time_argsort('merge', 'float32', ('sorted_block', 10)) |
| - | 692±2μs | 637±2μs | 0.92 | bench_function_base.Sort.time_argsort('merge', 'float32', ('random',)) |
| - | 45.5±0.03μs | 42.0±0.2μs | 0.92 | bench_function_base.Sort.time_argsort('merge', 'uint32', ('sorted_block', 1000)) |
| - | 80.5±0.07μs | 73.0±0.1μs | 0.91 | bench_function_base.Sort.time_argsort('quick', 'float64', ('ordered',)) |
| - | 78.9±0.2μs | 71.7±0.2μs | 0.91 | bench_function_base.Sort.time_sort('quick', 'uint32', ('ordered',)) |
| - | 79.2±0.1μs | 72.1±0.2μs | 0.91 | bench_function_base.Sort.time_sort('quick', 'uint32', ('reversed',)) |
| - | 131±2μs | 118±0.8μs | 0.9 | bench_function_base.Sort.time_sort('merge', 'float16', ('sorted_block', 100)) |
| - | 82.8±0.2μs | 73.7±0.3μs | 0.89 | bench_function_base.Sort.time_sort('quick', 'float32', ('ordered',)) |
| - | 83.4±0.07μs | 74.1±0.2μs | 0.89 | bench_function_base.Sort.time_sort('quick', 'float32', ('reversed',)) |
| - | 78.6±0.2μs | 70.3±0.2μs | 0.89 | bench_function_base.Sort.time_sort('quick', 'int32', ('ordered',)) |
| - | 79.2±0.09μs | 70.8±0.08μs | 0.89 | bench_function_base.Sort.time_sort('quick', 'int32', ('reversed',)) |
| - | 3.22±0.02μs | 2.86±0μs | 0.89 | bench_function_base.Sort.time_sort('quick', 'uint32', ('uniform',)) |
| - | 3.26±0.04μs | 2.84±0μs | 0.87 | bench_function_base.Sort.time_sort('quick', 'int32', ('uniform',)) |
| - | 82.6±0.06μs | 71.1±0.08μs | 0.86 | bench_function_base.Sort.time_sort('quick', 'float32', ('sorted_block', 10)) |
| - | 4.91±0.01μs | 4.22±0μs | 0.86 | bench_function_base.Sort.time_sort('quick', 'int64', ('uniform',)) |
| - | 79.0±0.2μs | 66.8±0.05μs | 0.85 | bench_function_base.Sort.time_sort('merge', 'float16', ('sorted_block', 1000)) |
| - | 78.8±0.05μs | 67.0±0.2μs | 0.85 | bench_function_base.Sort.time_sort('quick', 'uint32', ('sorted_block', 10)) |
| - | 84.2±0.07μs | 70.8±0.1μs | 0.84 | bench_function_base.Sort.time_sort('quick', 'float32', ('random',)) |
| - | 89.4±0.1μs | 75.5±0.05μs | 0.84 | bench_function_base.Sort.time_sort('quick', 'float32', ('sorted_block', 1000)) |
| - | 78.9±0.04μs | 65.9±0.1μs | 0.84 | bench_function_base.Sort.time_sort('quick', 'int32', ('sorted_block', 10)) |
| - | 85.4±0.06μs | 71.9±0.05μs | 0.84 | bench_function_base.Sort.time_sort('quick', 'uint32', ('sorted_block', 1000)) |
| - | 85.3±0.08μs | 70.5±0.1μs | 0.83 | bench_function_base.Sort.time_sort('quick', 'int32', ('sorted_block', 1000)) |
| - | 80.5±0.03μs | 66.4±0.1μs | 0.83 | bench_function_base.Sort.time_sort('quick', 'uint32', ('random',)) |
| - | 87.5±0.05μs | 71.6±0.1μs | 0.82 | bench_function_base.Sort.time_sort('quick', 'float32', ('sorted_block', 100)) |
| - | 80.4±0.05μs | 65.4±0.07μs | 0.81 | bench_function_base.Sort.time_sort('quick', 'int32', ('random',)) |
| - | 83.6±0.05μs | 66.9±0.1μs | 0.8 | bench_function_base.Sort.time_sort('quick', 'uint32', ('sorted_block', 100)) |
| - | 83.5±0.05μs | 65.8±0.08μs | 0.79 | bench_function_base.Sort.time_sort('quick', 'int32', ('sorted_block', 100)) |
| - | 6.87±0.01μs | 5.13±0.08μs | 0.75 | bench_function_base.Sort.time_sort('quick', 'float32', ('uniform',)) |
| - | 12.2±0.02μs | 8.79±0.1μs | 0.72 | bench_function_base.Sort.time_sort('quick', 'float64', ('uniform',)) |
| - | 193±0.5μs | 124±0.5μs | 0.65 | bench_function_base.Sort.time_sort('quick', 'float64', ('reversed',)) |
| - | 27.7±0.2ms | 18.0±0.2ms | 0.65 | bench_function_base.Sort.time_sort_worst |
| - | 192±0.4μs | 123±0.2μs | 0.64 | bench_function_base.Sort.time_sort('quick', 'float64', ('ordered',)) |
| - | 202±0.2μs | 128±0.04μs | 0.63 | bench_function_base.Sort.time_sort('quick', 'float64', ('sorted_block', 1000)) |
| - | 203±0.5μs | 125±0.09μs | 0.62 | bench_function_base.Sort.time_sort('quick', 'float64', ('sorted_block', 100)) |
| - | 199±0.4μs | 122±0.07μs | 0.61 | bench_function_base.Sort.time_sort('quick', 'float64', ('random',)) |
| - | 195±0.4μs | 120±0.09μs | 0.61 | bench_function_base.Sort.time_sort('quick', 'float64', ('sorted_block', 10)) |
| - | 215±0.3μs | 121±0.3μs | 0.56 | bench_function_base.Sort.time_sort('quick', 'int64', ('ordered',)) |
| - | 216±0.3μs | 121±0.7μs | 0.56 | bench_function_base.Sort.time_sort('quick', 'int64', ('reversed',)) |
| - | 225±0.3μs | 126±0.3μs | 0.56 | bench_function_base.Sort.time_sort('quick', 'int64', ('sorted_block', 1000)) |
| - | 223±0.2μs | 119±0.09μs | 0.54 | bench_function_base.Sort.time_sort('quick', 'int64', ('random',)) |
| - | 219±0.06μs | 118±0.08μs | 0.54 | bench_function_base.Sort.time_sort('quick', 'int64', ('sorted_block', 10)) |
| - | 227±0.3μs | 123±0.2μs | 0.54 | bench_function_base.Sort.time_sort('quick', 'int64', ('sorted_block', 100)) |
```1 parent 35b14fe commit 5ce15fd
6 files changed
Lines changed: 32 additions & 3 deletions
File tree
- meson_cpu/arm
- numpy
- _core
- src/common
- distutils/checks
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
54 | 60 | | |
55 | 61 | | |
56 | 62 | | |
57 | | - | |
| 63 | + | |
| 64 | + | |
58 | 65 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
784 | 784 | | |
785 | 785 | | |
786 | 786 | | |
787 | | - | |
| 787 | + | |
788 | 788 | | |
789 | 789 | | |
790 | 790 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
121 | | - | |
| 121 | + | |
| 122 | + | |
122 | 123 | | |
123 | 124 | | |
124 | 125 | | |
| |||
760 | 761 | | |
761 | 762 | | |
762 | 763 | | |
| 764 | + | |
763 | 765 | | |
764 | 766 | | |
765 | 767 | | |
| |||
794 | 796 | | |
795 | 797 | | |
796 | 798 | | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
797 | 802 | | |
798 | 803 | | |
799 | 804 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
| 90 | + | |
89 | 91 | | |
90 | 92 | | |
91 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
0 commit comments