PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` #24076

Micky774 · 2022-08-01T23:29:23Z

Reference Issues/PRs

Fixes #13783
Resolves #14543 (stalled)
Relates to #23721
Relates to #22587

What does this implement/fix? Explain your changes.

Implements a PairwiseDistancesReduction backend algorithm for KNeighbors.predict.

Any other comments?

Future PRs:

Support "distance" weighting
Support multioutput (y.ndim > 1)
Enable Euclidean specialization
Restudy heuristic

cc: @jjerphan

jjerphan

Thank you for starting this work, @Micky774.

Here is a first pass and comments regarding both the current implementation and potential path we could take for predict and predict_proba.

Regarding the TODO-list you wrote:

FEA Fused sparse-dense support for PairwiseDistancesReduction #23585 should treat sparse support for current and subsequent PairwiseDistancesReductions on its own.
float32 support should easily be ported to those implementation in another PR following FEA Add support for float32 on PairwiseDistancesReduction using Tempita #23865
"precomputed" should be addressable generally for all PairwiseDistancesReductions independently of this PR.

sklearn/neighbors/_classification.py

sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py

sklearn/metrics/_pairwise_distances_reduction/_argkminlabels.pyx

Micky774 · 2022-08-05T01:23:14Z

I tested whether it would be worth optionally passing labels as output through compute, attenuated with a keyword parameter return_labels since they're computed essentially for free in the same reduction loop. Seems to provide no real bonus, so I simplified compute to only returning the probabilities. The results are below:

Plot

Micky774 · 2022-08-05T03:01:01Z

It seems that the new implementation performs at least as well as the current one (for weights='uniform') except when parallelizing over Y. See plots:

Plot

Will need to investigate a bit.

jjerphan

Another pass on the new changes. I have not thought of factorising the common logic as done in weighted_histrogram_node. 👍

Regarding your comment:

It seems that the new implementation performs at least as well as the current one (for weights='uniform') except when parallelizing over Y.

I think this is not due to the logic of this PR, but rather to the current global heuristic for choosing between _parallel_on_X and _parallel_on_Y which is way sub-optimal (it does not take n_samples_Y into account!):

scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_base.pyx

Lines 97 to 103 in 6894a9b

    
           if strategy == 'auto': 
        
               # This is a simple heuristic whose constant for the 
        
               # comparison has been chosen based on experiments. 
        
               if 4 * self.chunk_size * self.effective_n_threads < self.n_samples_X: 
        
                   strategy = 'parallel_on_X' 
        
               else: 
        
                   strategy = 'parallel_on_Y'

#24043 aims at improving this heuristic, and we should probably treat before all the other PRs of this submodule.

Personally, I think I need to invest some time in coming up with a proper benchmarking suite for PairwiseDistancesReductions: I don't want us to messing our time around doing quick benchmark whose results might be misleading (this is want I have been doing until now thinking I could gain time but I in retrospective I think I've lost some).

Also, longer-term-wise, I would like us, our future selves and future maintainers to be able to easily and confidently perform benchmark between revisions in case changes need to be performed.

Hence I've opened #24120.

Also: I've changed the description of this PR to move the mentioned follow-up work in #22587 to avoid information duplication. I am in favour in treating multi-output in another PR (having PR be as small as possible as review and thus make the overall integration of their features faster).

What do you think?

sklearn/metrics/_pairwise_distances_reduction/_argkminlabels.pyx

sklearn/neighbors/_classification.py

sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py

Co-authored-by: Julien <[email protected]>

Micky774 · 2023-02-24T15:04:30Z

@jjerphan Updated with your suggestions! Good catch on the extra sort -- it is a vestige from the original copy-and-paste :)

Edit: Still need to revisit validation and re-run benchmarks -- will get to it soon hopefully (sorry for slow turnaround)

Micky774 · 2023-02-26T03:20:23Z

Turns out that most circumstances, parallel_on_X is more performant than parallel_on_Y. When adopting this, the performance gains are much more dramatic. I re-ran the benchmark configurations used by @jjerphan of

n_neighbors=500
n_features=30
n_classes=100

see gist

Plots

Running again with configuration

n_neighbors=5
n_features=100
n_classes=10

Plots

jjerphan

Thank you for the last commits and benchmarks, @Micky774!

I have just one last comment.

Moreover, I think we might want to list the rest of the work for this implementation and implement them in other small PRs. What do you think?

doc/whats_new/v1.3.rst

jjerphan · 2023-03-06T09:57:36Z

Small gentle up, @ogrisel and @thomasjpfan: do you think this PR is mergeable in this current state?

sklearn/neighbors/_classification.py

setup.py

ogrisel

Thanks for the benchmark. I wonder if the strategy="auto" heuristics should not be re-evaluated for other reducers (e.g. the generic argkmin) or if this effect is only specific to this PR.

Anyways, once the last batch of comments by Julien and the following comments are addressed, LGTM.

doc/whats_new/v1.3.rst

sklearn/metrics/tests/test_pairwise_distances_reduction.py

sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py

Co-authored-by: Olivier Grisel <[email protected]>

Co-authored-by: Julien Jerphanion <[email protected]>

Signed-off-by: Julien Jerphanion <[email protected]>

jjerphan · 2023-03-13T08:56:13Z

We now have to qualify cdef interfaces with noexcept nogil (courtesy of @adam2392 with #25621). This allows anticipating Cython 3 release which changes the default behavior regarding exception: they will be propagated by default (see cython/cython#4670).

d13793d fixes the compilation and the problems on the CI.

This PR still LGTM. I will merge this PR by the end of the day if no one objects.

jjerphan · 2023-03-13T17:25:04Z

Do you have objections or suggestions, @thomasjpfan and @ogrisel? 🙂

ogrisel · 2023-03-14T11:04:46Z

Merged!

jjerphan · 2023-03-14T11:13:04Z

Thank you for this contribution, @Micky774!

Thank you @ogrisel and @thomasjpfan for the reviews.

…redict_proba` (scikit-learn#24076) Signed-off-by: Julien Jerphanion <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

* MAINT Clean deprecated losses in (hist) gradient boosting for 1.3 (scikit-learn#25834) * MAINT Clean deprecation of normalize in calibration_curve for 1.3 (scikit-learn#25833) * BLD Clean command removes generated from cython templates (scikit-learn#25839) * PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` (scikit-learn#24076) Signed-off-by: Julien Jerphanion <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * MAINT Added Parameter Validation for datasets.make_circles (scikit-learn#25848) Co-authored-by: jeremiedbb <[email protected]> * MNT use a single job by default with sphinx build (scikit-learn#25836) * BLD Generate warning automatically for templated cython files (scikit-learn#25842) * MAINT parameter validation for sklearn.datasets.fetch_lfw_people (scikit-learn#25820) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for metrics.fbeta_score (scikit-learn#25841) * TST add global_random_seed fixture to sklearn/covariance/tests/test_robust_covariance.py (scikit-learn#25821) * MAINT Parameter validation for linear_model.orthogonal_mp (scikit-learn#25817) * TST activate common tests for TSNE (scikit-learn#25374) * CI Update lock files (scikit-learn#25849) * MAINT Added Parameter Validation for metrics.mean_gamma_deviance (scikit-learn#25853) * MAINT Parameters validation for feature_selection.mutual_info_regression (scikit-learn#25850) * MAINT parameter validation metrics.class_likelihood_ratios (scikit-learn#25863) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Ensure disjoint interval constraints (scikit-learn#25797) * MAINT Parameters validation for utils.gen_batches (scikit-learn#25864) * TST use global_random_seed in test_dict_vectorizer.py (scikit-learn#24533) * TST use global_random_seed in test_pls.py (scikit-learn#24526) Co-authored-by: jeremiedbb <[email protected]> * TST use global_random_seed in test_gpc.py (scikit-learn#24600) Co-authored-by: jeremiedbb <[email protected]> * DOC Fix overlapping plot axis in bench_sample_without_replacement.py (scikit-learn#25870) * MAINT Use contiguous memoryviews in _random.pyx (scikit-learn#25871) * MAINT parameter validation sklearn.datasets.fetch_lfw_pair (scikit-learn#25857) * MAINT Parameters validation for metrics.classification_report (scikit-learn#25868) * Empty commit * DOC fix docstring dtype parameter in OrdinalEncoder (scikit-learn#25877) * MAINT Clean up depreacted "log" loss of SGDClassifier for 1.3 (scikit-learn#25865) * ENH Adds TargetEncoder (scikit-learn#25334) Co-authored-by: Andreas Mueller <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Jovan Stojanovic <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> * CI make it possible to cancel running Azure jobs (scikit-learn#25876) * MAINT Clean-up deprecated if_delegate_has_method for 1.3 (scikit-learn#25879) * MAINT Parameter validation for tree.export_text (scikit-learn#25867) * DOC impact of `tol` for solvers in RidgeClassifier (scikit-learn#25530) * MAINT Parameters validation for metrics.hinge_loss (scikit-learn#25880) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for metrics.ndcg_score (scikit-learn#25885) * ENH KMeans initialization account for sample weights (scikit-learn#25752) Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * TST use global_random_seed in sklearn/tests/test_dummy.py (scikit-learn#25884) * DOC improve calibration user guide (scikit-learn#25687) * ENH Support for sparse matrices added to `sklearn.metrics.silhouette_samples` (scikit-learn#24677) Co-authored-by: Sahil Gupta <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT validate_params for plot_tree (scikit-learn#25882) Co-authored-by: Itay <[email protected]> * MAINT add missing space in error message in SVM (scikit-learn#25913) * FIX Adds requires_y tag to TargetEncoder (scikit-learn#25917) * MAINT Consistent cython types continued (scikit-learn#25810) * TST Speed-up common tests of DictionaryLearning (scikit-learn#25892) * TST Speed-up test_dbscan_optics_parity (scikit-learn#25893) * ENH add np.nan option for zero_division in precision/recall/f-score (scikit-learn#25531) Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT Parameters validation for datasets.make_low_rank_matrix (scikit-learn#25901) * MAINT Parameter validation for metrics.cluster.adjusted_mutual_info_score (scikit-learn#25898) Co-authored-by: Jérémie du Boisberranger <[email protected]> * TST Speed-up test_partial_dependence.test_output_shape (scikit-learn#25895) Co-authored-by: Thomas J. Fan <[email protected]> * MAINT Parameters validation for datasets.make_regression (scikit-learn#25899) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for metrics.mean_squared_log_error (scikit-learn#25924) * TST Use global_random_seed in tests/test_naive_bayes.py (scikit-learn#25890) * TST add global_random_seed fixture to sklearn/datasets/tests/test_covtype.py (scikit-learn#25904) Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for datasets.make_multilabel_classification (scikit-learn#25920) * Fixed feature mapping typo (scikit-learn#25934) * MAINT switch to newer codecov uploader (scikit-learn#25919) Co-authored-by: Loïc Estève <[email protected]> * TST Speed-up test suite when using pytest-xdist (scikit-learn#25918) * DOC update license year to 2023 (scikit-learn#25936) * FIX Remove spurious feature names warning in IsolationForest (scikit-learn#25931) * TST fix unstable test_newrand_set_seed (scikit-learn#25940) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Clean-up deprecated max_features="auto" in trees/forests/gb (scikit-learn#25941) * MAINT LogisticRegression informative error msg when penaly=elasticnet and l1_ratio is None (scikit-learn#25925) Co-authored-by: jeremiedbb <[email protected]> * MAINT Clean-up remaining SGDClassifier(loss="log") (scikit-learn#25938) * FIX Fixes pandas extension arrays in check_array (scikit-learn#25813) * FIX Fixes pandas extension arrays with objects in check_array (scikit-learn#25814) * CI Disable pytest-xdist in pylatest_pip_openblas_pandas build (scikit-learn#25943) * MAINT remove deprecated call to resources.content (scikit-learn#25951) * DOC note on calibration impact on ranking (scikit-learn#25900) * Remove loguniform fix, use scipy.stats instead (scikit-learn#24665) Co-authored-by: Olivier Grisel <[email protected]> * MAINT Fix broken links in cluster.dbscan module (scikit-learn#25958) * DOC Fix lars Xy shape (scikit-learn#25952) * ENH Add drop_intermediate parameter to metrics.precision_recall_curve (scikit-learn#24668) Co-authored-by: Guillaume Lemaitre <[email protected]> * FIX improve error message when computing NDCG with a single document (scikit-learn#25672) Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT introduce _get_response_values and _check_response_methods (scikit-learn#23073) Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Extend message for large sparse matrices support (scikit-learn#25961) Co-authored-by: Meekail Zain <[email protected]> * MAINT Parameters validation for datasets.make_gaussian_quantiles (scikit-learn#25959) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.d2_tweedie_score (scikit-learn#25975) * MAINT Parameters validation for datasets.make_hastie_10_2 (scikit-learn#25967) * MAINT Parameters validation for preprocessing.minmax_scale (scikit-learn#25962) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for datasets.make_checkerboard (scikit-learn#25955) * MAINT Parameters validation for datasets.make_biclusters (scikit-learn#25945) * MAINT Parameters validation for datasets.make_moons (scikit-learn#25971) * DOC replace deviance by loss in docstring of GradientBoosting (scikit-learn#25968) * MAINT Fix broken link in feature_selection/_univariate_selection.py (scikit-learn#25984) * DOC Update model_persistence.rst to fix skops example (scikit-learn#25993) Co-authored-by: adrinjalali <[email protected]> * DOC Specified meaning for max_patches=None in extract_patches_2d (scikit-learn#25996) * DOC document that last step is never cached in pipeline (scikit-learn#25995) Co-authored-by: Guillaume Lemaitre <[email protected]> * FIX SequentialFeatureSelector throws IndexError when cv is a generator (scikit-learn#25973) * ENH Adds infrequent categories support to OrdinalEncoder (scikit-learn#25677) Co-authored-by: Tim Head <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Andreas Mueller <[email protected]> * MAINT make plot_digits_denoising deterministic by fixing random state (scikit-learn#26004) * DOC improve example of PatchExtractor (scikit-learn#26002) * MAINT Parameters validation for datasets.make_friedman2 (scikit-learn#25986) * MAINT Parameters validation for datasets.make_friedman3 (scikit-learn#25989) * MAINT Parameters validation for datasets.make_sparse_uncorrelated (scikit-learn#26001) * MAINT Parameters validation for datasets.make_spd_matrix (scikit-learn#26003) * MAINT Parameters validation for datasets.make_sparse_spd_matrix (scikit-learn#26009) * DOC Added the meanings of default=None for PatchExtractor parameters (scikit-learn#26005) * MAINT remove unecessary check covered by parameter validation framework (scikit-learn#26014) * MAINT Consistent cython types from _typedefs (scikit-learn#25942) Co-authored-by: Julien Jerphanion <[email protected]> * MAINT Parameters validation for datasets.make_swiss_roll (scikit-learn#26020) * MAINT Parameters validation for datasets.make_s_curve (scikit-learn#26022) * MAINT Parameters validation for datasets.make_blobs (scikit-learn#25983) Co-authored-by: Guillaume Lemaitre <[email protected]> * DOC fix SplineTransformer include_bias docstring (scikit-learn#26018) * ENH RocCurveDisplay add option to plot chance level (scikit-learn#25987) * DOC show from_estimator and from_predictions for Displays (scikit-learn#25994) * EXA Fix rst in plot_partial_dependence (scikit-learn#26028) * CI Adds coverage to docker jobs on Azure (scikit-learn#26027) Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * API Replace `n_iter` in `Bayesian Ridge` and `ARDRegression` (scikit-learn#25697) Co-authored-by: Guillaume Lemaitre <[email protected]> * CLN Make _NumPyAPIWrapper naming consistent to _ArrayAPIWrapper (scikit-learn#26039) * CI disable coverage on Windows to keep CI times reasonable (scikit-learn#26052) * DOC Use Scientific Python Plausible instance for analytics (scikit-learn#25547) * MAINT Parameters validation for sklearn.preprocessing.scale (scikit-learn#26036) * MAINT Parameters validation for sklearn.metrics.pairwise.haversine_distances (scikit-learn#26047) * MAINT Parameters validation for sklearn.metrics.pairwise.laplacian_kernel (scikit-learn#26048) * MAINT Parameters validation for sklearn.metrics.pairwise.linear_kernel (scikit-learn#26049) * MAINT Parameters validation for sklearn.metrics.silhouette_samples (scikit-learn#26053) * MAINT Parameters validation for sklearn.preprocessing.add_dummy_feature (scikit-learn#26058) * Added Parameter Validation for metrics.cluster.normalized_mutual_info_score() (scikit-learn#26060) * DOC Typos in HistGradientBoosting documentation (scikit-learn#26057) * TST add global_random_seed fixture to sklearn/datasets/tests/test_rcv1.py (scikit-learn#26043) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_similarity (scikit-learn#26006) Co-authored-by: Jérémie du Boisberranger <[email protected]> * ENH Adds isdtype to Array API wrapper (scikit-learn#26029) * MAINT Parameters validation for sklearn.metrics.silhouette_score (scikit-learn#26054) Co-authored-by: Jérémie du Boisberranger <[email protected]> * FIX fix spelling mistake in _NumPyAPIWrapper (scikit-learn#26064) * CI ignore more non-library Python files in codecov (scikit-learn#26059) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_distances (scikit-learn#26046) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Introduce BinaryClassifierCurveDisplayMixin (scikit-learn#25969) Co-authored-by: Jérémie du Boisberranger <[email protected]> * ENH Forces shape to be tuple when using Array API's reshape (scikit-learn#26030) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Tim Head <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.paired_euclidean_distances (scikit-learn#26073) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_manhattan_distances (scikit-learn#26074) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_cosine_distances (scikit-learn#26075) * MAINT Parameters validation for sklearn.preprocessing.binarize (scikit-learn#26076) * MAINT Parameters validation for metrics.explained_variance_score (scikit-learn#26079) * DOC use correct template name for displays (scikit-learn#26081) * MAINT Parameters validation for sklearn.preprocessing.maxabs_scale (scikit-learn#26077) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.preprocessing.label_binarize (scikit-learn#26078) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT parameter validation for d2_absolute_error_score (scikit-learn#26066) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameter validation for roc_auc_score (scikit-learn#26007) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for sklearn.preprocessing.normalize (scikit-learn#26069) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameter validation for metrics.cluster.fowlkes_mallows_score (scikit-learn#26080) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for compose.make_column_transformer (scikit-learn#25897) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.polynomial_kernel (scikit-learn#26070) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.rbf_kernel (scikit-learn#26071) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.sigmoid_kernel (scikit-learn#26072) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Param validation: constraint for numeric missing values (scikit-learn#26085) * FIX Adds support for negative values in categorical features in gradient boosting (scikit-learn#25629) Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Tim Head <[email protected]> * MAINT Fix C warning in Cython module splitting.pyx (scikit-learn#26051) * MNT Updates _isotonic.pyx to use memoryviews instead of `cnp.ndarray` (scikit-learn#26068) * FIX Fixes memory regression for inspecting extension arrays (scikit-learn#26106) * PERF set openmp to use only physical cores by default (scikit-learn#26082) * MNT Update black to 23.3.0 (scikit-learn#26110) * MNT Adds black commit to git-blame-ignore-revs (scikit-learn#26111) * MAINT Parameters validation for sklearn.metrics.pair_confusion_matrix (scikit-learn#26107) * MAINT Parameters validation for sklearn.metrics.mean_poisson_deviance (scikit-learn#26104) * DOC Use notebook style in plot_lof_outlier_detection.py (scikit-learn#26017) Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT utils._fast_dict uses types from utils._typedefs (scikit-learn#26025) * DOC remove sparse-matrix for `y` in ElasticNet (scikit-learn#26127) * ENH add exponential loss (scikit-learn#25965) * MAINT Parameters validation for sklearn.preprocessing.robust_scale (scikit-learn#26086) * MAINT Parameters validation for sklearn.datasets.fetch_rcv1 (scikit-learn#26126) * MAINT Parameters validation for sklearn.metrics.adjusted_rand_score (scikit-learn#26134) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.calinski_harabasz_score (scikit-learn#26135) * MAINT Parameters validation for sklearn.metrics.davies_bouldin_score (scikit-learn#26136) * MAINT: remove `from numpy.math cimport` statements (scikit-learn#26143) * MAINT Parameters validation for sklearn.inspection.permutation_importance (scikit-learn#26145) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.cluster.homogeneity_completeness_v_measure (scikit-learn#26137) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.rand_score (scikit-learn#26138) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC update comment in metrics/tests/test_classification.py (scikit-learn#26150) * CI small cleanup of Cirrus CI test script (scikit-learn#26168) * MAINT remove deprecated is_categorical_dtype (scikit-learn#26156) * DOC Add skforecast to related projects page (scikit-learn#26133) Co-authored-by: Thomas J. Fan <[email protected]> * FIX Keeps namedtuple's class when transform returns a tuple (scikit-learn#26121) * DOC corrected letter case for better readability in sklearn/metrics/_classification.py / (scikit-learn#26169) * MAINT Parameters validation for sklearn.preprocessing.power_transform (scikit-learn#26142) * FIX `roc_auc_score` now uses `y_prob` instead of `y_pred` (scikit-learn#26155) * MAINT Parameters validation for sklearn.datasets.load_iris (scikit-learn#26177) * MAINT Parameters validation for sklearn.datasets.load_diabetes (scikit-learn#26166) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.load_breast_cancer (scikit-learn#26165) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.cluster.entropy (scikit-learn#26162) * MAINT Parameters validation for sklearn.datasets.fetch_species_distributions (scikit-learn#26161) Co-authored-by: Jérémie du Boisberranger <[email protected]> * ASV Fix tol in SGDRegressorBenchmark (scikit-learn#26146) Co-authored-by: jeremie du boisberranger <[email protected]> * MNT use api.openml.org URLs for fetch_openml (scikit-learn#26171) * MAINT Parameters validation for sklearn.utils.resample (scikit-learn#26139) * MAINT make it explicit that additive_chi2_kernel does not accept sparse matrix (scikit-learn#26178) * MNT fix circleci link in README.rst (scikit-learn#26183) * CI Fix circleci artifact redirector action (scikit-learn#26181) * GOV introduce rights for groups as discussed in SLEP019 (scikit-learn#25753) Co-authored-by: Julien <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * MAINT Parameters validation for sklearn.neighbors.sort_graph_by_row_values (scikit-learn#26173) Co-authored-by: Jérémie du Boisberranger <[email protected]> * FIX improve convergence criterion for LogisticRegression(penalty="l1", solver='liblinear') (scikit-learn#25214) Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * MAINT Fix several typos in src and doc files (scikit-learn#26187) * PERF fix overhead of _rescale_data in LinearRegression (scikit-learn#26207) * ENH add Huber loss (scikit-learn#25966) * MAINT Refactor GraphicalLasso and graphical_lasso (scikit-learn#26033) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Cython linting (scikit-learn#25861) * DOC Add JupyterLite button in example gallery (scikit-learn#25887) * MAINT Parameters validation for sklearn.covariance.ledoit_wolf_shrinkage (scikit-learn#26200) * MAINT Parameters validation for sklearn.datasets.load_linnerud (scikit-learn#26199) * MAINT Parameters validation for sklearn.datasets.load_wine (scikit-learn#26196) * DOC Added redirect to Provost paper + minor refactor (scikit-learn#26223) * MAINT Parameter Validation for `covariance.graphical_lasso` (scikit-learn#25053) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.load_digits (scikit-learn#26195) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.preprocessing.quantile_transform (scikit-learn#26144) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.model_selection.cross_validate (scikit-learn#26129) Co-authored-by: jeremiedbb <[email protected]> * DOC Adds TargetEncoder example explaining the internal CV (scikit-learn#26185) Co-authored-by: Tim Head <[email protected]> * spelling mistake corrected in documentation for script `plot_document_clustering.py` (scikit-learn#26228) Co-authored-by: Olivier Grisel <[email protected]> * FIX possible UnboundLocalError in fetch_openml (scikit-learn#26236) * ENH Adds PyTorch support to LinearDiscriminantAnalysis (scikit-learn#25956) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Tim Head <[email protected]> * MNT Use fixed version of Pyodide (scikit-learn#26247) * MNT Reset transform_output default in example to fix doc build build (scikit-learn#26269) * DOC Update example plot_nearest_centroid.py (scikit-learn#26263) * MNT reduce JupyterLite build size (scikit-learn#26246) * DOC term -> meth in GradientBoosting (scikit-learn#26225) * MNT speed-up html-noplot build (scikit-learn#26245) Co-authored-by: Thomas J. Fan <[email protected]> * MNT Use copy=False when creating DataFrames (scikit-learn#26272) * MAINT Parameters validation for sklearn.model_selection.permutation_test_score (scikit-learn#26230) * MAINT Parameters validation for sklearn.datasets.clear_data_home (scikit-learn#26259) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.load_files (scikit-learn#26203) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.get_data_home (scikit-learn#26260) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC Fix y-axis plot labels in permutation test score example (scikit-learn#26240) * MAINT cython-lint ignores asv_benchmarks (scikit-learn#26282) * MAINT Parameter validation for metrics.cluster._supervised (scikit-learn#26258) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC Improve docstring for tol in SequentialFeatureSelector (scikit-learn#26271) * MAINT Parameters validation for sklearn.datasets.load_sample_image (scikit-learn#26226) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC Consistent param type for pos_label (scikit-learn#26237) * DOC Minor grammar fix to imputation docs (scikit-learn#26283) * MAINT Parameters validation for sklearn.calibration.calibration_curve (scikit-learn#26198) Co-authored-by: jeremie du boisberranger <[email protected]> * MAINT Parameters validation for sklearn.inspection.partial_dependence (scikit-learn#26209) Co-authored-by: jeremie du boisberranger <[email protected]> * MAINT Parameters validation for sklearn.model_selection.validation_curve (scikit-learn#26229) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.model_selection.learning_curve (scikit-learn#26227) Co-authored-by: jeremie du boisberranger <[email protected]> * MNT Remove deprecated pandas.api.types.is_sparse (scikit-learn#26287) * CI Use Trusted Publishers for uploading wheels to PyPI (scikit-learn#26249) * MAINT Parameters validation for sklearn.metrics.pairwise.manhattan_distances (scikit-learn#26122) * PERF revert openmp use in csr_row_norms (scikit-learn#26275) * MAINT Parameters validation for metrics.check_scoring (scikit-learn#26041) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MNT Improve error message when checking classification target is of a non-regression type (scikit-learn#26281) Co-authored-by: Adrin Jalali <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * DOC fix link to User Guide encoder_infrequent_categories (scikit-learn#26309) * MNT remove unused args in _predict_regression_tree_inplace_fast_dense (scikit-learn#26314) * ENH Adds missing value support for trees (scikit-learn#23595) Co-authored-by: Tim Head <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> * CLN Clean up logic in validate_data and cast_to_ndarray (scikit-learn#26300) * MAINT refactor scorer using _get_response_values (scikit-learn#26037) Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Adrin Jalali <[email protected]> * DOC Add HGBDT to "see also" section of random forests (scikit-learn#26319) Co-authored-by: ArturoAmorQ <[email protected]> Co-authored-by: Tim Head <[email protected]> * MNT Bump Github Action labeler version to use newer Node (scikit-learn#26302) * FIX thresholds should not exceed 1.0 with probabilities in `roc_curve` (scikit-learn#26194) Co-authored-by: Olivier Grisel <[email protected]> * ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices (scikit-learn#23731) Co-authored-by: Aleksandr Kokhaniukov <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * DOC Fix minor typo (scikit-learn#26327) * MAINT bump minimum version for pytest (scikit-learn#26184) Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Adrin Jalali <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * DOC fix return type in isotonic_regression (scikit-learn#26332) * FIX fix available_if for MultiOutputRegressor.partial_fit (scikit-learn#26333) Co-authored-by: Guillaume Lemaitre <[email protected]> * FIX make pipeline pass check_estimator (scikit-learn#26325) * FEA Add multiclass support to `average_precision_score` (scikit-learn#24769) Co-authored-by: Geoffrey <[email protected]> Co-authored-by: gbolmier <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> --------- Signed-off-by: Julien Jerphanion <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Meekail Zain <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: zeeshan lone <[email protected]> Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Adrin Jalali <[email protected]> Co-authored-by: Shiva chauhan <[email protected]> Co-authored-by: AymericBasset <[email protected]> Co-authored-by: Maren Westermann <[email protected]> Co-authored-by: Nishu Choudhary <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Benedek Harsanyi <[email protected]> Co-authored-by: Pooja Subramaniam <[email protected]> Co-authored-by: Rushil Desai <[email protected]> Co-authored-by: Xiao Yuan <[email protected]> Co-authored-by: Omar Salman <[email protected]> Co-authored-by: 2357juan <[email protected]> Co-authored-by: Théophile Baranger <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Andreas Mueller <[email protected]> Co-authored-by: Jovan Stojanovic <[email protected]> Co-authored-by: Rahil Parikh <[email protected]> Co-authored-by: Bharat Raghunathan <[email protected]> Co-authored-by: Sortofamudkip <[email protected]> Co-authored-by: Gleb Levitski <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]> Co-authored-by: Ashwin Mathur <[email protected]> Co-authored-by: Sahil Gupta <[email protected]> Co-authored-by: Veghit <[email protected]> Co-authored-by: Itay <[email protected]> Co-authored-by: precondition <[email protected]> Co-authored-by: Marc Torrellas Socastro <[email protected]> Co-authored-by: Dominic Fox <[email protected]> Co-authored-by: futurewarning <[email protected]> Co-authored-by: Yao Xiao <[email protected]> Co-authored-by: Joey Ortiz <[email protected]> Co-authored-by: Tim Head <[email protected]> Co-authored-by: Christian Veenhuis <[email protected]> Co-authored-by: adienes <[email protected]> Co-authored-by: Dave Berenbaum <[email protected]> Co-authored-by: Lene Preuss <[email protected]> Co-authored-by: A.H.Mansouri <[email protected]> Co-authored-by: Boris Feld <[email protected]> Co-authored-by: Carla J <[email protected]> Co-authored-by: windiana42 <[email protected]> Co-authored-by: mdarii <[email protected]> Co-authored-by: murezzda <[email protected]> Co-authored-by: Peter Piontek <[email protected]> Co-authored-by: John Pangas <[email protected]> Co-authored-by: Dmitry Nesterov <[email protected]> Co-authored-by: Yuchen Zhou <[email protected]> Co-authored-by: Ekaterina Butyugina <[email protected]> Co-authored-by: Jiawei Zhang <[email protected]> Co-authored-by: Ansam Zedan <[email protected]> Co-authored-by: genvalen <[email protected]> Co-authored-by: farhan khan <[email protected]> Co-authored-by: Arturo Amor <[email protected]> Co-authored-by: Jiawei Zhang <[email protected]> Co-authored-by: Ralf Gommers <[email protected]> Co-authored-by: Jessicakk0711 <[email protected]> Co-authored-by: Ankur Singh <[email protected]> Co-authored-by: Seoeun(Sun☀️) Hong <[email protected]> Co-authored-by: Nightwalkx <[email protected]> Co-authored-by: VIGNESH D <[email protected]> Co-authored-by: Vincent-violet <[email protected]> Co-authored-by: Elabonga Atuo <[email protected]> Co-authored-by: Tom Dupré la Tour <[email protected]> Co-authored-by: André Pedersen <[email protected]> Co-authored-by: Ashish Dutt <[email protected]> Co-authored-by: Phil <[email protected]> Co-authored-by: Stanislav (Stanley) Modrak <[email protected]> Co-authored-by: hujiahong726 <[email protected]> Co-authored-by: James Dean <[email protected]> Co-authored-by: ArturoAmorQ <[email protected]> Co-authored-by: Aleksandr Kokhaniukov <[email protected]> Co-authored-by: c-git <[email protected]> Co-authored-by: annegnx <[email protected]> Co-authored-by: Geoffrey <[email protected]> Co-authored-by: gbolmier <[email protected]>

Micky774 added 5 commits July 29, 2022 19:40

Partial implementation, still broken

8a99217

Major update, mostly working

c071338

Merge branch 'main' into pwd_kncp

cbe5347

Completed X-parallel implementation

a995220

Improved documentation

052e59b

Micky774 added the Performance label Aug 1, 2022

github-actions bot added module:metrics module:neighbors cython labels Aug 1, 2022

Micky774 added Enhancement and removed module:metrics module:neighbors Enhancement labels Aug 1, 2022

Micky774 changed the title ~~ENH: Implement PairwiseDistancesReduction backend for KNeighbors.predict~~ PERF: Implement PairwiseDistancesReduction backend for KNeighbors.predict Aug 1, 2022

Micky774 added module:metrics module:neighbors labels Aug 1, 2022

jjerphan reviewed Aug 3, 2022

View reviewed changes

jjerphan changed the title ~~PERF: Implement PairwiseDistancesReduction backend for KNeighbors.predict~~ PERF Implement PairwiseDistancesReduction backend for KNeighbors.predict Aug 3, 2022

Micky774 added 3 commits August 4, 2022 13:10

First batch of review feedback

fc8d495

Created inline helper function for weighted mode

6334581

Merge branch 'main' into pwd_kncp

6f45f79

Multioutput support with probabilities output refactor

db31c78

jjerphan reviewed Aug 5, 2022

View reviewed changes

jjerphan mentioned this pull request Aug 5, 2022

PERF PairwiseDistancesReductions initial work #22587

Closed

Micky774 added 4 commits August 5, 2022 11:49

Merge branch 'main' into pwd_kncp

b96cab4

Code simplification and cleanup

59f60ed

Code simplification

a343b58

Merge branch 'main' into pwd_kncp

e419f96

Update doc/whats_new/v1.3.rst

11b3106

Co-authored-by: Julien <[email protected]>

Fixed inconsistency between strategies

3bc9e2b

Altered strategy in response to benchmarks

ca438ee

jjerphan approved these changes Feb 27, 2023

View reviewed changes

doc/whats_new/v1.3.rst Show resolved Hide resolved

Micky774 added 2 commits February 27, 2023 17:48

Fixed when validation occurs, avoiding accidental double validation

8f4b371

Merge branch 'main' into pwd_kncp

a1370fd

jjerphan reviewed Mar 6, 2023

View reviewed changes

sklearn/neighbors/_classification.py Show resolved Hide resolved

jjerphan reviewed Mar 6, 2023

View reviewed changes

setup.py Outdated Show resolved Hide resolved

ogrisel approved these changes Mar 7, 2023

View reviewed changes

Micky774 and others added 7 commits March 10, 2023 10:06

Apply suggestions from code review

6551d86

Co-authored-by: Olivier Grisel <[email protected]>

Updated with review feedback

f52468d

Update sklearn/neighbors/_classification.py

e403681

Co-authored-by: Julien Jerphanion <[email protected]>

Updated .gitignore

25d142f

Merge branch 'main' into pwd_kncp

1c0d1e5

Updated to reconcile new name

6f4433d

Qualify cdef methods with "noexcept nogil"

d13793d

Signed-off-by: Julien Jerphanion <[email protected]>

ogrisel merged commit b6b6f63 into scikit-learn:main Mar 14, 2023

jjerphan mentioned this pull request Mar 17, 2023

PERF PairwiseDistancesReductions subsequent work #25888

Open

21 tasks

Micky774 mentioned this pull request May 10, 2023

FIX Update pairwise distance function argument names #26351

Merged

This was referenced Aug 5, 2023

MAINT Make ArgKminClassMode accept sparse datasets #27018

Merged

PERF Implement PairwiseDistancesReduction backend for RadiusNeighbors.predict_proba #26828

Merged

	if strategy == 'auto':
	# This is a simple heuristic whose constant for the
	# comparison has been chosen based on experiments.
	if 4 * self.chunk_size * self.effective_n_threads < self.n_samples_X:
	strategy = 'parallel_on_X'
	else:
	strategy = 'parallel_on_Y'

Uh oh!

PERF Implement PairwiseDistancesReduction backend for KNeighbors.predict_proba #24076

PERF Implement PairwiseDistancesReduction backend for KNeighbors.predict_proba #24076

Uh oh!

Conversation

Micky774 commented Aug 1, 2022 • edited by OmarManzoor Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jjerphan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Micky774 commented Aug 5, 2022

Uh oh!

Micky774 commented Aug 5, 2022

Uh oh!

jjerphan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Micky774 commented Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Micky774 commented Feb 26, 2023

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jjerphan commented Mar 6, 2023

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan commented Mar 13, 2023

Uh oh!

jjerphan commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Mar 14, 2023

Uh oh!

jjerphan commented Mar 14, 2023

Uh oh!

Uh oh!

PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` #24076

PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` #24076

Micky774 commented Aug 1, 2022 •

edited by OmarManzoor

Loading

jjerphan left a comment •

edited

Loading

jjerphan left a comment •

edited

Loading

Micky774 commented Feb 24, 2023 •

edited

Loading

jjerphan commented Mar 13, 2023 •

edited

Loading