DOC Use Scientific Python Plausible instance for analytics #25547

lesteve · 2023-02-06T10:37:21Z

This switches analytics to https://views.scientific-python.org/ which is managed by Scientific Python. See scipy/scipy.org#435 for example, that added it for the scipy.org website and scipy/scipy#15401 that added it to docs.scipy.org

I asked via the Scientific Python Discord if this was fine to use it for scikit-learn and Stefan Van der Walt said it was. See this Discord message for more details.

Follow-up actions:

do something like ENH Anonymize Analytics and remove former google search scikit-learn.github.io#19 to swith to Plausible for all the older versions doc.
if we think this is worth it it looks like this is possible to import Google Analytics data, see https://plausible.io/docs/google-analytics-import

lesteve · 2023-02-06T11:14:34Z

That seems to work fine, clicking around in this PR doc I can see visits on https://views.scientific-python.org/scikit-learn.org

If you want to have a look at the dashboard:

create an account https://views.scientific-python.org
I can add you afterwards, I think I only need your email

tupui

LGTM, great decision 😃

ogrisel · 2023-02-06T16:37:32Z

Thanks for moving forward with this @lesteve. While we are at it, I think we should be more transparent and add a section about this in our README.rst as scipy did.

BTW @tupui what is the load / cost of running this service? scikit-learn is typically averaging between 4.5 and 6.5 million monthly page views (depending on the season) in 2022. It's slowly growing over years but I wouldn't expect those numbers to double before at least a few more years (if ever).

/cc @francoisgoupil as he expressed interest about this topic in the past.

tupui · 2023-02-06T16:39:00Z

BTW @tupui what is the load / cost of running this service? scikit-learn is typically averaging between 4.5 and 6.5 million monthly page views (depending on the season) at the moment.

I don't know, this is question for @stefanv or @jarrodmillman 😃 Also interested to know and in general I am not sure how we are safe from things like DDOS.

ogrisel · 2023-02-06T16:49:52Z

@lesteve we should probably wait for the next scikit-learn monthly meeting before merging this PR (and add it to the agenda).

stefanv · 2023-02-06T18:07:46Z

Easiest way to know if the machine can handle the load is to try it. Calls should be async, so even if the server goes down it won't block loading of the site.

thomasjpfan · 2023-02-07T15:23:51Z

Thank you @lesteve for putting this together! I am +1 on moving over to Plausible hosted by Scientific Python. As for the current Google Analytics data, I would want to export it somewhere just in case we need to reference it. If the dataset is too big, I'm okay with aggregating it. The historical number of active visitors is a common metric used to find funding.

With that in mind, I think it is important to backup the data on the Plausible instance, or at least export an aggregation of it. There is a Plausible API that we can query once a week to aggregate some data and place it on a public GitHub Repo. For the above use case, I think that would be good enough.

As for this PR, I prefer to turn analytics off on PRs. It would add more data into the database that is not useful.

laurburke · 2023-02-07T16:09:17Z

I also vote for us switching over to Plausible. If Numpy and Scipy are using it, I trust that we can consider it to be a good standard.

At this point in time, web analytics data is more of a nice to have. While it's useful to see where our userbase lies and how they like to interact with us, I don't consider it something we should pay for.

stefanv · 2023-02-07T18:46:41Z

As @thomasjpfan mentions, the database is not currently backed up. We have had to reset once before due to a plausible upgrade failure. If having reliable historic tracking is important to the project, I'd welcome help from someone with devops experience.

lesteve · 2023-02-08T07:20:08Z

If having reliable historic tracking is important to the project, I'd welcome help from someone with devops experience.

cc @norbusan since he asked, in our internal mailing list, what kind of help would be useful to make the Plausible instance more reliable. To be clear @norbusan I am not saying you should volunteer some time on this 😉.

stefanv · 2023-02-08T07:32:27Z

The instance is pretty reliable; we just don't have a backup procedure in place. I use backblaze to backup https://discuss.scientific-python.org, but there's only so much unpaid quota.

Here's an example of the type of issue I've run into before: plausible uses clickhouse as an events database. After an upgrade, a table is corrupted. Running a SELECT on the table gives some hideous internal error, and Googling for that error just makes you feel lonely.

So, yes, you could probably create a DB from scratch and populate it from the old volume, using Yandex's outdated Clickhouse client Docker image and praying everything holds together. But that's the kind of effort I don't want you to expect, unless a volunteer steps up.

tupui · 2023-02-08T08:30:14Z

It should just be a simple dump no? I am not familiar with ClickHouse but when I setup a Postgres DB this is one of the first thing I set in the background, a dump/restore strategy. It's simple, a bit slow but still the recommended way to backup.

jjerphan

Thank you for initiating this, @lesteve.

I think having a way to understand the webpages people browse has high value for scikit-learn. I have enough work to do for scikit-learn already and I trust people for taking the best decisions regarding canary-migrating from Google Analytics.

stefanv · 2023-02-08T14:39:31Z

It should just be a simple dump no? I am not familiar with ClickHouse but when I setup a Postgres DB this is one of the first thing I set in the background, a dump/restore strategy. It's simple, a bit slow but still the recommended way to backup.

Backing up the (two) databases is not hard, it just needs to be done. I.e., set up a cron job and find a place to put the data.

norbusan · 2023-02-10T05:03:02Z

It should just be a simple dump no? I am not familiar with ClickHouse but when I setup a Postgres DB this is one of the first thing I set in the background, a dump/restore strategy. It's simple, a bit slow but still the recommended way to backup.

Backing up the (two) databases is not hard, it just needs to be done. I.e., set up a cron job and find a place to put the data.

How big the databases are as dumps?

Considering other analytics options that require in the hundreds of EUR/USD per month, a backup to backblaze/aws/gcp should be far less problematic.

…nto plausible-analytics

lesteve · 2023-03-20T13:43:49Z

I haved added this PR as a topic in the next scikit-learn developer meeting.

I have also kept Google Analytics for now since I had some feed-back that keeping both for some time may be a good idea.

ogrisel

+1 for keeping both at least for a few months.

ogrisel · 2023-04-01T16:14:46Z

I think we have a consensus on merging this as it was discussed at the last meeting and nobody expressed objections. Let's merge this at the beginning of next week not to put unnecessary ops pressure on the server admins on a WE :)

ogrisel · 2023-04-04T08:12:08Z

I connected today and so far I see very few hits on https://views.scientific-python.org/scikit-learn.org:

But at this point, the plausible tracker is only deployed on the /dev/ subtree of the website. I tried to compare with the numbers from the GA tracker on the /dev/ subtree but unfortunately I cannot get those numbers on an hourly resolution (only daily). So we need to wait for a few days.

On GA we typically get between 0.5k and 2k pageviews on the /dev/ subtree while we get between 120k and 220k on the /stable/ subtree.

lesteve · 2023-04-12T08:51:37Z

I had a quick look comparing Plausible and Google Analytics. I used the same date range 5 April - 11 April:

The number are not too far but are not super close either. I guess there are implementation details that differ and we should not expect an exact match, for example see https://plausible.io/vs-google-analytics#avoiding-the-adblockers.

Is this good enough to try and use Plausible on the stable website? If so, I will open a PR targetting the 1.2.X branch.

betatim · 2023-04-12T09:30:07Z

I'd vote for "close enough". Maybe with it being on stable we get more statistics and can see that the gaps get smaller (as a fraction). For example the sorting of which pages are visited most should become more stable. I think right now the order doesn't match up very well for the lower "top ten" but that is because it is small statistics.

…arn#25547)

lesteve · 2023-04-12T12:42:59Z

Sounds good, I have opened #26160 to backport this PR in the 1.2.X branch.

…arn#25547)

* MAINT Clean deprecated losses in (hist) gradient boosting for 1.3 (scikit-learn#25834) * MAINT Clean deprecation of normalize in calibration_curve for 1.3 (scikit-learn#25833) * BLD Clean command removes generated from cython templates (scikit-learn#25839) * PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` (scikit-learn#24076) Signed-off-by: Julien Jerphanion <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * MAINT Added Parameter Validation for datasets.make_circles (scikit-learn#25848) Co-authored-by: jeremiedbb <[email protected]> * MNT use a single job by default with sphinx build (scikit-learn#25836) * BLD Generate warning automatically for templated cython files (scikit-learn#25842) * MAINT parameter validation for sklearn.datasets.fetch_lfw_people (scikit-learn#25820) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for metrics.fbeta_score (scikit-learn#25841) * TST add global_random_seed fixture to sklearn/covariance/tests/test_robust_covariance.py (scikit-learn#25821) * MAINT Parameter validation for linear_model.orthogonal_mp (scikit-learn#25817) * TST activate common tests for TSNE (scikit-learn#25374) * CI Update lock files (scikit-learn#25849) * MAINT Added Parameter Validation for metrics.mean_gamma_deviance (scikit-learn#25853) * MAINT Parameters validation for feature_selection.mutual_info_regression (scikit-learn#25850) * MAINT parameter validation metrics.class_likelihood_ratios (scikit-learn#25863) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Ensure disjoint interval constraints (scikit-learn#25797) * MAINT Parameters validation for utils.gen_batches (scikit-learn#25864) * TST use global_random_seed in test_dict_vectorizer.py (scikit-learn#24533) * TST use global_random_seed in test_pls.py (scikit-learn#24526) Co-authored-by: jeremiedbb <[email protected]> * TST use global_random_seed in test_gpc.py (scikit-learn#24600) Co-authored-by: jeremiedbb <[email protected]> * DOC Fix overlapping plot axis in bench_sample_without_replacement.py (scikit-learn#25870) * MAINT Use contiguous memoryviews in _random.pyx (scikit-learn#25871) * MAINT parameter validation sklearn.datasets.fetch_lfw_pair (scikit-learn#25857) * MAINT Parameters validation for metrics.classification_report (scikit-learn#25868) * Empty commit * DOC fix docstring dtype parameter in OrdinalEncoder (scikit-learn#25877) * MAINT Clean up depreacted "log" loss of SGDClassifier for 1.3 (scikit-learn#25865) * ENH Adds TargetEncoder (scikit-learn#25334) Co-authored-by: Andreas Mueller <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Jovan Stojanovic <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> * CI make it possible to cancel running Azure jobs (scikit-learn#25876) * MAINT Clean-up deprecated if_delegate_has_method for 1.3 (scikit-learn#25879) * MAINT Parameter validation for tree.export_text (scikit-learn#25867) * DOC impact of `tol` for solvers in RidgeClassifier (scikit-learn#25530) * MAINT Parameters validation for metrics.hinge_loss (scikit-learn#25880) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for metrics.ndcg_score (scikit-learn#25885) * ENH KMeans initialization account for sample weights (scikit-learn#25752) Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * TST use global_random_seed in sklearn/tests/test_dummy.py (scikit-learn#25884) * DOC improve calibration user guide (scikit-learn#25687) * ENH Support for sparse matrices added to `sklearn.metrics.silhouette_samples` (scikit-learn#24677) Co-authored-by: Sahil Gupta <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT validate_params for plot_tree (scikit-learn#25882) Co-authored-by: Itay <[email protected]> * MAINT add missing space in error message in SVM (scikit-learn#25913) * FIX Adds requires_y tag to TargetEncoder (scikit-learn#25917) * MAINT Consistent cython types continued (scikit-learn#25810) * TST Speed-up common tests of DictionaryLearning (scikit-learn#25892) * TST Speed-up test_dbscan_optics_parity (scikit-learn#25893) * ENH add np.nan option for zero_division in precision/recall/f-score (scikit-learn#25531) Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT Parameters validation for datasets.make_low_rank_matrix (scikit-learn#25901) * MAINT Parameter validation for metrics.cluster.adjusted_mutual_info_score (scikit-learn#25898) Co-authored-by: Jérémie du Boisberranger <[email protected]> * TST Speed-up test_partial_dependence.test_output_shape (scikit-learn#25895) Co-authored-by: Thomas J. Fan <[email protected]> * MAINT Parameters validation for datasets.make_regression (scikit-learn#25899) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for metrics.mean_squared_log_error (scikit-learn#25924) * TST Use global_random_seed in tests/test_naive_bayes.py (scikit-learn#25890) * TST add global_random_seed fixture to sklearn/datasets/tests/test_covtype.py (scikit-learn#25904) Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for datasets.make_multilabel_classification (scikit-learn#25920) * Fixed feature mapping typo (scikit-learn#25934) * MAINT switch to newer codecov uploader (scikit-learn#25919) Co-authored-by: Loïc Estève <[email protected]> * TST Speed-up test suite when using pytest-xdist (scikit-learn#25918) * DOC update license year to 2023 (scikit-learn#25936) * FIX Remove spurious feature names warning in IsolationForest (scikit-learn#25931) * TST fix unstable test_newrand_set_seed (scikit-learn#25940) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Clean-up deprecated max_features="auto" in trees/forests/gb (scikit-learn#25941) * MAINT LogisticRegression informative error msg when penaly=elasticnet and l1_ratio is None (scikit-learn#25925) Co-authored-by: jeremiedbb <[email protected]> * MAINT Clean-up remaining SGDClassifier(loss="log") (scikit-learn#25938) * FIX Fixes pandas extension arrays in check_array (scikit-learn#25813) * FIX Fixes pandas extension arrays with objects in check_array (scikit-learn#25814) * CI Disable pytest-xdist in pylatest_pip_openblas_pandas build (scikit-learn#25943) * MAINT remove deprecated call to resources.content (scikit-learn#25951) * DOC note on calibration impact on ranking (scikit-learn#25900) * Remove loguniform fix, use scipy.stats instead (scikit-learn#24665) Co-authored-by: Olivier Grisel <[email protected]> * MAINT Fix broken links in cluster.dbscan module (scikit-learn#25958) * DOC Fix lars Xy shape (scikit-learn#25952) * ENH Add drop_intermediate parameter to metrics.precision_recall_curve (scikit-learn#24668) Co-authored-by: Guillaume Lemaitre <[email protected]> * FIX improve error message when computing NDCG with a single document (scikit-learn#25672) Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT introduce _get_response_values and _check_response_methods (scikit-learn#23073) Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Extend message for large sparse matrices support (scikit-learn#25961) Co-authored-by: Meekail Zain <[email protected]> * MAINT Parameters validation for datasets.make_gaussian_quantiles (scikit-learn#25959) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.d2_tweedie_score (scikit-learn#25975) * MAINT Parameters validation for datasets.make_hastie_10_2 (scikit-learn#25967) * MAINT Parameters validation for preprocessing.minmax_scale (scikit-learn#25962) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for datasets.make_checkerboard (scikit-learn#25955) * MAINT Parameters validation for datasets.make_biclusters (scikit-learn#25945) * MAINT Parameters validation for datasets.make_moons (scikit-learn#25971) * DOC replace deviance by loss in docstring of GradientBoosting (scikit-learn#25968) * MAINT Fix broken link in feature_selection/_univariate_selection.py (scikit-learn#25984) * DOC Update model_persistence.rst to fix skops example (scikit-learn#25993) Co-authored-by: adrinjalali <[email protected]> * DOC Specified meaning for max_patches=None in extract_patches_2d (scikit-learn#25996) * DOC document that last step is never cached in pipeline (scikit-learn#25995) Co-authored-by: Guillaume Lemaitre <[email protected]> * FIX SequentialFeatureSelector throws IndexError when cv is a generator (scikit-learn#25973) * ENH Adds infrequent categories support to OrdinalEncoder (scikit-learn#25677) Co-authored-by: Tim Head <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Andreas Mueller <[email protected]> * MAINT make plot_digits_denoising deterministic by fixing random state (scikit-learn#26004) * DOC improve example of PatchExtractor (scikit-learn#26002) * MAINT Parameters validation for datasets.make_friedman2 (scikit-learn#25986) * MAINT Parameters validation for datasets.make_friedman3 (scikit-learn#25989) * MAINT Parameters validation for datasets.make_sparse_uncorrelated (scikit-learn#26001) * MAINT Parameters validation for datasets.make_spd_matrix (scikit-learn#26003) * MAINT Parameters validation for datasets.make_sparse_spd_matrix (scikit-learn#26009) * DOC Added the meanings of default=None for PatchExtractor parameters (scikit-learn#26005) * MAINT remove unecessary check covered by parameter validation framework (scikit-learn#26014) * MAINT Consistent cython types from _typedefs (scikit-learn#25942) Co-authored-by: Julien Jerphanion <[email protected]> * MAINT Parameters validation for datasets.make_swiss_roll (scikit-learn#26020) * MAINT Parameters validation for datasets.make_s_curve (scikit-learn#26022) * MAINT Parameters validation for datasets.make_blobs (scikit-learn#25983) Co-authored-by: Guillaume Lemaitre <[email protected]> * DOC fix SplineTransformer include_bias docstring (scikit-learn#26018) * ENH RocCurveDisplay add option to plot chance level (scikit-learn#25987) * DOC show from_estimator and from_predictions for Displays (scikit-learn#25994) * EXA Fix rst in plot_partial_dependence (scikit-learn#26028) * CI Adds coverage to docker jobs on Azure (scikit-learn#26027) Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * API Replace `n_iter` in `Bayesian Ridge` and `ARDRegression` (scikit-learn#25697) Co-authored-by: Guillaume Lemaitre <[email protected]> * CLN Make _NumPyAPIWrapper naming consistent to _ArrayAPIWrapper (scikit-learn#26039) * CI disable coverage on Windows to keep CI times reasonable (scikit-learn#26052) * DOC Use Scientific Python Plausible instance for analytics (scikit-learn#25547) * MAINT Parameters validation for sklearn.preprocessing.scale (scikit-learn#26036) * MAINT Parameters validation for sklearn.metrics.pairwise.haversine_distances (scikit-learn#26047) * MAINT Parameters validation for sklearn.metrics.pairwise.laplacian_kernel (scikit-learn#26048) * MAINT Parameters validation for sklearn.metrics.pairwise.linear_kernel (scikit-learn#26049) * MAINT Parameters validation for sklearn.metrics.silhouette_samples (scikit-learn#26053) * MAINT Parameters validation for sklearn.preprocessing.add_dummy_feature (scikit-learn#26058) * Added Parameter Validation for metrics.cluster.normalized_mutual_info_score() (scikit-learn#26060) * DOC Typos in HistGradientBoosting documentation (scikit-learn#26057) * TST add global_random_seed fixture to sklearn/datasets/tests/test_rcv1.py (scikit-learn#26043) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_similarity (scikit-learn#26006) Co-authored-by: Jérémie du Boisberranger <[email protected]> * ENH Adds isdtype to Array API wrapper (scikit-learn#26029) * MAINT Parameters validation for sklearn.metrics.silhouette_score (scikit-learn#26054) Co-authored-by: Jérémie du Boisberranger <[email protected]> * FIX fix spelling mistake in _NumPyAPIWrapper (scikit-learn#26064) * CI ignore more non-library Python files in codecov (scikit-learn#26059) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_distances (scikit-learn#26046) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Introduce BinaryClassifierCurveDisplayMixin (scikit-learn#25969) Co-authored-by: Jérémie du Boisberranger <[email protected]> * ENH Forces shape to be tuple when using Array API's reshape (scikit-learn#26030) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Tim Head <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.paired_euclidean_distances (scikit-learn#26073) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_manhattan_distances (scikit-learn#26074) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_cosine_distances (scikit-learn#26075) * MAINT Parameters validation for sklearn.preprocessing.binarize (scikit-learn#26076) * MAINT Parameters validation for metrics.explained_variance_score (scikit-learn#26079) * DOC use correct template name for displays (scikit-learn#26081) * MAINT Parameters validation for sklearn.preprocessing.maxabs_scale (scikit-learn#26077) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.preprocessing.label_binarize (scikit-learn#26078) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT parameter validation for d2_absolute_error_score (scikit-learn#26066) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameter validation for roc_auc_score (scikit-learn#26007) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for sklearn.preprocessing.normalize (scikit-learn#26069) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameter validation for metrics.cluster.fowlkes_mallows_score (scikit-learn#26080) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for compose.make_column_transformer (scikit-learn#25897) Co-authored-by: jeremiedbb <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.polynomial_kernel (scikit-learn#26070) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.rbf_kernel (scikit-learn#26071) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.pairwise.sigmoid_kernel (scikit-learn#26072) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Param validation: constraint for numeric missing values (scikit-learn#26085) * FIX Adds support for negative values in categorical features in gradient boosting (scikit-learn#25629) Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Tim Head <[email protected]> * MAINT Fix C warning in Cython module splitting.pyx (scikit-learn#26051) * MNT Updates _isotonic.pyx to use memoryviews instead of `cnp.ndarray` (scikit-learn#26068) * FIX Fixes memory regression for inspecting extension arrays (scikit-learn#26106) * PERF set openmp to use only physical cores by default (scikit-learn#26082) * MNT Update black to 23.3.0 (scikit-learn#26110) * MNT Adds black commit to git-blame-ignore-revs (scikit-learn#26111) * MAINT Parameters validation for sklearn.metrics.pair_confusion_matrix (scikit-learn#26107) * MAINT Parameters validation for sklearn.metrics.mean_poisson_deviance (scikit-learn#26104) * DOC Use notebook style in plot_lof_outlier_detection.py (scikit-learn#26017) Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> * MAINT utils._fast_dict uses types from utils._typedefs (scikit-learn#26025) * DOC remove sparse-matrix for `y` in ElasticNet (scikit-learn#26127) * ENH add exponential loss (scikit-learn#25965) * MAINT Parameters validation for sklearn.preprocessing.robust_scale (scikit-learn#26086) * MAINT Parameters validation for sklearn.datasets.fetch_rcv1 (scikit-learn#26126) * MAINT Parameters validation for sklearn.metrics.adjusted_rand_score (scikit-learn#26134) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.calinski_harabasz_score (scikit-learn#26135) * MAINT Parameters validation for sklearn.metrics.davies_bouldin_score (scikit-learn#26136) * MAINT: remove `from numpy.math cimport` statements (scikit-learn#26143) * MAINT Parameters validation for sklearn.inspection.permutation_importance (scikit-learn#26145) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.cluster.homogeneity_completeness_v_measure (scikit-learn#26137) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.rand_score (scikit-learn#26138) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC update comment in metrics/tests/test_classification.py (scikit-learn#26150) * CI small cleanup of Cirrus CI test script (scikit-learn#26168) * MAINT remove deprecated is_categorical_dtype (scikit-learn#26156) * DOC Add skforecast to related projects page (scikit-learn#26133) Co-authored-by: Thomas J. Fan <[email protected]> * FIX Keeps namedtuple's class when transform returns a tuple (scikit-learn#26121) * DOC corrected letter case for better readability in sklearn/metrics/_classification.py / (scikit-learn#26169) * MAINT Parameters validation for sklearn.preprocessing.power_transform (scikit-learn#26142) * FIX `roc_auc_score` now uses `y_prob` instead of `y_pred` (scikit-learn#26155) * MAINT Parameters validation for sklearn.datasets.load_iris (scikit-learn#26177) * MAINT Parameters validation for sklearn.datasets.load_diabetes (scikit-learn#26166) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.load_breast_cancer (scikit-learn#26165) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.metrics.cluster.entropy (scikit-learn#26162) * MAINT Parameters validation for sklearn.datasets.fetch_species_distributions (scikit-learn#26161) Co-authored-by: Jérémie du Boisberranger <[email protected]> * ASV Fix tol in SGDRegressorBenchmark (scikit-learn#26146) Co-authored-by: jeremie du boisberranger <[email protected]> * MNT use api.openml.org URLs for fetch_openml (scikit-learn#26171) * MAINT Parameters validation for sklearn.utils.resample (scikit-learn#26139) * MAINT make it explicit that additive_chi2_kernel does not accept sparse matrix (scikit-learn#26178) * MNT fix circleci link in README.rst (scikit-learn#26183) * CI Fix circleci artifact redirector action (scikit-learn#26181) * GOV introduce rights for groups as discussed in SLEP019 (scikit-learn#25753) Co-authored-by: Julien <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * MAINT Parameters validation for sklearn.neighbors.sort_graph_by_row_values (scikit-learn#26173) Co-authored-by: Jérémie du Boisberranger <[email protected]> * FIX improve convergence criterion for LogisticRegression(penalty="l1", solver='liblinear') (scikit-learn#25214) Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * MAINT Fix several typos in src and doc files (scikit-learn#26187) * PERF fix overhead of _rescale_data in LinearRegression (scikit-learn#26207) * ENH add Huber loss (scikit-learn#25966) * MAINT Refactor GraphicalLasso and graphical_lasso (scikit-learn#26033) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Cython linting (scikit-learn#25861) * DOC Add JupyterLite button in example gallery (scikit-learn#25887) * MAINT Parameters validation for sklearn.covariance.ledoit_wolf_shrinkage (scikit-learn#26200) * MAINT Parameters validation for sklearn.datasets.load_linnerud (scikit-learn#26199) * MAINT Parameters validation for sklearn.datasets.load_wine (scikit-learn#26196) * DOC Added redirect to Provost paper + minor refactor (scikit-learn#26223) * MAINT Parameter Validation for `covariance.graphical_lasso` (scikit-learn#25053) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.load_digits (scikit-learn#26195) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.preprocessing.quantile_transform (scikit-learn#26144) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.model_selection.cross_validate (scikit-learn#26129) Co-authored-by: jeremiedbb <[email protected]> * DOC Adds TargetEncoder example explaining the internal CV (scikit-learn#26185) Co-authored-by: Tim Head <[email protected]> * spelling mistake corrected in documentation for script `plot_document_clustering.py` (scikit-learn#26228) Co-authored-by: Olivier Grisel <[email protected]> * FIX possible UnboundLocalError in fetch_openml (scikit-learn#26236) * ENH Adds PyTorch support to LinearDiscriminantAnalysis (scikit-learn#25956) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Tim Head <[email protected]> * MNT Use fixed version of Pyodide (scikit-learn#26247) * MNT Reset transform_output default in example to fix doc build build (scikit-learn#26269) * DOC Update example plot_nearest_centroid.py (scikit-learn#26263) * MNT reduce JupyterLite build size (scikit-learn#26246) * DOC term -> meth in GradientBoosting (scikit-learn#26225) * MNT speed-up html-noplot build (scikit-learn#26245) Co-authored-by: Thomas J. Fan <[email protected]> * MNT Use copy=False when creating DataFrames (scikit-learn#26272) * MAINT Parameters validation for sklearn.model_selection.permutation_test_score (scikit-learn#26230) * MAINT Parameters validation for sklearn.datasets.clear_data_home (scikit-learn#26259) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.load_files (scikit-learn#26203) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.datasets.get_data_home (scikit-learn#26260) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC Fix y-axis plot labels in permutation test score example (scikit-learn#26240) * MAINT cython-lint ignores asv_benchmarks (scikit-learn#26282) * MAINT Parameter validation for metrics.cluster._supervised (scikit-learn#26258) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC Improve docstring for tol in SequentialFeatureSelector (scikit-learn#26271) * MAINT Parameters validation for sklearn.datasets.load_sample_image (scikit-learn#26226) Co-authored-by: Jérémie du Boisberranger <[email protected]> * DOC Consistent param type for pos_label (scikit-learn#26237) * DOC Minor grammar fix to imputation docs (scikit-learn#26283) * MAINT Parameters validation for sklearn.calibration.calibration_curve (scikit-learn#26198) Co-authored-by: jeremie du boisberranger <[email protected]> * MAINT Parameters validation for sklearn.inspection.partial_dependence (scikit-learn#26209) Co-authored-by: jeremie du boisberranger <[email protected]> * MAINT Parameters validation for sklearn.model_selection.validation_curve (scikit-learn#26229) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MAINT Parameters validation for sklearn.model_selection.learning_curve (scikit-learn#26227) Co-authored-by: jeremie du boisberranger <[email protected]> * MNT Remove deprecated pandas.api.types.is_sparse (scikit-learn#26287) * CI Use Trusted Publishers for uploading wheels to PyPI (scikit-learn#26249) * MAINT Parameters validation for sklearn.metrics.pairwise.manhattan_distances (scikit-learn#26122) * PERF revert openmp use in csr_row_norms (scikit-learn#26275) * MAINT Parameters validation for metrics.check_scoring (scikit-learn#26041) Co-authored-by: Jérémie du Boisberranger <[email protected]> * MNT Improve error message when checking classification target is of a non-regression type (scikit-learn#26281) Co-authored-by: Adrin Jalali <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * DOC fix link to User Guide encoder_infrequent_categories (scikit-learn#26309) * MNT remove unused args in _predict_regression_tree_inplace_fast_dense (scikit-learn#26314) * ENH Adds missing value support for trees (scikit-learn#23595) Co-authored-by: Tim Head <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> * CLN Clean up logic in validate_data and cast_to_ndarray (scikit-learn#26300) * MAINT refactor scorer using _get_response_values (scikit-learn#26037) Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Adrin Jalali <[email protected]> * DOC Add HGBDT to "see also" section of random forests (scikit-learn#26319) Co-authored-by: ArturoAmorQ <[email protected]> Co-authored-by: Tim Head <[email protected]> * MNT Bump Github Action labeler version to use newer Node (scikit-learn#26302) * FIX thresholds should not exceed 1.0 with probabilities in `roc_curve` (scikit-learn#26194) Co-authored-by: Olivier Grisel <[email protected]> * ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices (scikit-learn#23731) Co-authored-by: Aleksandr Kokhaniukov <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * DOC Fix minor typo (scikit-learn#26327) * MAINT bump minimum version for pytest (scikit-learn#26184) Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Adrin Jalali <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * DOC fix return type in isotonic_regression (scikit-learn#26332) * FIX fix available_if for MultiOutputRegressor.partial_fit (scikit-learn#26333) Co-authored-by: Guillaume Lemaitre <[email protected]> * FIX make pipeline pass check_estimator (scikit-learn#26325) * FEA Add multiclass support to `average_precision_score` (scikit-learn#24769) Co-authored-by: Geoffrey <[email protected]> Co-authored-by: gbolmier <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> --------- Signed-off-by: Julien Jerphanion <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Meekail Zain <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: zeeshan lone <[email protected]> Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Adrin Jalali <[email protected]> Co-authored-by: Shiva chauhan <[email protected]> Co-authored-by: AymericBasset <[email protected]> Co-authored-by: Maren Westermann <[email protected]> Co-authored-by: Nishu Choudhary <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Benedek Harsanyi <[email protected]> Co-authored-by: Pooja Subramaniam <[email protected]> Co-authored-by: Rushil Desai <[email protected]> Co-authored-by: Xiao Yuan <[email protected]> Co-authored-by: Omar Salman <[email protected]> Co-authored-by: 2357juan <[email protected]> Co-authored-by: Théophile Baranger <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Andreas Mueller <[email protected]> Co-authored-by: Jovan Stojanovic <[email protected]> Co-authored-by: Rahil Parikh <[email protected]> Co-authored-by: Bharat Raghunathan <[email protected]> Co-authored-by: Sortofamudkip <[email protected]> Co-authored-by: Gleb Levitski <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]> Co-authored-by: Ashwin Mathur <[email protected]> Co-authored-by: Sahil Gupta <[email protected]> Co-authored-by: Veghit <[email protected]> Co-authored-by: Itay <[email protected]> Co-authored-by: precondition <[email protected]> Co-authored-by: Marc Torrellas Socastro <[email protected]> Co-authored-by: Dominic Fox <[email protected]> Co-authored-by: futurewarning <[email protected]> Co-authored-by: Yao Xiao <[email protected]> Co-authored-by: Joey Ortiz <[email protected]> Co-authored-by: Tim Head <[email protected]> Co-authored-by: Christian Veenhuis <[email protected]> Co-authored-by: adienes <[email protected]> Co-authored-by: Dave Berenbaum <[email protected]> Co-authored-by: Lene Preuss <[email protected]> Co-authored-by: A.H.Mansouri <[email protected]> Co-authored-by: Boris Feld <[email protected]> Co-authored-by: Carla J <[email protected]> Co-authored-by: windiana42 <[email protected]> Co-authored-by: mdarii <[email protected]> Co-authored-by: murezzda <[email protected]> Co-authored-by: Peter Piontek <[email protected]> Co-authored-by: John Pangas <[email protected]> Co-authored-by: Dmitry Nesterov <[email protected]> Co-authored-by: Yuchen Zhou <[email protected]> Co-authored-by: Ekaterina Butyugina <[email protected]> Co-authored-by: Jiawei Zhang <[email protected]> Co-authored-by: Ansam Zedan <[email protected]> Co-authored-by: genvalen <[email protected]> Co-authored-by: farhan khan <[email protected]> Co-authored-by: Arturo Amor <[email protected]> Co-authored-by: Jiawei Zhang <[email protected]> Co-authored-by: Ralf Gommers <[email protected]> Co-authored-by: Jessicakk0711 <[email protected]> Co-authored-by: Ankur Singh <[email protected]> Co-authored-by: Seoeun(Sun☀️) Hong <[email protected]> Co-authored-by: Nightwalkx <[email protected]> Co-authored-by: VIGNESH D <[email protected]> Co-authored-by: Vincent-violet <[email protected]> Co-authored-by: Elabonga Atuo <[email protected]> Co-authored-by: Tom Dupré la Tour <[email protected]> Co-authored-by: André Pedersen <[email protected]> Co-authored-by: Ashish Dutt <[email protected]> Co-authored-by: Phil <[email protected]> Co-authored-by: Stanislav (Stanley) Modrak <[email protected]> Co-authored-by: hujiahong726 <[email protected]> Co-authored-by: James Dean <[email protected]> Co-authored-by: ArturoAmorQ <[email protected]> Co-authored-by: Aleksandr Kokhaniukov <[email protected]> Co-authored-by: c-git <[email protected]> Co-authored-by: annegnx <[email protected]> Co-authored-by: Geoffrey <[email protected]> Co-authored-by: gbolmier <[email protected]>

DOC Use Scientific Python Plausible instance for analytics

a6cdfea

github-actions bot added the Documentation label Feb 6, 2023

tupui approved these changes Feb 6, 2023

View reviewed changes

jjerphan approved these changes Feb 8, 2023

View reviewed changes

lesteve added 2 commits March 18, 2023 06:13

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

0ed0bbe

…nto plausible-analytics

Reinstate GA to have a period where we have both

7c326bf

ogrisel approved these changes Mar 27, 2023

View reviewed changes

betatim merged commit 9202cea into scikit-learn:main Apr 3, 2023

lesteve deleted the plausible-analytics branch April 3, 2023 09:10

lesteve added a commit to lesteve/scikit-learn that referenced this pull request Apr 12, 2023

DOC Use Scientific Python Plausible instance for analytics (scikit-le…

23d477f

…arn#25547)

lesteve mentioned this pull request Apr 12, 2023

DOC Use Scientific Python Plausible instance for stable doc analytics #26160

Merged

Veghit pushed a commit to Veghit/scikit-learn that referenced this pull request Apr 15, 2023

DOC Use Scientific Python Plausible instance for analytics (scikit-le…

03b4f9b

…arn#25547)

Uh oh!

DOC Use Scientific Python Plausible instance for analytics #25547

DOC Use Scientific Python Plausible instance for analytics #25547

Uh oh!

Conversation

lesteve commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Feb 6, 2023

Uh oh!

tupui left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tupui commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Feb 6, 2023

Uh oh!

stefanv commented Feb 6, 2023

Uh oh!

thomasjpfan commented Feb 7, 2023

Uh oh!

laurburke commented Feb 7, 2023

Uh oh!

stefanv commented Feb 7, 2023

Uh oh!

lesteve commented Feb 8, 2023

Uh oh!

stefanv commented Feb 8, 2023

Uh oh!

tupui commented Feb 8, 2023

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

stefanv commented Feb 8, 2023

Uh oh!

norbusan commented Feb 10, 2023

Uh oh!

lesteve commented Mar 20, 2023

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 1, 2023

Uh oh!

ogrisel commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Apr 12, 2023

Uh oh!

betatim commented Apr 12, 2023

Uh oh!

lesteve commented Apr 12, 2023

Uh oh!

Uh oh!

lesteve commented Feb 6, 2023 •

edited

Loading

ogrisel commented Feb 6, 2023 •

edited

Loading

tupui commented Feb 6, 2023 •

edited

Loading

ogrisel commented Apr 4, 2023 •

edited

Loading