Add `assert_docstring_consistency` checks

The [`assert_docstring_consistency`](https://github.com/scikit-learn/scikit-learn/blob/4ec5f69061a9c37e0f6b9920e296e06c6b4669ac/sklearn/utils/_testing.py#L734) function allows you to check the consistency between docstring parameters/attributes/returns of objects.

In scikit-learn there are often classes that share a parent (e.g., `AdaBoostClassifier`, `AdaBoostRegressor`) or related functions (e.g, `f1_score`, `fbeta_score`). In these cases, some parameters are often shared/common and we would like to check that the docstring type and description matches.

The [`assert_docstring_consistency`](https://github.com/scikit-learn/scikit-learn/blob/4ec5f69061a9c37e0f6b9920e296e06c6b4669ac/sklearn/utils/_testing.py#L734) function allows you to include/exclude specific parameters/attibutes/returns. In some cases only part of the description should match between objects. In this case you can use `descr_regex_pattern` to pass a regular expression to be matched to all descriptions. Please read the docstring of this function carefully.

Guide on how to contribute to this issue:

1. Pick an item below and comment the item you are working on so others know it has been taken.
    * NOT all items listed require a test to be added. If you find that the item you selected does not require a test, this is still a valuable contribution, please comment the reason why and we can tick it off the list.
2. Determine common parameters/attributes/returns between the objects.
    * If the description does not match but should, decide on the best wording and amend all objects to match. If only part of the description should match, consider using `descr_regex_pattern`.
3. Write a new test.
    * The test should live in `sklearn/tests/test_docstring_parameters_consistency.py` (cf. https://github.com/scikit-learn/scikit-learn/pull/30853)
    * Add `@skip_if_no_numpydoc` to the top of the test (these tests can only be run if numpydoc is installed)

See #29831 for an example. This PR adds a test for the stacking estimators `StackingClassifier` and `StackingRegressor`.

Classes that share a common parent:

- [ ] `BaseWeightBoosting`: ['AdaBoostClassifier', 'AdaBoostRegressor']
- [ ] `BaseBagging`: ['BaggingClassifier', 'BaggingRegressor', 'IsolationForest']
- [ ] `BaseMixture`: ['BayesianGaussianMixture', 'GaussianMixture']
- [ ] `_BaseDiscreteNB`: ['BernoulliNB', 'CategoricalNB', 'ComplementNB', 'MultinomialNB']
- [ ] `_BaseKMeans`: ['BisectingKMeans', 'KMeans', 'MiniBatchKMeans']
- [ ] `_PLS`: ['CCA', 'PLSCanonical', 'PLSRegression']
- [ ] `_BaseChain`: ['ClassifierChain', 'RegressorChain']
- [ ] `_VectorizerMixin`: ['CountVectorizer', 'HashingVectorizer', 'TfidfVectorizer']
- [ ] `BaseDecisionTree`: ['DecisionTreeClassifier', 'DecisionTreeRegressor', 'ExtraTreeClassifier', 'ExtraTreeRegressor']
- [ ] `_BaseSparseCoding`: ['DictionaryLearning', 'MiniBatchDictionaryLearning', 'SparseCoder']
- [ ] `LinearModelCV`: ['ElasticNetCV', 'LassoCV', 'MultiTaskElasticNetCV', 'MultiTaskLassoCV']
- [ ] `OutlierMixin`: ['EllipticEnvelope', 'IsolationForest', 'LocalOutlierFactor', 'OneClassSVM', 'SGDOneClassSVM']
- [ ] `EmpiricalCovariance`: ['EllipticEnvelope', 'GraphicalLasso', 'GraphicalLassoCV', 'LedoitWolf', 'MinCovDet', 'OAS', 'ShrunkCovariance']
- [ ] `ForestClassifier`: ['ExtraTreesClassifier', 'RandomForestClassifier']
- [ ] `BaseForest: ['ExtraTreesClassifier', 'ExtraTreesRegressor', 'RandomForestClassifier', 'RandomForestRegressor', 'RandomTreesEmbedding']
- [ ] `ForestRegressor`: ['ExtraTreesRegressor', 'RandomForestRegressor']
- [ ] `BaseThresholdClassifier`: ['FixedThresholdClassifier', 'TunedThresholdClassifierCV']
- [ ] `_GeneralizedLinearRegressor`: ['GammaRegressor', 'PoissonRegressor', 'TweedieRegressor']
- [ ] `BaseRandomProjection`: ['GaussianRandomProjection', 'SparseRandomProjection']
- [ ] `_BaseFilter`: ['GenericUnivariateSelect', 'SelectFdr', 'SelectFpr', 'SelectFwe', 'SelectKBest', 'SelectPercentile']
- [ ] `BaseGradientBoosting`: ['GradientBoostingClassifier', 'GradientBoostingRegressor']
- [ ] `BaseGraphicalLasso`: ['GraphicalLasso', 'GraphicalLassoCV']
- [ ] `BaseSearchCV`: ['GridSearchCV', 'RandomizedSearchCV']
- [ ] `BaseHistGradientBoosting`: ['HistGradientBoostingClassifier', 'HistGradientBoostingRegressor']
- [ ] `_BasePCA`: ['IncrementalPCA', 'PCA']
- [ ] `_BaseImputer`: ['KNNImputer', 'SimpleImputer']
- [ ] `KNeighborsMixin`: ['KNeighborsClassifier', 'KNeighborsRegressor', 'KNeighborsTransformer', 'LocalOutlierFactor', 'NearestNeighbors']
- [ ] `NeighborsBase`: ['KNeighborsClassifier', 'KNeighborsRegressor', 'KNeighborsTransformer', 'LocalOutlierFactor', 'NearestNeighbors', 'RadiusNeighborsClassifier', 'RadiusNeighborsRegressor', 'RadiusNeighborsTransformer']
- [ ] `BaseLabelPropagation`: ['LabelPropagation', 'LabelSpreading']
- [ ] `Lars`: ['LarsCV', 'LassoLars', 'LassoLarsCV', 'LassoLarsIC']
- [ ] `ElasticNet`: ['Lasso', 'MultiTaskElasticNet', 'MultiTaskLasso']
- [ ] `BaseMultilayerPerceptron`: ['MLPClassifier', 'MLPRegressor']
- [ ] `_BaseNMF`: ['MiniBatchNMF', 'NMF']
- [ ] `_BaseSparsePCA`: ['MiniBatchSparsePCA', 'SparsePCA']
- [ ] `_MultiOutputEstimator`: ['MultiOutputClassifier', 'MultiOutputRegressor']
- [ ] `Lasso`: ['MultiTaskElasticNet', 'MultiTaskLasso']
- [ ] `RadiusNeighborsMixin`: ['NearestNeighbors', 'RadiusNeighborsClassifier', 'RadiusNeighborsRegressor', 'RadiusNeighborsTransformer']
- [ ] `BaseSVC`: ['NuSVC', 'SVC']
- [ ] `BaseLibSVM`: ['NuSVC', 'NuSVR', 'OneClassSVM', 'SVC', 'SVR']
- [ ] `_BaseEncoder`: ['OneHotEncoder', 'OrdinalEncoder', 'TargetEncoder']
- [ ] `BaseSGDClassifier`: ['PassiveAggressiveClassifier', 'Perceptron', 'SGDClassifier']
- [ ] `BaseSGD`: ['PassiveAggressiveClassifier', 'PassiveAggressiveRegressor', 'Perceptron', 'SGDClassifier', 'SGDOneClassSVM', 'SGDRegressor']
- [ ] `BaseSGDRegressor`: ['PassiveAggressiveRegressor', 'SGDRegressor']
- [ ] `_BaseRidge`: ['Ridge', 'RidgeClassifier']
- [ ] `_BaseRidgeCV`: ['RidgeCV', 'RidgeClassifierCV']
- [ ] `_RidgeClassifierMixin`: ['RidgeClassifier', 'RidgeClassifierCV']
- [ ] `BaseSpectral`: ['SpectralBiclustering', 'SpectralCoclustering']
- [ ] `BiclusterMixin`: ['SpectralBiclustering', 'SpectralCoclustering']
- [ ] `_BaseStacking`: ['StackingClassifier', 'StackingRegressor']
- [ ] `_BaseHeterogeneousEnsemble`: ['StackingClassifier', 'StackingRegressor', 'VotingClassifier', 'VotingRegressor']
- [ ] `_BaseVoting`: ['VotingClassifier', 'VotingRegressor']

Functions, from the same module, that share parameters. 

<details open>
<summary>Details</summary>

I did a lot of manual culling as many functions shared only 1 or 2 parameters and were not actually relevant.

The functions are grouped by the parameters shared, so the list of parameters shared is not exhaustive for any subset of functions within the group. The grouping of functions below is not *necessarily* most ideal for the consistency check.

</details>


**Module: sklearn.utils**

- [ ] Functions: compute_class_weight, compute_sample_weight / Shared parameters: class_weight
- [ ] Functions: resample, shuffle / Shared parameters: random_state

**Module: sklearn.utils.class_weight**

- [ ] Functions: compute_class_weight, compute_sample_weight / Shared parameters: class_weight, y

**Module: sklearn.utils.extmath**

- [ ] Functions: randomized_range_finder, randomized_svd / Shared parameters: n_iter, power_iteration_normalizer, random_state

**Module: sklearn.utils.validation**

- [ ] Functions: as_float_array, check_X_y, check_array / Shared parameters: copy, force_all_finite, ensure_all_finite
- [ ] Functions: check_X_y, check_array / Shared parameters: accept_sparse, accept_large_sparse, order, force_writeable, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features

**Module: sklearn.metrics**

- [ ] Functions: adjusted_mutual_info_score, adjusted_rand_score, completeness_score, fowlkes_mallows_score, homogeneity_completeness_v_measure, homogeneity_score, mutual_info_score, normalized_mutual_info_score, pair_confusion_matrix, rand_score, v_measure_score / Shared parameters: labels_true, labels_pred

**Module: sklearn.metrics.pairwise**

- [ ] Functions: pairwise_distances_argmin, pairwise_distances_argmin_min / Shared parameters: axis, metric_kwargs
- [ ] Functions: pairwise_distances, pairwise_distances_chunked, pairwise_kernels / Shared parameters: n_jobs
- [ ] Functions: check_pairwise_arrays, pairwise_distances / Shared parameters: force_all_finite, ensure_all_finite
- [ ] Functions: chi2_kernel, laplacian_kernel, polynomial_kernel, rbf_kernel, sigmoid_kernel / Shared parameters: gamma

**Module: sklearn.cluster**

- [ ] Functions: affinity_propagation, estimate_bandwidth, k_means, kmeans_plusplus, spectral_clustering / Shared parameters: random_state
- [ ] Functions: cluster_optics_dbscan, cluster_optics_xi / Shared parameters: reachability, ordering
- [ ] Functions: compute_optics_graph, dbscan / Shared parameters: metric, p, metric_params, leaf_size
- [ ] Functions: linkage_tree, ward_tree / Shared parameters: connectivity, return_distance

**Module: sklearn.datasets**

- [ ] Functions: dump_svmlight_file, load_svmlight_file, load_svmlight_files / Shared parameters: zero_based, query_id, multilabel
- [ ] Functions: fetch_20newsgroups, fetch_20newsgroups_vectorized, fetch_california_housing, fetch_covtype, fetch_file, fetch_kddcup99, fetch_lfw_pairs, fetch_lfw_people, fetch_olivetti_faces, fetch_openml, fetch_rcv1, fetch_species_distributions / Shared parameters: n_retries, delay
- [ ] Functions: fetch_20newsgroups_vectorized, fetch_california_housing, fetch_covtype, fetch_kddcup99, fetch_openml, load_breast_cancer, load_diabetes, load_digits, load_iris, load_linnerud, load_wine / Shared parameters: as_frame
- [ ] Functions: make_biclusters, make_checkerboard / Shared parameters: shape, n_clusters, minval, maxval
- [ ] Functions: make_low_rank_matrix, make_regression / Shared parameters: effective_rank, tail_strength

**Module: sklearn.decomposition**

- [ ] Functions: dict_learning, dict_learning_online, fastica, non_negative_factorization / Shared parameters: X, max_iter, n_components, random_state
- [ ] Functions: dict_learning, dict_learning_online, sparse_encode / Shared parameters: alpha, n_jobs
- [ ] Functions: dict_learning, dict_learning_online / Shared parameters: method, dict_init, callback, positive_dict, positive_code, method_max_iter

**Module: sklearn.feature_extraction**
- [ ] Functions: grid_to_graph, img_to_graph / Shared parameters: mask, return_as, dtype, mask, return_as

**Module: sklearn.linear_model**
- [ ] Functions: lars_path, lars_path_gram, orthogonal_mp_gram / Shared parameters: Gram, copy_Gram
- [ ] Functions: lars_path, lars_path_gram / Shared parameters: alpha_min, method
- [ ] Functions: lars_path, lars_path_gram, ridge_regression / Shared parameters: max_iter
- [ ] Functions: lars_path, lars_path_gram, orthogonal_mp, orthogonal_mp_gram / Shared parameters: return_path
- [ ] Functions: orthogonal_mp, orthogonal_mp_gram / Shared parameters: n_nonzero_coefs, tol

**Module: sklearn.neighbors**
- [ ] Functions: kneighbors_graph, radius_neighbors_graph / Shared parameters: X, mode, metric, p, metric_params, include_self, n_jobs

**Module: sklearn.tree**

- [ ] Functions: export_graphviz, export_text, plot_tree / Shared parameters: decision_tree, max_depth, feature_names, class_names
- [ ] Functions: export_graphviz, plot_tree / Shared parameters: label, filled, impurity, node_ids, proportion, rounded, precision

**Module: sklearn.feature_selection**

- [ ] Functions: f_regression, r_regression / Shared parameters: center, force_finite
- [ ] Functions: mutual_info_classif, mutual_info_regression / Shared parameters: discrete_features, n_neighbors, copy, random_state, n_jobs




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `assert_docstring_consistency` checks #30854

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add assert_docstring_consistency checks #30854

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add `assert_docstring_consistency` checks #30854