Description
The assert_docstring_consistency
function allows you to check the consistency between docstring parameters/attributes/returns of objects.
In scikit-learn there are often classes that share a parent (e.g., AdaBoostClassifier
, AdaBoostRegressor
) or related functions (e.g, f1_score
, fbeta_score
). In these cases, some parameters are often shared/common and we would like to check that the docstring type and description matches.
The assert_docstring_consistency
function allows you to include/exclude specific parameters/attibutes/returns. In some cases only part of the description should match between objects. In this case you can use descr_regex_pattern
to pass a regular expression to be matched to all descriptions. Please read the docstring of this function carefully.
Guide on how to contribute to this issue:
- Pick an item below and comment the item you are working on so others know it has been taken.
- NOT all items listed require a test to be added. If you find that the item you selected does not require a test, this is still a valuable contribution, please comment the reason why and we can tick it off the list.
- Determine common parameters/attributes/returns between the objects.
- If the description does not match but should, decide on the best wording and amend all objects to match. If only part of the description should match, consider using
descr_regex_pattern
.
- If the description does not match but should, decide on the best wording and amend all objects to match. If only part of the description should match, consider using
- Write a new test.
- The test should live in
sklearn/tests/test_docstring_parameters_consistency.py
(cf. TST move test for parameters consistency checks #30853) - Add
@skip_if_no_numpydoc
to the top of the test (these tests can only be run if numpydoc is installed)
- The test should live in
See #29831 for an example. This PR adds a test for the stacking estimators StackingClassifier
and StackingRegressor
.
Classes that share a common parent:
-
BaseWeightBoosting
: ['AdaBoostClassifier', 'AdaBoostRegressor'] -
BaseBagging
: ['BaggingClassifier', 'BaggingRegressor', 'IsolationForest'] -
BaseMixture
: ['BayesianGaussianMixture', 'GaussianMixture'] -
_BaseDiscreteNB
: ['BernoulliNB', 'CategoricalNB', 'ComplementNB', 'MultinomialNB'] -
_BaseKMeans
: ['BisectingKMeans', 'KMeans', 'MiniBatchKMeans'] -
_PLS
: ['CCA', 'PLSCanonical', 'PLSRegression'] -
_BaseChain
: ['ClassifierChain', 'RegressorChain'] -
_VectorizerMixin
: ['CountVectorizer', 'HashingVectorizer', 'TfidfVectorizer'] -
BaseDecisionTree
: ['DecisionTreeClassifier', 'DecisionTreeRegressor', 'ExtraTreeClassifier', 'ExtraTreeRegressor'] -
_BaseSparseCoding
: ['DictionaryLearning', 'MiniBatchDictionaryLearning', 'SparseCoder'] -
LinearModelCV
: ['ElasticNetCV', 'LassoCV', 'MultiTaskElasticNetCV', 'MultiTaskLassoCV'] -
OutlierMixin
: ['EllipticEnvelope', 'IsolationForest', 'LocalOutlierFactor', 'OneClassSVM', 'SGDOneClassSVM'] -
EmpiricalCovariance
: ['EllipticEnvelope', 'GraphicalLasso', 'GraphicalLassoCV', 'LedoitWolf', 'MinCovDet', 'OAS', 'ShrunkCovariance'] -
ForestClassifier
: ['ExtraTreesClassifier', 'RandomForestClassifier'] - `BaseForest: ['ExtraTreesClassifier', 'ExtraTreesRegressor', 'RandomForestClassifier', 'RandomForestRegressor', 'RandomTreesEmbedding']
-
ForestRegressor
: ['ExtraTreesRegressor', 'RandomForestRegressor'] -
BaseThresholdClassifier
: ['FixedThresholdClassifier', 'TunedThresholdClassifierCV'] -
_GeneralizedLinearRegressor
: ['GammaRegressor', 'PoissonRegressor', 'TweedieRegressor'] -
BaseRandomProjection
: ['GaussianRandomProjection', 'SparseRandomProjection'] -
_BaseFilter
: ['GenericUnivariateSelect', 'SelectFdr', 'SelectFpr', 'SelectFwe', 'SelectKBest', 'SelectPercentile'] -
BaseGradientBoosting
: ['GradientBoostingClassifier', 'GradientBoostingRegressor'] -
BaseGraphicalLasso
: ['GraphicalLasso', 'GraphicalLassoCV'] -
BaseSearchCV
: ['GridSearchCV', 'RandomizedSearchCV'] -
BaseHistGradientBoosting
: ['HistGradientBoostingClassifier', 'HistGradientBoostingRegressor'] -
_BasePCA
: ['IncrementalPCA', 'PCA'] -
_BaseImputer
: ['KNNImputer', 'SimpleImputer'] -
KNeighborsMixin
: ['KNeighborsClassifier', 'KNeighborsRegressor', 'KNeighborsTransformer', 'LocalOutlierFactor', 'NearestNeighbors'] -
NeighborsBase
: ['KNeighborsClassifier', 'KNeighborsRegressor', 'KNeighborsTransformer', 'LocalOutlierFactor', 'NearestNeighbors', 'RadiusNeighborsClassifier', 'RadiusNeighborsRegressor', 'RadiusNeighborsTransformer'] -
BaseLabelPropagation
: ['LabelPropagation', 'LabelSpreading'] -
Lars
: ['LarsCV', 'LassoLars', 'LassoLarsCV', 'LassoLarsIC'] -
ElasticNet
: ['Lasso', 'MultiTaskElasticNet', 'MultiTaskLasso'] -
BaseMultilayerPerceptron
: ['MLPClassifier', 'MLPRegressor'] -
_BaseNMF
: ['MiniBatchNMF', 'NMF'] -
_BaseSparsePCA
: ['MiniBatchSparsePCA', 'SparsePCA'] -
_MultiOutputEstimator
: ['MultiOutputClassifier', 'MultiOutputRegressor'] -
Lasso
: ['MultiTaskElasticNet', 'MultiTaskLasso'] -
RadiusNeighborsMixin
: ['NearestNeighbors', 'RadiusNeighborsClassifier', 'RadiusNeighborsRegressor', 'RadiusNeighborsTransformer'] -
BaseSVC
: ['NuSVC', 'SVC'] -
BaseLibSVM
: ['NuSVC', 'NuSVR', 'OneClassSVM', 'SVC', 'SVR'] -
_BaseEncoder
: ['OneHotEncoder', 'OrdinalEncoder', 'TargetEncoder'] -
BaseSGDClassifier
: ['PassiveAggressiveClassifier', 'Perceptron', 'SGDClassifier'] -
BaseSGD
: ['PassiveAggressiveClassifier', 'PassiveAggressiveRegressor', 'Perceptron', 'SGDClassifier', 'SGDOneClassSVM', 'SGDRegressor'] -
BaseSGDRegressor
: ['PassiveAggressiveRegressor', 'SGDRegressor'] -
_BaseRidge
: ['Ridge', 'RidgeClassifier'] -
_BaseRidgeCV
: ['RidgeCV', 'RidgeClassifierCV'] -
_RidgeClassifierMixin
: ['RidgeClassifier', 'RidgeClassifierCV'] -
BaseSpectral
: ['SpectralBiclustering', 'SpectralCoclustering'] -
BiclusterMixin
: ['SpectralBiclustering', 'SpectralCoclustering'] -
_BaseStacking
: ['StackingClassifier', 'StackingRegressor'] -
_BaseHeterogeneousEnsemble
: ['StackingClassifier', 'StackingRegressor', 'VotingClassifier', 'VotingRegressor'] -
_BaseVoting
: ['VotingClassifier', 'VotingRegressor']
Functions, from the same module, that share parameters.
Details
I did a lot of manual culling as many functions shared only 1 or 2 parameters and were not actually relevant.
The functions are grouped by the parameters shared, so the list of parameters shared is not exhaustive for any subset of functions within the group. The grouping of functions below is not necessarily most ideal for the consistency check.
Module: sklearn.utils
- Functions: compute_class_weight, compute_sample_weight / Shared parameters: class_weight
- Functions: resample, shuffle / Shared parameters: random_state
Module: sklearn.utils.class_weight
- Functions: compute_class_weight, compute_sample_weight / Shared parameters: class_weight, y
Module: sklearn.utils.extmath
- Functions: randomized_range_finder, randomized_svd / Shared parameters: n_iter, power_iteration_normalizer, random_state
Module: sklearn.utils.validation
- Functions: as_float_array, check_X_y, check_array / Shared parameters: copy, force_all_finite, ensure_all_finite
- Functions: check_X_y, check_array / Shared parameters: accept_sparse, accept_large_sparse, order, force_writeable, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features
Module: sklearn.metrics
- Functions: adjusted_mutual_info_score, adjusted_rand_score, completeness_score, fowlkes_mallows_score, homogeneity_completeness_v_measure, homogeneity_score, mutual_info_score, normalized_mutual_info_score, pair_confusion_matrix, rand_score, v_measure_score / Shared parameters: labels_true, labels_pred
Module: sklearn.metrics.pairwise
- Functions: pairwise_distances_argmin, pairwise_distances_argmin_min / Shared parameters: axis, metric_kwargs
- Functions: pairwise_distances, pairwise_distances_chunked, pairwise_kernels / Shared parameters: n_jobs
- Functions: check_pairwise_arrays, pairwise_distances / Shared parameters: force_all_finite, ensure_all_finite
- Functions: chi2_kernel, laplacian_kernel, polynomial_kernel, rbf_kernel, sigmoid_kernel / Shared parameters: gamma
Module: sklearn.cluster
- Functions: affinity_propagation, estimate_bandwidth, k_means, kmeans_plusplus, spectral_clustering / Shared parameters: random_state
- Functions: cluster_optics_dbscan, cluster_optics_xi / Shared parameters: reachability, ordering
- Functions: compute_optics_graph, dbscan / Shared parameters: metric, p, metric_params, leaf_size
- Functions: linkage_tree, ward_tree / Shared parameters: connectivity, return_distance
Module: sklearn.datasets
- Functions: dump_svmlight_file, load_svmlight_file, load_svmlight_files / Shared parameters: zero_based, query_id, multilabel
- Functions: fetch_20newsgroups, fetch_20newsgroups_vectorized, fetch_california_housing, fetch_covtype, fetch_file, fetch_kddcup99, fetch_lfw_pairs, fetch_lfw_people, fetch_olivetti_faces, fetch_openml, fetch_rcv1, fetch_species_distributions / Shared parameters: n_retries, delay
- Functions: fetch_20newsgroups_vectorized, fetch_california_housing, fetch_covtype, fetch_kddcup99, fetch_openml, load_breast_cancer, load_diabetes, load_digits, load_iris, load_linnerud, load_wine / Shared parameters: as_frame
- Functions: make_biclusters, make_checkerboard / Shared parameters: shape, n_clusters, minval, maxval
- Functions: make_low_rank_matrix, make_regression / Shared parameters: effective_rank, tail_strength
Module: sklearn.decomposition
- Functions: dict_learning, dict_learning_online, fastica, non_negative_factorization / Shared parameters: X, max_iter, n_components, random_state
- Functions: dict_learning, dict_learning_online, sparse_encode / Shared parameters: alpha, n_jobs
- Functions: dict_learning, dict_learning_online / Shared parameters: method, dict_init, callback, positive_dict, positive_code, method_max_iter
Module: sklearn.feature_extraction
- Functions: grid_to_graph, img_to_graph / Shared parameters: mask, return_as, dtype, mask, return_as
Module: sklearn.linear_model
- Functions: lars_path, lars_path_gram, orthogonal_mp_gram / Shared parameters: Gram, copy_Gram
- Functions: lars_path, lars_path_gram / Shared parameters: alpha_min, method
- Functions: lars_path, lars_path_gram, ridge_regression / Shared parameters: max_iter
- Functions: lars_path, lars_path_gram, orthogonal_mp, orthogonal_mp_gram / Shared parameters: return_path
- Functions: orthogonal_mp, orthogonal_mp_gram / Shared parameters: n_nonzero_coefs, tol
Module: sklearn.neighbors
- Functions: kneighbors_graph, radius_neighbors_graph / Shared parameters: X, mode, metric, p, metric_params, include_self, n_jobs
Module: sklearn.tree
- Functions: export_graphviz, export_text, plot_tree / Shared parameters: decision_tree, max_depth, feature_names, class_names
- Functions: export_graphviz, plot_tree / Shared parameters: label, filled, impurity, node_ids, proportion, rounded, precision
Module: sklearn.feature_selection
- Functions: f_regression, r_regression / Shared parameters: center, force_finite
- Functions: mutual_info_classif, mutual_info_regression / Shared parameters: discrete_features, n_neighbors, copy, random_state, n_jobs