Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add assert_docstring_consistency checks #30854

Closed
@glemaitre

Description

@glemaitre

The assert_docstring_consistency function allows you to check the consistency between docstring parameters/attributes/returns of objects.

In scikit-learn there are often classes that share a parent (e.g., AdaBoostClassifier, AdaBoostRegressor) or related functions (e.g, f1_score, fbeta_score). In these cases, some parameters are often shared/common and we would like to check that the docstring type and description matches.

The assert_docstring_consistency function allows you to include/exclude specific parameters/attibutes/returns. In some cases only part of the description should match between objects. In this case you can use descr_regex_pattern to pass a regular expression to be matched to all descriptions. Please read the docstring of this function carefully.

Guide on how to contribute to this issue:

  1. Pick an item below and comment the item you are working on so others know it has been taken.
    • NOT all items listed require a test to be added. If you find that the item you selected does not require a test, this is still a valuable contribution, please comment the reason why and we can tick it off the list.
  2. Determine common parameters/attributes/returns between the objects.
    • If the description does not match but should, decide on the best wording and amend all objects to match. If only part of the description should match, consider using descr_regex_pattern.
  3. Write a new test.

See #29831 for an example. This PR adds a test for the stacking estimators StackingClassifier and StackingRegressor.

Classes that share a common parent:

  • BaseWeightBoosting: ['AdaBoostClassifier', 'AdaBoostRegressor']
  • BaseBagging: ['BaggingClassifier', 'BaggingRegressor', 'IsolationForest']
  • BaseMixture: ['BayesianGaussianMixture', 'GaussianMixture']
  • _BaseDiscreteNB: ['BernoulliNB', 'CategoricalNB', 'ComplementNB', 'MultinomialNB']
  • _BaseKMeans: ['BisectingKMeans', 'KMeans', 'MiniBatchKMeans']
  • _PLS: ['CCA', 'PLSCanonical', 'PLSRegression']
  • _BaseChain: ['ClassifierChain', 'RegressorChain']
  • _VectorizerMixin: ['CountVectorizer', 'HashingVectorizer', 'TfidfVectorizer']
  • BaseDecisionTree: ['DecisionTreeClassifier', 'DecisionTreeRegressor', 'ExtraTreeClassifier', 'ExtraTreeRegressor']
  • _BaseSparseCoding: ['DictionaryLearning', 'MiniBatchDictionaryLearning', 'SparseCoder']
  • LinearModelCV: ['ElasticNetCV', 'LassoCV', 'MultiTaskElasticNetCV', 'MultiTaskLassoCV']
  • OutlierMixin: ['EllipticEnvelope', 'IsolationForest', 'LocalOutlierFactor', 'OneClassSVM', 'SGDOneClassSVM']
  • EmpiricalCovariance: ['EllipticEnvelope', 'GraphicalLasso', 'GraphicalLassoCV', 'LedoitWolf', 'MinCovDet', 'OAS', 'ShrunkCovariance']
  • ForestClassifier: ['ExtraTreesClassifier', 'RandomForestClassifier']
  • `BaseForest: ['ExtraTreesClassifier', 'ExtraTreesRegressor', 'RandomForestClassifier', 'RandomForestRegressor', 'RandomTreesEmbedding']
  • ForestRegressor: ['ExtraTreesRegressor', 'RandomForestRegressor']
  • BaseThresholdClassifier: ['FixedThresholdClassifier', 'TunedThresholdClassifierCV']
  • _GeneralizedLinearRegressor: ['GammaRegressor', 'PoissonRegressor', 'TweedieRegressor']
  • BaseRandomProjection: ['GaussianRandomProjection', 'SparseRandomProjection']
  • _BaseFilter: ['GenericUnivariateSelect', 'SelectFdr', 'SelectFpr', 'SelectFwe', 'SelectKBest', 'SelectPercentile']
  • BaseGradientBoosting: ['GradientBoostingClassifier', 'GradientBoostingRegressor']
  • BaseGraphicalLasso: ['GraphicalLasso', 'GraphicalLassoCV']
  • BaseSearchCV: ['GridSearchCV', 'RandomizedSearchCV']
  • BaseHistGradientBoosting: ['HistGradientBoostingClassifier', 'HistGradientBoostingRegressor']
  • _BasePCA: ['IncrementalPCA', 'PCA']
  • _BaseImputer: ['KNNImputer', 'SimpleImputer']
  • KNeighborsMixin: ['KNeighborsClassifier', 'KNeighborsRegressor', 'KNeighborsTransformer', 'LocalOutlierFactor', 'NearestNeighbors']
  • NeighborsBase: ['KNeighborsClassifier', 'KNeighborsRegressor', 'KNeighborsTransformer', 'LocalOutlierFactor', 'NearestNeighbors', 'RadiusNeighborsClassifier', 'RadiusNeighborsRegressor', 'RadiusNeighborsTransformer']
  • BaseLabelPropagation: ['LabelPropagation', 'LabelSpreading']
  • Lars: ['LarsCV', 'LassoLars', 'LassoLarsCV', 'LassoLarsIC']
  • ElasticNet: ['Lasso', 'MultiTaskElasticNet', 'MultiTaskLasso']
  • BaseMultilayerPerceptron: ['MLPClassifier', 'MLPRegressor']
  • _BaseNMF: ['MiniBatchNMF', 'NMF']
  • _BaseSparsePCA: ['MiniBatchSparsePCA', 'SparsePCA']
  • _MultiOutputEstimator: ['MultiOutputClassifier', 'MultiOutputRegressor']
  • Lasso: ['MultiTaskElasticNet', 'MultiTaskLasso']
  • RadiusNeighborsMixin: ['NearestNeighbors', 'RadiusNeighborsClassifier', 'RadiusNeighborsRegressor', 'RadiusNeighborsTransformer']
  • BaseSVC: ['NuSVC', 'SVC']
  • BaseLibSVM: ['NuSVC', 'NuSVR', 'OneClassSVM', 'SVC', 'SVR']
  • _BaseEncoder: ['OneHotEncoder', 'OrdinalEncoder', 'TargetEncoder']
  • BaseSGDClassifier: ['PassiveAggressiveClassifier', 'Perceptron', 'SGDClassifier']
  • BaseSGD: ['PassiveAggressiveClassifier', 'PassiveAggressiveRegressor', 'Perceptron', 'SGDClassifier', 'SGDOneClassSVM', 'SGDRegressor']
  • BaseSGDRegressor: ['PassiveAggressiveRegressor', 'SGDRegressor']
  • _BaseRidge: ['Ridge', 'RidgeClassifier']
  • _BaseRidgeCV: ['RidgeCV', 'RidgeClassifierCV']
  • _RidgeClassifierMixin: ['RidgeClassifier', 'RidgeClassifierCV']
  • BaseSpectral: ['SpectralBiclustering', 'SpectralCoclustering']
  • BiclusterMixin: ['SpectralBiclustering', 'SpectralCoclustering']
  • _BaseStacking: ['StackingClassifier', 'StackingRegressor']
  • _BaseHeterogeneousEnsemble: ['StackingClassifier', 'StackingRegressor', 'VotingClassifier', 'VotingRegressor']
  • _BaseVoting: ['VotingClassifier', 'VotingRegressor']

Functions, from the same module, that share parameters.

Details

I did a lot of manual culling as many functions shared only 1 or 2 parameters and were not actually relevant.

The functions are grouped by the parameters shared, so the list of parameters shared is not exhaustive for any subset of functions within the group. The grouping of functions below is not necessarily most ideal for the consistency check.

Module: sklearn.utils

  • Functions: compute_class_weight, compute_sample_weight / Shared parameters: class_weight
  • Functions: resample, shuffle / Shared parameters: random_state

Module: sklearn.utils.class_weight

  • Functions: compute_class_weight, compute_sample_weight / Shared parameters: class_weight, y

Module: sklearn.utils.extmath

  • Functions: randomized_range_finder, randomized_svd / Shared parameters: n_iter, power_iteration_normalizer, random_state

Module: sklearn.utils.validation

  • Functions: as_float_array, check_X_y, check_array / Shared parameters: copy, force_all_finite, ensure_all_finite
  • Functions: check_X_y, check_array / Shared parameters: accept_sparse, accept_large_sparse, order, force_writeable, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features

Module: sklearn.metrics

  • Functions: adjusted_mutual_info_score, adjusted_rand_score, completeness_score, fowlkes_mallows_score, homogeneity_completeness_v_measure, homogeneity_score, mutual_info_score, normalized_mutual_info_score, pair_confusion_matrix, rand_score, v_measure_score / Shared parameters: labels_true, labels_pred

Module: sklearn.metrics.pairwise

  • Functions: pairwise_distances_argmin, pairwise_distances_argmin_min / Shared parameters: axis, metric_kwargs
  • Functions: pairwise_distances, pairwise_distances_chunked, pairwise_kernels / Shared parameters: n_jobs
  • Functions: check_pairwise_arrays, pairwise_distances / Shared parameters: force_all_finite, ensure_all_finite
  • Functions: chi2_kernel, laplacian_kernel, polynomial_kernel, rbf_kernel, sigmoid_kernel / Shared parameters: gamma

Module: sklearn.cluster

  • Functions: affinity_propagation, estimate_bandwidth, k_means, kmeans_plusplus, spectral_clustering / Shared parameters: random_state
  • Functions: cluster_optics_dbscan, cluster_optics_xi / Shared parameters: reachability, ordering
  • Functions: compute_optics_graph, dbscan / Shared parameters: metric, p, metric_params, leaf_size
  • Functions: linkage_tree, ward_tree / Shared parameters: connectivity, return_distance

Module: sklearn.datasets

  • Functions: dump_svmlight_file, load_svmlight_file, load_svmlight_files / Shared parameters: zero_based, query_id, multilabel
  • Functions: fetch_20newsgroups, fetch_20newsgroups_vectorized, fetch_california_housing, fetch_covtype, fetch_file, fetch_kddcup99, fetch_lfw_pairs, fetch_lfw_people, fetch_olivetti_faces, fetch_openml, fetch_rcv1, fetch_species_distributions / Shared parameters: n_retries, delay
  • Functions: fetch_20newsgroups_vectorized, fetch_california_housing, fetch_covtype, fetch_kddcup99, fetch_openml, load_breast_cancer, load_diabetes, load_digits, load_iris, load_linnerud, load_wine / Shared parameters: as_frame
  • Functions: make_biclusters, make_checkerboard / Shared parameters: shape, n_clusters, minval, maxval
  • Functions: make_low_rank_matrix, make_regression / Shared parameters: effective_rank, tail_strength

Module: sklearn.decomposition

  • Functions: dict_learning, dict_learning_online, fastica, non_negative_factorization / Shared parameters: X, max_iter, n_components, random_state
  • Functions: dict_learning, dict_learning_online, sparse_encode / Shared parameters: alpha, n_jobs
  • Functions: dict_learning, dict_learning_online / Shared parameters: method, dict_init, callback, positive_dict, positive_code, method_max_iter

Module: sklearn.feature_extraction

  • Functions: grid_to_graph, img_to_graph / Shared parameters: mask, return_as, dtype, mask, return_as

Module: sklearn.linear_model

  • Functions: lars_path, lars_path_gram, orthogonal_mp_gram / Shared parameters: Gram, copy_Gram
  • Functions: lars_path, lars_path_gram / Shared parameters: alpha_min, method
  • Functions: lars_path, lars_path_gram, ridge_regression / Shared parameters: max_iter
  • Functions: lars_path, lars_path_gram, orthogonal_mp, orthogonal_mp_gram / Shared parameters: return_path
  • Functions: orthogonal_mp, orthogonal_mp_gram / Shared parameters: n_nonzero_coefs, tol

Module: sklearn.neighbors

  • Functions: kneighbors_graph, radius_neighbors_graph / Shared parameters: X, mode, metric, p, metric_params, include_self, n_jobs

Module: sklearn.tree

  • Functions: export_graphviz, export_text, plot_tree / Shared parameters: decision_tree, max_depth, feature_names, class_names
  • Functions: export_graphviz, plot_tree / Shared parameters: label, filled, impurity, node_ids, proportion, rounded, precision

Module: sklearn.feature_selection

  • Functions: f_regression, r_regression / Shared parameters: center, force_finite
  • Functions: mutual_info_classif, mutual_info_regression / Shared parameters: discrete_features, n_neighbors, copy, random_state, n_jobs

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocumentationMeta-issueGeneral issue associated to an identified list of tasksSprintgood first issueEasy with clear instructions to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions