Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SLEP006: make sure generated set_{method}_request methods are all legit #26505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #22893
adrinjalali opened this issue Jun 3, 2023 · 0 comments · Fixed by #29920
Closed
Tracked by #22893

SLEP006: make sure generated set_{method}_request methods are all legit #26505

adrinjalali opened this issue Jun 3, 2023 · 0 comments · Fixed by #29920

Comments

@adrinjalali
Copy link
Member

For all estimators inheriting from BaseEstimator, if a method explicitly accepts an argument other than X, y, Y, a corresponding set_{method}_request is generated for it.

#26503 proposes the cleanup for inverse_transform methods to make sure the corresponding set_inverse_transform_request is only generated when necessary.

We should do the same for all other generated methods and make sure they are legit.

A list of all generated methods can be seen using this code:

from sklearn.utils import all_estimators
from sklearn.base import MetaEstimatorMixin
import inspect

for name, Cls in all_estimators():
    is_meta = issubclass(Cls, MetaEstimatorMixin)
    if is_meta:
        print(f"{name} is a meta-estimator")
    else:
        print(name)
    set_methods = [
        name
        for name, _ in inspect.getmembers(Cls, inspect.isroutine)
        if name.startswith("set_") and name.endswith("request")
    ]
    # get input arguments of the methods in set_methods
    args = {
        name: inspect.getfullargspec(getattr(Cls, name)).kwonlyargs
        for name in set_methods
    }
    for key, value in args.items():
        print(f"  {key}: {value}")
    print()

Which generates the following output:

ARDRegression
  set_predict_request: ['return_std']
  set_score_request: ['sample_weight']

AdaBoostClassifier is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

AdaBoostRegressor is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

AdditiveChi2Sampler

AffinityPropagation

AgglomerativeClustering

BaggingClassifier is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

BaggingRegressor is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

BayesianGaussianMixture

BayesianRidge
  set_fit_request: ['sample_weight']
  set_predict_request: ['return_std']
  set_score_request: ['sample_weight']

BernoulliNB
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']
  set_score_request: ['sample_weight']

BernoulliRBM

Binarizer
  set_transform_request: ['copy']

Birch

BisectingKMeans
  set_fit_request: ['sample_weight']
  set_predict_request: ['sample_weight']
  set_score_request: ['sample_weight']

CCA
  set_predict_request: ['copy']
  set_score_request: ['sample_weight']
  set_transform_request: ['copy']

CalibratedClassifierCV is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

CategoricalNB
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']
  set_score_request: ['sample_weight']

ClassifierChain is a meta-estimator
  set_score_request: ['sample_weight']

ColumnTransformer

ComplementNB
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']
  set_score_request: ['sample_weight']

CountVectorizer
  set_fit_request: ['raw_documents']
  set_transform_request: ['raw_documents']

DBSCAN
  set_fit_request: ['sample_weight']

DecisionTreeClassifier
  set_fit_request: ['check_input', 'sample_weight']
  set_predict_proba_request: ['check_input']
  set_predict_request: ['check_input']
  set_score_request: ['sample_weight']

DecisionTreeRegressor
  set_fit_request: ['check_input', 'sample_weight']
  set_predict_request: ['check_input']
  set_score_request: ['sample_weight']

DictVectorizer
  set_inverse_transform_request: ['dict_type']

DictionaryLearning

DummyClassifier
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

DummyRegressor
  set_fit_request: ['sample_weight']
  set_predict_request: ['return_std']
  set_score_request: ['sample_weight']

ElasticNet
  set_fit_request: ['check_input', 'sample_weight']
  set_score_request: ['sample_weight']

ElasticNetCV
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

EllipticEnvelope
  set_score_request: ['sample_weight']

EmpiricalCovariance
  set_score_request: ['X_test']

ExtraTreeClassifier
  set_fit_request: ['check_input', 'sample_weight']
  set_predict_proba_request: ['check_input']
  set_predict_request: ['check_input']
  set_score_request: ['sample_weight']

ExtraTreeRegressor
  set_fit_request: ['check_input', 'sample_weight']
  set_predict_request: ['check_input']
  set_score_request: ['sample_weight']

ExtraTreesClassifier is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

ExtraTreesRegressor is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

FactorAnalysis

FastICA
  set_inverse_transform_request: ['copy']
  set_transform_request: ['copy']

FeatureAgglomeration
  set_inverse_transform_request: ['Xred']

FeatureHasher
  set_transform_request: ['raw_X']

FeatureUnion

FunctionTransformer

GammaRegressor
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

GaussianMixture

GaussianNB
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']
  set_score_request: ['sample_weight']

GaussianProcessClassifier
  set_score_request: ['sample_weight']

GaussianProcessRegressor
  set_predict_request: ['return_cov', 'return_std']
  set_score_request: ['sample_weight']

GaussianRandomProjection

GenericUnivariateSelect

GradientBoostingClassifier is a meta-estimator
  set_fit_request: ['monitor', 'sample_weight']
  set_score_request: ['sample_weight']

GradientBoostingRegressor is a meta-estimator
  set_fit_request: ['monitor', 'sample_weight']
  set_score_request: ['sample_weight']

GraphicalLasso
  set_score_request: ['X_test']

GraphicalLassoCV
  set_score_request: ['X_test']

GridSearchCV is a meta-estimator
  set_fit_request: ['groups']
  set_inverse_transform_request: ['Xt']

HDBSCAN

HashingVectorizer

HistGradientBoostingClassifier
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

HistGradientBoostingRegressor
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

HuberRegressor
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

IncrementalPCA
  set_partial_fit_request: ['check_input']

IsolationForest is a meta-estimator
  set_fit_request: ['sample_weight']

Isomap

IsotonicRegression
  set_fit_request: ['sample_weight']
  set_predict_request: ['T']
  set_score_request: ['sample_weight']
  set_transform_request: ['T']

KBinsDiscretizer
  set_fit_request: ['sample_weight']
  set_inverse_transform_request: ['Xt']

KMeans
  set_fit_request: ['sample_weight']
  set_predict_request: ['sample_weight']
  set_score_request: ['sample_weight']

KNNImputer

KNeighborsClassifier
  set_score_request: ['sample_weight']

KNeighborsRegressor
  set_score_request: ['sample_weight']

KNeighborsTransformer

KernelCenterer
  set_fit_request: ['K']
  set_transform_request: ['K', 'copy']

KernelDensity
  set_fit_request: ['sample_weight']

KernelPCA

KernelRidge
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

LabelBinarizer
  set_inverse_transform_request: ['threshold']

LabelEncoder

LabelPropagation
  set_score_request: ['sample_weight']

LabelSpreading
  set_score_request: ['sample_weight']

Lars
  set_fit_request: ['Xy']
  set_score_request: ['sample_weight']

LarsCV
  set_fit_request: ['Xy']
  set_score_request: ['sample_weight']

Lasso
  set_fit_request: ['check_input', 'sample_weight']
  set_score_request: ['sample_weight']

LassoCV
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

LassoLars
  set_fit_request: ['Xy']
  set_score_request: ['sample_weight']

LassoLarsCV
  set_fit_request: ['Xy']
  set_score_request: ['sample_weight']

LassoLarsIC
  set_fit_request: ['copy_X']
  set_score_request: ['sample_weight']

LatentDirichletAllocation

LedoitWolf
  set_score_request: ['X_test']

LinearDiscriminantAnalysis
  set_score_request: ['sample_weight']

LinearRegression
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

LinearSVC
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

LinearSVR
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

LocalOutlierFactor

LocallyLinearEmbedding

LogisticRegression
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

LogisticRegressionCV
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

MDS
  set_fit_request: ['init']

MLPClassifier
  set_partial_fit_request: ['classes']
  set_score_request: ['sample_weight']

MLPRegressor
  set_score_request: ['sample_weight']

MaxAbsScaler

MeanShift

MinCovDet
  set_score_request: ['X_test']

MinMaxScaler

MiniBatchDictionaryLearning

MiniBatchKMeans
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['sample_weight']
  set_predict_request: ['sample_weight']
  set_score_request: ['sample_weight']

MiniBatchNMF
  set_inverse_transform_request: ['W']
  set_partial_fit_request: ['H', 'W']

MiniBatchSparsePCA

MissingIndicator

MultiLabelBinarizer
  set_inverse_transform_request: ['yt']

MultiOutputClassifier is a meta-estimator
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']

MultiOutputRegressor is a meta-estimator
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

MultiTaskElasticNet
  set_fit_request: ['check_input', 'sample_weight']
  set_score_request: ['sample_weight']

MultiTaskElasticNetCV
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

MultiTaskLasso
  set_fit_request: ['check_input', 'sample_weight']
  set_score_request: ['sample_weight']

MultiTaskLassoCV
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

MultinomialNB
  set_fit_request: ['sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']
  set_score_request: ['sample_weight']

NMF
  set_inverse_transform_request: ['W']

NearestCentroid
  set_score_request: ['sample_weight']

NearestNeighbors

NeighborhoodComponentsAnalysis

Normalizer
  set_transform_request: ['copy']

NuSVC
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

NuSVR
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

Nystroem

OAS
  set_score_request: ['X_test']

OPTICS

OneClassSVM
  set_fit_request: ['sample_weight']

OneHotEncoder

OneVsOneClassifier is a meta-estimator
  set_partial_fit_request: ['classes']
  set_score_request: ['sample_weight']

OneVsRestClassifier is a meta-estimator
  set_partial_fit_request: ['classes']
  set_score_request: ['sample_weight']

OrdinalEncoder

OrthogonalMatchingPursuit
  set_score_request: ['sample_weight']

OrthogonalMatchingPursuitCV
  set_score_request: ['sample_weight']

OutputCodeClassifier is a meta-estimator
  set_score_request: ['sample_weight']

PCA

PLSCanonical
  set_predict_request: ['copy']
  set_score_request: ['sample_weight']
  set_transform_request: ['copy']

PLSRegression
  set_predict_request: ['copy']
  set_score_request: ['sample_weight']
  set_transform_request: ['copy']

PLSSVD

PassiveAggressiveClassifier
  set_fit_request: ['coef_init', 'intercept_init']
  set_partial_fit_request: ['classes']
  set_score_request: ['sample_weight']

PassiveAggressiveRegressor
  set_fit_request: ['coef_init', 'intercept_init']
  set_partial_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

PatchExtractor

Perceptron
  set_fit_request: ['coef_init', 'intercept_init', 'sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']
  set_score_request: ['sample_weight']

Pipeline
  set_inverse_transform_request: ['Xt']
  set_score_request: ['sample_weight']

PoissonRegressor
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

PolynomialCountSketch

PolynomialFeatures

PowerTransformer

QuadraticDiscriminantAnalysis
  set_score_request: ['sample_weight']

QuantileRegressor
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

QuantileTransformer

RANSACRegressor is a meta-estimator
  set_fit_request: ['sample_weight']

RBFSampler

RFE is a meta-estimator

RFECV is a meta-estimator
  set_fit_request: ['groups']

RadiusNeighborsClassifier
  set_score_request: ['sample_weight']

RadiusNeighborsRegressor
  set_score_request: ['sample_weight']

RadiusNeighborsTransformer

RandomForestClassifier is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

RandomForestRegressor is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

RandomTreesEmbedding is a meta-estimator
  set_fit_request: ['sample_weight']

RandomizedSearchCV is a meta-estimator
  set_fit_request: ['groups']
  set_inverse_transform_request: ['Xt']

RegressorChain is a meta-estimator
  set_score_request: ['sample_weight']

Ridge
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

RidgeCV
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

RidgeClassifier
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

RidgeClassifierCV
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

RobustScaler

SGDClassifier
  set_fit_request: ['coef_init', 'intercept_init', 'sample_weight']
  set_partial_fit_request: ['classes', 'sample_weight']
  set_score_request: ['sample_weight']

SGDOneClassSVM
  set_fit_request: ['coef_init', 'offset_init', 'sample_weight']
  set_partial_fit_request: ['sample_weight']

SGDRegressor
  set_fit_request: ['coef_init', 'intercept_init', 'sample_weight']
  set_partial_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

SVC
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

SVR
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

SelectFdr

SelectFpr

SelectFromModel is a meta-estimator

SelectFwe

SelectKBest

SelectPercentile

SelfTrainingClassifier is a meta-estimator

SequentialFeatureSelector is a meta-estimator

ShrunkCovariance
  set_score_request: ['X_test']

SimpleImputer

SkewedChi2Sampler

SparseCoder

SparsePCA

SparseRandomProjection

SpectralBiclustering

SpectralClustering

SpectralCoclustering

SpectralEmbedding

SplineTransformer
  set_fit_request: ['sample_weight']

StackingClassifier is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

StackingRegressor is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

StandardScaler
  set_fit_request: ['sample_weight']
  set_inverse_transform_request: ['copy']
  set_partial_fit_request: ['sample_weight']
  set_transform_request: ['copy']

TSNE

TargetEncoder

TfidfTransformer
  set_transform_request: ['copy']

TfidfVectorizer
  set_fit_request: ['raw_documents']
  set_transform_request: ['raw_documents']

TheilSenRegressor
  set_score_request: ['sample_weight']

TransformedTargetRegressor
  set_score_request: ['sample_weight']

TruncatedSVD

TweedieRegressor
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

VarianceThreshold

VotingClassifier is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

VotingRegressor is a meta-estimator
  set_fit_request: ['sample_weight']
  set_score_request: ['sample_weight']

Observations, among other things:

  • sample_weight for non-meta estimators is legit, but for meta-estimators needs checking. Not all meta-estimators actually consume sample_weight, but many of them have it as an explicit arg.
  • Not all meta-estimators inherit from MetaEstimatorMixin, therefore not marked as meta-estimator above.
  • copy should probably be considered metadata, but not sure
  • check_input should probably not be considered metadata
  • return_std should probably be considered metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants