Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Version 1.6.X: ClassifierMixIn failing with new __sklearn_tags__ function #30479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DaMuBo opened this issue Dec 13, 2024 · 8 comments Β· Fixed by #30516
Closed

Version 1.6.X: ClassifierMixIn failing with new __sklearn_tags__ function #30479

DaMuBo opened this issue Dec 13, 2024 · 8 comments Β· Fixed by #30516

Comments

@DaMuBo
Copy link

DaMuBo commented Dec 13, 2024

Describe the bug

Hi,

we are using Sklearn in our projects for different classification training methods on production level. In the dev stage we upgraded to the latest release and our Training failed due to changes in the ClassifierMixIn Class. We use it in combination with a sklearn Pipeline.

in 1.6.X the following function was introduced:

    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        tags.estimator_type = "classifier"
        tags.classifier_tags = ClassifierTags()
        tags.target_tags.required = True
        return tags

It is calling the sklearn_tags methods from it's parent class. But the ClassifierMixIn doesn't have a parent class. So it says function super().sklearn_tags() is not existing.

Steps/Code to Reproduce

from sklearn.base import ClassifierMixin,
from sklearn.pipeline import Pipeline
import numpy as np

class MyEstimator(ClassifierMixin):
    def __init__(self, *, param=1):
        self.param = param
    def fit(self, X, y=None):
        self.is_fitted_ = True
        return self
    def predict(self, X):
        return np.full(shape=X.shape[0], fill_value=self.param)

X = np.array([[1, 2], [2, 3], [3, 4]])
y = np.array([1, 0, 1])


my_pipeline = Pipeline([("estimator", MyEstimator(param=1))])
my_pipeline.fit(X, y)
my_pipeline.predict(X)

Expected Results

A Prediction is returned.

Actual Results

Traceback (most recent call last):
  File "c:\Users\xxxx\error_sklearn\redo_error.py", line 22, in <module>
    my_pipeline.predict(X)
  File "C:\Users\xxxx\error_sklearn\.venv\Lib\site-packages\sklearn\pipeline.py", line 780, in predict
    with _raise_or_warn_if_not_fitted(self):
  File "C:\Program Files\Wpy64-31230\python-3.12.3.amd64\Lib\contextlib.py", line 144, in __exit__
    next(self.gen)
  File "C:\Users\xxxx\error_sklearn\.venv\Lib\site-packages\sklearn\pipeline.py", line 60, in _raise_or_warn_if_not_fitted
    check_is_fitted(estimator)
  File "C:\Users\xxxx\git_projects\error_sklearn\.venv\Lib\site-packages\sklearn\utils\validation.py", line 1756, in check_is_fitted
    if not _is_fitted(estimator, attributes, all_or_any):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxx\error_sklearn\.venv\Lib\site-packages\sklearn\utils\validation.py", line 1665, in _is_fitted
    return estimator.__sklearn_is_fitted__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxx\error_sklearn\.venv\Lib\site-packages\sklearn\pipeline.py", line 1310, in __sklearn_is_fitted__
    check_is_fitted(last_step)
  File "C:\Users\xxxx\error_sklearn\.venv\Lib\site-packages\sklearn\utils\validation.py", line 1751, in check_is_fitted
    tags = get_tags(estimator)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxx\error_sklearn\.venv\Lib\site-packages\sklearn\utils\_tags.py", line 396, in get_tags
    tags = estimator.__sklearn_tags__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxxx\error_sklearn\.venv\Lib\site-packages\sklearn\base.py", line 540, in __sklearn_tags__
    tags = super().__sklearn_tags__()
           ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'super' object has no attribute '__sklearn_tags__'

Versions

1.6.0 / 1.6.X
@DaMuBo DaMuBo added Bug Needs Triage Issue requires triage labels Dec 13, 2024
@gunsodo
Copy link
Contributor

gunsodo commented Dec 13, 2024

@DaMuBo According to the official documentation, I think your MyEstimator should inherit from BaseEstimator as well.

import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin

class MyEstimator(ClassifierMixin, BaseEstimator):
    def __init__(self, *, param=1):
        self.param = param

    def fit(self, X, y=None):
        self.is_fitted_ = True
        return self

    def predict(self, X):
        return np.full(shape=X.shape[0], fill_value=self.param)

The code snippet above runs fine with your .fit() and .predict() calls. sklearn also provides sklearn.utils.estimator_checks.check_estimator which you can validate your custom estimator as well. I myself find it very useful for production-grade implementation.

@DaMuBo
Copy link
Author

DaMuBo commented Dec 13, 2024

@gunsodo
Yes we are not exactly following the documentation here.
We will change that on our side in the future for more robust code.

Thanks for the hint to the check_estimator we'll look into it.

But i also think the ClassifierMixIn shouldn't use a method which it can't execute.

@glemaitre
Copy link
Member

I would still consider this breakage as a regression. We made effort to raise a proper deprecation warning such that user that defined compatible estimators have one version to change their code.

While we strongly advocate for using BaseEstimator and would request it in 1.7, I think that the Mixin should indeed not break but instead raise the same deprecation warnings than when one inherit from BaseEstimator.

We should probably have a try/except around the super call and if failing, get the default_tags and populate it.

@glemaitre glemaitre added Regression and removed Needs Triage Issue requires triage labels Dec 14, 2024
@glemaitre
Copy link
Member

@adrinjalali do you think that it is a reasonable way forward?

@adrinjalali
Copy link
Member

I'd rather make ClassifierMixin inherit from BaseEstimator instead (it's not a mixin per say then anymore though).

We certainly shouldn't "fix" the issue as proposed in #30480 since it breaks code if inheritance order is wrong.

The code written here in the OP is simply "not supported", however, it's the kinda thing that was supported before, albeit dysfunctional and wrong.

I'd suggest two things:

  • improve the inheritance order check test, to make sure if there's any of the mixins present, BaseEstimator is also present, and call the test before trying to get tags
  • The issue here is coming from get_tags called in check_is_fitted, which was a fix in this release too. We can make get_tags catch that particular exception, raise a deprecation warning, and return default tags, and the warning says that the code "runs", but it's probably wrong.

@jxu
Copy link

jxu commented Apr 10, 2025

I still have this issue with TransformerMixin in 1.6.1. What is the intended fix?

My code is something like

class TransformCoerce(TransformerMixin, BaseEstimator):
    """Coerce input data to all numeric for xgboost."""
    def fit(self, X, y):
        return self 
    
    def transform(self, X):
        cols = X.columns.tolist()
        value_cols = list(filter(lambda x: x.endswith("value"), cols))
        X[value_cols] = X[value_cols].apply(lambda x: pd.to_numeric(x, errors="coerce"))
        return X 

pipeline = Pipeline([("transformer", TransformCoerce()),
                     ("xgb", xgb.XGBClassifier(enable_categorical=True))])

X, y = df.drop("GT", axis=1), df["GT"]

clf = pipeline.fit(X, y)

# in-sample predictions for simplicity
y_pred_prob = clf.predict_proba(X)
print(y_pred_prob)

All the online tutorials say to inherit from TransformerMixin and BaseEstimator. In particular https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html

Is it an issue to do with xgboost?

@adrinjalali
Copy link
Member

@jxu this is not a reproducer, so I can't really answer. But xgboost doesn't have any issues if you use latest xgboost and scikit-learn versions.

@jxu
Copy link

jxu commented Apr 28, 2025

@jxu this is not a reproducer, so I can't really answer. But xgboost doesn't have any issues if you use latest xgboost and scikit-learn versions.

Yes, it is all fixed with latest versions. From https://stackoverflow.com/questions/79290968/super-object-has-no-attribute-sklearn-tags either I had to downgrade to sklearn 1.5.2 or upgrade xgboost to 3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment