Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TransformedTargetRegressor with Pipeline is not fitting model upon calling .fit #29983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmaddalena opened this issue Oct 1, 2024 · 3 comments
Labels
Bug Needs Triage Issue requires triage

Comments

@jmaddalena
Copy link

jmaddalena commented Oct 1, 2024

Describe the bug

A common use case is to use Pipeline to transform the feature set and to wrap it with TransformedTargetRegressor to transform the response variable. But when used in combination and calling fit on the TransformedTargetRegressor object, the model internal to the Pipeline is not actually fit.

Steps/Code to Reproduce

from sklearn.preprocessing import StandardScaler
from sklearn_pandas import DataFrameMapper
from xgboost import XGBRegressor
from sklearn.compose import TransformedTargetRegressor
from sklearn import datasets
from sklearn.pipeline import Pipeline
import pandas as pd
import numpy as np

iris = datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
y_col = "sepal length (cm)"
x_cols = [x for x in iris.feature_names if x != y_col]

model = XGBRegressor()

mod_pipeline = Pipeline([("scaler", StandardScaler()), ("model", model)])

mod_pipeline = TransformedTargetRegressor(regressor=mod_pipeline, func=np.log1p, inverse_func=np.expm1)

mod_pipeline.fit(df[x_cols], df[y_col])

mod_pipeline.regressor['model'].__sklearn_is_fitted__()

Expected Results

True

Actual Results

False

Versions

System:
    python: 3.11.9 (main, May 17 2024, 12:31:23) [Clang 14.0.3 (clang-1403.0.22.14.1)]
executable: /Users/jmaddalena/projects/vesta/.venv/bin/python
   machine: macOS-14.5-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.4.1.post1
          pip: 24.0
   setuptools: 66.1.1
        numpy: 1.26.4
        scipy: 1.12.0
       Cython: 3.0.9
       pandas: 2.2.2
   matplotlib: 3.8.3
       joblib: 1.3.2
threadpoolctl: 3.3.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 11
         prefix: libomp
       filepath: /Users/jmaddalena/projects/vesta/.venv/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 11
         prefix: libopenblas
       filepath: /Users/jmaddalena/projects/vesta/.venv/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Nehalem

       user_api: blas
   internal_api: openblas
    num_threads: 11
         prefix: libopenblas
       filepath: /Users/jmaddalena/projects/vesta/.venv/lib/python3.11/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Nehalem
@jmaddalena jmaddalena added Bug Needs Triage Issue requires triage labels Oct 1, 2024
@jmaddalena jmaddalena changed the title TransformedTargetRegressor with Pipeline is not fitting upon calling .fit TransformedTargetRegressor with Pipeline is not fitting model upon calling .fit Oct 1, 2024
@glemaitre
Copy link
Member

This expected. You are accessing the unfitted estimator. The fitted one is available via the regressor_ attribute:

mod_pipeline.regressor_

@jmaddalena
Copy link
Author

jmaddalena commented Oct 1, 2024

Thank you, that is very helpful! May I ask why there is both an unfitted estimator (mod_pipeline.regressor) and a fitted one (mod_pipeline.regressor_) when there is not an equivalent in the Pipeline object? I imagine I'm not the only user confused by this.

@glemaitre
Copy link
Member

The scikit-learn API enforces that the parameters passed at __init__ should not be modified. Therefore, we need to clone the object and and store it as a fitted attribute (the name is finishing with an _).

The only estimator in scikit-learn that does not follow this API is indeed Pipeline but this is a design bug instead of a feature (cf. #8157). However, solving this issue is really not trivial because we might break a lot of code. So we refrain to solve it for the moment (cf. #8350).

We thought about breaking the backward compatibility when bumping to scikit-learn 2.0 or creating a new pipeline class that follow the API and deprecate the previous one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

2 participants