Description
Describe the issue linked to the documentation
Hey, I don't know if I should call this a bug, but for me at least it was unexpected behavior. I tried to subclass from Pipeline
to implement a customization, so having a simplified configuration, which is used to build a sequence of transformations.
It generates an AttributeError
, due to not having an instance attribute with the same name as a positional argument (same is true for a kwarg) of the subclasses's init. Find a minimal example below.
Is this expected behavior? It does not harm to set the instance attributes with the same name, but it is surprising it is demanded and is very implicit. Also, it does not pop up, when you instantiate the object, but only when you try to call a method on it.
In case it is absolutely necessary, it may need some documentation.
In addition, I tried to globally skip parameter validation and it did not help in this situation, which might be a real bug?
Thanks for your help, and your good work:)
A simple example:
import sklearn
sklearn.set_config(
skip_parameter_validation=True, # disable validation
)
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd
class TakeColumn(BaseEstimator, TransformerMixin):
def __init__(self, column: str):
self.column = column
def __str__(self):
return self.__class__.__name__ + f"[{self.column}]"
def fit(self, X, y=None):
return self
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
return X[[self.column]]
class CategoricalFeature(Pipeline):
def __init__(self, column: str, encode=True):
take_column = TakeColumn(column)
steps = [(str(take_column), take_column)]
if encode:
encoder = OneHotEncoder()
steps.append((str(encoder), encoder))
# setting instance attributes having the same name, removes the exception
#self.column = column
#self.encode = encode
super().__init__(steps)
df = pd.DataFrame([["a"], ["b"], ["c"]], columns=["column"])
column_feature = CategoricalFeature("column")
some_other_feature = CategoricalFeature("other_column", encode=False)
# this fails, if instance attributes are not set with the same name as the
# corresponding parameter of __init__
result = column_feature.fit_transform(df)
Output:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/kristof/Library/Application Support/JetBrains/IntelliJIdea2024.3/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Library/Application Support/JetBrains/IntelliJIdea2024.3/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/kristof/Projects/pipeline-issue/default_console.py", line 45, in <module>
result = column_feature.fit_transform(df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/base.py", line 1382, in wrapper
estimator._validate_params()
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/base.py", line 438, in _validate_params
self.get_params(deep=False),
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/pipeline.py", line 299, in get_params
return self._get_params("steps", deep=deep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/utils/metaestimators.py", line 30, in _get_params
out = super().get_params(deep=deep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/base.py", line 248, in get_params
value = getattr(self, key)
^^^^^^^^^^^^^^^^^^
AttributeError: 'CategoricalFeature' object has no attribute 'column'
System:
python: 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 15:57:01) [Clang 17.0.6 ]
executable: /Users/kristof/Projects/pipeline-issue/.venv/bin/python
machine: macOS-15.3-arm64-arm-64bit
Python dependencies:
sklearn: 1.6.1
pip: None
setuptools: None
numpy: 2.2.2
scipy: 1.15.1
Cython: None
pandas: 2.2.3
matplotlib: None
joblib: 1.4.2
threadpoolctl: 3.5.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
num_threads: 12
prefix: libomp
filepath: /Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/.dylibs/libomp.dylib
version: None
Suggest a potential alternative/fix
No response