Description
Describe the bug
I experience the following error when using TransformedTargetRegressor with my skorch model:
ValueError: The target data shouldn't be 1-dimensional but instead have 2 dimensions, with the second dimension having the same size as the number of regression targets (usually 1). Please reshape your target data to be 2-dimensional (e.g. y = y.reshape(-1, 1).
After checking the Source Code this lead me the the following unexpected behaivor which makes little sense:
If TransformedTargetRegressor is fitted with with a 2d dimensional y, it will still be transformed to a 1d dimensional output
y should have the same input and output shapes with a TransformedTargetRegressor or there should be an init argument to disable the change of the input shape
(Yes, internally it gets casted to 2d, but I’m talking about the In and Outputs)
https://github.com/scikit-learn/scikit-learn/blob/364c77e04/sklearn/compose/_target.py#L20
TransformedTargetRegressor-->fit
if y.ndim == 1:
y_2d = y.reshape(-1, 1)
else:
y_2d = y
self._fit_transformer(y_2d)
[...]
if y_trans.ndim == 2 and y_trans.shape[1] == 1:
y_trans = y_trans.squeeze(axis=1)
But in the end we squeeze it back into a 1d which causes issues for models which expect a 2d input of y
y was 2d in the beginning for a reason
The following code would solve this:
if y_trans.ndim == 2 and y_trans.shape[1] == 1 and y.ndim==1: #only squeeze back to 1d if y is 1d
y_trans = y_trans.squeeze(axis=1)
This could only create an issue where the y input was for some reason 2d but should be 1d for the regressor.
In this case an attribute would be nice
if y_trans.ndim == 2 and y_trans.shape[1] == 1 and self.output_dim == 1:
y_trans = y_trans.squeeze(axis=1)
Also in TransformedTargetRegressor-->predict the results dont get squeezed after the prediction of the estimator - only if the original input shape was 1, in that case it is squeezed
So the result looks as expected, but only if the regressor takes a 1d y
If the estimator expects a 2d y the code fails
Steps/Code to Reproduce
regressor = TransformedTargetRegressor(
transformer=MinMaxScaler()
)
X, y = np.random.rand(10, 10), np.expand_dims(np.random.rand(10), 1)
regressor.fit(X, y)
Expected Results
The shape of y stays the same as the input OR there is a attribute which allows the choice of (1d or original) or (1d or 2d)
input | internal | output
2d —> 2d —> 2d
1d —> 2d —> 1d
Actual Results
the regressor gets just a 1d array even through y was specifically set to 2d
(I don't know how to extract these results without an debugger)
It works for this example because the default regressor is used, but when using it with other models they might need the 2nd dimention of y, because it was specifically reshaped (-1,1)
input | internal | output
2d —> 2d —> 1d THIS creates issues for the regressive which is passed to the Transformer if it expects a 2d array because a 2d y was given
1d —> 2d —> 1d
Versions
System:
python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]
executable: /anaconda/envs/azureml_py310_sdkv2/bin/python
machine: Linux-5.15.0-1017-azure-x86_64-with-glibc2.31
Python dependencies:
sklearn: 1.1.3
pip: 22.1.2
setuptools: 61.2.0
numpy: 1.23.2
scipy: 1.9.0
Cython: None
pandas: 1.4.3
matplotlib: 3.6.2
joblib: 1.2.0
threadpoolctl: 3.1.0
Metadata
Metadata
Assignees
Type
Projects
Status