-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Closed
Labels
Description
Describe the bug
When having:
- RandomForestRegressor
- Multiple targets
- integer values only (e.g., 1.0, 2.0, 3.0, ...) in the targets
- oob_score=True
The check in BaseForest will raise the error:
ValueError: The type of target cannot be used to compute OOB estimates. Got multiclass-multioutput while only the following are supported: continuous, continuous-multioutput, binary, multiclass, multilabel-indicator.
because type_of_target misclassifies the target as multiclass instead of continuous when integer values are reported.
This is a bug because (1) I explicitly requested for a Regressor and (2) the classes are clearly to many to be a classification problem.
I could solve the problem by perturbing just a bit one value for each target, e.g.,:
for target in targets:
df[target].iloc[0] *= 1.0001but I would like to work out a more definitive fix in the program.
Steps/Code to Reproduce
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
df = pd.DataFrame({
"feat1": [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10],
"feat2": [2, 6, 8, 1, 3, 5, 7, 9, 4, 10],
"target1": [4.0, 6.0, 7.0, 3.0, 5.0, 4.0, 6.0 ,7.0 ,8.0 ,9.0],
"target2": [5.0, 5.0, 6.0, 7.0, 3.0, 4.0, 10.0,6.0,6.0,7.0],
})
rf = RandomForestRegressor(oob_score=True, random_state=42)
rf.fit(df[["feat1", "feat2"]], df[["target1", "target2"]])Expected Results
RandomForestRegressor(oob_score=True, random_state=42)
Actual Results
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 12
4 df = pd.DataFrame({
5 "feat1": [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10],
6 "feat2": [2, 6, 8, 1, 3, 5, 7, 9, 4, 10],
7 "target1": [4.0, 6.0, 7.0, 3.0, 5.0, 4.0, 6.0 ,7.0 ,8.0 ,9.0],
8 "target2": [5.0, 5.0, 6.0, 7.0, 3.0, 4.0, 10.0,6.0,6.0,7.0],
9 })
11 rf = RandomForestRegressor(oob_score=True, random_state=42)
---> 12 rf.fit(df[["feat1", "feat2"]], df[["target1", "target2"]])
File c:\ProgramData\miniconda3\envs\py310\lib\site-packages\sklearn\ensemble\_forest.py:503, in BaseForest.fit(self, X, y, sample_weight)
497 y_type = type_of_target(y)
498 if y_type in ("multiclass-multioutput", "unknown"):
499 # FIXME: we could consider to support multiclass-multioutput if
500 # we introduce or reuse a constructor parameter (e.g.
501 # oob_score) allowing our user to pass a callable defining the
502 # scoring strategy on OOB sample.
--> 503 raise ValueError(
504 "The type of target cannot be used to compute OOB "
505 f"estimates. Got {y_type} while only the following are "
506 "supported: continuous, continuous-multioutput, binary, "
507 "multiclass, multilabel-indicator."
508 )
509 self._set_oob_score_and_attributes(X, y)
511 # Decapsulate classes_ attributes
ValueError: The type of target cannot be used to compute OOB estimates. Got multiclass-multioutput while only the following are supported: continuous, continuous-multioutput, binary, multiclass, multilabel-indicator.
Versions
1.3.2