Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RandomForestRegressor having problem with integer-values targets: The type of target cannot be used to compute OOB estimates #27814

@danieleongari

Description

@danieleongari

Describe the bug

When having:

  • RandomForestRegressor
  • Multiple targets
  • integer values only (e.g., 1.0, 2.0, 3.0, ...) in the targets
  • oob_score=True

The check in BaseForest will raise the error:

ValueError: The type of target cannot be used to compute OOB estimates. Got multiclass-multioutput while only the following are supported: continuous, continuous-multioutput, binary, multiclass, multilabel-indicator.

because type_of_target misclassifies the target as multiclass instead of continuous when integer values are reported.
This is a bug because (1) I explicitly requested for a Regressor and (2) the classes are clearly to many to be a classification problem.

I could solve the problem by perturbing just a bit one value for each target, e.g.,:

for target in targets:
   df[target].iloc[0] *= 1.0001

but I would like to work out a more definitive fix in the program.

Steps/Code to Reproduce

import pandas as pd
from sklearn.ensemble import RandomForestRegressor

df = pd.DataFrame({
    "feat1": [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10],
    "feat2": [2, 6, 8, 1, 3, 5, 7, 9, 4, 10],
    "target1": [4.0, 6.0, 7.0, 3.0, 5.0, 4.0, 6.0 ,7.0 ,8.0 ,9.0],
    "target2": [5.0, 5.0, 6.0, 7.0, 3.0, 4.0, 10.0,6.0,6.0,7.0],
})

rf = RandomForestRegressor(oob_score=True, random_state=42)
rf.fit(df[["feat1", "feat2"]], df[["target1", "target2"]])

Expected Results

RandomForestRegressor(oob_score=True, random_state=42)

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 12
      4 df = pd.DataFrame({
      5     "feat1": [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10],
      6     "feat2": [2, 6, 8, 1, 3, 5, 7, 9, 4, 10],
      7     "target1": [4.0, 6.0, 7.0, 3.0, 5.0, 4.0, 6.0 ,7.0 ,8.0 ,9.0],
      8     "target2": [5.0, 5.0, 6.0, 7.0, 3.0, 4.0, 10.0,6.0,6.0,7.0],
      9 })
     11 rf = RandomForestRegressor(oob_score=True, random_state=42)
---> 12 rf.fit(df[["feat1", "feat2"]], df[["target1", "target2"]])

File c:\ProgramData\miniconda3\envs\py310\lib\site-packages\sklearn\ensemble\_forest.py:503, in BaseForest.fit(self, X, y, sample_weight)
    497     y_type = type_of_target(y)
    498     if y_type in ("multiclass-multioutput", "unknown"):
    499         # FIXME: we could consider to support multiclass-multioutput if
    500         # we introduce or reuse a constructor parameter (e.g.
    501         # oob_score) allowing our user to pass a callable defining the
    502         # scoring strategy on OOB sample.
--> 503         raise ValueError(
    504             "The type of target cannot be used to compute OOB "
    505             f"estimates. Got {y_type} while only the following are "
    506             "supported: continuous, continuous-multioutput, binary, "
    507             "multiclass, multilabel-indicator."
    508         )
    509     self._set_oob_score_and_attributes(X, y)
    511 # Decapsulate classes_ attributes

ValueError: The type of target cannot be used to compute OOB estimates. Got multiclass-multioutput while only the following are supported: continuous, continuous-multioutput, binary, multiclass, multilabel-indicator.

Versions

1.3.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions