MultiOutputRegressor: Support for more fit parameters #15953

huangk10 · 2019-12-23T02:08:01Z

Description

This is a feature wanted. Till latest version of sklearn, the MultiOutputRegressor.fit only support a optional sample_weight parameter. It would be nice if it support another optional fit_param parameter, which will enhance the estimator.fit. For example, we can use lightgbm or xgboost early stopping fitting way to overcome the over-fitting issue.

I know it is a little bit complicated to realize that. But I I hope you will consider that. Thanks!

Steps/Code to Reproduce

This is my expected usage example.

#!/usr/bin/env python3

import numpy as np
from sklearn.multioutput import MultiOutputRegressor
import lightgbm as lgb

train_X = np.random.random((10, 10))
train_y = np.random.random((10, 4))
eval_X = np.random.random((10, 10))
train_y = np.random.random((10, 4))
single_model = lgb.GBMRegressor()
model = MultiOutputRegressor(single_model)
fit_param = {'verbose': False, 'early_stopping_rounds':10, 'eval_set':(eval_X, eval_y)}
reg.fit(train_X, train_y, fit_param=fit_param)

Expected Results

Unsupported yet.

Actual Results

Unsupported yet.

Versions

Scikit-Learn: 0.22
pltaform: Windows-10-10.0.14393-SP0
python: 3.6.9

The text was updated successfully, but these errors were encountered:

jnothman · 2019-12-23T02:39:36Z

I'd happily see a pull request for this

armgilles · 2020-01-02T15:05:52Z

I like this features request.

Is RegressorChain will be impacted too ?

huangk10 · 2020-01-03T06:12:53Z

I like this features request.

Is RegressorChain will be impacted too ?

As far as I am working on the pr #15995, I haven't yet considered passing a fit_params to the RegressorChain.fit or ClassifierChain.fit function. But I think the idea to custom the Chain.fit with a fit_params makes sense. And I think you could open a new issue and maybe then work on the new feature.

armgilles · 2020-01-03T09:15:54Z

As far as I am working on the pr #15995,

You mean this one #15959 no ?

glemaitre · 2020-01-03T12:47:48Z

@armgilles Support for *Chain estimators should be in a separate PR. I don't know the source code enough but I think that it should be meaningful. You can open an issue by giving your use case that requires parameter.

huangk10 · 2020-01-03T14:06:22Z

You mean this one #15959 no ?

@armgilles Yes, It it should be #15959. Sorry for the misspell of the no pr.

Kur-Ich · 2021-08-02T12:46:45Z

Any update on this? Would be a great feature to have.

glemaitre · 2021-08-02T12:57:53Z

It was merged in #15959

alessio-ca · 2021-10-16T12:44:26Z

Hi! I believe the current implementation still does not support passing the eval_set for early stopping (at least for XGBoost).
The problem is that the feature matrices and targets provided by eval_set are never propagated in the chain: the matrices are never augmented, and the targets (which are 2D matrices theirselves, since it's a chain) are never split into single column vectors to be passed to the fit method.

For example:

import numpy as np
from sklearn.multioutput import RegressorChain
from xgboost import XGBRegressor

train_X = np.random.random((10, 10))
train_y = np.random.random((10, 4))
eval_X = np.random.random((10, 10))
eval_y = np.random.random((10, 4))

base_reg = XGBRegressor()
chain = RegressorChain(base_estimator=base_reg, order=[0, 1,2,3])

fit_param = {'verbose': False, 'early_stopping_rounds':10, 'eval_set':[(eval_X, eval_y)]}
chain.fit(train_X, train_y, **fit_param)

Result:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/var/folders/zg/575k6939237_dd479ds5mxhr0000gn/T/ipykernel_30304/157940229.py in <module>
     12 
     13 fit_param = {'verbose': False, 'early_stopping_rounds':10, 'eval_set':[(eval_X, eval_y)]}
---> 14 chain.fit(train_X, train_y, **fit_param)

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/sklearn/multioutput.py in fit(self, X, Y, **fit_params)
    857         self : object
    858         """
--> 859         super().fit(X, Y, **fit_params)
    860         return self
    861 

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/sklearn/multioutput.py in fit(self, X, Y, **fit_params)
    526             y = Y[:, self.order_[chain_idx]]
    527 
--> 528             estimator.fit(X_aug[:, : (X.shape[1] + chain_idx)], y, **fit_params)
    529             if self.cv is not None and chain_idx < len(self.estimators_) - 1:
    530                 col_idx = X.shape[1] + chain_idx

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    431         for k, arg in zip(sig.parameters, args):
    432             kwargs[k] = arg
--> 433         return f(**kwargs)
    434 
    435     return inner_f

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
    709         evals_result = {}
    710 
--> 711         train_dmatrix, evals = _wrap_evaluation_matrices(
    712             missing=self.missing,
    713             X=X,

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/sklearn.py in _wrap_evaluation_matrices(missing, X, y, group, qid, sample_weight, base_margin, feature_weights, eval_set, sample_weight_eval_set, base_margin_eval_set, eval_group, eval_qid, create_dmatrix, label_transform)
    279                 evals.append(train_dmatrix)
    280             else:
--> 281                 m = create_dmatrix(
    282                     data=valid_X,
    283                     label=label_transform(valid_y),

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/sklearn.py in <lambda>(**kwargs)
    723             eval_group=None,
    724             eval_qid=None,
--> 725             create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
    726         )
    727         params = self.get_xgb_params()

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    431         for k, arg in zip(sig.parameters, args):
    432             kwargs[k] = arg
--> 433         return f(**kwargs)
    434 
    435     return inner_f

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in __init__(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, group, qid, label_lower_bound, label_upper_bound, feature_weights, enable_categorical)
    547         self.handle = handle
    548 
--> 549         self.set_info(
    550             label=label,
    551             weight=weight,

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    431         for k, arg in zip(sig.parameters, args):
    432             kwargs[k] = arg
--> 433         return f(**kwargs)
    434 
    435     return inner_f

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in set_info(self, label, weight, base_margin, group, qid, label_lower_bound, label_upper_bound, feature_names, feature_types, feature_weights)
    587 
    588         if label is not None:
--> 589             self.set_label(label)
    590         if weight is not None:
    591             self.set_weight(weight)

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/core.py in set_label(self, label)
    718         """
    719         from .data import dispatch_meta_backend
--> 720         dispatch_meta_backend(self, label, 'label', 'float')
    721 
    722     def set_weight(self, weight):

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/data.py in dispatch_meta_backend(matrix, data, name, dtype)
    693     """Dispatch for meta info."""
    694     handle = matrix.handle
--> 695     _validate_meta_shape(data)
    696     if data is None:
    697         return

~/.pyenv/versions/miniconda3-latest/envs/my_env/lib/python3.8/site-packages/xgboost/data.py in _validate_meta_shape(data)
    637 def _validate_meta_shape(data):
    638     if hasattr(data, "shape"):
--> 639         assert len(data.shape) == 1 or (
    640             len(data.shape) == 2 and (data.shape[1] == 0 or data.shape[1] == 1)
    641         )

AssertionError:

As you can see, the target of eval_set (eval_y) is passed to XGBoost as a 2D matrix, which is not allowed. Even if you fix the problem for eval_y, the feature matrix of eval_set (eval_X) is not augmented when traversing the chain, raising an error as well in the next iteration.

Versions:
XGBoost: 1.4.2
SKlearn: 0.24.2

thomasjpfan · 2022-04-14T17:50:29Z

In this case, RegressorChain or MultiOutputRegressor does not know which fit parameters to slice.

To be fully generic, we would need to accept a process_fit_params callable parameter in RegressorChain or MultiOutputRegressor. During fit, the indices to slice on and fit_params is passed in and the callable returns the new fit params with the data correctly sliced.

adrinjalali · 2024-08-22T15:48:26Z

Now fixed with metadata routing.

huangk10 mentioned this issue Dec 23, 2019

Add a optional fit_param to enable custom MultiOutput fit process #15959

Merged

armgilles mentioned this issue Jan 7, 2020

RegressorChain: Support for fit parameters #16034

Closed

This was referenced Jul 29, 2020

[tests][python][scikit-learn] New unit tests and maintenance microsoft/LightGBM#3253

Merged

Support fit_params in stacking #18028

Closed

cmarmo added module:multioutput Enhancement Needs Triage Issue requires triage labels Mar 29, 2022

thomasjpfan removed the Needs Triage Issue requires triage label Apr 14, 2022

thomasjpfan added the Needs Decision - Include Feature Requires decision regarding including feature label Apr 14, 2022

adrinjalali mentioned this issue Aug 18, 2022

FEAT SLEP006: metadata routing infrastructure #24027

Merged

adrinjalali mentioned this issue Apr 26, 2023

SLEP006 - Metadata Routing task list #22893

Open

28 tasks

adrinjalali closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiOutputRegressor: Support for more fit parameters #15953

MultiOutputRegressor: Support for more fit parameters #15953

huangk10 commented Dec 23, 2019

jnothman commented Dec 23, 2019 via email

armgilles commented Jan 2, 2020

huangk10 commented Jan 3, 2020

armgilles commented Jan 3, 2020

glemaitre commented Jan 3, 2020

huangk10 commented Jan 3, 2020

Kur-Ich commented Aug 2, 2021

glemaitre commented Aug 2, 2021

alessio-ca commented Oct 16, 2021 •

edited

Loading

thomasjpfan commented Apr 14, 2022

adrinjalali commented Aug 22, 2024

MultiOutputRegressor: Support for more fit parameters #15953

MultiOutputRegressor: Support for more fit parameters #15953

Comments

huangk10 commented Dec 23, 2019

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

jnothman commented Dec 23, 2019 via email

armgilles commented Jan 2, 2020

huangk10 commented Jan 3, 2020

armgilles commented Jan 3, 2020

glemaitre commented Jan 3, 2020

huangk10 commented Jan 3, 2020

Kur-Ich commented Aug 2, 2021

glemaitre commented Aug 2, 2021

alessio-ca commented Oct 16, 2021 • edited Loading

thomasjpfan commented Apr 14, 2022

adrinjalali commented Aug 22, 2024

alessio-ca commented Oct 16, 2021 •

edited

Loading