Minimize validation of X in ensembles with a base estimator #7768

jnothman · 2016-10-27T12:02:19Z

Currently AdaBoost* requires X to be an array or sparse matrix of numerics. However, since the data is not processed directly by AdaBoost* but by its base estimator (on which fit, predict_proba and predict may be called), we should not need to constrain the data that much, allowing for X to be a list of text blobs or similar.

Similar may apply to other ensemble methods.

Derived from #7767.

The text was updated successfully, but these errors were encountered:

chkoar · 2016-10-31T22:40:40Z

That could be applied to any meta-estimator that uses a base estimator, right?

jnothman · 2016-11-01T02:01:43Z

Yes, it could be. I didn't have time when I wrote this issue to check the applicability to other ensembles.

jnothman · 2016-11-01T02:02:26Z

Updated title and description

chkoar · 2016-11-01T10:59:49Z

@jnothman I think that we have two options.

Validate the input early as it is now and introduce a new parameter check_input in fit, predict, etc with default vaule True in order to preserve the current behavior. The check_input could be in the constrcutor.
Relax the validation in the ensemble and let base estimator to handle the validation.

What do you think? I'll sent a PR.

jnothman · 2016-11-01T11:17:53Z

IMO assuming the base estimator manages validation is fine.

Chaitya62 · 2016-12-01T06:35:50Z

Is this still open ? can I work on it?

chkoar · 2016-12-01T08:17:03Z

@Chaitya62 I didn't have the time to work on this. So, go ahead.

Chaitya62 · 2016-12-01T17:55:38Z

@chkoar onit!

Chaitya62 · 2016-12-04T19:46:55Z

After reading code for 2 days and trying to understand what actually needs to be changed I figured out that in that a call to check_X_y is being made which is forcing X to be 2d now for the patch should I do what @chkoar suggested ?

jnothman · 2016-12-04T22:30:16Z

As in let the base estimator handle validation? Yes, IMO

…

On 5 December 2016 at 06:46, Chaitya Shah ***@***.***> wrote: After reading code for 2 days and trying to understand what actually needs to be changed I figured out that in that a call to check_X_y is being made which is forcing X to be 2d now for the patch should I do what @chkoar <https://github.com/chkoar> suggested ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7768 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69Z4CUcCqOlkaOc0xpln9o1ovc85ks5rExiygaJpZM4KiQ_P> .

Chaitya62 · 2016-12-04T22:49:47Z

Cool I ll submit a PR soon

devanshdalal · 2017-01-02T18:08:53Z

@Chaitya62, Let me inform if you are not working on this anymore. I want to work on this.

Chaitya62 · 2017-01-04T20:55:31Z

@devanshdalal I am working on it have a minor issue which I hope I ll soon solve

dalmia · 2017-02-01T12:43:12Z

@Chaitya62 Are you still working on this?

Chaitya62 · 2017-02-01T14:11:07Z

@dalmia go ahead work on it I am not able to test my code properly

dalmia · 2017-02-01T14:35:53Z

@Chaitya62 Thanks!

gokart23 · 2018-09-13T22:18:27Z

I'd like to work on this, if that's ok

gokart23 · 2018-09-13T22:22:54Z

As a first step, I tried looking at the behavior of meta-estimators when passed a 3D tensor. Looks like almost all meta-estimators which accept a base estimator fail :

>>> pytest -sx -k 'test_meta_estimators' sklearn/tests/test_common.py
<....>
AdaBoostClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
AdaBoostRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
BaggingClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
BaggingRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
ExtraTreesClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
ExtraTreesRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data

Skipping GradientBoostingClassifier - 'base_estimator' key not supported
Skipping GradientBoostingRegressor - 'base_estimator' key not supported
IsolationForest raised error 'default contamination parameter 0.1 will change in version 0.22 to "auto". This will change the predict method behavior.' when parsing data   

RANSACRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data                                     
RandomForestClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
RandomForestRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data

@jnothman @amueller considering this, should this be a WONTFIX, or should all the meta-estimators be fixed?

jnothman · 2018-09-15T11:03:49Z

Thanks for looking into this. Not all ensembles are meta-estimators. Here we intend things that should be generic enough to support non-scikit-learn use-cases: not just dealing with rectangular feature matrices.

…

On Fri, 14 Sep 2018 at 08:23, Karthik Duddu ***@***.***> wrote: As a first step, I tried looking at behavior of meta-estimators when passed a 3D tensor. Looks like almost all meta-estimators which accept a base estimator fail : >>> pytest -sx -k 'test_meta_estimators' sklearn/tests/test_common.py <....> AdaBoostClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data AdaBoostRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data BaggingClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data BaggingRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data ExtraTreesClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data ExtraTreesRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data Skipping GradientBoostingClassifier - 'base_estimator' key not supported Skipping GradientBoostingRegressor - 'base_estimator' key not supported IsolationForest raised error 'default contamination parameter 0.1 will change in version 0.22 to "auto". This will change the predict method behavior.' when parsing data RANSACRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data RandomForestClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data RandomForestRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data @jnothman <https://github.com/jnothman> @amueller <https://github.com/amueller> considering this, should this be a WONTFIX, or should all the meta-estimators be fixed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7768 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz68Axs1khuYjm7lM4guYgyf2IlUL_ks5uatrDgaJpZM4KiQ_P> .

chkoar · 2019-02-14T16:49:44Z

@jnothman Adaboost tests are testing the sparsity of the X. This means that we should skip these tests in order to relax the validation, right?

jnothman · 2019-02-14T23:25:44Z

Sounds like it as long as it doesn't do other things with X than fit the base estimator

jilt-sebastian · 2019-03-21T10:22:03Z

@jnothman Could you please tell me the solution you landed on? I am working with CNNs as my weak classifiers for which the input features need to be 3D. I still face the same issue as referenced in #7767.

jnothman · 2019-03-21T10:59:49Z

Please give code that shows the error you encounter

jilt-sebastian · 2019-03-21T11:20:58Z

I am having the error in the following code snippet:
boosted_cnn = AdaBoostClassifier(base_estimator= model(), n_estimators=20)
boosted_audio.fit(X,Y)

shape of X (a,100,1) and Y (a, b) where b is the number of classes and a is the number of examples
fit gives be the following error:
*** ValueError: Found array with dim 3. Estimator expected <= 2.

chkoar · 2019-03-21T11:29:06Z

@jilt-sebastian This functionality hasn't released yet. You probably need to install scikit-learn from master.

jnothman · 2019-03-21T12:36:56Z

Or from the nightly build. pip install --pre -f http://nightly.scikit-learn.org scikit-learn

jilt-sebastian · 2019-03-21T12:49:40Z

Cool. Thank you

jnothman added Enhancement Moderate Anything that requires some knowledge of conventions and best practices Need Contributor labels Oct 27, 2016

jnothman mentioned this issue Oct 27, 2016

AdaboostClassifier for 3D or greater Input Data #7767

Closed

jnothman changed the title ~~Minimize validation of X in AdaBoost~~ Minimize validation of X in ensembles with a base estimator Nov 1, 2016

dalmia mentioned this issue Feb 6, 2017

[WIP] minimize validation of X in adaboost #8304

Closed

amueller removed the Need Contributor label Mar 3, 2017

amueller added the help wanted label Aug 21, 2018

gokart23 mentioned this issue Sep 13, 2018

[WIP] Minimize validation of X in ensembles with a base estimator #12072

Closed

chkoar mentioned this issue Feb 15, 2019

[MRG+2] Minimize the validation of X in adaboost #13174

Merged

jnothman closed this as completed in #13174 Feb 28, 2019

Uh oh!

Minimize validation of X in ensembles with a base estimator #7768

Minimize validation of X in ensembles with a base estimator #7768

Comments

jnothman commented Oct 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

chkoar commented Oct 31, 2016

Uh oh!

jnothman commented Nov 1, 2016

Uh oh!

jnothman commented Nov 1, 2016

Uh oh!

chkoar commented Nov 1, 2016

Uh oh!

jnothman commented Nov 1, 2016

Uh oh!

Chaitya62 commented Dec 1, 2016

Uh oh!

chkoar commented Dec 1, 2016

Uh oh!

Chaitya62 commented Dec 1, 2016

Uh oh!

Chaitya62 commented Dec 4, 2016

Uh oh!

jnothman commented Dec 4, 2016 via email

Uh oh!

Chaitya62 commented Dec 4, 2016

Uh oh!

devanshdalal commented Jan 2, 2017

Uh oh!

Chaitya62 commented Jan 4, 2017

Uh oh!

dalmia commented Feb 1, 2017

Uh oh!

Chaitya62 commented Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalmia commented Feb 1, 2017

Uh oh!

gokart23 commented Sep 13, 2018

Uh oh!

gokart23 commented Sep 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Sep 15, 2018 via email

Uh oh!

chkoar commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Feb 14, 2019 via email

Uh oh!

jilt-sebastian commented Mar 21, 2019

Uh oh!

jnothman commented Mar 21, 2019

Uh oh!

jilt-sebastian commented Mar 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chkoar commented Mar 21, 2019

Uh oh!

jnothman commented Mar 21, 2019 via email

Uh oh!

jilt-sebastian commented Mar 21, 2019

Uh oh!

jnothman commented Oct 27, 2016 •

edited

Loading

Chaitya62 commented Feb 1, 2017 •

edited

Loading

gokart23 commented Sep 13, 2018 •

edited

Loading

chkoar commented Feb 14, 2019 •

edited

Loading

jilt-sebastian commented Mar 21, 2019 •

edited

Loading