Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Minimize validation of X in ensembles with a base estimator #7768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Oct 27, 2016 · 27 comments · Fixed by #13174
Closed

Minimize validation of X in ensembles with a base estimator #7768

jnothman opened this issue Oct 27, 2016 · 27 comments · Fixed by #13174
Labels
Enhancement help wanted Moderate Anything that requires some knowledge of conventions and best practices

Comments

@jnothman
Copy link
Member

jnothman commented Oct 27, 2016

Currently AdaBoost* requires X to be an array or sparse matrix of numerics. However, since the data is not processed directly by AdaBoost* but by its base estimator (on which fit, predict_proba and predict may be called), we should not need to constrain the data that much, allowing for X to be a list of text blobs or similar.

Similar may apply to other ensemble methods.

Derived from #7767.

@jnothman jnothman added Enhancement Moderate Anything that requires some knowledge of conventions and best practices Need Contributor labels Oct 27, 2016
@chkoar
Copy link
Contributor

chkoar commented Oct 31, 2016

That could be applied to any meta-estimator that uses a base estimator, right?

@jnothman
Copy link
Member Author

jnothman commented Nov 1, 2016

Yes, it could be. I didn't have time when I wrote this issue to check the applicability to other ensembles.

@jnothman jnothman changed the title Minimize validation of X in AdaBoost Minimize validation of X in ensembles with a base estimator Nov 1, 2016
@jnothman
Copy link
Member Author

jnothman commented Nov 1, 2016

Updated title and description

@chkoar
Copy link
Contributor

chkoar commented Nov 1, 2016

@jnothman I think that we have two options.

  • Validate the input early as it is now and introduce a new parameter check_input in fit, predict, etc with default vaule True in order to preserve the current behavior. The check_input could be in the constrcutor.
  • Relax the validation in the ensemble and let base estimator to handle the validation.

What do you think? I'll sent a PR.

@jnothman
Copy link
Member Author

jnothman commented Nov 1, 2016

IMO assuming the base estimator manages validation is fine.

@Chaitya62
Copy link

Is this still open ? can I work on it?

@chkoar
Copy link
Contributor

chkoar commented Dec 1, 2016

@Chaitya62 I didn't have the time to work on this. So, go ahead.

@Chaitya62
Copy link

@chkoar onit!

@Chaitya62
Copy link

After reading code for 2 days and trying to understand what actually needs to be changed I figured out that in that a call to check_X_y is being made which is forcing X to be 2d now for the patch should I do what @chkoar suggested ?

@jnothman
Copy link
Member Author

jnothman commented Dec 4, 2016 via email

@Chaitya62
Copy link

Cool I ll submit a PR soon

@devanshdalal
Copy link
Contributor

@Chaitya62, Let me inform if you are not working on this anymore. I want to work on this.

@Chaitya62
Copy link

@devanshdalal I am working on it have a minor issue which I hope I ll soon solve

@dalmia
Copy link
Contributor

dalmia commented Feb 1, 2017

@Chaitya62 Are you still working on this?

@Chaitya62
Copy link

Chaitya62 commented Feb 1, 2017

@dalmia go ahead work on it I am not able to test my code properly

@dalmia
Copy link
Contributor

dalmia commented Feb 1, 2017

@Chaitya62 Thanks!

@gokart23
Copy link

I'd like to work on this, if that's ok

@gokart23
Copy link

gokart23 commented Sep 13, 2018

As a first step, I tried looking at the behavior of meta-estimators when passed a 3D tensor. Looks like almost all meta-estimators which accept a base estimator fail :

>>> pytest -sx -k 'test_meta_estimators' sklearn/tests/test_common.py
<....>
AdaBoostClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
AdaBoostRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
BaggingClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
BaggingRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
ExtraTreesClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
ExtraTreesRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data

Skipping GradientBoostingClassifier - 'base_estimator' key not supported
Skipping GradientBoostingRegressor - 'base_estimator' key not supported
IsolationForest raised error 'default contamination parameter 0.1 will change in version 0.22 to "auto". This will change the predict method behavior.' when parsing data   

RANSACRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data                                     
RandomForestClassifier raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data
RandomForestRegressor raised error 'Found array with dim 3. Estimator expected <= 2.' when parsing data

@jnothman @amueller considering this, should this be a WONTFIX, or should all the meta-estimators be fixed?

@jnothman
Copy link
Member Author

jnothman commented Sep 15, 2018 via email

@chkoar
Copy link
Contributor

chkoar commented Feb 14, 2019

@jnothman Adaboost tests are testing the sparsity of the X. This means that we should skip these tests in order to relax the validation, right?

@jnothman
Copy link
Member Author

jnothman commented Feb 14, 2019 via email

@jilt-sebastian
Copy link

@jnothman Could you please tell me the solution you landed on? I am working with CNNs as my weak classifiers for which the input features need to be 3D. I still face the same issue as referenced in #7767.

@jnothman
Copy link
Member Author

Please give code that shows the error you encounter

@jilt-sebastian
Copy link

jilt-sebastian commented Mar 21, 2019

I am having the error in the following code snippet:
boosted_cnn = AdaBoostClassifier(base_estimator= model(), n_estimators=20)
boosted_audio.fit(X,Y)

shape of X (a,100,1) and Y (a, b) where b is the number of classes and a is the number of examples
fit gives be the following error:
*** ValueError: Found array with dim 3. Estimator expected <= 2.

@chkoar
Copy link
Contributor

chkoar commented Mar 21, 2019

@jilt-sebastian This functionality hasn't released yet. You probably need to install scikit-learn from master.

@jnothman
Copy link
Member Author

jnothman commented Mar 21, 2019 via email

@jilt-sebastian
Copy link

Cool. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement help wanted Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
8 participants