Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Change dataset for test_classifier_chain_vs_independent_models #9255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 4, 2017
Merged

[MRG] Change dataset for test_classifier_chain_vs_independent_models #9255

merged 2 commits into from
Jul 4, 2017

Conversation

qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Jun 30, 2017

Reference Issue

FIxes #9254

What does this implement/fix? Explain your changes.

I follow the author's method in the previous test to construct a dataset. I don't use the function provided by the author(generate_multilabel_dataset_with_correlations) because we need a random state to ensure a certain result. Sometimes, better model doesn't get better result.

Any other comments?

@qinhanmin2014 qinhanmin2014 changed the title [WIP] Change dataset for test_classifier_chain_vs_independent_models [MRG] Change dataset for test_classifier_chain_vs_independent_models Jun 30, 2017
X_test = X[2000:, :]
Y_train = Y[:2000, :]
Y_test = Y[2000:, :]
X, y = make_classification(n_samples=1000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does generate_multilabel_dataset_with_correlations not work as is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Thanks. I think we need a random state here to ensure a certain result(In the previous test, the author's purpose is to ensure that different calculation methods get same result so random state is not needed). There are indeed some cases when ClassifierChain get wrose result.

@jnothman
Copy link
Member

jnothman commented Jul 1, 2017

Ordinarily we tend to include a random state at all invocation in the tests to avoid occasional failures... (Although I wish that we had marked all tests that should work for any random state.)

Might be cleanest to add a random_state param to generate_multilabel_dataset_with_correlations...?

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks. I also think it is good. For this test, the improvement of the model is around 1% both with the original dataset and the new dataset. Considering new dataset relies on random state, without a random state, I'm really worring about test failure.

@jnothman
Copy link
Member

jnothman commented Jul 3, 2017

@adamklec, could you remind me if there was any special motivation for this test being applied to yeast? We're having trouble with the mldata servers' unreliability, and could do without depending on it for tests to pass.

@jmschrei
Copy link
Member

jmschrei commented Jul 3, 2017

It is probably worth it to amend the authors function to avoid effort duplication now/in the future.

order=np.array([0, 2, 4, 6, 8, 10,
12, 1, 3, 5, 7, 9,
11, 13]))
chain = ClassifierChain(LogisticRegression())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change in orderings?

@jnothman
Copy link
Member

jnothman commented Jul 3, 2017 via email

@qinhanmin2014
Copy link
Member Author

Thanks. Seems that there's already such comment at the beginning of the function. If there's something more to clarify, please leave a comment.

@ogrisel
Copy link
Member

ogrisel commented Jul 4, 2017

LGTM, merging.

@ogrisel ogrisel merged commit a294149 into scikit-learn:master Jul 4, 2017
@adamklec
Copy link

@jnothman Sorry for the delayed response. Your explanation for why I used the yeast dataset is correct. I wanted to write a test that asserted that in the presence of correlated classes ClassifierChains out perform independent models. For some reason I was having difficult doing this using the generate_multilabel_dataset_with_correlations function. Yeast is small and manageable and reasonably well known so I went with it. But I never liked the fact the unit test needed to connect to a remote server. If there is a way to use generate_multilabel_dataset_with_correlations I would go with it.

massich pushed a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017
@qinhanmin2014 qinhanmin2014 deleted the my-feature-1 branch July 26, 2017 08:41
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change dataset for test_classifier_chain_vs_independent_models
5 participants