[MRG+1] Reduce warnings in the model_selection tests #5703

raghavrv · 2015-11-03T15:06:14Z

Fix #5669

stack y, y_reversed and use it for the multi output case and set random_state to 0 (sag solver is used)... That should make the sag solver converge without any warnings...
for the test_cross_val_predict_input_types, use the iris dataset to prevent non-convergence... (This won't slow down the tests by any significant amount)
Use atleast 3 samples per class when using cross_val* which uses 3 fold cv...

@amueller

amueller · 2015-11-03T17:10:55Z

Can you quickly summarize why the errors were raised and why you change fixes them?
Also please squash.

raghavrv · 2015-11-03T17:22:17Z

The ConvergenceWarning was due to X being used as a target for multioutput data, for which sag solver didn't converge... Also random_state needs to be fixed since even for the given y, randomly sag solver seems to not converge...
"The Warning : least frequent class has less than n_folds samples" was raised when the y = arange(10) // 2 was used as target with the cross_val_score which uses 3-fold cv... So I fixed the y to have atleast 3 samples per class... (Also I didn't want to change the default y used at every other place, which will affect other tests which may depend on y having only one sample per class?)

Those were the only two warnings I observed, with the first one repeated twice or thrice I think...

raghavrv · 2015-11-03T17:28:49Z

I am wondering if this should be extended to cross_validation tests too?? I've attempted to suppress blindly the warnings raised in tests for the old c_v / g_s / l_c at #5568 reasoning that they will be taken care of by model selection tests... ;)

raghavrv · 2015-11-04T14:06:52Z

Arghhh the ConvergenceWarning is not fully removed.. sorry... on it...

raghavrv · 2015-11-06T13:25:11Z

@amueller Fixed! this can be reviewed and merged!

raghavrv · 2015-11-06T13:27:25Z

➜  tests git:(test_val_warnings) ✗ nosetests -v -s . 

test_search.test_parameter_grid ... ok
test_search.test_grid_search ... [Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.0s finished
ok
test_search.test_grid_search_score_method ... ok
test_search.test_grid_search_labels ... ok
test_search.test_trivial_grid_scores ... ok
test_search.test_no_refit ... ok
test_search.test_grid_search_error ... ok
test_search.test_grid_search_iid ... ok
test_search.test_grid_search_no_score ... ok
test_search.test_pandas_input ... ok
test_search.test_refit ... ok
test_search.test_grid_search_one_grid_point ... ok
test_search.test_grid_search_bad_param_grid ... ok
test_search.test_grid_search_sparse ... ok
test_search.test_grid_search_sparse_scoring ... ok
test_search.test_grid_search_precomputed_kernel ... ok
test_search.test_grid_search_precomputed_kernel_error_nonsquare ... ok
test_search.test_grid_search_precomputed_kernel_error_kernel_function ... ok
test_search.test_gridsearch_nd ... ok
test_search.test_X_as_list ... ok
test_search.test_y_as_list ... ok
test_search.test_unsupervised_grid_search ... ok
test_search.test_gridsearch_no_predict ... ok
test_search.test_param_sampler ... ok
test_search.test_randomized_search_grid_scores ... ok
test_search.test_grid_search_score_consistency ... ok
test_search.test_pickle ... ok
test_search.test_grid_search_with_multioutput_data ... ok
test_search.test_predict_proba_disabled ... ok
test_search.test_grid_search_allows_nans ... ok
test_search.test_grid_search_failing_classifier ... ok
test_search.test_grid_search_failing_classifier_raise ... ok
test_search.test_parameters_sampler_replacement ... ok
test_split.test_kfold_valueerrors ... ok
test_split.test_kfold_indices ... ok
test_split.test_kfold_no_shuffle ... ok
test_split.test_stratified_kfold_no_shuffle ... ok
test_split.test_stratified_kfold_ratios ... ok
test_split.test_cross_validator_with_default_indices ... ok
test_split.train_test_split_pandas ... ok
test_split.test_kfold_balance ... ok
test_split.test_stratifiedkfold_balance ... ok
test_split.test_shuffle_kfold ... ok
test_split.test_shuffle_kfold_stratifiedkfold_reproducibility ... ok
test_split.test_shuffle_stratifiedkfold ... ok
test_split.test_kfold_can_detect_dependent_samples_on_digits ... ok
test_split.test_shuffle_split ... ok
test_split.test_stratified_shuffle_split_init ... ok
test_split.test_stratified_shuffle_split_iter ... ok
test_split.test_stratified_shuffle_split_even ... ok
test_split.test_predefinedsplit_with_kfold_split ... ok
test_split.test_label_shuffle_split ... ok
test_split.test_leave_label_out_changing_labels ... ok
test_split.test_train_test_split_errors ... ok
test_split.test_train_test_split ... ok
test_split.train_test_split_mock_pandas ... ok
test_split.test_shufflesplit_errors ... ok
test_split.test_shufflesplit_reproducible ... ok
test_split.test_safe_split_with_precomputed_kernel ... ok
test_split.test_train_test_split_allow_nans ... ok
test_split.test_check_cv ... ok
test_split.test_cv_iterable_wrapper ... ok
test_split.test_label_kfold ... ok
test_split.test_nested_cv ... ok
test_split.test_build_repr ... ok
test_validation.test_cross_val_score ... ok
test_validation.test_cross_val_score_predict_labels ... ok
test_validation.test_cross_val_score_pandas ... ok
test_validation.test_cross_val_score_mask ... ok
test_validation.test_cross_val_score_precomputed ... ok
test_validation.test_cross_val_score_fit_params ... ok
test_validation.test_cross_val_score_score_func ... ok
test_validation.test_cross_val_score_errors ... ok
test_validation.test_cross_val_score_with_score_func_classification ... ok
test_validation.test_cross_val_score_with_score_func_regression ... ok
test_validation.test_permutation_score ... ok
test_validation.test_permutation_test_score_allow_nans ... ok
test_validation.test_cross_val_score_allow_nans ... ok
test_validation.test_cross_val_score_multilabel ... ok
test_validation.test_cross_val_predict ... ok
test_validation.test_cross_val_predict_input_types ... ok
test_validation.test_cross_val_predict_pandas ... ok
test_validation.test_cross_val_score_sparse_fit_params ... ok
test_validation.test_learning_curve ... ok
test_validation.test_learning_curve_unsupervised ... ok
test_validation.test_learning_curve_verbose ... [Parallel(n_jobs=1)]: Done  15 out of  15 | elapsed:    0.0s finished
ok
test_validation.test_learning_curve_incremental_learning_not_possible ... ok
test_validation.test_learning_curve_incremental_learning ... ok
test_validation.test_learning_curve_incremental_learning_unsupervised ... ok
test_validation.test_learning_curve_batch_and_incremental_learning_are_equal ... ok
test_validation.test_learning_curve_n_sample_range_out_of_bounds ... ok
test_validation.test_learning_curve_remove_duplicate_sample_sizes ... ok
test_validation.test_learning_curve_with_boolean_indices ... ok
test_validation.test_validation_curve ... ok
test_validation.test_check_is_permutation ... ok
test_validation.test_cross_val_predict_sparse_prediction ... ok

----------------------------------------------------------------------
Ran 96 tests in 12.914s

OK

raghavrv · 2015-11-14T20:11:57Z

@amueller could you look at this one too if you are online?

TomDLT · 2016-01-04T16:38:06Z

sklearn/model_selection/tests/test_validation.py

I guess you don't need the // 2

TomDLT · 2016-01-04T16:47:28Z

LGTM apart from nitpick

raghavrv · 2016-01-04T17:33:32Z

Thanks for the review! Have addressed your comments.

MechCoder · 2016-01-08T22:04:43Z

sklearn/model_selection/tests/test_split.py

Not related to this PR, but what is the rationale behind not raising an error for this extreme case when it creates an empty test fold. ie the number of labels for all classes is less than the number of folds?

It is highly likely that this will raise a meaningless error at a further stage. For ex

dtc = DecisionTreeClassifier() X2 = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]) y = np.array([3, 3, -1, -1, 2]) cross_val_score(dtc, X2, y)

ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.

Ah! Thanks for the catch!

- Use data that will converge for the multioutput case - Use atleast 3 samples per class to conform to 3fold cv - Add the elided ignore warnings line - Use the iris dataset to prevent non-convergence of sag solver

MechCoder · 2016-01-17T20:49:07Z

Thanks !

raghavrv · 2016-01-17T20:54:33Z

Thanks for the reviews and merge :D

raghavrv changed the title ~~TST/FIX Use data that will converge for the multioutput case~~ Reduce warnings in the model_selection module Nov 3, 2015

raghavrv changed the title ~~Reduce warnings in the model_selection module~~ Reduce warnings in the model_selection tests Nov 3, 2015

raghavrv force-pushed the test_val_warnings branch 2 times, most recently from 47b3e8a to 31a366a Compare November 3, 2015 15:53

raghavrv mentioned this pull request Nov 3, 2015

test_validation raises many warnings #5669

Closed

raghavrv changed the title ~~Reduce warnings in the model_selection tests~~ [MRG] Reduce warnings in the model_selection tests Nov 3, 2015

raghavrv force-pushed the test_val_warnings branch from 31a366a to f0bce2f Compare November 3, 2015 17:26

raghavrv force-pushed the test_val_warnings branch from f0bce2f to a7c65ae Compare November 6, 2015 13:10

raghavrv force-pushed the test_val_warnings branch 2 times, most recently from da8afc8 to 0315318 Compare November 12, 2015 14:28

amueller added the Waiting for Reviewer label Dec 10, 2015

TomDLT reviewed Jan 4, 2016
View reviewed changes

sklearn/model_selection/tests/test_validation.py Outdated

Copy link

Member

TomDLT Jan 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you don't need the // 2

TomDLT changed the title ~~[MRG] Reduce warnings in the model_selection tests~~ [MRG+1] Reduce warnings in the model_selection tests Jan 4, 2016

raghavrv force-pushed the test_val_warnings branch from 0315318 to 3d8276e Compare January 4, 2016 17:32

MechCoder reviewed Jan 8, 2016
View reviewed changes

raghavrv mentioned this pull request Jan 17, 2016

StratifiedKFold does not raise error when number of samples for all the classes is less than n_folds #6177

Closed

TST/FIX Make model selection tests warning free!

e0df7c9

- Use data that will converge for the multioutput case - Use atleast 3 samples per class to conform to 3fold cv - Add the elided ignore warnings line - Use the iris dataset to prevent non-convergence of sag solver

raghavrv force-pushed the test_val_warnings branch from 3d8276e to e0df7c9 Compare January 17, 2016 20:08

MechCoder closed this Jan 17, 2016

raghavrv deleted the test_val_warnings branch January 17, 2016 20:54

Uh oh!

[MRG+1] Reduce warnings in the model_selection tests #5703

[MRG+1] Reduce warnings in the model_selection tests #5703

Uh oh!

Conversation

raghavrv commented Nov 3, 2015

Uh oh!

amueller commented Nov 3, 2015

Uh oh!

raghavrv commented Nov 3, 2015

Uh oh!

raghavrv commented Nov 3, 2015

Uh oh!

raghavrv commented Nov 4, 2015

Uh oh!

raghavrv commented Nov 6, 2015

Uh oh!

raghavrv commented Nov 6, 2015

Uh oh!

raghavrv commented Nov 14, 2015

Uh oh!

TomDLT Jan 4, 2016

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Jan 4, 2016

Uh oh!

raghavrv commented Jan 4, 2016

Uh oh!

MechCoder Jan 8, 2016

Choose a reason for hiding this comment

Uh oh!

raghavrv Jan 17, 2016

Choose a reason for hiding this comment

Uh oh!

raghavrv Jan 17, 2016

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Jan 17, 2016

Uh oh!

raghavrv commented Jan 17, 2016

Uh oh!

Uh oh!