Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] Reduce warnings in the model_selection tests #5703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

raghavrv
Copy link
Member

@raghavrv raghavrv commented Nov 3, 2015

Fix #5669

  • stack y, y_reversed and use it for the multi output case and set random_state to 0 (sag solver is used)... That should make the sag solver converge without any warnings...
  • for the test_cross_val_predict_input_types, use the iris dataset to prevent non-convergence... (This won't slow down the tests by any significant amount)
  • Use atleast 3 samples per class when using cross_val* which uses 3 fold cv...

@amueller

@raghavrv raghavrv changed the title TST/FIX Use data that will converge for the multioutput case Reduce warnings in the model_selection module Nov 3, 2015
@raghavrv raghavrv changed the title Reduce warnings in the model_selection module Reduce warnings in the model_selection tests Nov 3, 2015
@raghavrv raghavrv force-pushed the test_val_warnings branch 2 times, most recently from 47b3e8a to 31a366a Compare November 3, 2015 15:53
@raghavrv raghavrv changed the title Reduce warnings in the model_selection tests [MRG] Reduce warnings in the model_selection tests Nov 3, 2015
@amueller
Copy link
Member

amueller commented Nov 3, 2015

Can you quickly summarize why the errors were raised and why you change fixes them?
Also please squash.

@raghavrv
Copy link
Member Author

raghavrv commented Nov 3, 2015

  1. The ConvergenceWarning was due to X being used as a target for multioutput data, for which sag solver didn't converge... Also random_state needs to be fixed since even for the given y, randomly sag solver seems to not converge...
  2. "The Warning : least frequent class has less than n_folds samples" was raised when the y = arange(10) // 2 was used as target with the cross_val_score which uses 3-fold cv... So I fixed the y to have atleast 3 samples per class... (Also I didn't want to change the default y used at every other place, which will affect other tests which may depend on y having only one sample per class?)

Those were the only two warnings I observed, with the first one repeated twice or thrice I think...

@raghavrv
Copy link
Member Author

raghavrv commented Nov 3, 2015

I am wondering if this should be extended to cross_validation tests too?? I've attempted to suppress blindly the warnings raised in tests for the old c_v / g_s / l_c at #5568 reasoning that they will be taken care of by model selection tests... ;)

@raghavrv
Copy link
Member Author

raghavrv commented Nov 4, 2015

Arghhh the ConvergenceWarning is not fully removed.. sorry... on it...

@raghavrv
Copy link
Member Author

raghavrv commented Nov 6, 2015

@amueller Fixed! this can be reviewed and merged!

@raghavrv
Copy link
Member Author

raghavrv commented Nov 6, 2015

➜  tests git:(test_val_warnings) ✗ nosetests -v -s . 

test_search.test_parameter_grid ... ok
test_search.test_grid_search ... [Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.0s finished
ok
test_search.test_grid_search_score_method ... ok
test_search.test_grid_search_labels ... ok
test_search.test_trivial_grid_scores ... ok
test_search.test_no_refit ... ok
test_search.test_grid_search_error ... ok
test_search.test_grid_search_iid ... ok
test_search.test_grid_search_no_score ... ok
test_search.test_pandas_input ... ok
test_search.test_refit ... ok
test_search.test_grid_search_one_grid_point ... ok
test_search.test_grid_search_bad_param_grid ... ok
test_search.test_grid_search_sparse ... ok
test_search.test_grid_search_sparse_scoring ... ok
test_search.test_grid_search_precomputed_kernel ... ok
test_search.test_grid_search_precomputed_kernel_error_nonsquare ... ok
test_search.test_grid_search_precomputed_kernel_error_kernel_function ... ok
test_search.test_gridsearch_nd ... ok
test_search.test_X_as_list ... ok
test_search.test_y_as_list ... ok
test_search.test_unsupervised_grid_search ... ok
test_search.test_gridsearch_no_predict ... ok
test_search.test_param_sampler ... ok
test_search.test_randomized_search_grid_scores ... ok
test_search.test_grid_search_score_consistency ... ok
test_search.test_pickle ... ok
test_search.test_grid_search_with_multioutput_data ... ok
test_search.test_predict_proba_disabled ... ok
test_search.test_grid_search_allows_nans ... ok
test_search.test_grid_search_failing_classifier ... ok
test_search.test_grid_search_failing_classifier_raise ... ok
test_search.test_parameters_sampler_replacement ... ok
test_split.test_kfold_valueerrors ... ok
test_split.test_kfold_indices ... ok
test_split.test_kfold_no_shuffle ... ok
test_split.test_stratified_kfold_no_shuffle ... ok
test_split.test_stratified_kfold_ratios ... ok
test_split.test_cross_validator_with_default_indices ... ok
test_split.train_test_split_pandas ... ok
test_split.test_kfold_balance ... ok
test_split.test_stratifiedkfold_balance ... ok
test_split.test_shuffle_kfold ... ok
test_split.test_shuffle_kfold_stratifiedkfold_reproducibility ... ok
test_split.test_shuffle_stratifiedkfold ... ok
test_split.test_kfold_can_detect_dependent_samples_on_digits ... ok
test_split.test_shuffle_split ... ok
test_split.test_stratified_shuffle_split_init ... ok
test_split.test_stratified_shuffle_split_iter ... ok
test_split.test_stratified_shuffle_split_even ... ok
test_split.test_predefinedsplit_with_kfold_split ... ok
test_split.test_label_shuffle_split ... ok
test_split.test_leave_label_out_changing_labels ... ok
test_split.test_train_test_split_errors ... ok
test_split.test_train_test_split ... ok
test_split.train_test_split_mock_pandas ... ok
test_split.test_shufflesplit_errors ... ok
test_split.test_shufflesplit_reproducible ... ok
test_split.test_safe_split_with_precomputed_kernel ... ok
test_split.test_train_test_split_allow_nans ... ok
test_split.test_check_cv ... ok
test_split.test_cv_iterable_wrapper ... ok
test_split.test_label_kfold ... ok
test_split.test_nested_cv ... ok
test_split.test_build_repr ... ok
test_validation.test_cross_val_score ... ok
test_validation.test_cross_val_score_predict_labels ... ok
test_validation.test_cross_val_score_pandas ... ok
test_validation.test_cross_val_score_mask ... ok
test_validation.test_cross_val_score_precomputed ... ok
test_validation.test_cross_val_score_fit_params ... ok
test_validation.test_cross_val_score_score_func ... ok
test_validation.test_cross_val_score_errors ... ok
test_validation.test_cross_val_score_with_score_func_classification ... ok
test_validation.test_cross_val_score_with_score_func_regression ... ok
test_validation.test_permutation_score ... ok
test_validation.test_permutation_test_score_allow_nans ... ok
test_validation.test_cross_val_score_allow_nans ... ok
test_validation.test_cross_val_score_multilabel ... ok
test_validation.test_cross_val_predict ... ok
test_validation.test_cross_val_predict_input_types ... ok
test_validation.test_cross_val_predict_pandas ... ok
test_validation.test_cross_val_score_sparse_fit_params ... ok
test_validation.test_learning_curve ... ok
test_validation.test_learning_curve_unsupervised ... ok
test_validation.test_learning_curve_verbose ... [Parallel(n_jobs=1)]: Done  15 out of  15 | elapsed:    0.0s finished
ok
test_validation.test_learning_curve_incremental_learning_not_possible ... ok
test_validation.test_learning_curve_incremental_learning ... ok
test_validation.test_learning_curve_incremental_learning_unsupervised ... ok
test_validation.test_learning_curve_batch_and_incremental_learning_are_equal ... ok
test_validation.test_learning_curve_n_sample_range_out_of_bounds ... ok
test_validation.test_learning_curve_remove_duplicate_sample_sizes ... ok
test_validation.test_learning_curve_with_boolean_indices ... ok
test_validation.test_validation_curve ... ok
test_validation.test_check_is_permutation ... ok
test_validation.test_cross_val_predict_sparse_prediction ... ok

----------------------------------------------------------------------
Ran 96 tests in 12.914s

OK

@raghavrv raghavrv force-pushed the test_val_warnings branch 2 times, most recently from da8afc8 to 0315318 Compare November 12, 2015 14:28
@raghavrv
Copy link
Member Author

@amueller could you look at this one too if you are online?

@@ -126,6 +126,7 @@ def _is_training_data(self, X):
X = np.ones((10, 2))
X_sparse = coo_matrix(X)
y = np.arange(10) // 2
y2 = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 3]) // 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you don't need the // 2

@TomDLT TomDLT changed the title [MRG] Reduce warnings in the model_selection tests [MRG+1] Reduce warnings in the model_selection tests Jan 4, 2016
@TomDLT
Copy link
Member

TomDLT commented Jan 4, 2016

LGTM apart from nitpick

@raghavrv
Copy link
Member Author

raghavrv commented Jan 4, 2016

Thanks for the review! Have addressed your comments.

@@ -216,6 +216,7 @@ def test_kfold_valueerrors():
# though all the classes are not necessarily represented at on each
# side of the split at each split
with warnings.catch_warnings():
warnings.simplefilter("ignore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but what is the rationale behind not raising an error for this extreme case when it creates an empty test fold. ie the number of labels for all classes is less than the number of folds?

It is highly likely that this will raise a meaningless error at a further stage. For ex

dtc = DecisionTreeClassifier()
X2 = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([3, 3, -1, -1, 2])
cross_val_score(dtc, X2, y)

ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Thanks for the catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Use data that will converge for the multioutput case
- Use atleast 3 samples per class to conform to 3fold cv
- Add the elided ignore warnings line
- Use the iris dataset to prevent non-convergence of sag solver
@MechCoder
Copy link
Member

Thanks !

@MechCoder MechCoder closed this Jan 17, 2016
@raghavrv raghavrv deleted the test_val_warnings branch January 17, 2016 20:54
@raghavrv
Copy link
Member Author

Thanks for the reviews and merge :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants