Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jdsuryap
Copy link
Contributor

@jdsuryap jdsuryap commented Dec 20, 2020

Fixes #18996

Input X was being converted to float at the beginning of fit() method. However when a pairwise boolean metric was requested, X was again converted to boolean with multiple warning.

This caused were two issues:

  1. The data, if it was already of boolean type, was first converted to float and later to boolean
  2. The low level utility function printed the warning multiple times

As a FIX, a check has been put in place in the fit() method itself to let boolean data pass as such. However, when the data needs to be converted to boolean, it is done at this level only with the same warning that was printed earlier, but only once.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR @Prakashsur , this looks good so far. Could you please also need a non-regression test making sure that:

  • no warning is raised if the data passed o fit is already converted to bool
  • a single warning (the one you just added) is raised if the data passed to fit isn't converted to bool

thanks!

@jdsuryap jdsuryap changed the title FIX- Incorrect warning when clustering boolean data scikit-learn#18996 [MRG] FIX- Incorrect warning when clustering boolean data scikit-learn#18996 Dec 21, 2020
@jdsuryap
Copy link
Contributor Author

@NicolasHug
Both points have been resolved. Shall we take another look and take it forward?

Thanks

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jdsurya ! made some minor comments but LGTM

@cmarmo
Copy link
Contributor

cmarmo commented Jan 18, 2021

Hi @jdsurya , thanks for your pull request.
Do you mind fixing the lint errors so that all the check could be run? Thanks!

sklearn/cluster/tests/test_optics.py:219:80: E501 line too long (87 > 79 characters)
    # non-regression test for https://github.com/scikit-learn/scikit-learn/issues/18996
                                                                               ^
sklearn/cluster/tests/test_optics.py:231:80: E501 line too long (87 > 79 characters)
    # non regression test for https://github.com/scikit-learn/scikit-learn/issues/18996

@jdsuryap
Copy link
Contributor Author

Hi @cmarmo, ntmu!

Thanks for sharing the details! The two linting issues have been fixed and are passing now.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @jdsurya !

LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v1.0.rst with tag |Fix|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@thomasjpfan thomasjpfan changed the title [MRG] FIX- Incorrect warning when clustering boolean data scikit-learn#18996 FIX Incorrect warning when clustering boolean data Jan 21, 2021
@jdsuryap
Copy link
Contributor Author

Thank you for details regarding the entry into 'whats_new' Thomas @thomasjpfan! The same has been added.

@thomasjpfan thomasjpfan merged commit 364b1e3 into scikit-learn:master Jan 21, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect warning when clustering boolean data
5 participants