Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph #12105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 28, 2018

Conversation

TomDLT
Copy link
Member

@TomDLT TomDLT commented Sep 18, 2018

This is a bug fix on DBSCAN.
It is also included in #10482, but I isolated it here to help the reviews.

From #10482 (comment):

I fixed an bug in DBSCAN: When the neighbors graph was precomputed, it assumed that it was computed with include_self=False and DBSCAN added the diagonal to be equivalent to include_self=True. In particular, if the diagonal was already present, it added the diagonal a second time. I fixed this bug, so include_self=True and include_self=False are now equivalent. It will break code but I consider it as a bugfix.

@TomDLT TomDLT changed the title FIX diagonal in DBSCAN with precomputed sparse neighbors graph [MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph Sep 18, 2018
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess after sum_duplicates, this is reasonable.

@TomDLT TomDLT added the Bug label Sep 24, 2018
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM as well.

Shall this be backported to 0.20.1? If so the whats_new.rst needs to be changed accordingly.

dbscan(X, metric=metric)

if use_sparse:
assert_array_equal(X.A, X_copy.A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I don't find the use of X.A explicit enough. I would prefer assert_array_equal(X.toarray(), X_copy.toarray()).

@jnothman
Copy link
Member

jnothman commented Sep 27, 2018 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't expect you to revert, I was just giving my 2c on when I think it's worth backporting and when not. If we want to see 0.20 as an LTS for Py2 support, then we should backport most bug fixes...?)

@TomDLT
Copy link
Member Author

TomDLT commented Sep 27, 2018

I agree this PR belongs more to 0.21 than 0.20. This is why I reverted my whatsnew entry commit.

Then , the question is indeed "Do we want to backport most bug fixes in 0.20.x?"

@qinhanmin2014
Copy link
Member

I prefer to include more bug fixes in 0.20.1:
(1) 0.20 is our last version for python 2 and it'll be user-friendly to include more bug fixes.
(2) 0.21 is not likely to release very soon so I guess we don't want to let users complain about known bugs.

@ogrisel
Copy link
Member

ogrisel commented Sep 28, 2018

As long as they are easy to backport, yes.

@ogrisel ogrisel merged commit 819d8ef into scikit-learn:master Sep 28, 2018
@ogrisel
Copy link
Member

ogrisel commented Sep 28, 2018

Thanks!

@ogrisel ogrisel deleted the dbscan_bug branch September 28, 2018 14:22
@qinhanmin2014
Copy link
Member

We put what's new in the wrong section, I'll push a commit to correct it.
Btw, @TomDLT Can you explain why we need test_dbscan_input_not_modified. It seems unrelated to the PR (also passes on master I think)?

@TomDLT
Copy link
Member Author

TomDLT commented Oct 1, 2018

Can you explain why we need test_dbscan_input_not_modified. It seems unrelated to the PR (also passes on master I think)?

For this bug fix, the non-regression test is in test_dbscan_sparse_precomputed.
I also added a new test test_dbscan_input_not_modified, to be sure that we don't modify the input. Indeed, we change the input inplace, modifying the internal representation of the sparse matrix, so I wanted to make sure we don't also accidentally modify the data.

@qinhanmin2014
Copy link
Member

For this bug fix, the non-regression test is in test_dbscan_sparse_precomputed.
I also added a new test test_dbscan_input_not_modified, to be sure that we don't modify the input. Indeed, we change the input inplace, modifying the internal representation of the sparse matrix, so I wanted to make sure we don't also accidentally modify the data.

Fair enough, thanks.

alashkov83 added a commit to alashkov83/hydrocluster that referenced this pull request Nov 23, 2018
…ved. See scikit-learn/scikit-learn#12105

2) Noise filter option '' was changed to 'comb'
3) NZ atom of Arg was removed in 'rekkergroup'
4) Command line interface of testis has been changed to options on hjson based config file
5) Screenshots was changed
6) Articles have been ordered
7) Minor bugs fixed and improved functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants