[MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph #12105

TomDLT · 2018-09-18T14:22:34Z

This is a bug fix on DBSCAN.
It is also included in #10482, but I isolated it here to help the reviews.

I fixed an bug in DBSCAN: When the neighbors graph was precomputed, it assumed that it was computed with include_self=False and DBSCAN added the diagonal to be equivalent to include_self=True. In particular, if the diagonal was already present, it added the diagonal a second time. I fixed this bug, so include_self=True and include_self=False are now equivalent. It will break code but I consider it as a bugfix.

jnothman

I guess after sum_duplicates, this is reasonable.

ogrisel

This LGTM as well.

Shall this be backported to 0.20.1? If so the whats_new.rst needs to be changed accordingly.

ogrisel · 2018-09-27T08:08:06Z

sklearn/cluster/tests/test_dbscan.py

+    dbscan(X, metric=metric)
+
+    if use_sparse:
+        assert_array_equal(X.A, X_copy.A)


nitpick: I don't find the use of X.A explicit enough. I would prefer assert_array_equal(X.toarray(), X_copy.toarray()).

jnothman · 2018-09-27T13:07:56Z

This is neither a critical issue, nor a regression in 0.20, so I don't really see why it should be in 0.20.1 if that takes effort, but I don't mind much.

This reverts commit 1b6b985.

jnothman

(I didn't expect you to revert, I was just giving my 2c on when I think it's worth backporting and when not. If we want to see 0.20 as an LTS for Py2 support, then we should backport most bug fixes...?)

TomDLT · 2018-09-27T13:34:55Z

I agree this PR belongs more to 0.21 than 0.20. This is why I reverted my whatsnew entry commit.

Then , the question is indeed "Do we want to backport most bug fixes in 0.20.x?"

qinhanmin2014 · 2018-09-27T14:59:49Z

I prefer to include more bug fixes in 0.20.1:
(1) 0.20 is our last version for python 2 and it'll be user-friendly to include more bug fixes.
(2) 0.21 is not likely to release very soon so I guess we don't want to let users complain about known bugs.

ogrisel · 2018-09-28T07:57:03Z

As long as they are easy to backport, yes.

ogrisel · 2018-09-28T14:22:19Z

Thanks!

qinhanmin2014 · 2018-09-28T15:25:23Z

We put what's new in the wrong section, I'll push a commit to correct it.
Btw, @TomDLT Can you explain why we need test_dbscan_input_not_modified. It seems unrelated to the PR (also passes on master I think)?

TomDLT · 2018-10-01T08:50:54Z

Can you explain why we need test_dbscan_input_not_modified. It seems unrelated to the PR (also passes on master I think)?

For this bug fix, the non-regression test is in test_dbscan_sparse_precomputed.
I also added a new test test_dbscan_input_not_modified, to be sure that we don't modify the input. Indeed, we change the input inplace, modifying the internal representation of the sparse matrix, so I wanted to make sure we don't also accidentally modify the data.

qinhanmin2014 · 2018-10-01T09:00:55Z

For this bug fix, the non-regression test is in test_dbscan_sparse_precomputed.
I also added a new test test_dbscan_input_not_modified, to be sure that we don't modify the input. Indeed, we change the input inplace, modifying the internal representation of the sparse matrix, so I wanted to make sure we don't also accidentally modify the data.

Fair enough, thanks.

…cikit-learn#12105)

…ved. See scikit-learn/scikit-learn#12105 2) Noise filter option '' was changed to 'comb' 3) NZ atom of Arg was removed in 'rekkergroup' 4) Command line interface of testis has been changed to options on hjson based config file 5) Screenshots was changed 6) Articles have been ordered 7) Minor bugs fixed and improved functionality

FIX diagonal in DBSCAN with precomputed sparse neighbors graph

d727d90

TomDLT changed the title ~~FIX diagonal in DBSCAN with precomputed sparse neighbors graph~~ [MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph Sep 18, 2018

TomDLT mentioned this pull request Sep 19, 2018

FEA Generalize the use of precomputed sparse distance matr… #10482

Merged

jnothman approved these changes Sep 19, 2018

View reviewed changes

TomDLT added the Bug label Sep 24, 2018

ogrisel approved these changes Sep 27, 2018

View reviewed changes

TomDLT added 4 commits September 27, 2018 14:32

nitpick

b8c6f3f

CLN

ebaa8f7

Merge branch 'master' into dbscan_bug

69fb986

move whatsnew entry

1b6b985

Revert "move whatsnew entry"

c627630

This reverts commit 1b6b985.

jnothman reviewed Sep 27, 2018

View reviewed changes

E261 at least two spaces before inline comment

437586a

add bug fix entry in 0.20.1

03297fa

ogrisel merged commit 819d8ef into scikit-learn:master Sep 28, 2018

ogrisel deleted the dbscan_bug branch September 28, 2018 14:22

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Oct 15, 2018

[MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph (s…

4bd468a

…cikit-learn#12105)

peay mentioned this pull request Mar 29, 2019

[MRG] Explicitly ignore SparseEfficiencyWarning in DBSCAN #13539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph #12105

[MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph #12105

Uh oh!

TomDLT commented Sep 18, 2018

Uh oh!

jnothman left a comment

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Sep 27, 2018

Uh oh!

jnothman commented Sep 27, 2018 via email

Uh oh!

jnothman left a comment

Uh oh!

TomDLT commented Sep 27, 2018

Uh oh!

qinhanmin2014 commented Sep 27, 2018

Uh oh!

ogrisel commented Sep 28, 2018

Uh oh!

ogrisel commented Sep 28, 2018

Uh oh!

qinhanmin2014 commented Sep 28, 2018

Uh oh!

TomDLT commented Oct 1, 2018

Uh oh!

qinhanmin2014 commented Oct 1, 2018

Uh oh!

Uh oh!

Uh oh!

[MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph #12105

[MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph #12105

Uh oh!

Conversation

TomDLT commented Sep 18, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Sep 27, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman commented Sep 27, 2018 via email

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Sep 27, 2018

Uh oh!

qinhanmin2014 commented Sep 27, 2018

Uh oh!

ogrisel commented Sep 28, 2018

Uh oh!

ogrisel commented Sep 28, 2018

Uh oh!

qinhanmin2014 commented Sep 28, 2018

Uh oh!

TomDLT commented Oct 1, 2018

Uh oh!

qinhanmin2014 commented Oct 1, 2018

Uh oh!

Uh oh!