[BUG] Optics float min_samples NN instantiation #14421

SomeUserName1 · 2019-07-19T22:01:40Z

Reference Issues/PRs

None yet.

data = load_some_data()

clust = OPTICS(metric='minkowski', n_jobs=-1, min_samples=0.1)
clust.fit(data)

What does this implement/fix? Explain your changes.

When passing min_samples as a float to optics l439 & 440 execute to bring it into integer ranges, but don't convert to int:

    if min_samples <= 1:
        min_samples = max(2, min_samples * n_samples)           # Still a float

When instantiating the NearestNeighbours class with a float it raises due to the float (l448).

Error message:

  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/cluster/optics_.py", line 248, in fit
    max_eps=self.max_eps)
  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/cluster/optics_.py", line 456, in compute_optics_graph
    nbrs.fit(X)
  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/neighbors/base.py", line 930, in fit
    return self._fit(X)
  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/neighbors/base.py", line 275, in _fit
    type(self.n_neighbors))
TypeError: n_neighbors does not take <class 'numpy.float64'> value, enter integer value

Fix:

    if min_samples <= 1:
        min_samples = int(round(max(2, min_samples * n_samples)))        # round to get the closest integer

the int(...) is for backwards compatibbility to Python 2 where round: T -> T with T Number, while Python3 round: T -> int

Any other comments?

qinhanmin2014 · 2019-07-20T15:02:00Z

thanks for spotting this
(1) OPTICS was introduced in 0.21, so we don't need to consider python2. maybe use int(...) directly?
(2) please fix similar issues in cluster_optics_xi
(3) please update the doc of min_samples in compute_optics_graph
(4) please add some tests
(5) please add what's new

SomeUserName1 · 2019-07-20T15:46:26Z

Where shall the what's new go? (this PR, the commit message, ...)? Actually it's just the expected behavior, given the documentation

Regarding the test:
I couldn't think of a test that checks the (not anymore existing) error besides just running optics with floating point parameters for min_samples and min_cluster_size and asserting true is it ran ...
Is comparing with an integer parameter example possible?
(thought the epsilon selection and different choices in initialization would make the algorithm and esp. the labeling non-deterministic but bijective.. with more time reading the tests that are there i ll probably figure it out)

Advise is very welcome!

qinhanmin2014

thanks, please try to avoid irrelevant changes, this will make this PR much easier to review.

sklearn/cluster/optics_.py

sklearn/cluster/tests/test_optics.py

qinhanmin2014 · 2019-07-21T11:00:45Z

Where shall the what's new go?

Please add an entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

qinhanmin2014 · 2019-07-21T12:10:46Z

ping we you are ready for another review. please avoid irrelevant changes.

When passing min_samples as a float to optics l439 & 440 execute to bring it into integer ranges, but don't convert to int: ``` if min_samples <= 1: min_samples = max(2, min_samples * n_samples) # Still a float ``` When instantiating the NearestNeighbours class with a float it raises due to the float (l448). Fix: ``` if min_samples <= 1: min_samples = int(round(max(2, min_samples * n_samples))) # round to get the closest integer ``` the int(...) is for backwards compatibbility to Python 2 where `round: T -> T` with T Number, while Python3 `round: T -> int`

SomeUserName1 · 2019-07-21T12:30:58Z

Just added the what's new part, ready for review

SomeUserName1 · 2019-07-21T18:12:18Z

ping

rth

Thanks for looking into it !

sklearn/cluster/optics_.py

sklearn/cluster/tests/test_optics.py

SomeUserName1

Integrated changes

rth

A few comments, otherwise LGTM.

sklearn/cluster/optics_.py

rth · 2019-07-22T13:59:38Z

Also please resolve conflicts.

Co-Authored-By: Roman Yurchak <[email protected]>

sklearn/cluster/tests/test_optics.py

sklearn/cluster/optics_.py

jnothman · 2019-07-24T08:16:32Z

@SomeUserName1, are you able to respond to the reviews to complete this work? We would like to include it in 0.21.3 which should be released next week.

SomeUserName1 · 2019-07-24T11:28:40Z

Have a presentation tomorrow concerning my bachelor's.
I m going to do it over the weekend (think it ll be already finished by Friday).

jnothman · 2019-07-27T09:07:38Z

We're going to be releasing 0.21.3 in the coming week, so an update here would be great.

SomeUserName1 · 2019-07-27T15:47:23Z

Updated

jnothman

Otherwise lgtm

jnothman · 2019-07-28T00:59:14Z

sklearn/cluster/tests/test_optics.py

+    C3 = [[100, 100], [100, 96], [100, 106]]
+    X = np.vstack((C1, C2, C3))
+
+    expected_labels = np.r_[[0] * 3, [1] * 3, [2] * 3]


If the point is to test the equivalence between float and int values, then you should be comparing the results of different OPTICS ruins to each other rather than compare to a ground truth again...

I don't want to introduce a new test for 0.21.3, I think we can make use of existing tests.

qinhanmin2014

I feel that there're still lots of irrelevant changes. I'm going to open a small PR to finish this one because we're going to release very soon. I'll mention the contributor in what's new. Apologies if I make someone unhappy.

qinhanmin2014 · 2019-07-28T13:24:04Z

doc/whats_new/v0.21.rst

  ``tol`` required too strict types. :pr:`14092` by
  :user:`Jérémie du Boisberranger <jeremiedbb>`.

+:mod:`sklearn.cluster`


we already have this section

qinhanmin2014 · 2019-07-28T13:25:48Z

sklearn/cluster/optics_.py

+        return size
+    elif 0 < size <= 1:
+        return max(2, int(size * n_samples))
+    else:


can we keep this function as it is and add int when needed?

qinhanmin2014 · 2019-07-28T13:27:31Z

sklearn/cluster/tests/test_optics.py

+    C3 = [[100, 100], [100, 96], [100, 106]]
+    X = np.vstack((C1, C2, C3))
+
+    expected_labels = np.r_[[0] * 3, [1] * 3, [2] * 3]


I don't want to introduce a new test for 0.21.3, I think we can make use of existing tests.

SomeUserName1 changed the title ~~BUG: Optics float min_samples NN instantiation~~ [BUG] Optics float min_samples NN instantiation Jul 19, 2019

qinhanmin2014 added the Blocker label Jul 20, 2019

qinhanmin2014 added this to the 0.21.3 milestone Jul 20, 2019

SomeUserName1 force-pushed the optics_convert_int branch 10 times, most recently from 3d0fbf8 to f68dead Compare July 20, 2019 21:52

qinhanmin2014 reviewed Jul 21, 2019

View reviewed changes

sklearn/cluster/optics_.py Show resolved Hide resolved

sklearn/cluster/optics_.py Show resolved Hide resolved

sklearn/cluster/tests/test_optics.py Show resolved Hide resolved

SomeUserName1 force-pushed the optics_convert_int branch from f68dead to 77a9a10 Compare July 21, 2019 12:06

SomeUserName1 force-pushed the optics_convert_int branch from 77a9a10 to a1d5e34 Compare July 21, 2019 12:30

rth suggested changes Jul 22, 2019

View reviewed changes

sklearn/cluster/optics_.py Outdated Show resolved Hide resolved

sklearn/cluster/tests/test_optics.py Outdated Show resolved Hide resolved

sklearn/cluster/tests/test_optics.py Outdated Show resolved Hide resolved

sklearn/cluster/tests/test_optics.py Outdated Show resolved Hide resolved

requested changes

c785064

SomeUserName1 commented Jul 22, 2019

View reviewed changes

rth reviewed Jul 22, 2019

View reviewed changes

sklearn/cluster/optics_.py Outdated Show resolved Hide resolved

sklearn/cluster/optics_.py Outdated Show resolved Hide resolved

sklearn/cluster/optics_.py Outdated Show resolved Hide resolved

sklearn/cluster/optics_.py Outdated Show resolved Hide resolved

Update sklearn/cluster/optics_.py

c028b33

Co-Authored-By: Roman Yurchak <[email protected]>

jnothman reviewed Jul 23, 2019

View reviewed changes

sklearn/cluster/tests/test_optics.py Outdated Show resolved Hide resolved

sklearn/cluster/tests/test_optics.py Outdated Show resolved Hide resolved

sklearn/cluster/tests/test_optics.py Outdated Show resolved Hide resolved

sklearn/cluster/optics_.py Show resolved Hide resolved

jnothman mentioned this pull request Jul 24, 2019

[MRG] Release 0.21.3 #14188

Merged

merge doc/whats_nes/v0.21.rst

4111f7d

SomeUserName1 force-pushed the optics_convert_int branch from 03f0048 to 87e6ace Compare July 27, 2019 15:45

SomeUserName1 force-pushed the optics_convert_int branch from 87e6ace to 7ed1a77 Compare July 27, 2019 16:03

applied requested changes

e4ef829

SomeUserName1 force-pushed the optics_convert_int branch from 7ed1a77 to e4ef829 Compare July 27, 2019 17:21

jnothman approved these changes Jul 28, 2019

View reviewed changes

qinhanmin2014 reviewed Jul 28, 2019

View reviewed changes

qinhanmin2014 mentioned this pull request Jul 28, 2019

[MRG+1] FIX Support float min_samples and min_cluster_size in OPTICS #14496

Merged

jnothman closed this in #14496 Jul 29, 2019

SomeUserName1 deleted the optics_convert_int branch July 29, 2019 13:03

Uh oh!

[BUG] Optics float min_samples NN instantiation #14421

[BUG] Optics float min_samples NN instantiation #14421

Uh oh!

Conversation

SomeUserName1 commented Jul 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

qinhanmin2014 commented Jul 20, 2019

Uh oh!

SomeUserName1 commented Jul 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qinhanmin2014 commented Jul 21, 2019

Uh oh!

qinhanmin2014 commented Jul 21, 2019

Uh oh!

SomeUserName1 commented Jul 21, 2019

Uh oh!

SomeUserName1 commented Jul 21, 2019

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SomeUserName1 left a comment

Choose a reason for hiding this comment

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rth commented Jul 22, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented Jul 24, 2019

Uh oh!

SomeUserName1 commented Jul 24, 2019

Uh oh!

jnothman commented Jul 27, 2019

Uh oh!

SomeUserName1 commented Jul 27, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Jul 28, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jul 28, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jul 28, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jul 28, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jul 28, 2019

SomeUserName1 commented Jul 19, 2019 •

edited

Loading

SomeUserName1 commented Jul 20, 2019 •

edited

Loading