Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@SomeUserName1
Copy link

@SomeUserName1 SomeUserName1 commented Jul 19, 2019

Reference Issues/PRs

None yet.

data = load_some_data()

clust = OPTICS(metric='minkowski', n_jobs=-1, min_samples=0.1)
clust.fit(data)

What does this implement/fix? Explain your changes.

When passing min_samples as a float to optics l439 & 440 execute to bring it into integer ranges, but don't convert to int:

    if min_samples <= 1:
        min_samples = max(2, min_samples * n_samples)           # Still a float

When instantiating the NearestNeighbours class with a float it raises due to the float (l448).

Error message:

  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/cluster/optics_.py", line 248, in fit
    max_eps=self.max_eps)
  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/cluster/optics_.py", line 456, in compute_optics_graph
    nbrs.fit(X)
  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/neighbors/base.py", line 930, in fit
    return self._fit(X)
  File "/home/someusername/anaconda3/envs/bachelor_project/lib/python3.7/site-packages/sklearn/neighbors/base.py", line 275, in _fit
    type(self.n_neighbors))
TypeError: n_neighbors does not take <class 'numpy.float64'> value, enter integer value

Fix:

    if min_samples <= 1:
        min_samples = int(round(max(2, min_samples * n_samples)))        # round to get the closest integer

the int(...) is for backwards compatibbility to Python 2 where round: T -> T with T Number, while Python3 round: T -> int

Any other comments?

@SomeUserName1 SomeUserName1 changed the title BUG: Optics float min_samples NN instantiation [BUG] Optics float min_samples NN instantiation Jul 19, 2019
@qinhanmin2014
Copy link
Member

thanks for spotting this
(1) OPTICS was introduced in 0.21, so we don't need to consider python2. maybe use int(...) directly?
(2) please fix similar issues in cluster_optics_xi
(3) please update the doc of min_samples in compute_optics_graph
(4) please add some tests
(5) please add what's new

@qinhanmin2014 qinhanmin2014 added this to the 0.21.3 milestone Jul 20, 2019
@SomeUserName1
Copy link
Author

SomeUserName1 commented Jul 20, 2019

Where shall the what's new go? (this PR, the commit message, ...)? Actually it's just the expected behavior, given the documentation

Regarding the test:
I couldn't think of a test that checks the (not anymore existing) error besides just running optics with floating point parameters for min_samples and min_cluster_size and asserting true is it ran ...
Is comparing with an integer parameter example possible?
(thought the epsilon selection and different choices in initialization would make the algorithm and esp. the labeling non-deterministic but bijective.. with more time reading the tests that are there i ll probably figure it out)

Advise is very welcome!

@SomeUserName1 SomeUserName1 force-pushed the optics_convert_int branch 10 times, most recently from 3d0fbf8 to f68dead Compare July 20, 2019 21:52
Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, please try to avoid irrelevant changes, this will make this PR much easier to review.

@qinhanmin2014
Copy link
Member

Where shall the what's new go?

Please add an entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@qinhanmin2014
Copy link
Member

ping we you are ready for another review. please avoid irrelevant changes.

When passing min_samples as a float to optics l439 & 440 execute to bring it into integer ranges, but don't convert to int:
```
    if min_samples <= 1:
        min_samples = max(2, min_samples * n_samples)           # Still a float
```
When instantiating  the NearestNeighbours class with a float it raises due to the float (l448).

Fix:
```
    if min_samples <= 1:
        min_samples = int(round(max(2, min_samples * n_samples)))        # round to get the closest integer
```
the int(...) is for backwards compatibbility to Python 2 where `round: T -> T` with T Number, while Python3 `round: T -> int`
@SomeUserName1
Copy link
Author

Just added the what's new part, ready for review

@SomeUserName1
Copy link
Author

ping

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it !

Copy link
Author

@SomeUserName1 SomeUserName1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrated changes

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, otherwise LGTM.

@rth
Copy link
Member

rth commented Jul 22, 2019

Also please resolve conflicts.

@jnothman
Copy link
Member

@SomeUserName1, are you able to respond to the reviews to complete this work? We would like to include it in 0.21.3 which should be released next week.

@jnothman jnothman mentioned this pull request Jul 24, 2019
@SomeUserName1
Copy link
Author

Have a presentation tomorrow concerning my bachelor's.
I m going to do it over the weekend (think it ll be already finished by Friday).

@jnothman
Copy link
Member

We're going to be releasing 0.21.3 in the coming week, so an update here would be great.

@SomeUserName1
Copy link
Author

Updated

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm

C3 = [[100, 100], [100, 96], [100, 106]]
X = np.vstack((C1, C2, C3))

expected_labels = np.r_[[0] * 3, [1] * 3, [2] * 3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the point is to test the equivalence between float and int values, then you should be comparing the results of different OPTICS ruins to each other rather than compare to a ground truth again...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to introduce a new test for 0.21.3, I think we can make use of existing tests.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that there're still lots of irrelevant changes. I'm going to open a small PR to finish this one because we're going to release very soon. I'll mention the contributor in what's new. Apologies if I make someone unhappy.

``tol`` required too strict types. :pr:`14092` by
:user:`Jérémie du Boisberranger <jeremiedbb>`.

:mod:`sklearn.cluster`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have this section

return size
elif 0 < size <= 1:
return max(2, int(size * n_samples))
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep this function as it is and add int when needed?

C3 = [[100, 100], [100, 96], [100, 106]]
X = np.vstack((C1, C2, C3))

expected_labels = np.r_[[0] * 3, [1] * 3, [2] * 3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to introduce a new test for 0.21.3, I think we can make use of existing tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants