-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+2] make_circles() now works with odd number of samples, test added #10045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] make_circles() now works with odd number of samples, test added #10045
Conversation
…ther generated points lie on the expected circles similar to test_make_moons()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
inner_circ_y = outer_circ_y * factor | ||
# so as not to have the first point = last point, we set endpoint=False | ||
linspace_out = np.linspace(0, 2 * np.pi, n_samples_out, endpoint=False) | ||
linspace_in = np.linspace(0, 2 * np.pi, n_samples_in, endpoint=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why endpoint=True
in the second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I feel like I thought it might have been necessary, but I cannot think of a proper reason right now. I'll chnage it to endpoint=False
. Thanks.
|
||
def test_make_circles(): | ||
f = 0.3 | ||
X, y = make_circles(7, shuffle=False, noise=None, factor=f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
It might be useful to test with both n_samples
odd and even, to make sure this change doesn't break the previous behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice!
I'll also add an additional test then, which tests whether the samples were correctly distributed across both classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from not testing the even case, I think the current test looks fine...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the PEP8 error to pass the test :)
@@ -613,20 +613,23 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None, | |||
|
|||
if factor > 1 or factor < 0: | |||
raise ValueError("'factor' has to be between 0 and 1.") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./sklearn/datasets/samples_generator.py:616:1: W293 blank line contains whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
It seems PyCharm is not the solution to every PEP8-related issue. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add another test, but looks good either way.
@@ -614,19 +614,22 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None, | |||
if factor > 1 or factor < 0: | |||
raise ValueError("'factor' has to be between 0 and 1.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to add a test for this line, too? ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually never understood, why a factor > 1
raises a ValueError
. All it would to is turn the inner circle into the outer one, allowing me to generate the samples more intuitively. Compare "the one circle's radius is 0.142 of the other one's" vs. "Bigger circler is 7 times the size of the smaller one".
It would remove the dataset's property to be (roughly, due to noise) within a [-1, 1]² box, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think leave it as is. Test if you will
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christianbraune79
I think 0 < factor < 1
might be enough. From my perspective, you can use factor to specify how much bigger is the big circle and apply operations on X to specify the position of the big circle (and the small circle). For the example you mentioned, you might use X, y = make_circle(factor=1/7); X=7*X
to achieve your goal. Even if you allow factor > 1
, you still can't meet the need of many situations, because the center of two circles are fixed to (0,0) and the radius of one circle is fixed to 1.
Also, as the core dev suggested, it will be better if you can add a test to check the error is raised.
outer_circ_y = np.sin(linspace) | ||
inner_circ_x = outer_circ_x * factor | ||
inner_circ_y = outer_circ_y * factor | ||
# so as not to have the first point = last point, we set endpoint=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're so much smarter than me....
assert_equal(X.shape, (10, 2), "X shape mismatch") | ||
assert_equal(y.shape, (10,), "y shape mismatch") | ||
|
||
assert_equal(X[y == 0].shape, (5, 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want you could add a test that it's 3 and 4 for the test above? maybe that it's at most one different from 7 // 2? but fine either way.
Please add an entry to the change log at |
Added a test to check if really only factors in (0, 1) (excluding borders) are accepted Adjusted factor check (1.0 was accepted before, though doc said otherwise)
Codecov Report
@@ Coverage Diff @@
## master #10045 +/- ##
==========================================
+ Coverage 96.19% 96.19% +<.01%
==========================================
Files 336 336
Lines 62725 62743 +18
==========================================
+ Hits 60336 60354 +18
Misses 2389 2389
Continue to review full report at Codecov.
|
…g was already mentioned in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for something to confirm from core devs. Also wondering if it will be helpful to explicitly state that the center of the two circles is fixed to (0,0) and the radius of the outer circle is 1.
@@ -611,22 +612,25 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None, | |||
The integer labels (0 or 1) for class membership of each sample. | |||
""" | |||
|
|||
if factor > 1 or factor < 0: | |||
if factor >= 1 or factor < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm with you. But this latest source code change is not reviewed by anyone else. So need to confirm with core devs to see whether we need factor=1 in extreme case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. If it was a typo in the docs then I'll revert the change and adjust the documentation. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
doc/whats_new/_contributors.rst
Outdated
@@ -141,3 +141,5 @@ | |||
.. _Neeraj Gangwar: http://neerajgangwar.in | |||
|
|||
.. _Arthur Mensch: https://amensch.fr | |||
|
|||
.. _Christian Braune: https://github.com/christianbraune79 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused and unnecessary. These says we try to only use this format for core devs...
@@ -611,22 +612,25 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None, | |||
The integer labels (0 or 1) for class membership of each sample. | |||
""" | |||
|
|||
if factor > 1 or factor < 0: | |||
if factor >= 1 or factor < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this is fine.
|
||
def test_make_circles(): | ||
f = 0.3 | ||
X, y = make_circles(7, shuffle=False, noise=None, factor=f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from not testing the even case, I think the current test looks fine...?
@christianbraune79 Could you please take some time to finish this, it's very close form merge. |
@qinhanmin2014 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 nitpicks more :)
|
||
|
||
def test_make_circles(): | ||
f = 0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call this factor
instead of f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can even affect 0.3 to factor inside the function itself.
def test_make_circles(): | ||
f = 0.3 | ||
|
||
# Testing odd and even case, because in the past make_circles always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put this comment just below the def
@christianbraune79 Thanks for the issue and the PR :) |
Reference Issues/PRs
Fixes #10037 and adds corresponding tests
What does this implement/fix? Explain your changes.
Fixes he faulty behaviour of
make_circles
whenn_samples
is an odd number.Adds test
test_make_circles
todatasets/tests/test_samples_generator.py
.Any other comments?