Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+2] make_circles() now works with odd number of samples, test added #10045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 11, 2017

Conversation

christianbraune79
Copy link
Contributor

Reference Issues/PRs

Fixes #10037 and adds corresponding tests

What does this implement/fix? Explain your changes.

Fixes he faulty behaviour of make_circles when n_samples is an odd number.
Adds test test_make_circles to datasets/tests/test_samples_generator.py.

Any other comments?

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

inner_circ_y = outer_circ_y * factor
# so as not to have the first point = last point, we set endpoint=False
linspace_out = np.linspace(0, 2 * np.pi, n_samples_out, endpoint=False)
linspace_in = np.linspace(0, 2 * np.pi, n_samples_in, endpoint=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why endpoint=True in the second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I feel like I thought it might have been necessary, but I cannot think of a proper reason right now. I'll chnage it to endpoint=False. Thanks.


def test_make_circles():
f = 0.3
X, y = make_circles(7, shuffle=False, noise=None, factor=f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

It might be useful to test with both n_samples odd and even, to make sure this change doesn't break the previous behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice!
I'll also add an additional test then, which tests whether the samples were correctly distributed across both classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from not testing the even case, I think the current test looks fine...?

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the PEP8 error to pass the test :)

@@ -613,20 +613,23 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,

if factor > 1 or factor < 0:
raise ValueError("'factor' has to be between 0 and 1.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./sklearn/datasets/samples_generator.py:616:1: W293 blank line contains whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
It seems PyCharm is not the solution to every PEP8-related issue. :/

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add another test, but looks good either way.

@@ -614,19 +614,22 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
if factor > 1 or factor < 0:
raise ValueError("'factor' has to be between 0 and 1.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add a test for this line, too? ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually never understood, why a factor > 1 raises a ValueError. All it would to is turn the inner circle into the outer one, allowing me to generate the samples more intuitively. Compare "the one circle's radius is 0.142 of the other one's" vs. "Bigger circler is 7 times the size of the smaller one".

It would remove the dataset's property to be (roughly, due to noise) within a [-1, 1]² box, though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leave it as is. Test if you will

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christianbraune79
I think 0 < factor < 1 might be enough. From my perspective, you can use factor to specify how much bigger is the big circle and apply operations on X to specify the position of the big circle (and the small circle). For the example you mentioned, you might use X, y = make_circle(factor=1/7); X=7*X to achieve your goal. Even if you allow factor > 1, you still can't meet the need of many situations, because the center of two circles are fixed to (0,0) and the radius of one circle is fixed to 1.
Also, as the core dev suggested, it will be better if you can add a test to check the error is raised.

outer_circ_y = np.sin(linspace)
inner_circ_x = outer_circ_x * factor
inner_circ_y = outer_circ_y * factor
# so as not to have the first point = last point, we set endpoint=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're so much smarter than me....

assert_equal(X.shape, (10, 2), "X shape mismatch")
assert_equal(y.shape, (10,), "y shape mismatch")

assert_equal(X[y == 0].shape, (5, 2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want you could add a test that it's 3 and 4 for the test above? maybe that it's at most one different from 7 // 2? but fine either way.

@amueller amueller changed the title [MRG] make_circles() now works with odd number of samples, test added [MRG + 1] make_circles() now works with odd number of samples, test added Oct 31, 2017
@jnothman
Copy link
Member

jnothman commented Nov 1, 2017

Please add an entry to the change log at doc/whats_new.

Added a test to check if really only factors in (0, 1) (excluding borders) are accepted
Adjusted factor check (1.0 was accepted before, though doc said otherwise)
@codecov
Copy link

codecov bot commented Nov 3, 2017

Codecov Report

Merging #10045 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10045      +/-   ##
==========================================
+ Coverage   96.19%   96.19%   +<.01%     
==========================================
  Files         336      336              
  Lines       62725    62743      +18     
==========================================
+ Hits        60336    60354      +18     
  Misses       2389     2389
Impacted Files Coverage Δ
sklearn/datasets/tests/test_samples_generator.py 100% <100%> (ø) ⬆️
sklearn/datasets/samples_generator.py 93.42% <100%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9172a59...8a86067. Read the comment docs.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for something to confirm from core devs. Also wondering if it will be helpful to explicitly state that the center of the two circles is fixed to (0,0) and the radius of the outer circle is 1.

@@ -611,22 +612,25 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
The integer labels (0 or 1) for class membership of each sample.
"""

if factor > 1 or factor < 0:
if factor >= 1 or factor < 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm with you. But this latest source code change is not reviewed by anyone else. So need to confirm with core devs to see whether we need factor=1 in extreme case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. If it was a typo in the docs then I'll revert the change and adjust the documentation. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this is fine.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@@ -141,3 +141,5 @@
.. _Neeraj Gangwar: http://neerajgangwar.in

.. _Arthur Mensch: https://amensch.fr

.. _Christian Braune: https://github.com/christianbraune79
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused and unnecessary. These says we try to only use this format for core devs...

@@ -611,22 +612,25 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
The integer labels (0 or 1) for class membership of each sample.
"""

if factor > 1 or factor < 0:
if factor >= 1 or factor < 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this is fine.


def test_make_circles():
f = 0.3
X, y = make_circles(7, shuffle=False, noise=None, factor=f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from not testing the even case, I think the current test looks fine...?

@jnothman jnothman changed the title [MRG + 1] make_circles() now works with odd number of samples, test added [MRG+2] make_circles() now works with odd number of samples, test added Nov 4, 2017
@qinhanmin2014
Copy link
Member

@christianbraune79 Could you please take some time to finish this, it's very close form merge.
(1) please remove your name in _contributors.rst, that's for core devs
(2) please extend the test to even case, you might consider a loop to avoid duplicate code. Also consider to add a comment to explain that we are testing both odd case and even case here.

@christianbraune79
Copy link
Contributor Author

@qinhanmin2014
Done. Sorry for (1) taking so long to respond (family stuff) and (2) putting myself in the contributors list. At least now I learned, what it is for. :)

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 nitpicks more :)



def test_make_circles():
f = 0.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call this factor instead of f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can even affect 0.3 to factor inside the function itself.

def test_make_circles():
f = 0.3

# Testing odd and even case, because in the past make_circles always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put this comment just below the def

@qinhanmin2014
Copy link
Member

@jnothman @amueller Could you please give a final check? I think it is OK for merge. Thanks :)

@jnothman jnothman merged commit 4bead39 into scikit-learn:master Nov 11, 2017
@qinhanmin2014
Copy link
Member

@christianbraune79 Thanks for the issue and the PR :)

@christianbraune79 christianbraune79 deleted the make_circles branch November 11, 2017 18:06
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make_circles always generates an even number of samples
8 participants