[MRG+2] make_circles() now works with odd number of samples, test added #10045

christianbraune79 · 2017-10-31T10:00:44Z

Reference Issues/PRs

Fixes #10037 and adds corresponding tests

What does this implement/fix? Explain your changes.

Fixes he faulty behaviour of make_circles when n_samples is an odd number.
Adds test test_make_circles to datasets/tests/test_samples_generator.py.

Any other comments?

…ther generated points lie on the expected circles similar to test_make_moons()

…cikit-learn into make_circles

TomDLT

LGTM

TomDLT · 2017-10-31T13:09:06Z

sklearn/datasets/samples_generator.py

-    inner_circ_y = outer_circ_y * factor
+    # so as not to have the first point = last point, we set endpoint=False
+    linspace_out = np.linspace(0, 2 * np.pi, n_samples_out, endpoint=False)
+    linspace_in = np.linspace(0, 2 * np.pi, n_samples_in, endpoint=True)


Why endpoint=True in the second?

Thanks. I feel like I thought it might have been necessary, but I cannot think of a proper reason right now. I'll chnage it to endpoint=False. Thanks.

rth · 2017-10-31T13:25:01Z

sklearn/datasets/tests/test_samples_generator.py

+
+def test_make_circles():
+    f = 0.3
+    X, y = make_circles(7, shuffle=False, noise=None, factor=f)


Thank you for the PR!

It might be useful to test with both n_samples odd and even, to make sure this change doesn't break the previous behaviour.

Thanks for the advice!
I'll also add an additional test then, which tests whether the samples were correctly distributed across both classes.

Apart from not testing the even case, I think the current test looks fine...?

qinhanmin2014

Please fix the PEP8 error to pass the test :)

qinhanmin2014 · 2017-10-31T13:54:34Z

sklearn/datasets/samples_generator.py

@@ -613,20 +613,23 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,

    if factor > 1 or factor < 0:
        raise ValueError("'factor' has to be between 0 and 1.")
+


./sklearn/datasets/samples_generator.py:616:1: W293 blank line contains whitespace

Thanks.
It seems PyCharm is not the solution to every PEP8-related issue. :/

amueller

you could add another test, but looks good either way.

amueller · 2017-10-31T20:28:11Z

sklearn/datasets/samples_generator.py

@@ -614,19 +614,22 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
    if factor > 1 or factor < 0:
        raise ValueError("'factor' has to be between 0 and 1.")


do you want to add a test for this line, too? ;)

I actually never understood, why a factor > 1 raises a ValueError. All it would to is turn the inner circle into the outer one, allowing me to generate the samples more intuitively. Compare "the one circle's radius is 0.142 of the other one's" vs. "Bigger circler is 7 times the size of the smaller one".

It would remove the dataset's property to be (roughly, due to noise) within a [-1, 1]² box, though...

I think leave it as is. Test if you will

@christianbraune79
I think 0 < factor < 1 might be enough. From my perspective, you can use factor to specify how much bigger is the big circle and apply operations on X to specify the position of the big circle (and the small circle). For the example you mentioned, you might use X, y = make_circle(factor=1/7); X=7*X to achieve your goal. Even if you allow factor > 1, you still can't meet the need of many situations, because the center of two circles are fixed to (0,0) and the radius of one circle is fixed to 1.
Also, as the core dev suggested, it will be better if you can add a test to check the error is raised.

amueller · 2017-10-31T20:28:41Z

sklearn/datasets/samples_generator.py

-    outer_circ_y = np.sin(linspace)
-    inner_circ_x = outer_circ_x * factor
-    inner_circ_y = outer_circ_y * factor
+    # so as not to have the first point = last point, we set endpoint=False


You're so much smarter than me....

amueller · 2017-10-31T20:30:53Z

sklearn/datasets/tests/test_samples_generator.py

+    assert_equal(X.shape, (10, 2), "X shape mismatch")
+    assert_equal(y.shape, (10,), "y shape mismatch")
+
+    assert_equal(X[y == 0].shape, (5, 2),


If you want you could add a test that it's 3 and 4 for the test above? maybe that it's at most one different from 7 // 2? but fine either way.

jnothman · 2017-11-01T06:55:23Z

Please add an entry to the change log at doc/whats_new.

Added a test to check if really only factors in (0, 1) (excluding borders) are accepted Adjusted factor check (1.0 was accepted before, though doc said otherwise)

codecov · 2017-11-03T04:44:40Z

Codecov Report

Merging #10045 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #10045      +/-   ##
==========================================
+ Coverage   96.19%   96.19%   +<.01%     
==========================================
  Files         336      336              
  Lines       62725    62743      +18     
==========================================
+ Hits        60336    60354      +18     
  Misses       2389     2389

Impacted Files	Coverage Δ
sklearn/datasets/tests/test_samples_generator.py	`100% <100%> (ø)`	⬆️
sklearn/datasets/samples_generator.py	`93.42% <100%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9172a59...8a86067. Read the comment docs.

…g was already mentioned in there.

qinhanmin2014

LGTM except for something to confirm from core devs. Also wondering if it will be helpful to explicitly state that the center of the two circles is fixed to (0,0) and the radius of the outer circle is 1.

qinhanmin2014 · 2017-11-03T14:34:09Z

sklearn/datasets/samples_generator.py

@@ -611,22 +612,25 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
        The integer labels (0 or 1) for class membership of each sample.
    """

-    if factor > 1 or factor < 0:
+    if factor >= 1 or factor < 0:


Personally, I'm with you. But this latest source code change is not reviewed by anyone else. So need to confirm with core devs to see whether we need factor=1 in extreme case.

Yes, definitely. If it was a typo in the docs then I'll revert the change and adjust the documentation. :)

I this is fine.

jnothman

Otherwise LGTM

jnothman · 2017-11-04T11:02:25Z

doc/whats_new/_contributors.rst

@@ -141,3 +141,5 @@
 .. _Neeraj Gangwar: http://neerajgangwar.in

 .. _Arthur Mensch: https://amensch.fr
+
+.. _Christian Braune: https://github.com/christianbraune79


This is unused and unnecessary. These says we try to only use this format for core devs...

jnothman · 2017-11-04T11:03:50Z

sklearn/datasets/samples_generator.py

@@ -611,22 +612,25 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
        The integer labels (0 or 1) for class membership of each sample.
    """

-    if factor > 1 or factor < 0:
+    if factor >= 1 or factor < 0:


I this is fine.

jnothman · 2017-11-04T11:06:19Z

sklearn/datasets/tests/test_samples_generator.py

+
+def test_make_circles():
+    f = 0.3
+    X, y = make_circles(7, shuffle=False, noise=None, factor=f)


Apart from not testing the even case, I think the current test looks fine...?

qinhanmin2014 · 2017-11-08T00:57:00Z

@christianbraune79 Could you please take some time to finish this, it's very close form merge.
(1) please remove your name in _contributors.rst, that's for core devs
(2) please extend the test to even case, you might consider a loop to avoid duplicate code. Also consider to add a comment to explain that we are testing both odd case and even case here.

christianbraune79 · 2017-11-09T19:33:15Z

@qinhanmin2014
Done. Sorry for (1) taking so long to respond (family stuff) and (2) putting myself in the contributors list. At least now I learned, what it is for. :)

glemaitre

3 nitpicks more :)

glemaitre · 2017-11-10T00:21:54Z

sklearn/datasets/tests/test_samples_generator.py

+
+
+def test_make_circles():
+    f = 0.3


can you call this factor instead of f

or you can even affect 0.3 to factor inside the function itself.

glemaitre · 2017-11-10T00:22:09Z

sklearn/datasets/tests/test_samples_generator.py

+def test_make_circles():
+    f = 0.3
+
+    # Testing odd and even case, because in the past make_circles always


you can put this comment just below the def

qinhanmin2014 · 2017-11-11T00:13:13Z

@jnothman @amueller Could you please give a final check? I think it is OK for merge. Thanks :)

qinhanmin2014 · 2017-11-11T12:33:28Z

@christianbraune79 Thanks for the issue and the PR :)

…cikit-learn#10045)

cbrauneovgude added 4 commits October 29, 2017 10:22

fixes scikit-learn#10037

7576278

tests for scikit-learn#10037, tests for odd number of samples and whe…

effc6c2

…ther generated points lie on the expected circles similar to test_make_moons()

Merge branch 'make_circles' of https://github.com/christianbraune79/s…

0a101a1

…cikit-learn into make_circles

nasty doubled lines of code removed

18ee5d8

TomDLT approved these changes Oct 31, 2017

View reviewed changes

rth reviewed Oct 31, 2017

View reviewed changes

qinhanmin2014 reviewed Oct 31, 2017

View reviewed changes

cbrauneovgude added 2 commits October 31, 2017 16:05

changes according to comments in PR

32b2ffb

assert_equal obviously has a different signature.

4c99154

amueller approved these changes Oct 31, 2017

View reviewed changes

amueller changed the title ~~[MRG] make_circles() now works with odd number of samples, test added~~ [MRG + 1] make_circles() now works with odd number of samples, test added Oct 31, 2017

Adjusted documentation for make_circles

8a86067

Added a test to check if really only factors in (0, 1) (excluding borders) are accepted Adjusted factor check (1.0 was accepted before, though doc said otherwise)

added entry under "Decomposition, ..." as another datasets-related bu…

9ce24e4

…g was already mentioned in there.

qinhanmin2014 approved these changes Nov 3, 2017

View reviewed changes

jnothman approved these changes Nov 4, 2017

View reviewed changes

jnothman changed the title ~~[MRG + 1] make_circles() now works with odd number of samples, test added~~ [MRG+2] make_circles() now works with odd number of samples, test added Nov 4, 2017

cbrauneovgude added 3 commits November 9, 2017 20:21

removed wrong entry

0f1321a

all tests for odd and even case

ad1af5d

added final comment

ad07946

'107 > 79' fixed

6126a5c

glemaitre reviewed Nov 10, 2017

View reviewed changes

cbrauneovgude added 2 commits November 10, 2017 21:25

refactoring f into factor

2a62edc

pep8

e19cecc

jnothman approved these changes Nov 11, 2017

View reviewed changes

jnothman merged commit 4bead39 into scikit-learn:master Nov 11, 2017

christianbraune79 deleted the make_circles branch November 11, 2017 18:06

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

FIX make_circles() now works with odd number of samples, test added (s…

85be5c6

…cikit-learn#10045)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

FIX make_circles() now works with odd number of samples, test added (s…

d2abf6b

…cikit-learn#10045)

		@@ -613,20 +613,23 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,

		if factor > 1 or factor < 0:
		raise ValueError("'factor' has to be between 0 and 1.")

		@@ -614,19 +614,22 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
		if factor > 1 or factor < 0:
		raise ValueError("'factor' has to be between 0 and 1.")

Uh oh!

[MRG+2] make_circles() now works with odd number of samples, test added #10045

[MRG+2] make_circles() now works with odd number of samples, test added #10045

Uh oh!

Conversation

christianbraune79 commented Oct 31, 2017

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

TomDLT left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Nov 1, 2017

Uh oh!

codecov bot commented Nov 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Nov 8, 2017

Uh oh!

christianbraune79 commented Nov 9, 2017

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 3, 2017 •

edited

Loading