Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+2] make_circles() now works with odd number of samples, test added #10045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 11, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,10 @@ Decomposition, manifold learning and clustering
wrapped estimator and its parameter. :issue:`9999` by :user:`Marcus Voss
<marcus-voss>` and `Joel Nothman`_.

- Fixed a bug in :func:`datasets.make_circles`, where no odd number of data
points could be generated. :issue:`10037` by :user:`Christian Braune
<christianbraune79>`_.

Metrics

- Fixed a bug due to floating point error in :func:`metrics.roc_auc_score` with
Expand Down
28 changes: 16 additions & 12 deletions sklearn/datasets/samples_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -585,7 +585,8 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
Parameters
----------
n_samples : int, optional (default=100)
The total number of points generated.
The total number of points generated. If odd, the inner circle will
have one point more than the outer circle.

shuffle : bool, optional (default=True)
Whether to shuffle the samples.
Expand All @@ -599,7 +600,7 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
If None, the random number generator is the RandomState instance used
by `np.random`.

factor : double < 1 (default=.8)
factor : 0 < double < 1 (default=.8)
Scale factor between inner and outer circle.

Returns
Expand All @@ -611,22 +612,25 @@ def make_circles(n_samples=100, shuffle=True, noise=None, random_state=None,
The integer labels (0 or 1) for class membership of each sample.
"""

if factor > 1 or factor < 0:
if factor >= 1 or factor < 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm with you. But this latest source code change is not reviewed by anyone else. So need to confirm with core devs to see whether we need factor=1 in extreme case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. If it was a typo in the docs then I'll revert the change and adjust the documentation. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this is fine.

raise ValueError("'factor' has to be between 0 and 1.")

n_samples_out = n_samples // 2
n_samples_in = n_samples - n_samples_out

generator = check_random_state(random_state)
# so as not to have the first point = last point, we add one and then
# remove it.
linspace = np.linspace(0, 2 * np.pi, n_samples // 2 + 1)[:-1]
outer_circ_x = np.cos(linspace)
outer_circ_y = np.sin(linspace)
inner_circ_x = outer_circ_x * factor
inner_circ_y = outer_circ_y * factor
# so as not to have the first point = last point, we set endpoint=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're so much smarter than me....

linspace_out = np.linspace(0, 2 * np.pi, n_samples_out, endpoint=False)
linspace_in = np.linspace(0, 2 * np.pi, n_samples_in, endpoint=False)
outer_circ_x = np.cos(linspace_out)
outer_circ_y = np.sin(linspace_out)
inner_circ_x = np.cos(linspace_in) * factor
inner_circ_y = np.sin(linspace_in) * factor

X = np.vstack((np.append(outer_circ_x, inner_circ_x),
np.append(outer_circ_y, inner_circ_y))).T
y = np.hstack([np.zeros(n_samples // 2, dtype=np.intp),
np.ones(n_samples // 2, dtype=np.intp)])
y = np.hstack([np.zeros(n_samples_out, dtype=np.intp),
np.ones(n_samples_in, dtype=np.intp)])
if shuffle:
X, y = util_shuffle(X, y, random_state=generator)

Expand Down
27 changes: 27 additions & 0 deletions sklearn/datasets/tests/test_samples_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from sklearn.datasets import make_friedman3
from sklearn.datasets import make_low_rank_matrix
from sklearn.datasets import make_moons
from sklearn.datasets import make_circles
from sklearn.datasets import make_sparse_coded_signal
from sklearn.datasets import make_sparse_uncorrelated
from sklearn.datasets import make_spd_matrix
Expand Down Expand Up @@ -385,3 +386,29 @@ def test_make_moons():
dist_sqr = ((x - center) ** 2).sum()
assert_almost_equal(dist_sqr, 1.0,
err_msg="Point is not on expected unit circle")


def test_make_circles():
factor = 0.3

for (n_samples, n_outer, n_inner) in [(7, 3, 4), (8, 4, 4)]:
# Testing odd and even case, because in the past make_circles always
# created an even number of samples.
X, y = make_circles(n_samples, shuffle=False, noise=None,
factor=factor)
assert_equal(X.shape, (n_samples, 2), "X shape mismatch")
assert_equal(y.shape, (n_samples,), "y shape mismatch")
center = [0.0, 0.0]
for x, label in zip(X, y):
dist_sqr = ((x - center) ** 2).sum()
dist_exp = 1.0 if label == 0 else factor**2
assert_almost_equal(dist_sqr, dist_exp,
err_msg="Point is not on expected circle")

assert_equal(X[y == 0].shape, (n_outer, 2),
"Samples not correctly distributed across circles.")
assert_equal(X[y == 1].shape, (n_inner, 2),
"Samples not correctly distributed across circles.")

assert_raises(ValueError, make_circles, factor=-0.01)
assert_raises(ValueError, make_circles, factor=1.)