Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] FIX/TST boundary cases in dbscan #4073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jnothman
Copy link
Member

#3994 handled the min_samples boundary case differently to the prior DBSCAN implementation. This is now clarified in the docs. Unfortunately, when properly testing boundary cases, I found the inconsistency reported at #4072. I fix it here for 'brute' search without tests, pending a complete patch for #4072.

@@ -174,6 +175,17 @@ def test_pickle():
assert_equal(type(pickle.loads(s)), obj.__class__)


def test_bounaries():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: boundaries

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@amueller
Copy link
Member

amueller commented Mar 3, 2015

This should now be testable, right?

@ogrisel ogrisel changed the title [MRG pending #4072] FIX/TST boundary cases in dbscan [MRG] FIX/TST boundary cases in dbscan Mar 4, 2015
@jnothman
Copy link
Member Author

jnothman commented Mar 4, 2015

This should now be testable, right?

Rather the tests were written elsewhere with a different patch.

This is now rebased and ready for review.

@amueller
Copy link
Member

amueller commented Mar 4, 2015

the case with no core samples fails...

@jnothman
Copy link
Member Author

jnothman commented Mar 5, 2015

Of course I reviewed #4052, but what basis did we have for thinking X = rng.rand(40, 10); X[X < 8] = 0 would generate data without core samples for eps=.5, min_samples=5? I get:

>>> np.bincount(pairwise_distances(X) <= .5)
[ 0 18 10  3  4  5]

I've made that test more certain.

@amueller
Copy link
Member

amueller commented Mar 5, 2015

Sorry, that was a hacky test. It probably came from some example that was failing at the time.

core, _ = dbscan([[0], [1]], eps=2, min_samples=2)
assert_in(0, core)
# ensure eps is inclusive of circumference
core, _ = dbscan([[0], [1], [1]], eps=1, min_samples=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a stupid question but why do you need [1] twice?

@jnothman jnothman closed this in 15c9c0f Mar 5, 2015
alexsavio pushed a commit to alexsavio/scikit-learn that referenced this pull request Mar 9, 2015
rasbt pushed a commit to rasbt/scikit-learn that referenced this pull request Apr 6, 2015
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 11, 2015
* tag '0.16b1': (1589 commits)
  0.16.X branching, version 0.16b1
  Fix scikit-learn#4351. Rendering of docs in MinMaxScaler.
  Fix rebase conflict
  MAINT use canonical PEP-440 dev version consistently
  Adding fix for issue scikit-learn#4297, isotonic infinite loop
  DOC deprecate random_state for DBSCAN
  FIX/TST boundary cases in dbscan (closes scikit-learn#4073)
  Do not shuffle in DBSCAN (warn if `random_state` is used).
  Update docstring predict_proba()
  Update documentation of predict_proba in tree module
  add scipy2013 tutorial links to presentations on website.
  TST boundary handling in LSHForest.radius_neighbors
  ENH improve docstrings and test for radius_neighbors models
  use a pipeline for pre-processing feature selection, as per best practise
  DOC remove unnecessary backticks in CONTRIBUTING.
  ENH no need for tie breaking jitter in calibration
  Implement "secondary" tie strategy in isotonic.
  Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184
  MAINT fix typo pyagm -> pygamg in SkipTest
  STYLE trailing spaces
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 11, 2015
* releases: (1589 commits)
  0.16.X branching, version 0.16b1
  Fix scikit-learn#4351. Rendering of docs in MinMaxScaler.
  Fix rebase conflict
  MAINT use canonical PEP-440 dev version consistently
  Adding fix for issue scikit-learn#4297, isotonic infinite loop
  DOC deprecate random_state for DBSCAN
  FIX/TST boundary cases in dbscan (closes scikit-learn#4073)
  Do not shuffle in DBSCAN (warn if `random_state` is used).
  Update docstring predict_proba()
  Update documentation of predict_proba in tree module
  add scipy2013 tutorial links to presentations on website.
  TST boundary handling in LSHForest.radius_neighbors
  ENH improve docstrings and test for radius_neighbors models
  use a pipeline for pre-processing feature selection, as per best practise
  DOC remove unnecessary backticks in CONTRIBUTING.
  ENH no need for tie breaking jitter in calibration
  Implement "secondary" tie strategy in isotonic.
  Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184
  MAINT fix typo pyagm -> pygamg in SkipTest
  STYLE trailing spaces
  ...

Conflicts:
	sklearn/externals/joblib/__init__.py
	sklearn/externals/joblib/numpy_pickle.py
	sklearn/externals/joblib/parallel.py
	sklearn/externals/joblib/pool.py
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 11, 2015
* dfsg: (1589 commits)
  0.16.X branching, version 0.16b1
  Fix scikit-learn#4351. Rendering of docs in MinMaxScaler.
  Fix rebase conflict
  MAINT use canonical PEP-440 dev version consistently
  Adding fix for issue scikit-learn#4297, isotonic infinite loop
  DOC deprecate random_state for DBSCAN
  FIX/TST boundary cases in dbscan (closes scikit-learn#4073)
  Do not shuffle in DBSCAN (warn if `random_state` is used).
  Update docstring predict_proba()
  Update documentation of predict_proba in tree module
  add scipy2013 tutorial links to presentations on website.
  TST boundary handling in LSHForest.radius_neighbors
  ENH improve docstrings and test for radius_neighbors models
  use a pipeline for pre-processing feature selection, as per best practise
  DOC remove unnecessary backticks in CONTRIBUTING.
  ENH no need for tie breaking jitter in calibration
  Implement "secondary" tie strategy in isotonic.
  Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184
  MAINT fix typo pyagm -> pygamg in SkipTest
  STYLE trailing spaces
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants