[MRG+1] FIX: warns when invalid n_components in LinearDiscriminantAnalysis #11526

wdevazelhes · 2018-07-15T14:38:25Z

Reference Issues/PRs

Fixes #10048.
Fixes #8956. (The second dimension of scalings will always be thresholded (not only for svd, (see #8956 (comment))))

What does this implement/fix? Explain your changes.

This PR:

Raises a ChangedBehaviourWarning when the user sets n_components > min(n_features, n_classes - 1). In this case it sets the max_features (the number of first components to take) to min(n_features, n_classes - 1) (and this way it doesn't take the n_components into account anymore). It does not throws an error like PCA not to break user code (cf comment: LinearDiscriminantAnalysis doesn't reduce dimensionality during prediction #6355 (comment)). I should maybe provide a FutureWarning and throw an error in the future ?
Changes the docstring, saying that we should have n_components < min(n_features, n_classes - 1) (and not just n_components < n_classes - 1)
Tests that if the condition is verified no warning is thrown otherwise a warning is thrown

I did not check explicitly the dimension (just the presence/absence of warnings) because it can still happen that the dimension is unexpected if input points are colinear. I was thinking to tackle this in another PR (raise a warning in that case and/or return the whole scalings_ (including zeros) without truncation) (see #11528)

TODO:

Add FutureWarning

amueller · 2018-07-15T15:05:26Z

I think future warning sounds good.

amueller

generally looks good, but the boundary case is unclear in the doc and comments.

amueller · 2018-07-15T15:07:03Z

sklearn/discriminant_analysis.py

-    n_components : int, optional
-        Number of components (< n_classes - 1) for dimensionality reduction.
+    n_components : int, optional (default=None)
+        Number of components (< min(n_classes - 1, n_features)) for


<=, right? = is the default after all...

That's right, I forgot about the boundary case

amueller · 2018-07-15T15:08:52Z

sklearn/discriminant_analysis.py

-                                       self.n_components)
+            if self.n_components > max_components:
+                warnings.warn(
+                    "n_components cannot be superior to min(n_features, "


I would say "larger than" not "superior". I'm not a native English speaker and the usage strikes me as odd. Even if it's correct, we have many users that are not native English speakers and will be thrown off by it.

Yes, that was a french inspired mistake :p

amueller · 2018-07-15T15:09:28Z

sklearn/discriminant_analysis.py

+
        if self.n_components is None:
-            self._max_components = len(self.classes_) - 1
+            self._max_components = max_components


I feel the private variable should be called _n_components, not _max_components.

I agree, however maybe it could be justified to call it _max_components for the case where some inputs of LDA would be collinear ? indeed the scalings_ would be truncated and have less components than _n_components (see issue #11528)

amueller · 2018-07-15T15:10:55Z

sklearn/tests/test_discriminant_analysis.py

+@pytest.mark.parametrize('n_features', [3, 5])
+@pytest.mark.parametrize('n_classes', [5, 3])
+def test_lda_dimension_warning(n_classes, n_features):
+    RNG = check_random_state(0)


lowercase please. upper case is reserved for module level constants, right? (yes, X violates that, I know).

that's right, will do

amueller · 2018-07-15T15:11:01Z

sklearn/tests/test_discriminant_analysis.py

+
+    for n_components in [max_components + 1,
+                         max(n_features, n_classes - 1) + 1]:
+        # if n_components < min(n_classes - 1, n_features), raise warning


yes, sorry, typo

amueller · 2018-07-15T15:11:17Z

sklearn/tests/test_discriminant_analysis.py

+    max_components = min(n_features, n_classes - 1)
+
+    for n_components in [max_components - 1, None, max_components]:
+        # if n_components < min(n_classes - 1, n_features), no warning


yes, will do

amueller · 2018-07-15T15:11:52Z

sklearn/tests/test_discriminant_analysis.py

+        assert_no_warnings(lda.fit, X, y)
+
+    for n_components in [max_components + 1,
+                         max(n_features, n_classes - 1) + 1]:


I don't understand the second one?

I am not sure about this one indeed, it's just that since I test something just one unit higher than max_components, I thought I could test something that is higher than both n_features and n_classes - 1 to ensure the test works for any value of n_components

That's good for me. Maybe a small comment that explains this.

wdevazelhes · 2018-07-15T16:47:53Z

I think future warning sounds good.

Alright, I'll include one

- fix doc for boundary case using including unequalities - fix typos - fix style conventions

and fixes test warning assertion

GaelVaroquaux · 2018-07-16T16:51:43Z

sklearn/discriminant_analysis.py

+                    "n_classes - 1) = min(%d, %d - 1) = %d components."
+                    % (X.shape[1], len(self.classes_), max_components),
+                    ChangedBehaviorWarning)
+                future_msg = ("In version 0.22, invalid values for "


Maybe it would be useful to say in a few words what invalid means.

GaelVaroquaux

Aside from the two minor comments that I made, this is good for me.

GaelVaroquaux · 2018-07-16T16:53:01Z

sklearn/tests/test_discriminant_analysis.py

+        assert_no_warnings(lda.fit, X, y)
+
+    for n_components in [max_components + 1,
+                         max(n_features, n_classes - 1) + 1]:


That's good for me. Maybe a small comment that explains this.

…11526 (review)

TomDLT

LGTM, but you have a conflict.

You also need to add a whatsnew entry.

Thanks !

…alysis # Conflicts: # sklearn/discriminant_analysis.py # sklearn/tests/test_discriminant_analysis.py

wdevazelhes · 2018-12-06T14:45:42Z

Sorry for the late reply. I resolved the conflict, added a what's new entry, and changed the FutureWarning from 0.22 to 0.23 . I guess it should be ready to merge

TomDLT · 2018-12-07T15:32:23Z

Thanks !

…ikit-learn#11526)

…ysis (scikit-learn#11526)" This reverts commit 829d7bb.

…ikit-learn#11526)

FIX: warns when invalid n_components in LinearDiscriminantAnalysis

8059aa2

amueller reviewed Jul 15, 2018

View reviewed changes

STY: fix flake8 errors

7237577

wdevazelhes mentioned this pull request Jul 15, 2018

Output dimension consistency between PCA and LinearDiscriminantAnalysis #11528

Open

FIX: fixes according to code review scikit-learn#11526 (review)

a673139

- fix doc for boundary case using including unequalities - fix typos - fix style conventions

wdevazelhes changed the title ~~[MRG] FIX: warns when invalid n_components in LinearDiscriminantAnalysis~~ [WIP] FIX: warns when invalid n_components in LinearDiscriminantAnalysis Jul 15, 2018

ENH: add FutureWarning

f299131

and fixes test warning assertion

wdevazelhes changed the title ~~[WIP] FIX: warns when invalid n_components in LinearDiscriminantAnalysis~~ [MRG] FIX: warns when invalid n_components in LinearDiscriminantAnalysis Jul 15, 2018

GaelVaroquaux reviewed Jul 16, 2018

View reviewed changes

GaelVaroquaux approved these changes Jul 16, 2018

View reviewed changes

GaelVaroquaux changed the title ~~[MRG] FIX: warns when invalid n_components in LinearDiscriminantAnalysis~~ [MRG+1] FIX: warns when invalid n_components in LinearDiscriminantAnalysis Jul 16, 2018

William de Vazelhes added 2 commits July 17, 2018 07:47

MAINT: modify warning and add comments regarding review scikit-learn#…

c0608e8

…11526 (review)

relaunch appveyor

37cf5c7

TomDLT approved these changes Nov 24, 2018

View reviewed changes

William de Vazelhes added 7 commits December 6, 2018 14:13

Merge branch 'master' into fix/warn_n_components_lineardiscriminantan…

94a5bd9

…alysis # Conflicts: # sklearn/discriminant_analysis.py # sklearn/tests/test_discriminant_analysis.py

MAINT: Update FeatureWarning from 0.22 to 0.23

8f484d1

DOC: Add what's new entry

d452a27

MAINT: remove useless comment

c9d149c

MAINT: remove unused import

5a45231

FIX: add missing import

b9f4972

FIX: update FutureWarning in tests too

f590dd1

Update v0.21.rst

dda2190

TomDLT merged commit 6b1d8e5 into scikit-learn:master Dec 7, 2018

wdevazelhes mentioned this pull request Dec 12, 2018

[MRG+2] Neighborhood Components Analysis #10058

Merged

9 tasks

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Jan 7, 2019

FIX warns when invalid n_components in LinearDiscriminantAnalysis (sc…

55ec997

…ikit-learn#11526)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FIX warns when invalid n_components in LinearDiscriminantAnalysis (sc…

829d7bb

…ikit-learn#11526)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX warns when invalid n_components in LinearDiscriminantAnal…

bef416d

…ysis (scikit-learn#11526)" This reverts commit 829d7bb.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX warns when invalid n_components in LinearDiscriminantAnal…

b8186a2

…ysis (scikit-learn#11526)" This reverts commit 829d7bb.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX warns when invalid n_components in LinearDiscriminantAnalysis (sc…

640c9f6

…ikit-learn#11526)

Uh oh!

[MRG+1] FIX: warns when invalid n_components in LinearDiscriminantAnalysis #11526

[MRG+1] FIX: warns when invalid n_components in LinearDiscriminantAnalysis #11526

Uh oh!

Conversation

wdevazelhes commented Jul 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

TODO:

Uh oh!

amueller commented Jul 15, 2018

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented Jul 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomDLT left a comment

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented Dec 6, 2018

Uh oh!

TomDLT commented Dec 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wdevazelhes commented Jul 15, 2018 •

edited

Loading