Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@wdevazelhes
Copy link
Contributor

@wdevazelhes wdevazelhes commented Jul 15, 2018

Reference Issues/PRs

Fixes #10048.
Fixes #8956. (The second dimension of scalings will always be thresholded (not only for svd, (see #8956 (comment))))

What does this implement/fix? Explain your changes.

This PR:

  • Raises a ChangedBehaviourWarning when the user sets n_components > min(n_features, n_classes - 1). In this case it sets the max_features (the number of first components to take) to min(n_features, n_classes - 1) (and this way it doesn't take the n_components into account anymore). It does not throws an error like PCA not to break user code (cf comment: LinearDiscriminantAnalysis doesn't reduce dimensionality during prediction #6355 (comment)). I should maybe provide a FutureWarning and throw an error in the future ?
  • Changes the docstring, saying that we should have n_components < min(n_features, n_classes - 1) (and not just n_components < n_classes - 1)
  • Tests that if the condition is verified no warning is thrown otherwise a warning is thrown

I did not check explicitly the dimension (just the presence/absence of warnings) because it can still happen that the dimension is unexpected if input points are colinear. I was thinking to tackle this in another PR (raise a warning in that case and/or return the whole scalings_ (including zeros) without truncation) (see #11528)

TODO:

  • Add FutureWarning

@amueller
Copy link
Member

I think future warning sounds good.

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, but the boundary case is unclear in the doc and comments.

n_components : int, optional
Number of components (< n_classes - 1) for dimensionality reduction.
n_components : int, optional (default=None)
Number of components (< min(n_classes - 1, n_features)) for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<=, right? = is the default after all...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I forgot about the boundary case

self.n_components)
if self.n_components > max_components:
warnings.warn(
"n_components cannot be superior to min(n_features, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "larger than" not "superior". I'm not a native English speaker and the usage strikes me as odd. Even if it's correct, we have many users that are not native English speakers and will be thrown off by it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was a french inspired mistake :p


if self.n_components is None:
self._max_components = len(self.classes_) - 1
self._max_components = max_components
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the private variable should be called _n_components, not _max_components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, however maybe it could be justified to call it _max_components for the case where some inputs of LDA would be collinear ? indeed the scalings_ would be truncated and have less components than _n_components (see issue #11528)

@pytest.mark.parametrize('n_features', [3, 5])
@pytest.mark.parametrize('n_classes', [5, 3])
def test_lda_dimension_warning(n_classes, n_features):
RNG = check_random_state(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase please. upper case is reserved for module level constants, right? (yes, X violates that, I know).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right, will do


for n_components in [max_components + 1,
max(n_features, n_classes - 1) + 1]:
# if n_components < min(n_classes - 1, n_features), raise warning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, typo

max_components = min(n_features, n_classes - 1)

for n_components in [max_components - 1, None, max_components]:
# if n_components < min(n_classes - 1, n_features), no warning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<=?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do

assert_no_warnings(lda.fit, X, y)

for n_components in [max_components + 1,
max(n_features, n_classes - 1) + 1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the second one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this one indeed, it's just that since I test something just one unit higher than max_components, I thought I could test something that is higher than both n_features and n_classes - 1 to ensure the test works for any value of n_components

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good for me. Maybe a small comment that explains this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@wdevazelhes
Copy link
Contributor Author

I think future warning sounds good.

Alright, I'll include one

- fix doc for boundary case using including unequalities
- fix typos
- fix style conventions
@wdevazelhes wdevazelhes changed the title [MRG] FIX: warns when invalid n_components in LinearDiscriminantAnalysis [WIP] FIX: warns when invalid n_components in LinearDiscriminantAnalysis Jul 15, 2018
and fixes test warning assertion
@wdevazelhes wdevazelhes changed the title [WIP] FIX: warns when invalid n_components in LinearDiscriminantAnalysis [MRG] FIX: warns when invalid n_components in LinearDiscriminantAnalysis Jul 15, 2018
"n_classes - 1) = min(%d, %d - 1) = %d components."
% (X.shape[1], len(self.classes_), max_components),
ChangedBehaviorWarning)
future_msg = ("In version 0.22, invalid values for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be useful to say in a few words what invalid means.

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the two minor comments that I made, this is good for me.

assert_no_warnings(lda.fit, X, y)

for n_components in [max_components + 1,
max(n_features, n_classes - 1) + 1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good for me. Maybe a small comment that explains this.

@GaelVaroquaux GaelVaroquaux changed the title [MRG] FIX: warns when invalid n_components in LinearDiscriminantAnalysis [MRG+1] FIX: warns when invalid n_components in LinearDiscriminantAnalysis Jul 16, 2018
Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but you have a conflict.

You also need to add a whatsnew entry.

Thanks !

@wdevazelhes
Copy link
Contributor Author

Sorry for the late reply. I resolved the conflict, added a what's new entry, and changed the FutureWarning from 0.22 to 0.23 . I guess it should be ready to merge

@TomDLT TomDLT merged commit 6b1d8e5 into scikit-learn:master Dec 7, 2018
@TomDLT
Copy link
Member

TomDLT commented Dec 7, 2018

Thanks !

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LinearDiscriminantAnalysis silently changes user parameter LDA scalings_ gives wrong dimensions

4 participants