MAINT Use check_scalar to validate scalar in: GeneralizedLinearRegressor #21946

reshamas · 2021-12-10T19:13:22Z

Reference Issues/PRs

References #21927

What does this implement/fix? Explain your changes.

File: sklearn/linear_model/_glm/glm.py
Class: GeneralizedLinearRegressor
Identify the parameters which are scalar.
Add tests to ensure appropriate behavior when invalid arguments are passed in.
Implement the helper function check_scalar from sklearn.utils to validate the scalar parameters.

Parameters:

Any other comments?

#DataUmbrella #postsprint

…alar_glm

glemaitre · 2021-12-15T14:03:18Z

You should remove:

test_glm_tol_argument
test_glm_alpha_argument

These tests will be covered by the more generic test functions that you created and parametrized.

reshamas · 2021-12-15T15:40:45Z

@glemaitre

You should remove:

test_glm_tol_argument

test_glm_alpha_argument

These tests will be covered by the more generic test functions that you created and parametrized.

The tol parametrization I added only tests for negative values. Should I also add in a test for string?
Same for alpha. Should I add in a test that checks for string?

glemaitre · 2021-12-15T15:41:55Z

The tol parametrization I added only tests for negative values. Should I also add in a test for string?
Same for alpha. Should I add in a test that checks for string?

Yes, we should test for a type of data that should not be supported to ensure that we raise a TypeError.

reshamas · 2021-12-15T15:47:12Z

The tol parametrization I added only tests for negative values. Should I also add in a test for string?
Same for alpha. Should I add in a test that checks for string?

Yes, we should test for a type of data that should not be supported to ensure that we raise a TypeError.

OK, I think I will add it for verbose as well.

…alar_glm

reshamas · 2021-12-15T19:15:22Z

@glemaitre
I see this error. But, I am not sure what to do to fix it.

Should I change my code so it reads: 'stopping criteria must be positive'

tol = -1.0

    @pytest.mark.parametrize("tol", ["not a number", 0, -1.0, [1e-3]])
    def test_glm_tol_argument(tol):
        """Test GLM for invalid tol argument."""
        y = np.array([1, 2])
        X = np.array([[1], [2]])
        glm = GeneralizedLinearRegressor(tol=tol)
        with pytest.raises(ValueError, match="stopping criteria must be positive"):
>           glm.fit(X, y)
E           AssertionError: Regex pattern 'stopping criteria must be positive' does not match 'tol == -1.0, must be >= 0.'.

/h

thomasjpfan

Thank you for the PR @reshamas !

sklearn/linear_model/_glm/glm.py

thomasjpfan · 2021-12-15T23:55:15Z

sklearn/linear_model/_glm/glm.py

@@ -72,6 +73,7 @@ class GeneralizedLinearRegressor(RegressorMixin, BaseEstimator):
        regularization strength. ``alpha = 0`` is equivalent to unpenalized
        GLMs. In this case, the design matrix `X` must have full column rank
        (no collinearities).
+        Values should be >=0.


If we are going to show ranges, I think we should be consistent with #21955 (When that PR gets merged)

Co-authored-by: Thomas J. Fan <[email protected]>

reshamas · 2021-12-20T18:22:01Z

Reminders:

tol test for 0 (boundary), test for list item
remove blank space before internal range values

…alar_glm

reshamas · 2021-12-22T17:55:23Z

@thomasjpfan @glemaitre Is the below correct?

I don't need to add any tests for the other 3 classes in this file. (PoissonRegressor), (GammaRegressor), (TweedieRegressor).
I do need to add the valid intervals for the parameters.
I can probably add the valid intervals for the scalar parameters for the other 3 classes to this PR.

glemaitre · 2021-12-22T18:06:13Z

I don't need to add any tests for the other 3 classes in this file. (PoissonRegressor), (GammaRegressor), (TweedieRegressor).

Actually, we should. But it could be easier because I would expect that all these models to have the same API than GeneralizedLinearModel meaning that adding

@pytest.mark.parametrize("Estimator", [GeneralizedLinearModel, TweedieRegressor, ...])

should be enough.

I do need to add the valid intervals for the parameters.

Yes it would be better.

I can probably add the valid intervals for the scalar parameters for the other 3 classes to this PR.

Let's first merge this PR. It has a reasonable size and it will be easier to review.

…alar_glm

reshamas · 2022-01-05T20:29:12Z

@glemaitre I think this PR is in a good place for review.
I added the intervals in #22076.

jjerphan

Thank you, @reshamas for this contribution.

Here are a few comments.

sklearn/linear_model/_glm/glm.py

sklearn/linear_model/_glm/tests/test_glm.py

sklearn/linear_model/_glm/glm.py

Co-authored-by: Julien Jerphanion <[email protected]>

… into ckscalar_glm

…alar_glm

glemaitre

Otherwise LGTM

glemaitre · 2022-01-10T18:59:36Z

sklearn/linear_model/_glm/glm.py


    warm_start : bool, default=False
        If set to ``True``, reuse the solution of the previous call to ``fit``
        as initialization for ``coef_`` and ``intercept_``.

    verbose : int, default=0
        For the lbfgs solver set verbose to any positive number for verbosity.
+        Values must be in the range `[1, inf)`.


Suggested change

Values must be in the range `[1, inf)`.

Values must be in the range `[0, inf)`.

thomasjpfan · 2022-01-10T18:19:45Z

sklearn/linear_model/_glm/glm.py

+            name="alpha",
+            target_type=numbers.Real,
+            min_val=0.0,
+            include_boundaries="left",


In general, we are not consistent here, but since the default is both, I think we can leave out 'left'

Suggested change

include_boundaries="left",

Also, this is consistent with the rest of this PR that leaves out include_boundaries.

@thomasjpfan Not sure I understand why we are leaving it "both" when it's "left"?

With max_val set to None, "left" and "both" means the same thing in terms of the upper bound because the upper bound is not checked.

For alpha, I think it is technically "both" because np.inf is a valid value for alpha in the GLMs:

from sklearn import linear_model import numpy as np clf = linear_model.PoissonRegressor(alpha=np.inf) X = [[1, 2], [2, 3], [3, 4], [4, 3]] y = [12, 17, 22, 21] # works but with warnings clf.fit(X, y)

Setting alpha=np.inf is strange, but it can be educational?

Edit: In other words, should we distinguish between [0.0, inf] and [0.0, inf)? Currently:

check_scalar(..., min_val=0.0, include_boundaries="both")

check_scalar(..., min_val=0.0, include_boundaries="left")

both mean [0.0, inf], where the inf is included. Here is a snippet to show case that np.inf passes both checks:

from sklearn.utils.validation import check_scalar import numpy as np check_scalar(np.inf, name="value", target_type=float, min_val=0.0, include_boundaries="both") check_scalar(np.inf, name="value", target_type=float, min_val=0.0, include_boundaries="left")

I would say that most of the time in the docstring we should note [0.0, inf) to exclude inf as invalid.

But there might be places where np.inf is a valid value for the parameter and has a special meaning and it should be documented what it does in the docstring.

BTW: for the Poisson regressor example with infinite alpha I get:

/Users/ogrisel/code/scikit-learn/sklearn/linear_model/_glm/glm.py:302: RuntimeWarning: invalid value encountered in multiply coef_scaled = alpha * coef[offset:] /Users/ogrisel/code/scikit-learn/sklearn/linear_model/_glm/glm.py:323: ConvergenceWarning: lbfgs failed to converge (status=2): ABNORMAL_TERMINATION_IN_LNSRCH. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html self.n_iter_ = _check_optimize_result("lbfgs", opt_res)

so rejecting it seems a good idea :)

…alar_glm

jjerphan

LGTM.

reshamas added 3 commits December 7, 2021 08:56

added check_scalar

fcc822c

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

f66cd3c

…alar_glm

added tests for max_iter

24391d1

github-actions bot added the module:linear_model label Dec 10, 2021

reshamas changed the title ~~MNT Use check_scalar to validate scalar in: GeneralizedLinearRegressor~~ MAINT Use check_scalar to validate scalar in: GeneralizedLinearRegressor Dec 10, 2021

reshamas mentioned this pull request Dec 11, 2021

Use the function check_scalar for parameters validation #21927

Closed

41 tasks

reshamas added the No Changelog Needed label Dec 12, 2021

reshamas added 2 commits December 12, 2021 11:51

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

917b05b

…alar_glm

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

e8318fc

…alar_glm

reshamas requested review from thomasjpfan and glemaitre December 13, 2021 17:29

reshamas added 3 commits December 13, 2021 16:01

added tests for alpha

4e9a84c

adding tests for tol

cbafadf

added tests for verbose

1e56683

glemaitre removed their request for review December 15, 2021 14:58

reshamas added 2 commits December 15, 2021 13:56

remove extra checks for alpha and tol

a0c1055

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

d495edd

…alar_glm

thomasjpfan reviewed Dec 15, 2021

View reviewed changes

reshamas and others added 3 commits December 16, 2021 15:43

remove default parameters in function call

ca2b8ef

Co-authored-by: Thomas J. Fan <[email protected]>

remove default calls

0bafb3b

Co-authored-by: Thomas J. Fan <[email protected]>

add range

9a58697

reshamas added 3 commits December 21, 2021 08:19

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

9fb9ce8

…alar_glm

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

569f510

…alar_glm

fixing flake8 error

4354567

comment out tol=1 check

0c17768

reshamas added 2 commits December 23, 2021 10:42

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

db18608

…alar_glm

add multiple estimators in parametrization

3ef2a20

reshamas mentioned this pull request Dec 26, 2021

DOC added intervals for parameters for 4 glm classes #22076

Merged

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

414d605

…alar_glm

jjerphan reviewed Jan 7, 2022

View reviewed changes

reshamas and others added 9 commits January 7, 2022 09:44

Update wording for interval range: "should be" to "must be"

0c62527

Co-authored-by: Julien Jerphanion <[email protected]>

remove commented isinstance check

97ac32e

Co-authored-by: Julien Jerphanion <[email protected]>

capitalize "estimator" to "Estimator"

f9bc8a0

Co-authored-by: Julien Jerphanion <[email protected]>

remove commented portion for checking tol

22f4962

Co-authored-by: Julien Jerphanion <[email protected]>

estimator should be "Estimator"

3f9b7a9

Co-authored-by: Julien Jerphanion <[email protected]>

estimator should be "Estimator"

f94b36d

Co-authored-by: Julien Jerphanion <[email protected]>

for interval range, change from should to must

2b765a5

Merge branch 'ckscalar_glm' of https://github.com/reshamas/scikit-learn…

861b805

… into ckscalar_glm

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

98b9484

…alar_glm

glemaitre approved these changes Jan 10, 2022

View reviewed changes

thomasjpfan reviewed Jan 11, 2022

View reviewed changes

reshamas added 2 commits January 13, 2022 14:26

Merge branch 'main' of github.com:scikit-learn/scikit-learn into cksc…

3478713

…alar_glm

removing interval ranges; added to PR#22076

9f04ec8

jjerphan approved these changes Jan 14, 2022

View reviewed changes

glemaitre merged commit b361f37 into scikit-learn:main Jan 24, 2022

reshamas mentioned this pull request Jun 11, 2022

Make all estimators use _validate_params #23462

Closed

	Values must be in the range `[1, inf)`.
	Values must be in the range `[0, inf)`.

Uh oh!

MAINT Use check_scalar to validate scalar in: GeneralizedLinearRegressor #21946

MAINT Use check_scalar to validate scalar in: GeneralizedLinearRegressor #21946

Uh oh!

Conversation

reshamas commented Dec 10, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre commented Dec 15, 2021

Uh oh!

reshamas commented Dec 15, 2021

Uh oh!

glemaitre commented Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reshamas commented Dec 15, 2021

Uh oh!

reshamas commented Dec 15, 2021

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomasjpfan Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

reshamas commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reshamas commented Dec 22, 2021

Uh oh!

glemaitre commented Dec 22, 2021

Uh oh!

reshamas commented Jan 5, 2022

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 10, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jan 10, 2022

Choose a reason for hiding this comment

Uh oh!

reshamas Jan 13, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 14, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Dec 15, 2021 •

edited

Loading

reshamas commented Dec 20, 2021 •

edited

Loading

thomasjpfan Jan 13, 2022 •

edited

Loading

ogrisel Jan 14, 2022 •

edited

Loading