MNT use check_scalar in SpectralBiClustering and SpectralCoClustering #20817

creatornadiran · 2021-08-23T14:08:06Z

Reference Issues/PRs

Reference Issue #20724
PR #20723

What does this implement/fix? Explain your changes.

Used check_scalar function instead of if-else blocks to validate parameters.

Any other comments?

Please let me know if there is a mistake or you have a suggestion.

glemaitre · 2021-09-02T15:42:08Z

sklearn/cluster/_bicluster.py

+        "n_clusters": { "target_type": numbers.Integral,"min_val": 1, "max_val": n_samples},
+        "n_init": { "target_type": numbers.Integral,"min_val": 1 },
+        }
+        for scalar_name in scalars_checks:


You can make 2 calls and not use a loop.

glemaitre · 2021-09-02T15:42:22Z

sklearn/cluster/_bicluster.py

        legal_svd_methods = ("randomized", "arpack")
        if self.svd_method not in legal_svd_methods:
            raise ValueError(
                "Unknown SVD method: '{0}'. svd_method must be one of {1}.".format(
                    self.svd_method, legal_svd_methods
                )
            )
+        scalars_checks = {
+        "n_clusters": { "target_type": numbers.Integral,"min_val": 1, "max_val": n_samples},
+        "n_init": { "target_type": numbers.Integral,"min_val": 1 },


There is some PEP8 issue here

glemaitre · 2021-09-02T15:43:34Z

sklearn/cluster/_bicluster.py

+        "n_components": { "target_type": numbers.Integral,"min_val": 1},
+        "n_best": { "target_type": numbers.Integral,"min_val": 1, "max_val": self.n_components },
+        }
+        for scalar_name in scalars_checks:


creatornadiran · 2021-09-02T18:59:01Z

Thanks for the reviews. I made the changes. If I'm still missing something, please let me know.

glemaitre · 2021-09-15T11:21:20Z

The CIs are failing. Can you look at the reported error. You should be able to reproduce the error locally by running the associated tests of the scikit-learn estimator that you modified.

glemaitre · 2021-09-15T12:30:59Z

I am posting the patch that would be required to fix most probably the CI with some additional changes that I would have asked later on with a good review:

diff --git a/sklearn/cluster/_bicluster.py b/sklearn/cluster/_bicluster.py
index 849afd6cf5..5bb8b95cc6 100644
--- a/sklearn/cluster/_bicluster.py
+++ b/sklearn/cluster/_bicluster.py
@@ -106,7 +106,7 @@ class BaseSpectral(BiclusterMixin, BaseEstimator, metaclass=ABCMeta):
         self.n_init = n_init
         self.random_state = random_state
 
-    def _check_parameters(self, n_samples):
+    def _check_parameters(self, n_samples, n_features):
         legal_svd_methods = ("randomized", "arpack")
         if self.svd_method not in legal_svd_methods:
             raise ValueError(
@@ -114,8 +114,13 @@ class BaseSpectral(BiclusterMixin, BaseEstimator, metaclass=ABCMeta):
                     self.svd_method, legal_svd_methods
                 )
             )
-        check_scalar(self.n_clusters, "n_clusters", target_type=numbers.Integral, min_val=1, max_val=n_samples)
-        check_scalar(self.n_init, "n_init", target_type=numbers.Integral, min_val=1)
+        check_scalar(
+            self.n_init,
+            "n_init",
+            target_type=numbers.Integral,
+            min_val=1,
+            include_boundaries="left",
+        )
 
     def fit(self, X, y=None):
         """Creates a biclustering for X.
@@ -128,7 +133,7 @@ class BaseSpectral(BiclusterMixin, BaseEstimator, metaclass=ABCMeta):
 
         """
         X = self._validate_data(X, accept_sparse="csr", dtype=np.float64)
-        self._check_parameters(X.shape[0])
+        self._check_parameters(*X.shape)
         self._fit(X)
         return self
 
@@ -326,6 +331,17 @@ class SpectralCoclustering(BaseSpectral):
             n_clusters, svd_method, n_svd_vecs, mini_batch, init, n_init, random_state
         )
 
+    def _check_parameters(self, n_samples, n_features):
+        super()._check_parameters(n_samples, n_features)
+        check_scalar(
+            self.n_clusters,
+            "n_clusters",
+            target_type=numbers.Integral,
+            min_val=1,
+            max_val=n_samples,
+            include_boundaries="both",
+        )
+
     def _fit(self, X):
         normalized_data, row_diag, col_diag = _scale_normalize(X)
         n_sv = 1 + int(np.ceil(np.log2(self.n_clusters)))
@@ -487,8 +503,8 @@ class SpectralBiclustering(BaseSpectral):
         self.n_components = n_components
         self.n_best = n_best
 
-    def _check_parameters(self, n_sample):
-        super()._check_parameters()
+    def _check_parameters(self, n_samples, n_features):
+        super()._check_parameters(n_samples, n_features)
         legal_methods = ("bistochastic", "scale", "log")
         if self.method not in legal_methods:
             raise ValueError(
@@ -496,22 +512,60 @@ class SpectralBiclustering(BaseSpectral):
                     self.method, legal_methods
                 )
             )
-        try:
-            int(self.n_clusters)
-        except TypeError:
+
+        n_clusters_type_error = (
+            f"Incorrect parameter n_clusters has value: {self.n_clusters}. It "
+            "should either be a single integer or an iterable with two "
+            "integers: (n_row_clusters, n_column_clusters)"
+        )
+        if isinstance(self.n_clusters, numbers.Integral):
+            check_scalar(
+                self.n_clusters,
+                "n_clusters",
+                target_type=numbers.Integral,
+                min_val=1,
+                max_val=n_samples,
+                include_boundaries="both",
+            )
+        elif isinstance(self.n_clusters, tuple):
             try:
-                r, c = self.n_clusters
-                int(r)
-                int(c)
-            except (ValueError, TypeError) as e:
-                raise ValueError(
-                    "Incorrect parameter n_clusters has value:"
-                    " {}. It should either be a single integer"
-                    " or an iterable with two integers:"
-                    " (n_row_clusters, n_column_clusters)"
-                ) from e
-        check_scalar(self.n_components, "n_components", target_type=numbers.Integral, min_val=1)
-        check_scalar(self.n_best, "n_best", target_type=numbers.Integral, min_val=1, max_val=self.n_components)
+                rows, columns = self.n_clusters
+            except ValueError as e:
+                raise ValueError(n_clusters_type_error) from e
+            check_scalar(
+                rows,
+                "n_rows from n_clusters",
+                target_type=numbers.Integral,
+                min_val=1,
+                max_val=n_samples,
+                include_boundaries="both",
+            )
+            check_scalar(
+                columns,
+                "n_columns from n_clusters",
+                target_type=numbers.Integral,
+                min_val=1,
+                max_val=n_features,
+                include_boundaries="both",
+            )
+        else:
+            raise TypeError(n_clusters_type_error)
+
+        check_scalar(
+            self.n_components,
+            "n_components",
+            target_type=numbers.Integral,
+            min_val=1,
+            include_boundaries="left",
+        )
+        check_scalar(
+            self.n_best,
+            "n_best",
+            target_type=numbers.Integral,
+            min_val=1,
+            max_val=self.n_components,
+            include_boundaries="both",
+        )
 
     def _fit(self, X):
         n_sv = self.n_components
diff --git a/sklearn/cluster/tests/test_bicluster.py b/sklearn/cluster/tests/test_bicluster.py
index ba6d91a537..0eb6a6805e 100644
--- a/sklearn/cluster/tests/test_bicluster.py
+++ b/sklearn/cluster/tests/test_bicluster.py
@@ -208,23 +208,36 @@ def test_perfect_checkerboard():
 
 
 @pytest.mark.parametrize(
-    "args",
+    "args, err_type, err_msg",
     [
-        {"n_clusters": (3, 3, 3)},
-        {"n_clusters": "abc"},
-        {"n_clusters": (3, "abc")},
-        {"method": "unknown"},
-        {"n_components": 0},
-        {"n_best": 0},
-        {"svd_method": "unknown"},
-        {"n_components": 3, "n_best": 4},
+        (
+            {"n_clusters": (3, 3, 3)},
+            ValueError,
+            r"Incorrect parameter n_clusters has value: \(3, 3, 3\)",
+        ),
+        (
+            {"n_clusters": "abc"},
+            TypeError,
+            "Incorrect parameter n_clusters has value: abc",
+        ),
+        (
+            {"n_clusters": (3, "abc")},
+            TypeError,
+            "n_columns from n_clusters must be an instance of <class"
+            " 'numbers.Integral'>, not <class 'str'>.",
+        ),
+        ({"method": "unknown"}, ValueError, "Unknown method: 'unknown'"),
+        ({"n_components": 0}, ValueError, "n_components == 0, must be >= 1."),
+        ({"n_best": 0}, ValueError, "n_best == 0, must be >= 1."),
+        ({"svd_method": "unknown"}, ValueError, "Unknown SVD method: 'unknown'"),
+        ({"n_components": 3, "n_best": 4}, ValueError, "n_best == 4, must be <= 3."),
     ],
 )
-def test_errors(args):
+def test_errors(args, err_type, err_msg):
     data = np.arange(25).reshape((5, 5))
 
     model = SpectralBiclustering(**args)
-    with pytest.raises(ValueError):
+    with pytest.raises(err_type, match=err_msg):
         model.fit(data)

In short, this patch does:

specialize each Spectral classes for the n_clusters check
rework the type of error raised for n_clusters => raise a proper TypeError when the type is not the one expected
improve the tests by checking the error message raised as well as the type of error

@creatornadiran Could you apply these changes such that I can make another round of reviews with these changes.

glemaitre

Adding the "Request changes" to see that we already reviewed this PR.

glemaitre · 2021-09-15T14:04:32Z

n_jobs should also be checked.

creatornadiran · 2021-09-15T21:04:36Z

n_jobs should also be checked.

Wouldn't it be a bit difficult to check n_jobs with check_scalar?

glemaitre · 2021-09-16T07:55:37Z

Wouldn't it be a bit difficult to check n_jobs with check_scalar?

if self.n_jobs is not None:
    check_scalar(self.n_jobs, "n_jobs", numbers.Integral)

This should be enough

glemaitre · 2021-09-16T07:55:54Z

Be aware that the CIs are failing. You should check the logs.

glemaitre · 2021-09-16T09:14:44Z

Be sure to apply black on your file otherwise the linting CI will fail. The best is to install pre-commit as mentioned in the contributing guide (item 9.) It will reformat the code for you before committing.

creatornadiran · 2021-09-16T10:15:08Z

Wouldn't it be a bit difficult to check n_jobs with check_scalar?
if self.n_jobs is not None:
    check_scalar(self.n_jobs, "n_jobs", numbers.Integral)
This should be enough

I got "'SpectralCoclustering' object has no attribute 'n_jobs'" error.

creatornadiran · 2021-09-16T18:09:10Z

Finally all CLs are passed.

thomasjpfan

Thank you for the PR @creatornadiran !

sklearn/cluster/_bicluster.py

creatornadiran · 2021-10-02T15:04:12Z

Thank you for the PR @creatornadiran !

Thanks for the reply!

I missed that the super()._check_parameters is already checking svd_method. I deleted.

But I can't get why should I add n_samples parameter to BaseSpectral._check_parameters. n_samples is the for check n_cluster's max value but n_cluster checking from child _check_parameters already.

thomasjpfan · 2021-10-02T15:25:30Z

But I can't get why should I add n_samples parameter to BaseSpectral._check_parameters. n_samples is the for check n_cluster's max value but n_cluster checking from child _check_parameters already.

BaseSpectral._check_parameters will not actually use n_samples for checking. It's more to make the API consistent.

Currently, BaseSpectral.fit calls self._check_parameters(X.shape[0]), but it's own _check_parameters does not accept n_samples. BaseSpectral.fit depends on a subclass to override and change the signature of _check_parameters, which feels non-intuitive.

creatornadiran · 2021-10-02T15:39:13Z

I understand now thanks. I made the changes.

thomasjpfan

Small comment regarding n_jobs, otherwise LGTM!

sklearn/cluster/_bicluster.py

creatornadiran · 2021-10-04T21:55:03Z

Thanks for ckeck and approve!
@glemaitre recommended me to check for n_jobs. But I think I used it in the wrong class.

thomasjpfan

There is a linting error, which can be fixed by running black .

creatornadiran · 2021-10-07T09:55:12Z

I ran the black i think it looks better now, also it passed the lint test.

thomasjpfan

LGTM

super()._check_parameters is already checking svd_method so no need to this block of code Co-authored-by: Thomas J. Fan <[email protected]>

Co-authored-by: Thomas J. Fan <[email protected]>

creatornadiran · 2022-01-29T23:06:40Z

Still getting 2 errors and don't understand why.
Error Lines:
FAILED cluster/tests/test_bicluster.py::test_spectalcoclustering_parameter_validation[params2-ValueError-Incorrect parameter n_clusters has value: abc]

FAILED cluster/tests/test_bicluster.py::test_spectalbiclustering_parameter_validation[params1-TypeError-n_init must be an instance of integer]

glemaitre · 2022-01-31T09:56:21Z

Still getting 2 errors and don't understand why.

The string that you passed was not matching the error message. You can check the fix here: cda955d

glemaitre · 2022-01-31T10:30:28Z

CIs are green. Let's merge then. Thanks @creatornadiran

creatornadiran · 2022-01-31T10:34:16Z

Still getting 2 errors and don't understand why.

The string that you passed was not matching the error message. You can check the fix here: cda955d

Wow, I didn't notice. Thanks for the fix.

github-actions bot added the module:cluster label Aug 23, 2021

glemaitre reviewed Sep 2, 2021

View reviewed changes

creatornadiran requested a review from glemaitre September 13, 2021 17:14

glemaitre removed their request for review September 15, 2021 12:31

glemaitre changed the title ~~Used check_scalar to check parameters.~~ MNT use check_scalar in SpectralBiClustering and SpectralCoClustering Sep 15, 2021

glemaitre requested changes Sep 15, 2021

View reviewed changes

creatornadiran requested a review from glemaitre September 16, 2021 18:09

thomasjpfan reviewed Oct 2, 2021

View reviewed changes

sklearn/cluster/_bicluster.py Show resolved Hide resolved

sklearn/cluster/_bicluster.py Outdated Show resolved Hide resolved

creatornadiran requested a review from thomasjpfan October 2, 2021 16:42

thomasjpfan approved these changes Oct 4, 2021

View reviewed changes

sklearn/cluster/_bicluster.py Outdated Show resolved Hide resolved

thomasjpfan reviewed Oct 6, 2021

View reviewed changes

creatornadiran requested a review from thomasjpfan October 7, 2021 11:11

thomasjpfan approved these changes Oct 8, 2021

View reviewed changes

creatornadiran and others added 18 commits January 30, 2022 01:08

Update _bicluster.py

dbb7f61

Update _bicluster.py

1a59b6b

Update _bicluster.py

d2c6695

some change in spelling

e878db6

Update sklearn/cluster/_bicluster.py

54b0cf6

super()._check_parameters is already checking svd_method so no need to this block of code Co-authored-by: Thomas J. Fan <[email protected]>

Update _bicluster.py

59ecd9a

Update sklearn/cluster/_bicluster.py

95a458d

Co-authored-by: Thomas J. Fan <[email protected]>

runned black

c5937fe

necessary changes

788675a

test fixed

6ea505b

bicluster_test updated

bf2d6d3

test_bicluster fixed

d77d8ed

test_bicluster.py fixed

87fe753

test_bicluster refixed

42de356

test_bicluster

8615430

tests/test_bicluster fixed

2cf26cc

Update test_bicluster.py

feb581a

rebase and changes

f70c9ef

creatornadiran force-pushed the main branch from 15e18e3 to f70c9ef Compare January 29, 2022 22:14

creatornadiran added 3 commits January 30, 2022 01:19

black

5b20cec

error fix

9569bfa

error fix 2

0194654

creatornadiran requested a review from glemaitre January 31, 2022 01:44

glemaitre added 2 commits January 31, 2022 10:45

Merge remote-tracking branch 'origin/main' into pr/creatornadiran/20817

e5488f4

fix regex to match

cda955d

glemaitre merged commit 604bc5c into scikit-learn:main Jan 31, 2022

Uh oh!

MNT use check_scalar in SpectralBiClustering and SpectralCoClustering #20817

MNT use check_scalar in SpectralBiClustering and SpectralCoClustering #20817

Uh oh!

Conversation

creatornadiran commented Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre Sep 2, 2021

Choose a reason for hiding this comment

Uh oh!

glemaitre Sep 2, 2021

Choose a reason for hiding this comment

Uh oh!

glemaitre Sep 2, 2021

Choose a reason for hiding this comment

Uh oh!

creatornadiran commented Sep 2, 2021

Uh oh!

glemaitre commented Sep 15, 2021

Uh oh!

glemaitre commented Sep 15, 2021

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Sep 15, 2021

Uh oh!

creatornadiran commented Sep 15, 2021

Uh oh!

glemaitre commented Sep 16, 2021

Uh oh!

glemaitre commented Sep 16, 2021

Uh oh!

glemaitre commented Sep 16, 2021

Uh oh!

creatornadiran commented Sep 16, 2021

Uh oh!

creatornadiran commented Sep 16, 2021

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

creatornadiran commented Oct 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Oct 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

creatornadiran commented Oct 2, 2021

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

creatornadiran commented Oct 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

creatornadiran commented Oct 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

creatornadiran commented Jan 29, 2022

Uh oh!

glemaitre commented Jan 31, 2022

Uh oh!

glemaitre commented Jan 31, 2022

Uh oh!

creatornadiran commented Jan 31, 2022

Uh oh!

creatornadiran commented Aug 23, 2021 •

edited

Loading

creatornadiran commented Oct 2, 2021 •

edited

Loading

thomasjpfan commented Oct 2, 2021 •

edited

Loading

creatornadiran commented Oct 4, 2021 •

edited

Loading

creatornadiran commented Oct 7, 2021 •

edited

Loading