incorrect PLS rank bounds #29662

adrinjalali · 2024-08-13T11:04:57Z

Going through the math, I think the code is incorrect. The rank of X.T @ X is bounded above by min(n_samples, n_features) and not just n_features. We can update the implementation to reflect this:

diff --git a/sklearn/cross_decomposition/_pls.py b/sklearn/cross_decomposition/_pls.py
index dce2c23db2..4346905f00 100644
--- a/sklearn/cross_decomposition/_pls.py
+++ b/sklearn/cross_decomposition/_pls.py
@@ -246,7 +246,9 @@ class _PLS(
         # With PLSRegression n_components is bounded by the rank of (X.T X) see
         # Wegelin page 25. With CCA and PLSCanonical, n_components is bounded
         # by the rank of X and the rank of Y: see Wegelin page 12
-        rank_upper_bound = p if self.deflation_mode == "regression" else min(n, p, q)
+        rank_upper_bound = (
+            min(n, p) if self.deflation_mode == "regression" else min(n, p, q)
+        )
         if n_components > rank_upper_bound:
             raise ValueError(
                 f"`n_components` upper bound is {rank_upper_bound}. "

I would treat this as a bug fix because it makes the implementation consistent with the paper.

For reference, here is a quick example of the rank being bounded by n_samples:

import numpy as np

rng = np.random.RandomState(42)
X = rng.standard_normal(size=(20, 54))

print(np.linalg.matrix_rank(X.T @ X))
# 20

The text was updated successfully, but these errors were encountered:

github-actions bot added the Needs Triage Issue requires triage label Aug 13, 2024

adrinjalali mentioned this issue Aug 13, 2024

Fix PLSR n_components documentation. #26204

Closed

lesteve removed the Needs Triage Issue requires triage label Aug 20, 2024

thomasjpfan mentioned this issue Aug 24, 2024

FIX Raises error in PLSRegression for invalid n_components #29710

Merged

OmarManzoor closed this as completed in #29710 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect PLS rank bounds #29662

incorrect PLS rank bounds #29662

adrinjalali commented Aug 13, 2024

incorrect PLS rank bounds #29662

incorrect PLS rank bounds #29662

Comments

adrinjalali commented Aug 13, 2024