You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Going through the math, I think the code is incorrect. The rank of X.T @ X is bounded above by min(n_samples, n_features) and not just n_features. We can update the implementation to reflect this:
diff --git a/sklearn/cross_decomposition/_pls.py b/sklearn/cross_decomposition/_pls.py
index dce2c23db2..4346905f00 100644
--- a/sklearn/cross_decomposition/_pls.py+++ b/sklearn/cross_decomposition/_pls.py@@ -246,7 +246,9 @@ class _PLS(
# With PLSRegression n_components is bounded by the rank of (X.T X) see
# Wegelin page 25. With CCA and PLSCanonical, n_components is bounded
# by the rank of X and the rank of Y: see Wegelin page 12
- rank_upper_bound = p if self.deflation_mode == "regression" else min(n, p, q)+ rank_upper_bound = (+ min(n, p) if self.deflation_mode == "regression" else min(n, p, q)+ )
if n_components > rank_upper_bound:
raise ValueError(
f"`n_components` upper bound is {rank_upper_bound}. "
I would treat this as a bug fix because it makes the implementation consistent with the paper.
For reference, here is a quick example of the rank being bounded by n_samples:
From #26204 (review)
The text was updated successfully, but these errors were encountered: