FEA Calculate Explained Variance Ratio for `_PLS` Models #32722

paucablop · 2025-11-16T16:53:17Z

Reference Issues/PRs

This PR is targeting the Issue (#32675) regarding the addition of the attributes explained_variance_ratio_x_ and explained_variance_ratio_y_ for the _PLS models.

Fixes #32675
Fixes #19896
Fixes #30470

What does this implement/fix? Explain your changes.

This PR implements the explained_variance_ratio_x_ and explained_variance_ratio_y_ for _PLS models, analogous to how PCA exposes explained_variance_ratio_. Not having access to it in the PLSRegression makes it difficult to:

Quantify how much variance each latent variable captures,
Compare models with different numbers of components, and
Produce diagnostic or variance-explained plots (as is standard in different disciplines where latent space interpretability is important).

In the current PR, I propose adding two new attributes, that are calculated during fitting:

explained_variance_ratio_x_ : ndarray of shape (n_components)
Fraction of variance explained in X-space for each component.
explained_variance_ratio_y_ : ndarray of shape (n_components)
Fraction of variance explained in Y-space for each component.

In the initial issue (#32675) I suggested a solution for the PLSRegression model, including a calculation of the explained variance ratio for both matrices after right after fitting the superclass.

However, as I was working on it, I realized that a more elegant and extensible solution is to calculate the variance ratio directly in the parent class during fit. This gives:

Extensibility the suggested solution directly extends the functionality to other models inheriting _PLS (PLSRegression, PLSCanonical and CCA) without having to handle additional logic (i.e., deflation_mode canonical vs regression (symmetric vs asymmetric)) as this logic is already handled during supper().fit().
Consistency all subclasses expose the same attributes without custom logic.
Performance the suggested approach is more performant because it does not require to redundantly deflate the matrices again after fitting. The explained variances are calculated in each iteration during the fitting process.

To ensure correctness, the implementation was compared with literature benchmark values; additional details are provided in the Testing section.

I think the superclass implementation is the better design choice, but I’m happy to adopt the earlier proposal if the maintainers prefer that direction.

Testing

The following test actions have been done:

Added tests
The following test was added to sklearn/cross_decomposition/tests/test_pls.py:

test_pls_variance_ratio_X_y()

This test runs on all PLS models that inherit the _PLS (PLSRegression, PLSCanonical and CCA). A description of the test is provided below:

For PLSRegression, PLSCanonical and CCA:
- ✅ Assert that the number of items in the explained_variance_ratio_x_ and explained_variance_ratio_y_ is the same as the nr_components in the model.
- ✅ Assert that the cumulative explained variance in X approaches 1 when using the maximum number of components.
  (This holds for the synthetic test data; symmetric-deflation models may vary depending on the ranks of X and y.)
For PLSRegression:
- ✅ Assert that the variance of each component matches the reported literature values in the literature for the X and the y matrices [1].
- ✅ Assert that the cumulative variance in the y matrix is not larger than 1 variance when the max number of components is used (due to asymmetric deflation).
For PLSCanonical and CCA it is expected that the cumulative variance explained in the y matrix adds to 1 when the max number of components are used.

[1].. Abdi, H. (2003) Partial Least Squares (PLS) Regression. In Lewis-Beck M., Bryman A., Futing T. (Eds.), Encyclopedia of Social Sciences Research Methods. Thousand Oaks (CA): Sage.

Run the following test suite:

✅ pytest sklearn/cross_decomposition/tests/test_pls.py [69/69 passing]
✅ pytest sklearn/tests/test_common.py -k PLSRegression -v [65/66 passing, 1 skipped]*
✅ pytest sklearn/tests/test_common.py -k PLSCanonical -v [65/66 passing, 1 skipped]*
✅ pytest sklearn/tests/test_common.py -k CCA -v [65/66 passing, 1 skipped]*

(Skipped tests are unrelated and also skipped on main.)

Documentation

Added docstrings

✅ The following docstrings were added to PLSRegression, PLSCanonical and CCA.

"""[PLSRegression] / [PLSCanonical] / [CCA]

[...]

explained_variance_ratio_x_ : ndarray of shape (n_components,)
	Percentage of variance explained by each of the selected components in `X`.

explained_variance_ratio_y_ : ndarray of shape (n_components,)
	Percentage of variance explained by each of the selected components in `y`.
"""

NOTE: other attributes include the release version at which the attribute became available (e.g., .. versionadded:: 1.0) I have not added version tags yet; I will include them once maintainers confirm the target release.

✅ Docstrings were added for the helper function _calculate_variance_xy()

"""Calculates the variance of the X and y matrices
The flags has_x_variance and has_y_variance are included
as guards to prevent crashes on constant data
"""

✅ Docstrings were added to the new unit test function test_pls_variance_ratio_X_y()
A reference to the literature values is included in this reference.

Building documentation
The documentation was build as indicated in the contributor documentation page. As suggested in the guide, the generated HTML files were inspected to verify the successful built of the documentation. This was done for all three models in scope and an example is shown in the image below.

✅ Documentation for PLSRegression.
✅ Documentation for PLSCanonical.
✅ Documentation for CCA.

Examples

Right now, this is only exemplified in the documentation Since it is a rather small addition, I am not sure what would be best:

to be included in the examples page,
to be included as a docstring examlpe,
or leave it as it is as the attributes are documented in the method docstrings

Happy to add an example if desired 😄

Performance

The computation is integrated into the existing iterative deflation performed during fit, so the overhead is minimal. No additional matrix factorizations or extra passes over the data are introduced.

There is no impact on estimator instantiation time, or on .transform() or .predict().

github-actions · 2025-11-16T16:54:16Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 3651130. Link to the linter CI: here}

paucablop added 6 commits November 14, 2025 19:56

feat: implement explained variance ratio for _pls

730dca0

fix: explained_variance_ratios are calculated as np.arrays

95ff5d5

test: add docstrings to test file

66cd98c

docs: add comment to matrix deflation procedure

c989bc2

docs: update tocumentation in _calculate_variance_xy

8b2735e

docs: fix typo

f62ede7

github-actions bot added the module:cross_decomposition label Nov 16, 2025

paucablop added 3 commits November 17, 2025 01:01

docs: fix incomplete docstring issue in helper class

f94ef99

docs: document change in changelog

68bf634

fix: remove double line braks in docs

3651130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FEA Calculate Explained Variance Ratio for `_PLS` Models #32722

FEA Calculate Explained Variance Ratio for `_PLS` Models #32722

paucablop commented Nov 16, 2025

Uh oh!

github-actions bot commented Nov 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

FEA Calculate Explained Variance Ratio for _PLS Models #32722

Are you sure you want to change the base?

FEA Calculate Explained Variance Ratio for _PLS Models #32722

Conversation

paucablop commented Nov 16, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Testing

Documentation

Examples

Performance

Uh oh!

github-actions bot commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FEA Calculate Explained Variance Ratio for `_PLS` Models #32722

FEA Calculate Explained Variance Ratio for `_PLS` Models #32722

github-actions bot commented Nov 16, 2025 •

edited

Loading