Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@paucablop
Copy link

Reference Issues/PRs

This PR is targeting the Issue (#32675) regarding the addition of the attributes explained_variance_ratio_x_ and explained_variance_ratio_y_ for the _PLS models.

Fixes #32675
Fixes #19896
Fixes #30470

What does this implement/fix? Explain your changes.

This PR implements the explained_variance_ratio_x_ and explained_variance_ratio_y_ for _PLS models, analogous to how PCA exposes explained_variance_ratio_. Not having access to it in the PLSRegression makes it difficult to:

  • Quantify how much variance each latent variable captures,
  • Compare models with different numbers of components, and
  • Produce diagnostic or variance-explained plots (as is standard in different disciplines where latent space interpretability is important).

In the current PR, I propose adding two new attributes, that are calculated during fitting:

  • explained_variance_ratio_x_ndarray of shape (n_components)
    Fraction of variance explained in X-space for each component.

  • explained_variance_ratio_y_ndarray of shape (n_components)
    Fraction of variance explained in Y-space for each component.

In the initial issue (#32675) I suggested a solution for the PLSRegression model, including a calculation of the explained variance ratio for both matrices after right after fitting the superclass.

However, as I was working on it, I realized that a more elegant and extensible solution is to calculate the variance ratio directly in the parent class during fit. This gives:

  • Extensibility the suggested solution directly extends the functionality to other models inheriting _PLS (PLSRegression, PLSCanonical and CCA) without having to handle additional logic (i.e., deflation_mode canonical vs regression (symmetric vs asymmetric)) as this logic is already handled during supper().fit().

  • Consistency all subclasses expose the same attributes without custom logic.

  • Performance the suggested approach is more performant because it does not require to redundantly deflate the matrices again after fitting. The explained variances are calculated in each iteration during the fitting process.

To ensure correctness, the implementation was compared with literature benchmark values; additional details are provided in the Testing section.

I think the superclass implementation is the better design choice, but I’m happy to adopt the earlier proposal if the maintainers prefer that direction.

Testing

The following test actions have been done:

Added tests
The following test was added to sklearn/cross_decomposition/tests/test_pls.py:

  • test_pls_variance_ratio_X_y()

This test runs on all PLS models that inherit the _PLS (PLSRegression, PLSCanonical and CCA). A description of the test is provided below:

  1. For PLSRegression, PLSCanonical and CCA:

    • ✅ Assert that the number of items in the explained_variance_ratio_x_ and explained_variance_ratio_y_ is the same as the nr_components in the model.
    • ✅ Assert that the cumulative explained variance in X approaches 1 when using the maximum number of components.
      (This holds for the synthetic test data; symmetric-deflation models may vary depending on the ranks of X and y.)
  2. For PLSRegression:

    • ✅ Assert that the variance of each component matches the reported literature values in the literature for the X and the y matrices [1].
    • ✅ Assert that the cumulative variance in the y matrix is not larger than 1 variance when the max number of components is used (due to asymmetric deflation).
  3. For PLSCanonical and CCA it is expected that the cumulative variance explained in the y matrix adds to 1 when the max number of components are used.

[1].. Abdi, H. (2003) Partial Least Squares (PLS) Regression. In Lewis-Beck M., Bryman A., Futing T. (Eds.), Encyclopedia of Social Sciences Research Methods. Thousand Oaks (CA): Sage.

Run the following test suite:

  • pytest sklearn/cross_decomposition/tests/test_pls.py [69/69 passing]
  • pytest sklearn/tests/test_common.py -k PLSRegression -v [65/66 passing, 1 skipped]*
  • pytest sklearn/tests/test_common.py -k PLSCanonical -v [65/66 passing, 1 skipped]*
  • pytest sklearn/tests/test_common.py -k CCA -v [65/66 passing, 1 skipped]*

(Skipped tests are unrelated and also skipped on main.)

Documentation

Added docstrings

  • ✅ The following docstrings were added to PLSRegression, PLSCanonical and CCA.
"""[PLSRegression] / [PLSCanonical] / [CCA]

[...]

explained_variance_ratio_x_ : ndarray of shape (n_components,)
	Percentage of variance explained by each of the selected components in `X`.

explained_variance_ratio_y_ : ndarray of shape (n_components,)
	Percentage of variance explained by each of the selected components in `y`.
"""

NOTE: other attributes include the release version at which the attribute became available (e.g., .. versionadded:: 1.0) I have not added version tags yet; I will include them once maintainers confirm the target release.

  • ✅ Docstrings were added for the helper function _calculate_variance_xy()
"""Calculates the variance of the X and y matrices
The flags has_x_variance and has_y_variance are included
as guards to prevent crashes on constant data
"""
  • ✅ Docstrings were added to the new unit test function test_pls_variance_ratio_X_y()
    A reference to the literature values is included in this reference.

Building documentation
The documentation was build as indicated in the contributor documentation page. As suggested in the guide, the generated HTML files were inspected to verify the successful built of the documentation. This was done for all three models in scope and an example is shown in the image below.

  • ✅ Documentation for PLSRegression.
  • ✅ Documentation for PLSCanonical.
  • ✅ Documentation for CCA.
PLSRegression Docs

Examples

Right now, this is only exemplified in the documentation Since it is a rather small addition, I am not sure what would be best:

  • to be included in the examples page,
  • to be included as a docstring examlpe,
  • or leave it as it is as the attributes are documented in the method docstrings

Happy to add an example if desired 😄

Performance

The computation is integrated into the existing iterative deflation performed during fit, so the overhead is minimal. No additional matrix factorizations or extra passes over the data are introduced.

There is no impact on estimator instantiation time, or on .transform() or .predict().

@github-actions
Copy link

github-actions bot commented Nov 16, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 3651130. Link to the linter CI: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

1 participant