ENH allow shrunk_covariance to handle multiple matrices at once #25275

qbarthelemy · 2023-01-02T20:55:35Z

Reference Issues/PRs

None.

What does this implement/fix? Explain your changes.

Current version of shrunk_covariance processes only one covariance matrix.

This PR aims to handle ndarrays with n>=2.

Any other comments?

Example with a 3D array.

import timeit
import numpy as np
from sklearn.utils import check_array
from sklearn.covariance import empirical_covariance, shrunk_covariance


def shrunk_covariance_old(emp_covs, shrinkage=0.1):
    return np.array([shrunk_covariance(e, shrinkage) for e in emp_covs])

def shrunk_covariance_new(emp_cov, shrinkage=0.1):
    emp_cov = check_array(emp_cov, allow_nd=True)
    n_features = emp_cov.shape[-1]

    shrunk_cov = (1.0 - shrinkage) * emp_cov
    mu = np.trace(emp_cov, axis1=-2, axis2=-1) / n_features
    while mu.ndim != emp_cov.ndim:
        mu = mu[..., np.newaxis]
    shrunk_cov += shrinkage * mu * np.eye(n_features)
    return shrunk_cov


def compute_time_3D(func, n_reps=10, n_matrices=200, n_samples=1000, n_features=50):
    rs = np.random.RandomState(42)
    times = np.zeros((n_reps))
    for r in range(n_reps):
        X = rs.randn(n_matrices, n_samples, n_features)
        C = np.array([empirical_covariance(x) for x in X])
        t0 = timeit.default_timer()
        func(C)
        times[r] = timeit.default_timer() - t0
    print('Comput time = {0:.4f} +/- {1:.4f}'.format(times.mean(), times.std()))

compute_time_3D(shrunk_covariance_old)
compute_time_3D(shrunk_covariance_new)

Comput time = 0.0105 +/- 0.0004
Comput time = 0.0032 +/- 0.0001

qbarthelemy · 2023-01-11T18:32:40Z

Thx @agramfort if you can help the process!

agramfort

@qbarthelemy can you add a changelog entry?

here is a just for a bug

can you see why CIs are red?

thx @qbarthelemy

sklearn/covariance/_shrunk_covariance.py

qbarthelemy · 2023-01-13T17:01:15Z

Thx for your review @agramfort ! Everything seems OK.

agramfort · 2023-01-13T21:16:21Z

Can you see why CIs fail?

qbarthelemy · 2023-01-14T10:56:55Z

Errors do not seem to be related to my modifications.

/bin/bash: ./build_tools/circle/build_test_arm.sh: No such file or directory

Failed to fetch http://azure.archive.ubuntu.com/ubuntu/pool/universe/m/matplotlib/python-matplotlib-data_3.1.2-1ubuntu4_all.deb Connection failed [IP: 40.119.46.219 80]

/home/vsts/work/1/s/build_tools/azure/test_docs.sh: line 8: testvenv/bin/activate: No such file or directory

glemaitre

Instead of adding this validation, we should instead use the parameter validation framework: #24862

Before to be able to do that, we will need a small refactoring to have the function calling the class as we did for other estimator/function. I will fix that in another PR and we can build upon it here.

sklearn/covariance/_shrunk_covariance.py

doc/whats_new/v1.3.rst

glemaitre · 2023-01-14T16:00:58Z

sklearn/covariance/_shrunk_covariance.py


    Read more in the :ref:`User Guide <shrunk_covariance>`.

    Parameters
    ----------
-    emp_cov : array-like of shape (n_features, n_features)
-        Covariance matrix to be shrunk.
+    emp_cov : array-like of shape (..., n_features, n_features)


Suggested change

emp_cov : array-like of shape (..., n_features, n_features)

emp_cov : array-like of shape (n_features, n_features) or \

(n_matrices, n_features, n_features)

Suggestions about docstring seem to restrict usage to 2D and 3D arrays, whereas function is now able to process any nd array.

What is the use case of n-D shrinkage? Since the covariance matrix is always a 2-D matrix, I don't really see when you will pass a 4-D array, for instance.

Most of NumPy and SciPy functions use (..., n, n) to indicate that they can process ndarrays,
like numpy.linalg.eig and scipy.linalg.expm for example.

Use cases belong to users. But, for an example with a 4D array, shape (k, m, n, n), one might want to shrunk k sets, each set containing of m covariance matrices. I have tested, and code is ok.

For me, description should not restrain actual usage. But, as you wish.

I was thinking more of restraining on purpose the usage that we have in scikit-learn.
But we can go this road and see what other reviewers think.

sklearn/covariance/_shrunk_covariance.py

glemaitre · 2023-01-17T10:42:53Z

sklearn/covariance/_shrunk_covariance.py

-    shrunk_cov.flat[:: n_features + 1] += shrinkage * mu
+    mu = np.trace(emp_cov, axis1=-2, axis2=-1) / n_features
+    mu = np.expand_dims(mu, axis=tuple(range(mu.ndim, emp_cov.ndim)))
+    shrunk_cov += shrinkage * mu * np.eye(n_features)


Here, we materialize the eye matrix. If we restrain to a 3-D matrix then, we can efficiently do the in place addition with flattening as previously done.

sklearn/covariance/tests/test_covariance.py

glemaitre · 2023-01-19T09:52:39Z

doc/whats_new/v1.3.rst

+.........................
+
+- |Enhancement| Allow :func:`covariance.shrunk_covariance` to process
+  multiple covariance matrices at once.


Then we need to mention somehow that we handle nd-array.

glemaitre · 2023-01-19T09:55:34Z

sklearn/covariance/_shrunk_covariance.py


    Read more in the :ref:`User Guide <shrunk_covariance>`.

    Parameters
    ----------
-    emp_cov : array-like of shape (n_features, n_features)
-        Covariance matrix to be shrunk.
+    emp_cov : array-like of shape (..., n_features, n_features)


I was thinking more of restraining on purpose the usage that we have in scikit-learn.
But we can go this road and see what other reviewers think.

sklearn/covariance/tests/test_covariance.py

qbarthelemy · 2023-01-19T12:45:31Z

Tests fail, due to np.expand_dims.
It seems that NumPy version used in tests does not accept tuple for axis argument (tuple added in NumPy 1.18.0).

  File "/home/vsts/work/1/s/sklearn/covariance/_shrunk_covariance.py", line 114, in shrunk_covariance
    mu = np.expand_dims(mu, axis=tuple(range(mu.ndim, emp_cov.ndim)))
  File "<__array_function__ internals>", line 5, in expand_dims
  File "/usr/lib/python3/dist-packages/numpy/lib/shape_base.py", line 577, in expand_dims
    if axis > a.ndim or axis < -a.ndim - 1:
TypeError: '>' not supported between instances of 'tuple' and 'int'

glemaitre · 2023-01-19T13:37:48Z

Indeed, it was introduced in NumPy 1.18 and we have minimal dependencies set to 1.17.3.
But I think that we should be fine here. NEP29 tells us that we will drop Python 3.8 support and the oldest-support-numpy (https://github.com/scipy/oldest-supported-numpy/blob/main/setup.cfg) tell us that we should bump to 1.19.3.

So for scikit-learn 1.3, we will be able to get with the current implementation. We might need to delay the merge of the feature depending on the bugfix release, but I will put this feature in the milestone.

jeremiedbb · 2023-06-07T16:08:06Z

Actually we're not bumping the numpy min version for 1.3. Let's target this for 1.4 and we'll merge this when we bump the min version.

github-actions · 2023-08-29T15:10:13Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 4d88273. Link to the linter CI: here}

Co-authored-by: Alexandre Gramfort <[email protected]>

Co-authored-by: Guillaume Lemaitre <[email protected]>

glemaitre · 2023-12-07T16:37:53Z

I open #27910 to bump the version of NumPy. Once merged, this PR should be mergeable.

glemaitre · 2023-12-13T13:56:16Z

Merging since everything is green. Thanks @qbarthelemy

…it-learn#25275) Co-authored-by: Alexandre Gramfort <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]>

github-actions bot added the module:covariance label Jan 2, 2023

agramfort reviewed Jan 11, 2023

View reviewed changes

sklearn/covariance/_shrunk_covariance.py Outdated Show resolved Hide resolved

agramfort approved these changes Jan 13, 2023

View reviewed changes

glemaitre reviewed Jan 14, 2023

View reviewed changes

sklearn/covariance/_shrunk_covariance.py Outdated Show resolved Hide resolved

glemaitre requested changes Jan 14, 2023

View reviewed changes

glemaitre changed the title ~~ENH : covariance : parameter checking and ndarray processing for shrunk_covariance~~ ENH allow shrunk_covariance to handle multiple matrices at once Jan 14, 2023

glemaitre self-requested a review January 17, 2023 10:07

glemaitre reviewed Jan 17, 2023

View reviewed changes

sklearn/covariance/tests/test_covariance.py Outdated Show resolved Hide resolved

sklearn/covariance/tests/test_covariance.py Outdated Show resolved Hide resolved

glemaitre reviewed Jan 19, 2023

View reviewed changes

glemaitre added this to the 1.3 milestone Jan 19, 2023

qbarthelemy mentioned this pull request Jan 24, 2023

replace while loop by expand_dims pyRiemann/pyRiemann#221

Merged

jeremiedbb modified the milestones: 1.3, 1.4 Jun 7, 2023

qbarthelemy force-pushed the covariance_shrunk branch from 50e7b8f to 38443e7 Compare August 29, 2023 15:08

qbarthelemy and others added 6 commits September 22, 2023 09:03

update shrunk_covariance

a1e27d1

complete tests

3ff8c2e

correct n_features

09985c1

Co-authored-by: Alexandre Gramfort <[email protected]>

complete changelog

9f2c51f

change quote

44782e8

allow nd in check array

cd75f5a

qbarthelemy and others added 5 commits September 22, 2023 09:03

Apply suggestions from code review

09f8f8a

Co-authored-by: Guillaume Lemaitre <[email protected]>

correct log and correct expand_dims

ab8aacb

split tests and parametrize

22beec4

use assert_allclose

cf1a82c

Co-authored-by: Guillaume Lemaitre <[email protected]>

update whats new

de0ba82

qbarthelemy force-pushed the covariance_shrunk branch from c9ad783 to de0ba82 Compare September 22, 2023 07:05

correct whats new

61e1d7e

glemaitre self-requested a review December 7, 2023 15:51

Merge remote-tracking branch 'origin/main' into pr/qbarthelemy/25275

c7a3dbd

glemaitre mentioned this pull request Dec 7, 2023

MAINT bumpversion Python and dependencies #27910

Merged

Merge branch 'main' into covariance_shrunk

4d88273

glemaitre approved these changes Dec 13, 2023

View reviewed changes

glemaitre merged commit 2d4197d into scikit-learn:main Dec 13, 2023

qbarthelemy deleted the covariance_shrunk branch December 13, 2023 13:59

	emp_cov : array-like of shape (..., n_features, n_features)
	emp_cov : array-like of shape (n_features, n_features) or \
	(n_matrices, n_features, n_features)

Uh oh!

ENH allow shrunk_covariance to handle multiple matrices at once #25275

ENH allow shrunk_covariance to handle multiple matrices at once #25275

Uh oh!

Conversation

qbarthelemy commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

qbarthelemy commented Jan 11, 2023

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qbarthelemy commented Jan 13, 2023

Uh oh!

agramfort commented Jan 13, 2023

Uh oh!

qbarthelemy commented Jan 14, 2023

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre Jan 14, 2023

Choose a reason for hiding this comment

Uh oh!

qbarthelemy Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

qbarthelemy Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qbarthelemy commented Jan 19, 2023

Uh oh!

glemaitre commented Jan 19, 2023

Uh oh!

jeremiedbb commented Jun 7, 2023

Uh oh!

github-actions bot commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

glemaitre commented Dec 7, 2023

Uh oh!

glemaitre commented Dec 13, 2023

Uh oh!

Uh oh!

qbarthelemy commented Jan 2, 2023 •

edited

Loading

qbarthelemy Jan 17, 2023 •

edited

Loading

github-actions bot commented Aug 29, 2023 •

edited

Loading