Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@skailasa
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds the parameter to specify the Lapack driver for the internal SVD taken over the matrix B, which is the projection of the matrix being compressed in a low-dimensional subspace (see Halko et al).

This is useful in numerical applications where the defualt 'gesdd' does not offer enough precision.

Any other comments?

This function should also support double precision. However, this should be the subject of a further PR.

@skailasa skailasa changed the title Add lapack driver ENH: Add lapack driver argument to randomized svd function Jul 27, 2021
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

You need to add an additional test to check that we can indeed pass the new parameter as expected.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Could you please add a small unit test for this option?

@skailasa
Copy link
Contributor Author

skailasa commented Jul 28, 2021

I added the test however left the quotation marks as single rather than double around variable names as suggested above in order to match the rest of this function/module.

@glemaitre
Copy link
Member

quotation marks as single rather than double around variable names

I would expect black to fail then since the standard is double-quoted.
FYI: https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=30899&view=logs&jobId=32e2e1bb-a28f-5b18-6cfc-3f01273f5609&j=32e2e1bb-a28f-5b18-6cfc-3f01273f5609&t=fc67071d-c3d4-58b8-d38e-cafc0d3c731a

You just probably want to apply black on the two files and that's it.

@glemaitre
Copy link
Member

@skailasa would you be able to complete your PR?

@skailasa
Copy link
Contributor Author

skailasa commented Sep 7, 2021

@skailasa would you be able to complete your PR?

Hi, yes sorry about dropping the ball on this. Been held up elsewhere/been on my holidays. WIll complete this today!

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed nice enhancement.

Comment on lines 337 to 340
Whether to use the more efficient divide-and-conquer approach
(`"gesdd"`) or more general rectangular approach (`"gesvd"`) to compute
the SVD of the matrix B, which is the projection of M into a low
dimensional subspace, as described by Halko et. al.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as described by Halko et. al.

Can you please add a reference to the mention article?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a reference to the function's docstring, as it's mentioned multiple times during the parameter documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just checked and a reference is already in the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. In that case, can you adapt the doc-string to add a Sphinx cross-reference similarly to this?

References
----------
.. [1] Ankerst, Mihael, Markus M. Breunig, Hans-Peter Kriegel,
and Jörg Sander. "OPTICS: ordering points to identify the clustering
structure." ACM SIGMOD Record 28, no. 2 (1999): 49-60.
.. [2] Schubert, Erich, Michael Gertz.
"Improving the Cluster Structure Extracted from OPTICS Plots." Proc. of
the Conference "Lernen, Wissen, Daten, Analysen" (LWDA) (2018): 318-329.

Comment on lines 570 to 581
def test_randomized_svd_lapack_driver():
# Check that different SVD drivers provide consistent results

# Matrix being compressed
n = 123
m = 456
k = 10
rng = np.random.RandomState(0)
X = rng.rand(n, m)

# Number of components
k = 10
Copy link
Member

@jjerphan jjerphan Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about parametrizing this test on a few combinations of n, m, k and on the random seed?

For instance:

Suggested change
def test_randomized_svd_lapack_driver():
# Check that different SVD drivers provide consistent results
# Matrix being compressed
n = 123
m = 456
k = 10
rng = np.random.RandomState(0)
X = rng.rand(n, m)
# Number of components
k = 10
@pytest.mark.parametrize("n", [50, 100, 300])
@pytest.mark.parametrize("m", [50, 100, 300])
@pytest.mark.parametrize("k", [10, 20, 50])
@pytest.mark.parametrize("seed", range(5))
def test_randomized_svd_lapack_driver(n, m, k, seed):
# Check that different SVD drivers provide consistent results
# Matrix being compressed
rng = np.random.RandomState(seed)
X = rng.rand(n, m)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@jjerphan jjerphan Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted my suggestion which was shadowing the parametrized value.

(It looks like you have not committed all the changes or pushed them.)

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after removing the last shadowing.

X = rng.rand(n, m)

# Number of components
k = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this originates from the mistake I made when suggesting and this has to be removed.

Suggested change
k = 10

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @skailasa !

Comment on lines +340 to +341
dimensional subspace, as described in [1]_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dimensional subspace, as described in [1]_.
dimensional subspace, as described in [1]_.
.. versionadded:: 1.1

these functions were not documented and part from the public API.
:pr:`20521` by :user:`Olivier Grisel <ogrisel>`.

- |Enhancement| :func:`utils.extmath.randomized_svd` now accepts an argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to move into 1.1

the value of `random_state` explicitly to suppress the deprecation
warning.
lapack_svd_driver : {"gesdd", "gesvd"}, default="gesdd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do you think svd_lapack_driver is better here? It keeps the term "lapack_driver" together, which makes it a bit easier to search for when looking for "lapack_driver" in the scipy docs.

@cmarmo
Copy link
Contributor

cmarmo commented May 10, 2022

@skailasa you already have two approvals. Do you mind fixing conflicts and the last comment? Then this can probably be merged. Thanks!

@glemaitre glemaitre self-assigned this May 17, 2022
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just applied the suggestions of @thomasjpfan.
I will be merging when the CI turns green.

@glemaitre glemaitre merged commit 0c618b2 into scikit-learn:main May 17, 2022
@glemaitre
Copy link
Member

Thanks @skailasa

lesteve pushed a commit to lesteve/scikit-learn that referenced this pull request May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants