Thanks to visit codestin.com
Credit goes to github.com

Skip to content

WIP Enabling different array types (CuPy) in PCA with NEP 37 #16574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
3882723
MNT Adds labeler
thomasjpfan Feb 21, 2020
d79390b
BUG Fix
thomasjpfan Feb 21, 2020
d71ecae
Double quotes are better
thomasjpfan Feb 21, 2020
919a519
BUG Fix
thomasjpfan Feb 21, 2020
a89754c
BUG Fix
thomasjpfan Feb 21, 2020
0f56d96
MNT Adds build ci tag
thomasjpfan Feb 21, 2020
e4ee673
MNT Use fork for new feature
thomasjpfan Feb 22, 2020
409be8d
Merge branch 'only_change_setup'
thomasjpfan Feb 22, 2020
3e729e4
MNT Uses tagged version
thomasjpfan Feb 22, 2020
faef88c
Merge remote-tracking branch 'upstream/master'
thomasjpfan Feb 26, 2020
60c3834
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
e7bfda8
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
8cdfd2d
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
a8dc598
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
adb07db
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
df8ca82
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
ea1c0fb
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
4fe65fe
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
5ca03e0
Merge remote-tracking branch 'upstream/master' into pca_array_functio…
thomasjpfan Feb 27, 2020
fe6293d
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
8c3001f
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
33bbd52
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
19b9d2a
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
a988b8a
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
a33bde0
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
d50fbbf
WIP Testing nep37 [skip ci]
thomasjpfan Feb 27, 2020
817c4f7
BUG Fix
thomasjpfan Feb 28, 2020
2718bb7
Merge remote-tracking branch 'upstream/master' into pca_array_functio…
thomasjpfan Apr 20, 2020
d14166f
WIP Update
thomasjpfan Apr 20, 2020
e2b74a5
WIP Enables support for JAx
thomasjpfan Apr 21, 2020
100b0f9
WIP [ci skip]
thomasjpfan Apr 21, 2020
7578d90
ENH adds support for minmaxScaler
thomasjpfan Apr 21, 2020
fc5d9e8
Fix extra n_features + better error message
ogrisel Jun 22, 2020
a24515c
Merge master + pass npx to linalg.svd
ogrisel Jun 22, 2020
f33e325
Add missing docstring to make the tests pass
ogrisel Jun 22, 2020
a96afc5
Add a test for jax compat
ogrisel Jun 22, 2020
adf4692
Let's focus on svd_solver='full' for now
ogrisel Jun 22, 2020
c1c2f20
CI Be nicer to the ci
thomasjpfan Jun 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 103 additions & 103 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,23 +67,23 @@ jobs:
SKLEARN_SKIP_NETWORK_TESTS: '0'

# Will run all the time regardless of linting outcome.
- template: build_tools/azure/posix.yml
parameters:
name: Linux_Runs
vmImage: ubuntu-18.04
matrix:
pylatest_conda_mkl:
DISTRIB: 'conda'
PYTHON_VERSION: '*'
BLAS: 'mkl'
NUMPY_VERSION: '*'
SCIPY_VERSION: '*'
CYTHON_VERSION: '*'
PILLOW_VERSION: '*'
PYTEST_VERSION: '*'
JOBLIB_VERSION: '*'
THREADPOOLCTL_VERSION: '2.0.0'
COVERAGE: 'true'
# - template: build_tools/azure/posix.yml
# parameters:
# name: Linux_Runs
# vmImage: ubuntu-18.04
# matrix:
# pylatest_conda_mkl:
# DISTRIB: 'conda'
# PYTHON_VERSION: '*'
# BLAS: 'mkl'
# NUMPY_VERSION: '*'
# SCIPY_VERSION: '*'
# CYTHON_VERSION: '*'
# PILLOW_VERSION: '*'
# PYTEST_VERSION: '*'
# JOBLIB_VERSION: '*'
# THREADPOOLCTL_VERSION: '2.0.0'
# COVERAGE: 'true'

- template: build_tools/azure/posix.yml
parameters:
Expand All @@ -95,31 +95,31 @@ jobs:
# Linux environment to test that scikit-learn can be built against
# versions of numpy, scipy with ATLAS that comes with Ubuntu Bionic 18.04
# i.e. numpy 1.13.3 and scipy 0.19
py36_ubuntu_atlas:
DISTRIB: 'ubuntu'
PYTHON_VERSION: '3.6'
JOBLIB_VERSION: '0.11'
PYTEST_XDIST: 'false'
THREADPOOLCTL_VERSION: '2.0.0'
# Linux + Python 3.6 build with OpenBLAS and without SITE_JOBLIB
py36_conda_openblas:
DISTRIB: 'conda'
PYTHON_VERSION: '3.6'
BLAS: 'openblas'
NUMPY_VERSION: '1.13.3'
SCIPY_VERSION: '0.19.1'
PANDAS_VERSION: '*'
CYTHON_VERSION: '*'
# temporary pin pytest due to unknown failure with pytest 5.3
PYTEST_VERSION: '5.2'
PILLOW_VERSION: '4.2.1'
MATPLOTLIB_VERSION: '2.1.1'
SCIKIT_IMAGE_VERSION: '*'
# latest version of joblib available in conda for Python 3.6
JOBLIB_VERSION: '0.13.2'
THREADPOOLCTL_VERSION: '2.0.0'
PYTEST_XDIST: 'false'
COVERAGE: 'true'
# py36_ubuntu_atlas:
# DISTRIB: 'ubuntu'
# PYTHON_VERSION: '3.6'
# JOBLIB_VERSION: '0.11'
# PYTEST_XDIST: 'false'
# THREADPOOLCTL_VERSION: '2.0.0'
# # Linux + Python 3.6 build with OpenBLAS and without SITE_JOBLIB
# py36_conda_openblas:
# DISTRIB: 'conda'
# PYTHON_VERSION: '3.6'
# BLAS: 'openblas'
# NUMPY_VERSION: '1.13.3'
# SCIPY_VERSION: '0.19.1'
# PANDAS_VERSION: '*'
# CYTHON_VERSION: '*'
# # temporary pin pytest due to unknown failure with pytest 5.3
# PYTEST_VERSION: '5.2'
# PILLOW_VERSION: '4.2.1'
# MATPLOTLIB_VERSION: '2.1.1'
# SCIKIT_IMAGE_VERSION: '*'
# # latest version of joblib available in conda for Python 3.6
# JOBLIB_VERSION: '0.13.2'
# THREADPOOLCTL_VERSION: '2.0.0'
# PYTEST_XDIST: 'false'
# COVERAGE: 'true'
# Linux environment to test the latest available dependencies and MKL.
# It runs tests requiring lightgbm, pandas and PyAMG.
pylatest_pip_openblas_pandas:
Expand All @@ -131,66 +131,66 @@ jobs:
TEST_DOCSTRINGS: 'true'
CHECK_WARNINGS: 'true'

- template: build_tools/azure/posix-32.yml
parameters:
name: Linux32
vmImage: ubuntu-18.04
dependsOn: [linting]
condition: and(ne(variables['Build.Reason'], 'Schedule'), succeeded('linting'))
matrix:
py36_ubuntu_atlas_32bit:
DISTRIB: 'ubuntu-32'
PYTHON_VERSION: '3.6'
JOBLIB_VERSION: '0.13'
THREADPOOLCTL_VERSION: '2.0.0'
# - template: build_tools/azure/posix-32.yml
# parameters:
# name: Linux32
# vmImage: ubuntu-18.04
# dependsOn: [linting]
# condition: and(ne(variables['Build.Reason'], 'Schedule'), succeeded('linting'))
# matrix:
# py36_ubuntu_atlas_32bit:
# DISTRIB: 'ubuntu-32'
# PYTHON_VERSION: '3.6'
# JOBLIB_VERSION: '0.13'
# THREADPOOLCTL_VERSION: '2.0.0'

- template: build_tools/azure/posix.yml
parameters:
name: macOS
vmImage: macOS-10.14
dependsOn: [linting]
condition: and(ne(variables['Build.Reason'], 'Schedule'), succeeded('linting'))
matrix:
pylatest_conda_mkl:
DISTRIB: 'conda'
PYTHON_VERSION: '*'
BLAS: 'mkl'
NUMPY_VERSION: '*'
SCIPY_VERSION: '*'
CYTHON_VERSION: '*'
PILLOW_VERSION: '*'
PYTEST_VERSION: '*'
JOBLIB_VERSION: '*'
THREADPOOLCTL_VERSION: '2.0.0'
COVERAGE: 'true'
pylatest_conda_mkl_no_openmp:
DISTRIB: 'conda'
PYTHON_VERSION: '*'
BLAS: 'mkl'
NUMPY_VERSION: '*'
SCIPY_VERSION: '*'
CYTHON_VERSION: '*'
PILLOW_VERSION: '*'
PYTEST_VERSION: '*'
JOBLIB_VERSION: '*'
THREADPOOLCTL_VERSION: '2.0.0'
COVERAGE: 'true'
SKLEARN_TEST_NO_OPENMP: 'true'
SKLEARN_SKIP_OPENMP_TEST: 'true'
# - template: build_tools/azure/posix.yml
# parameters:
# name: macOS
# vmImage: macOS-10.14
# dependsOn: [linting]
# condition: and(ne(variables['Build.Reason'], 'Schedule'), succeeded('linting'))
# matrix:
# pylatest_conda_mkl:
# DISTRIB: 'conda'
# PYTHON_VERSION: '*'
# BLAS: 'mkl'
# NUMPY_VERSION: '*'
# SCIPY_VERSION: '*'
# CYTHON_VERSION: '*'
# PILLOW_VERSION: '*'
# PYTEST_VERSION: '*'
# JOBLIB_VERSION: '*'
# THREADPOOLCTL_VERSION: '2.0.0'
# COVERAGE: 'true'
# pylatest_conda_mkl_no_openmp:
# DISTRIB: 'conda'
# PYTHON_VERSION: '*'
# BLAS: 'mkl'
# NUMPY_VERSION: '*'
# SCIPY_VERSION: '*'
# CYTHON_VERSION: '*'
# PILLOW_VERSION: '*'
# PYTEST_VERSION: '*'
# JOBLIB_VERSION: '*'
# THREADPOOLCTL_VERSION: '2.0.0'
# COVERAGE: 'true'
# SKLEARN_TEST_NO_OPENMP: 'true'
# SKLEARN_SKIP_OPENMP_TEST: 'true'

- template: build_tools/azure/windows.yml
parameters:
name: Windows
vmImage: vs2017-win2016
dependsOn: [linting]
condition: and(ne(variables['Build.Reason'], 'Schedule'), succeeded('linting'))
matrix:
py37_conda_mkl:
PYTHON_VERSION: '3.7'
CHECK_WARNINGS: 'true'
PYTHON_ARCH: '64'
PYTEST_VERSION: '*'
COVERAGE: 'true'
py36_pip_openblas_32bit:
PYTHON_VERSION: '3.6'
PYTHON_ARCH: '32'
# - template: build_tools/azure/windows.yml
# parameters:
# name: Windows
# vmImage: vs2017-win2016
# dependsOn: [linting]
# condition: and(ne(variables['Build.Reason'], 'Schedule'), succeeded('linting'))
# matrix:
# py37_conda_mkl:
# PYTHON_VERSION: '3.7'
# CHECK_WARNINGS: 'true'
# PYTHON_ARCH: '64'
# PYTEST_VERSION: '*'
# COVERAGE: 'true'
# py36_pip_openblas_32bit:
# PYTHON_VERSION: '3.6'
# PYTHON_ARCH: '32'
2 changes: 1 addition & 1 deletion build_tools/azure/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ elif [[ "$DISTRIB" == "conda-pip-latest" ]]; then
python -m pip install -U pip
python -m pip install pytest==$PYTEST_VERSION pytest-cov

python -m pip install pandas matplotlib pyamg scikit-image
python -m pip install pandas matplotlib pyamg scikit-image jax jaxlib
# do not install dependencies for lightgbm since it requires scikit-learn
python -m pip install lightgbm --no-deps
elif [[ "$DISTRIB" == "conda-pip-scipy-dev" ]]; then
Expand Down
2 changes: 1 addition & 1 deletion build_tools/azure/test_script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,5 @@ cp setup.cfg $TEST_DIR
cd $TEST_DIR

set -x
$TEST_CMD --pyargs sklearn
$TEST_CMD --pyargs sklearn.decomposition
set +x
6 changes: 5 additions & 1 deletion sklearn/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
'working_memory': int(os.environ.get('SKLEARN_WORKING_MEMORY', 1024)),
'print_changed_only': True,
'display': 'text',
'enable_duck_array': False,
}


Expand All @@ -28,7 +29,8 @@ def get_config():


def set_config(assume_finite=None, working_memory=None,
print_changed_only=None, display=None):
print_changed_only=None, display=None,
enable_duck_array=None):
"""Set global scikit-learn configuration

.. versionadded:: 0.19
Expand Down Expand Up @@ -80,6 +82,8 @@ def set_config(assume_finite=None, working_memory=None,
_global_config['print_changed_only'] = print_changed_only
if display is not None:
_global_config['display'] = display
if enable_duck_array is not None:
_global_config['enable_duck_array'] = enable_duck_array


@contextmanager
Expand Down
13 changes: 7 additions & 6 deletions sklearn/decomposition/_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
import numbers

import numpy as np
from scipy import linalg
from scipy.special import gammaln
from scipy.sparse import issparse
from scipy.sparse.linalg import svds
Expand All @@ -26,6 +25,7 @@
from ..utils.extmath import stable_cumsum
from ..utils.validation import check_is_fitted
from ..utils.validation import _deprecate_positional_args
from ..utils import _get_array_module


def _assess_dimension(spectrum, rank, n_samples):
Expand Down Expand Up @@ -396,6 +396,7 @@ def _fit(self, X):

X = self._validate_data(X, dtype=[np.float64, np.float32],
ensure_2d=True, copy=self.copy)
npx = _get_array_module(X)

# Handle n_components==None
if self.n_components is None:
Expand All @@ -420,14 +421,14 @@ def _fit(self, X):

# Call different fits for either full or truncated SVD
if self._fit_svd_solver == 'full':
return self._fit_full(X, n_components)
return self._fit_full(X, n_components, npx=npx)
elif self._fit_svd_solver in ['arpack', 'randomized']:
return self._fit_truncated(X, n_components, self._fit_svd_solver)
else:
raise ValueError("Unrecognized svd_solver='{0}'"
"".format(self._fit_svd_solver))

def _fit_full(self, X, n_components):
def _fit_full(self, X, n_components, npx=np):
"""Fit the model by computing full SVD on X"""
n_samples, n_features = X.shape

Expand All @@ -448,12 +449,12 @@ def _fit_full(self, X, n_components):
% (n_components, type(n_components)))

# Center data
self.mean_ = np.mean(X, axis=0)
self.mean_ = npx.mean(X, axis=0)
X -= self.mean_

U, S, Vt = linalg.svd(X, full_matrices=False)
U, S, Vt = npx.linalg.svd(X, full_matrices=False)
# flip eigenvectors' sign to enforce deterministic output
U, Vt = svd_flip(U, Vt)
U, Vt = svd_flip(U, Vt, npx=npx)

components_ = Vt

Expand Down
28 changes: 28 additions & 0 deletions sklearn/decomposition/tests/test_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import pytest

import sklearn
from sklearn.utils._testing import assert_allclose

from sklearn import datasets
Expand Down Expand Up @@ -638,3 +639,30 @@ def test_assess_dimesion_rank_one():
assert np.isfinite(_assess_dimension(s, rank=1, n_samples=n_samples))
for rank in range(2, n_features):
assert _assess_dimension(s, rank, n_samples) == -np.inf


# XXX: it should be possible to support 'randomized' by adding npx=np
# in appropriate locations. The 'arpack' svd_solver, on the other-hand,
# cannot easily be adapted to work on non-numpy allocated arrays.
# @pytest.mark.parametrize('svd_solver', ["full", "randomized", "auto"])
@pytest.mark.parametrize('svd_solver', ["full"])
@pytest.mark.parametrize('copy', [True, False])
def test_pca_jax_data(svd_solver, copy):
jnp = pytest.importorskip("jax.numpy")
X_np = np.random.RandomState(42).randn(1000, 100)
X_np = X_np.astype(np.float32)
X_jnp = jnp.asarray(X_np)

pca_np = PCA(n_components=3, svd_solver=svd_solver, copy=copy,
random_state=0)
X_pca_np = pca_np.fit_transform(X_np)

with sklearn.config_context(enable_duck_array=True):
pca_jnp = PCA(**pca_np.get_params())
X_pca_jnp = pca_jnp.fit_transform(X_jnp)

assert isinstance(X_pca_jnp, type(X_jnp))
assert isinstance(pca_jnp.components_, type(X_jnp))

assert_allclose(X_pca_np, X_pca_jnp, atol=1e-3)
assert_allclose(pca_np.components_, pca_jnp.components_, atol=1e-3)
Loading