Thanks to visit codestin.com
Credit goes to github.com

Skip to content

GridSearchCV with PCA returns object masked array #28350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MarcoGorelli opened this issue Feb 2, 2024 · 1 comment · Fixed by #28352
Closed

GridSearchCV with PCA returns object masked array #28350

MarcoGorelli opened this issue Feb 2, 2024 · 1 comment · Fixed by #28352
Labels

Comments

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Feb 2, 2024

Describe the bug

I noticed this while looking into #28345

The dtype of the components_col is object, which means that the pandas object which is then created is of dtype object.

Steps/Code to Reproduce

import numpy as np

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV

pca = PCA()
X_digits, y_digits = datasets.load_digits(return_X_y=True)
param_grid = {"n_components": [5, 15]}
search = GridSearchCV(pca, param_grid)
search.fit(X_digits, y_digits)

print(search.cv_results_['param_n_components'].data.dtype)

Expected Results

int64

Actual Results

object

Versions

System:
    python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
executable: /home/marcogorelli/tmp/.venv/bin/python3.10
   machine: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.4.0
          pip: 22.0.2
   setuptools: 59.6.0
        numpy: 1.26.3
        scipy: 1.11.4
       Cython: None
       pandas: 2.2.0
   matplotlib: 3.8.2
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libopenblas
       filepath: /home/marcogorelli/tmp/.venv/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: SkylakeX

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /home/marcogorelli/tmp/.venv/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libopenblas
       filepath: /home/marcogorelli/tmp/.venv/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: SkylakeX
@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue requires triage labels Feb 2, 2024
@adrinjalali adrinjalali removed the Needs Triage Issue requires triage label Feb 2, 2024
@adrinjalali
Copy link
Member

Hmm, we were just talking the other day about improving cv_results_ and the tools around it.

But this is true, that the dtype of n_components inside PCA is int, but here we get object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants