Thanks to visit codestin.com
Credit goes to github.com

Skip to content

GridSearchCV fails when parameters are arrays with different sizes #29277

Closed
@Gabriel-Kissin

Description

@Gabriel-Kissin

Describe the bug

SplineTransformer accepts arrays for the knots argument to specify the positions of the knots.

Using GridSearchCV to find the best positions fails if the knots array has a different size (i.e. if there is a different n_knots). This appears to be because the code attempts to coerce the parameters into one array, and therefore fails due to the inhomogeneous shape.

Note: sklearn versions - this error only occurs in recent versions of sklearn (1.5.0). Earlier versions (1.4.2) did not suffer from this issue.

Note 2: the issue would be avoided if the n_knots parameter were to be searched over (instead of the knots parameter). However, it is often important to specify the knots positions directly - for example, with periodic data, as in the provided example, as the periodicity is defined by the first and last knots. In any case there are presumably other places in sklearn where arrays of different shapes can be provided as parameters and where the same issue will occur.

Steps/Code to Reproduce

import numpy as np

import sklearn.pipeline
import sklearn.preprocessing
import sklearn.model_selection
import sklearn.linear_model

import matplotlib.pyplot as plt

x = np.linspace(-np.pi*2,np.pi*5,1000)
y_true = np.sin(x)
y_train = y_true[(0<x) & (x<np.pi*2)]

x_train = x[(0<x) & (x<np.pi*2)]
y_train_noise = y_train + np.random.normal(size=y_train.shape, scale=0.5)

x = x.reshape((-1,1))
x_train = x_train.reshape((-1,1))

spline_reg_pipe = sklearn.pipeline.make_pipeline(
            sklearn.preprocessing.SplineTransformer(extrapolation="periodic"), 
            sklearn.linear_model.LinearRegression(fit_intercept=False)
            )

spline_reg_pipe_cv = sklearn.model_selection.GridSearchCV(
    estimator=spline_reg_pipe,
    param_grid={
        # 'splinetransformer__degree' : [3,4,5],
        'splinetransformer__knots'  : [np.linspace(0,np.pi*2,n_knots).reshape((-1,1)) 
                                       for n_knots in range(10,21,5)],
    },
    verbose=1
)

spline_reg_pipe_cv.fit(X=x_train, y=y_train_noise)

plt.scatter(x_train, y_train_noise, s=1, label='noisy data')
plt.plot(x, y_true, label='truth')
plt.plot(x, spline_reg_pipe_cv.predict(x), label='predictions')
plt.legend()
plt.show()

Expected Results

This is sample output from earlier versions of sklearn:
image

Actual Results

Error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.

Full traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[46], line 26
     11 spline_reg_pipe = sklearn.pipeline.make_pipeline(
     12             sklearn.preprocessing.SplineTransformer(extrapolation="periodic"), 
     13             sklearn.linear_model.LinearRegression(fit_intercept=False)
     14             )
     16 spline_reg_pipe_cv = sklearn.model_selection.GridSearchCV(
     17     estimator=spline_reg_pipe,
     18     param_grid={
   (...)
     23     verbose=1
     24 )
---> 26 spline_reg_pipe_cv.fit(X=x_train, y=y_train_noise)
     28 plt.scatter(x_train, y_train_noise, s=1, label='noisy data')
     29 plt.plot(x, y_true, label='truth')

File ~/Library/Python/3.12/lib/python/site-packages/sklearn/base.py:1473, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1466     estimator._validate_params()
   1468 with config_context(
   1469     skip_parameter_validation=(
   1470         prefer_skip_nested_validation or global_skip_validation
   1471     )
   1472 ):
-> 1473     return fit_method(estimator, *args, **kwargs)

File ~/Library/Python/3.12/lib/python/site-packages/sklearn/model_selection/_search.py:968, in BaseSearchCV.fit(self, X, y, **params)
    962     results = self._format_results(
    963         all_candidate_params, n_splits, all_out, all_more_results
    964     )
    966     return results
--> 968 self._run_search(evaluate_candidates)
    970 # multimetric is determined here because in the case of a callable
    971 # self.scoring the return type is only known after calling
    972 first_test_score = all_out[0]["test_scores"]

File ~/Library/Python/3.12/lib/python/site-packages/sklearn/model_selection/_search.py:1543, in GridSearchCV._run_search(self, evaluate_candidates)
   1541 def _run_search(self, evaluate_candidates):
   1542     """Search all candidates in param_grid"""
-> 1543     evaluate_candidates(ParameterGrid(self.param_grid))

File ~/Library/Python/3.12/lib/python/site-packages/sklearn/model_selection/_search.py:962, in BaseSearchCV.fit.<locals>.evaluate_candidates(candidate_params, cv, more_results)
    959         all_more_results[key].extend(value)
    961 nonlocal results
--> 962 results = self._format_results(
    963     all_candidate_params, n_splits, all_out, all_more_results
    964 )
    966 return results

File ~/Library/Python/3.12/lib/python/site-packages/sklearn/model_selection/_search.py:1098, in BaseSearchCV._format_results(self, candidate_params, n_splits, out, more_results)
   1094     arr_dtype = object
   1095 if len(param_list) == n_candidates and arr_dtype != object:
   1096     # Exclude `object` else the numpy constructor might infer a list of
   1097     # tuples to be a 2d array.
-> 1098     results[key] = MaskedArray(param_list, mask=False, dtype=arr_dtype)
   1099 else:
   1100     # Use one MaskedArray and mask all the places where the param is not
   1101     # applicable for that candidate (which may not contain all the params).
   1102     ma = MaskedArray(np.empty(n_candidates), mask=True, dtype=arr_dtype)

File ~/Library/Python/3.12/lib/python/site-packages/numpy/ma/core.py:2820, in MaskedArray.__new__(cls, data, mask, dtype, copy, subok, ndmin, fill_value, keep_mask, hard_mask, shrink, order)
   2811 """
   2812 Create a new masked array from scratch.
   2813 
   (...)
   2817 
   2818 """
   2819 # Process data.
-> 2820 _data = np.array(data, dtype=dtype, copy=copy,
   2821                  order=order, subok=True, ndmin=ndmin)
   2822 _baseclass = getattr(data, '_baseclass', type(_data))
   2823 # Check that we're not erasing the mask.

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.

Versions

System:
    python: 3.12.3 (v3.12.3:f6650f9ad7, Apr  9 2024, 08:18:47) [Clang 13.0.0 (clang-1300.0.29.30)]
executable: /usr/local/bin/python3
   machine: macOS-14.5-arm64-arm-64bit

Python dependencies:
      sklearn: 1.5.0
          pip: 24.0
   setuptools: 70.0.0
        numpy: 1.26.4
        scipy: 1.13.0
       Cython: 3.0.10
       pandas: 2.2.2
   matplotlib: 3.8.4
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 11
         prefix: libopenblas
       filepath: /Users/gabriel.kissin/Library/Python/3.12/lib/python/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 11
         prefix: libopenblas
       filepath: /Users/gabriel.kissin/Library/Python/3.12/lib/python/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.26.dev
threading_layer: pthreads
   architecture: neoversen1

       user_api: openmp
   internal_api: openmp
    num_threads: 11
         prefix: libomp
       filepath: /Users/gabriel.kissin/Library/Python/3.12/lib/python/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: openmp
   internal_api: openmp
    num_threads: 11
         prefix: libomp
       filepath: /Users/gabriel.kissin/Library/Python/3.12/lib/python/site-packages/xgboost/.dylibs/libomp.dylib
        version: None

       user_api: openmp
   internal_api: openmp
    num_threads: 11
         prefix: libomp
       filepath: /opt/homebrew/Cellar/libomp/18.1.7/lib/libomp.dylib
        version: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions