Thanks to visit codestin.com
Credit goes to github.com

Skip to content

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 05, 2024) ⚠️ #30315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scikit-learn-bot opened this issue Nov 21, 2024 · 8 comments · Fixed by #30393
Labels

Comments

@scikit-learn-bot
Copy link
Contributor

scikit-learn-bot commented Nov 21, 2024

CI is still failing on Linux_Nightly.pylatest_pip_scipy_dev (Dec 05, 2024)

  • test_partial_dependence_binary_model_grid_resolution[features0-10-10]
@github-actions github-actions bot added the Needs Triage Issue requires triage label Nov 21, 2024
@glemaitre
Copy link
Member

Apparently, we try to affect a float value generated from the grid into a column of integer and it will fail in Pandas 3.0. We could change the test such that we don't trigger the issue.

However, we should revisit this use case: potentially a user pass integral values to represent categories; so sampling floating values do not make sense. So we might want to always raise an error mentioning that you should either convert in to float or declare it as a categorical feature.

I have to look at it in more details.

@glemaitre glemaitre added Bug and removed Needs Triage Issue requires triage labels Nov 21, 2024
@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 21, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 22, 2024) ⚠️ Nov 22, 2024
@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 22, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 23, 2024) ⚠️ Nov 23, 2024
@scikit-learn-bot
Copy link
Contributor Author

scikit-learn-bot commented Nov 24, 2024

CI is no longer failing! ✅

Successful run on Dec 09, 2024

@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 23, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 28, 2024) ⚠️ Nov 28, 2024
@lesteve
Copy link
Member

lesteve commented Nov 29, 2024

This seems weird that the failure does not seem deterministic ...

  • it failed on November 21 (first observed failure), November 22, November 23, November 28
  • it passed on the other days

@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 28, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 30, 2024) ⚠️ Nov 30, 2024
@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Nov 30, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 01, 2024) ⚠️ Dec 1, 2024
@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 01, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 02, 2024) ⚠️ Dec 2, 2024
@lesteve
Copy link
Member

lesteve commented Dec 2, 2024

The latest in test_csr_polynomial_expansion_index_overflow one seems like a scipy change in main, here is an excerpt from the wheel build log (the assertion rewriting works and we have more info because we are not using meson editable install).

 E       AssertionError: assert dtype('int64') == <class 'numpy.int32'>
 E        +  where dtype('int64') = array([65534]).dtype
 E        +    where array([65534]) = <Compressed Sparse Row sparse array of dtype 'float64'\n	with 1 stored elements and shape (1, 2147450880)>.indices
  __ test_csr_polynomial_expansion_index_overflow[csr_array-False-True-2-65535] __
  
  degree = 2, n_features = 65535, interaction_only = True, include_bias = False
  csr_container = <class 'scipy.sparse._csr.csr_array'>
  
      @pytest.mark.parametrize(
          "degree, n_features",
          [
              # Needs promotion to int64 when interaction_only=False
              (2, 65535),
              (3, 2344),
              # This guarantees that the intermediate operation when calculating
              # output columns would overflow a C-long, hence checks that python-
              # longs are being used.
              (2, int(np.sqrt(np.iinfo(np.int64).max) + 1)),
              (3, 65535),
              # This case tests the second clause of the overflow check which
              # takes into account the value of `n_features` itself.
              (2, int(np.sqrt(np.iinfo(np.int64).max))),
          ],
      )
      @pytest.mark.parametrize("interaction_only", [True, False])
      @pytest.mark.parametrize("include_bias", [True, False])
      @pytest.mark.parametrize("csr_container", CSR_CONTAINERS)
      def test_csr_polynomial_expansion_index_overflow(
          degree, n_features, interaction_only, include_bias, csr_container
      ):
          """Tests known edge-cases to the dtype promotion strategy and custom
          Cython code, including a current bug in the upstream
          `scipy.sparse.hstack`.
          """
          data = [1.0]
          row = [0]
          col = [n_features - 1]
      
          # First degree index
          expected_indices = [
              n_features - 1 + int(include_bias),
          ]
          # Second degree index
          expected_indices.append(n_features * (n_features + 1) // 2 + expected_indices[0])
          # Third degree index
          expected_indices.append(
              n_features * (n_features + 1) * (n_features + 2) // 6 + expected_indices[1]
          )
      
          X = csr_container((data, (row, col)))
          pf = PolynomialFeatures(
              interaction_only=interaction_only, include_bias=include_bias, degree=degree
          )
      
          # Calculate the number of combinations a-priori, and if needed check for
          # the correct ValueError and terminate the test early.
          num_combinations = pf._num_combinations(
              n_features=n_features,
              min_degree=0,
              max_degree=degree,
              interaction_only=pf.interaction_only,
              include_bias=pf.include_bias,
          )
          if num_combinations > np.iinfo(np.intp).max:
              msg = (
                  r"The output that would result from the current configuration would have"
                  r" \d* features which is too large to be indexed"
              )
              with pytest.raises(ValueError, match=msg):
                  pf.fit(X)
              return
      
          # In SciPy < 1.8, a bug occurs when an intermediate matrix in
          # `to_stack` in `hstack` fits within int32 however would require int64 when
          # combined with all previous matrices in `to_stack`.
          if sp_version < parse_version("1.8.0"):
              has_bug = False
              max_int32 = np.iinfo(np.int32).max
              cumulative_size = n_features + include_bias
              for deg in range(2, degree + 1):
                  max_indptr = _calc_total_nnz(X.indptr, interaction_only, deg)
                  max_indices = _calc_expanded_nnz(n_features, interaction_only, deg) - 1
                  cumulative_size += max_indices + 1
                  needs_int64 = max(max_indices, max_indptr) > max_int32
                  has_bug |= not needs_int64 and cumulative_size > max_int32
              if has_bug:
                  msg = r"In scipy versions `<1.8.0`, the function `scipy.sparse.hstack`"
                  with pytest.raises(ValueError, match=msg):
                      X_trans = pf.fit_transform(X)
                  return
      
          # When `n_features>=65535`, `scipy.sparse.hstack` may not use the right
          # dtype for representing indices and indptr if `n_features` is still
          # small enough so that each block matrix's indices and indptr arrays
          # can be represented with `np.int32`. We test `n_features==65535`
          # since it is guaranteed to run into this bug.
          if (
              sp_version < parse_version("1.9.2")
              and n_features == 65535
              and degree == 2
              and not interaction_only
          ):  # pragma: no cover
              msg = r"In scipy versions `<1.9.2`, the function `scipy.sparse.hstack`"
              with pytest.raises(ValueError, match=msg):
                  X_trans = pf.fit_transform(X)
              return
          X_trans = pf.fit_transform(X)
      
          expected_dtype = np.int64 if num_combinations > np.iinfo(np.int32).max else np.int32
          # Terms higher than first degree
          non_bias_terms = 1 + (degree - 1) * int(not interaction_only)
          expected_nnz = int(include_bias) + non_bias_terms
          assert X_trans.dtype == X.dtype
          assert X_trans.shape == (1, pf.n_output_features_)
  >       assert X_trans.indptr.dtype == X_trans.indices.dtype == expected_dtype
  E       AssertionError: assert dtype('int64') == <class 'numpy.int32'>
  E        +  where dtype('int64') = array([65534]).dtype
  E        +    where array([65534]) = <Compressed Sparse Row sparse array of dtype 'float64'\n	with 1 stored elements and shape (1, 2147450880)>.indices
  
  ../venv/lib/python3.13t/site-packages/sklearn/preprocessing/tests/test_polynomial.py:1132: AssertionError

@ogrisel
Copy link
Member

ogrisel commented Dec 2, 2024

This is probably a consequence of scipy/scipy#21824. I need to read more carefully, but I think this is not a bug, and we have to adapt our test.

@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 02, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 03, 2024) ⚠️ Dec 3, 2024
@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 03, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 04, 2024) ⚠️ Dec 4, 2024
@lesteve
Copy link
Member

lesteve commented Dec 4, 2024

I can reproduce the test_partial_dependence_binary_model_grid_resolution[features0-10-10] but it is not deterministic. With pytest-repeat running the test a 100 times fails roughly 10 times.

 pytest sklearn/inspection/tests/test_partial_dependence.py -k 'test_partial_dependence_binary_model_grid_resolution and 10-10' --count 100
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-16-100] - TypeError: Invalid value '0.41000000000000014' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-18-100] - TypeError: Invalid value '0.41000000000000014' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-30-100] - TypeError: Invalid value '0.9544444444444449' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-37-100] - TypeError: Invalid value '0.41000000000000014' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-102-47-100] - TypeError: Invalid value '0.41000000000000014' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-52-100] - TypeError: Invalid value '0.41000000000000014' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-67-100] - TypeError: Invalid value '0.9544444444444449' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-74-100] - TypeError: Invalid value '0.8888888888888888' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-84-100] - TypeError: Invalid value '0.9544444444444449' for dtype 'int64'
FAILED sklearn/inspection/tests/test_partial_dependence.py::test_partial_dependence_binary_model_grid_resolution[features0-10-10-85-100] - TypeError: Invalid value '0.41000000000000014' for dtype 'int64'

@ogrisel ogrisel reopened this Dec 4, 2024
@ogrisel
Copy link
Member

ogrisel commented Dec 4, 2024

@lesteve I have found a fix for test_partial_dependence_binary_model_grid_resolution. It's a real bug. I will open a PR.

@scikit-learn-bot scikit-learn-bot changed the title ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 04, 2024) ⚠️ ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Dec 05, 2024) ⚠️ Dec 5, 2024
@lesteve
Copy link
Member

lesteve commented Dec 9, 2024

This has been fixed by #30409.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants