CI Use conda-forge for min-dependencies build and add polars and pandas #29502

lesteve · 2024-07-16T13:40:30Z

Reference Issues/PRs

As noticed in #29490 (comment) we currently don't have any CI build with numpy 1.19 or numpy 1.20. The issue was caught in doc-min-depencies because it is actually using our real numpy minimum supported version.

What does this implement/fix? Explain your changes.

On top on using conda-forge to be able to use our min dependencies, this is adding polars and pandas to our min-dependencies build. This would allow to notice more easily issues like #29490.

github-actions · 2024-07-16T13:41:57Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: e6e9357. Link to the linter CI: here}

OmarManzoor

LGTM. Thanks @lesteve. The CI seems to be failing though. Won't doing this require fixing other errors just because of an older version?

CI seems to be failing

lesteve · 2024-07-16T15:19:05Z

The CI seems to be failing though.

Yep, I guess this is the old saying "if it's not tested then it's probably broken" striking again 😉

There are 3 failures see build log

FAILED compose/tests/test_column_transformer.py::test_column_transformer_column_renaming[polars] - ValueError: expected 3 values when selecting columns by boolean mask, got 0
FAILED compose/tests/test_column_transformer.py::test_column_transformer_error_with_duplicated_columns[polars] - AssertionError: Regex pattern did not match.
FAILED preprocessing/tests/test_function_transformer.py::test_function_transformer_overwrite_column_names[pandas-polars] - ValueError: Length mismatch: Expected axis has 3 elements, new values have ...

It is probably worth having a closer look and to estimate the work needed. It seems to be mostly related to polars and I would say numpy<=1.21 + polars is rather unlikely to happen in reality.

A reasonable compromise may be to bump our minimum Numpy version now since you know we are not really testing them very thoroughly.

Another reasonable compromise (but maybe more controversial?) would be to do nothing, pretend we did not notice this, and rely only on doc-min-dependencies to catch the worst bugs.

lesteve · 2024-07-17T06:15:38Z

So after a bit of investigation I settled on:

adding pandas and polars to the min-dependencies CI build.
fixing issue with pandas < 1.4. For some reason when you create a pandas DataFrame from a polars DataFrame in pandas < 1.4 it adds an additional column ... Edit: actually what happens actually is the data is transposed.
moving minimum supported polars to 0.20.30 which fixed the other issues. polars 0.20.30 has been released May 26 2024, originally the minimum supported version was 0.20.23 which was released one month before roughly (28 April 2024). My feeling is that polars is moving so fast that scikit-learn users that care about polars support will be on recent versions anyway. cc @lorentzenchr who may have an informed opinion about this.

lesteve · 2024-07-17T09:48:57Z

The codecov red status is because there is no CI without pandas and with coverage enabled. Actually adding pandas to the min dependencies remove the last build without pandas and with coverage enabled. I think this is still OKish.

The CI failure on the previous commit (where I did a mistake on the "no pandas installed" code) shows that we still have CI builds without pandas mainly the Windows pymin_conda_forge_mkl and Ubuntu Atlas see build log

sklearn/utils/fixes.py

betatim · 2024-07-17T13:48:55Z

sklearn/utils/fixes.py

+
+    def _create_pandas_dataframe_from_non_pandas_container(X, *, index, copy):
+        X_output = pd.DataFrame(X, index=index, copy=copy)
+        if "polars" not in str(X.__class__):


Is it good enough to test "polars" in class or should we check the more specific

Suggested change

if "polars" not in str(X.__class__):

if str(X.__class__).startswith("polars."):

Fair question: originally I wanted to avoid importing polars to check isinstance(X, pl.DataFrame) I don't really remember why if to be fully honest 🤔. There is also sklearn.util.validation._is_polar_df although this will likely cause a circular import, sklearn.utils.validation imports stuff from sklearn.utils.fixes so it's not OK to use import sklearn.utils.validation in sklearn.utils.fixes.

Let me try to think a bit more about this.

I pushed some better code in fdf23ed

betatim · 2024-07-17T13:53:44Z

I think bumping the polars version is Ok. It is not an official dependency of scikit-learn, so we don't have to be as conservative.

With this PR we remove the only CI setup where we use the defaults channel of conda. I personally haven't used the defaults channel in a very very long time, but I suspect if we ask more "normal" data science users we would find some/many who do? Should we keep at least one defaults channel CI setup, but maybe not also with the most minimal version we require but the lowest one currently available?

Co-authored-by: Tim Head <[email protected]>

lesteve · 2024-07-17T14:14:00Z

With this PR we remove the only CI setup where we use the defaults channel of conda

What makes you think this? There are plenty (OK actually 3 after the changes in this PR) of CI builds still using "defaults" channel. Edit: thinking about it 2 are pip-based so they don't really use the conda packages from the defaults channel but there is still the no-OpenMP one which is on macOS (do we strongly want to keep a build with conda package from defaults for Linux?):

The command I use:

❯ rg defaults build_tools/**/*environment.yml
build_tools/azure/pylatest_conda_mkl_no_openmp_environment.yml
5:  - defaults

build_tools/azure/pylatest_pip_openblas_pandas_environment.yml
5:  - defaults

build_tools/azure/pylatest_pip_scipy_dev_environment.yml
5:  - defaults

(IMO there should be one CI build using defaults and all the other using conda-forge, but that's a different discussion 😉)

…nto conda-forge-min-deps-ci

ogrisel

LGTM. I am fine with switching this build to conda-forge as long as we still have one CI build using the defaults channel.

…nto conda-forge-min-deps-ci

lesteve · 2024-07-30T10:09:13Z

Merging my own PR with two approvals

…as (scikit-learn#29502) Co-authored-by: Tim Head <[email protected]>

…as (#29502) Co-authored-by: Tim Head <[email protected]>

CI Use conda-forge for min-dependencies build and add polars and pandas

148d73a

github-actions bot added the Build / CI label Jul 16, 2024

[azure parallel] trigger CI

4698bd5

adrinjalali approved these changes Jul 16, 2024

View reviewed changes

lesteve mentioned this pull request Jul 16, 2024

Fix array api in mean_absolute_percentage_error for older versions #29490

Merged

lesteve added 3 commits July 16, 2024 16:05

Better name + clean-up

b1e86f2

[azure parallel] trigger CI

bd54517

[azure parallel] remove outdated comment

c1fe222

OmarManzoor previously approved these changes Jul 16, 2024

View reviewed changes

lesteve added 6 commits July 16, 2024 17:45

[azure parallel] no pandas no polars

912fde1

[azure parallel] Bump minimum supported polars and fix for pandas<1.4

89d43a4

[azure parallel] fix

669d906

[azure parallel] fix

b8677d9

Keep naming more consistent + move to utils.fixes

423de28

[azure parallel]

6eb7b00

[azure parallel] fix

1ea150e

betatim reviewed Jul 17, 2024

View reviewed changes

sklearn/utils/fixes.py Outdated Show resolved Hide resolved

betatim reviewed Jul 17, 2024

View reviewed changes

Update sklearn/utils/fixes.py

e283745

Co-authored-by: Tim Head <[email protected]>

lesteve added 4 commits July 17, 2024 16:33

[azure parallel] improve code

fdf23ed

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

a25dfd7

…nto conda-forge-min-deps-ci

[doc build] update lock-files as well as doc-min-dependencies lock-file

90364f2

[azure parallel] trigger CI

a4429c8

ogrisel approved these changes Jul 29, 2024

View reviewed changes

lesteve added 2 commits July 30, 2024 11:14

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

df0f5bb

…nto conda-forge-min-deps-ci

[doc build] [azure parallel]

e6e9357

lesteve merged commit cb35bd4 into scikit-learn:main Jul 30, 2024
30 of 32 checks passed

lesteve deleted the conda-forge-min-deps-ci branch July 30, 2024 10:09

MarcBresson pushed a commit to MarcBresson/scikit-learn that referenced this pull request Sep 2, 2024

CI Use conda-forge for min-dependencies build and add polars and pand…

7fb47dc

…as (scikit-learn#29502) Co-authored-by: Tim Head <[email protected]>

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 9, 2024

CI Use conda-forge for min-dependencies build and add polars and pand…

20d8321

…as (scikit-learn#29502) Co-authored-by: Tim Head <[email protected]>

glemaitre pushed a commit that referenced this pull request Sep 11, 2024

CI Use conda-forge for min-dependencies build and add polars and pand…

c680035

…as (#29502) Co-authored-by: Tim Head <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CI Use conda-forge for min-dependencies build and add polars and pandas #29502

CI Use conda-forge for min-dependencies build and add polars and pandas #29502

Uh oh!

lesteve commented Jul 16, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jul 16, 2024 •

edited

Loading

Uh oh!

OmarManzoor left a comment •

edited

Loading

Uh oh!

lesteve commented Jul 16, 2024 •

edited

Loading

Uh oh!

lesteve commented Jul 17, 2024 •

edited

Loading

Uh oh!

lesteve commented Jul 17, 2024 •

edited

Loading

Uh oh!

Uh oh!

betatim Jul 17, 2024

Uh oh!

lesteve Jul 17, 2024 •

edited

Loading

Uh oh!

lesteve Jul 17, 2024

Uh oh!

betatim commented Jul 17, 2024

Uh oh!

lesteve commented Jul 17, 2024 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

lesteve commented Jul 30, 2024

Uh oh!

Uh oh!

Uh oh!

	if "polars" not in str(X.__class__):
	if str(X.__class__).startswith("polars."):

Uh oh!

CI Use conda-forge for min-dependencies build and add polars and pandas #29502

CI Use conda-forge for min-dependencies build and add polars and pandas #29502

Uh oh!

Conversation

lesteve commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

OmarManzoor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lesteve commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

betatim Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

lesteve Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lesteve Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

betatim commented Jul 17, 2024

Uh oh!

lesteve commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

lesteve commented Jul 30, 2024

Uh oh!

Uh oh!

Uh oh!

lesteve commented Jul 16, 2024 •

edited

Loading

github-actions bot commented Jul 16, 2024 •

edited

Loading

OmarManzoor left a comment •

edited

Loading

lesteve commented Jul 16, 2024 •

edited

Loading

lesteve commented Jul 17, 2024 •

edited

Loading

lesteve commented Jul 17, 2024 •

edited

Loading

lesteve Jul 17, 2024 •

edited

Loading

lesteve commented Jul 17, 2024 •

edited

Loading