Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #28174

What does this implement/fix? Explain your changes.

This PR goes with the solution in #28174 (comment) .

There is no additional overhead because the pandas check is only used if it is in sys.modules. It is in sys.modules only if pandas is already imported by the user.

@thomasjpfan thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Jan 20, 2024
@thomasjpfan thomasjpfan added this to the 1.4.1 milestone Jan 20, 2024
@github-actions
Copy link

github-actions bot commented Jan 20, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 5170c39. Link to the linter CI: here

@glemaitre
Copy link
Member

Do you recall why we were ducktyping in the first place?

@thomasjpfan
Copy link
Member Author

Do you recall why we were ducktyping in the first place?

It was an evolution of how we detect dataframes. It started with hasattr(X, "iloc"), then it become:

def _is_pandas_df(X):
    if hasattr(X, "iloc"):
        try:
            import pandas as pd
            return isinstance(X, pd.DataFrame)
        except ImportError:
            return False
    return False

But there was overhead with importing pandas when pandas was installed, but not used. Then we shifted to getting the pandas module from sys.modules which does not have an import overhead, but the ducktyping remained.

larger than the number of non-duplicate samples.
:pr:`28165` by :user:`Jérémie du Boisberranger <jeremiedbb>`.

- |Fix| Pandas and Polars dataframe are validated directly without ducktyping checks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not consider it a fix :) but rather an enhancement :)

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @thomasjpfan

@thomasjpfan thomasjpfan changed the title FIX Checks pandas and polars directly ENH Checks pandas and polars directly Jan 20, 2024
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @thomasjpfan

@OmarManzoor OmarManzoor merged commit b4754ba into scikit-learn:main Jan 22, 2024
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:utils To backport PR merged in master that need a backport to a release branch defined based on the milestone.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sklearn 1.4 breaks using astropy tables with KFold.split

3 participants