Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add built-in dataset with missing values and categorical data? #12433

@amueller

Description

@amueller

Follow up on #12424:
We don't have a built-in dataset with missing values and/or mixed feature types. I think it would be good to have one to have non-synthetic examples for the column transformer and imputation strategies.
While I like to reduce the number of custom fetchers and would like to use fetch_openml as much as possible, I think there's a benefit to having a built-in dataset so that examples can run without internet connection.

Not sure what good candidates would be. Titanic is somewhat obvious though I'm not sure about missingness patterns there. Adult would be nice but might be too large to ship (4mb - would double the size of the wheel so seems unreasonable).

Maybe the ames housing data would be appropriate, not sure if it has missing values.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions