-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
Follow up on #12424:
We don't have a built-in dataset with missing values and/or mixed feature types. I think it would be good to have one to have non-synthetic examples for the column transformer and imputation strategies.
While I like to reduce the number of custom fetchers and would like to use fetch_openml as much as possible, I think there's a benefit to having a built-in dataset so that examples can run without internet connection.
Not sure what good candidates would be. Titanic is somewhat obvious though I'm not sure about missingness patterns there. Adult would be nice but might be too large to ship (4mb - would double the size of the wheel so seems unreasonable).
Maybe the ames housing data would be appropriate, not sure if it has missing values.